1. InterModel Vigorish (IMV): A novel approach for quantifying predictive accuracy with binary outcomes

With Ben Domingue, Jessica Faul, Jeremy Freese, Klint Kanopka, Alexandros Rigos, Ben Stenhaug and Ajay Tripathi. Working paper available here, code library available here.

Abstract: Understanding the ‘fit’ of models meant to predict binary outcomes has been a long-standing problem. We propose a novel metric—the InterModel Vigorish (IMV)—for quantifying the change in accuracy between two predictive systems in the case of a binary outcome. The IMV is based on an analogy to well-characterized physical systems with tractable probabilities, weighted coins. The IMV is always a statement about the change in fit relative to some baseline (which can simply be the prevalence) whereas other metrics (e.g., AUC) are stand-alone measures that need to be further manipulated to yield indices related to differences in fit across models. Moreover, the IMV’s value is consistently interpretable independently of the baseline prediction or prevalence. We illustrate the properties of this metric in simulations and its value in empirical applications related to health, political affiliation, and item responses. We also reconsider results from the recent Fragile Families Challenge using the IMV metric.

2. The Role of the Third Sector in Public Health Service Procurement

With John Mohan. Code library available here

Abstract: The role of external suppliers across statutory health insurance procurement processes varies widely and is a source of political contention throughout the modern world. We comprehensively examine the role of non-profit organisations in public health procurement within publicly funded healthcare which runs parallel to private provision in a ‘two-tier’ system. We build a unique ‘Big Data’ based pipeline which scrapes tens of thousands of heterogeneous accounting datasets from across a commissioning hierarchy. These datasets provide granular information on every element of procurement at the micro-level (where the value of a transaction is greater than twenty-five thousand pounds), mandated by transparency requirements introduced by David Cameron in 2010. We develop tools to scrape, parse, and reconcile suppliers with institutional registers. The processed dataset contains over four hundred and forty-five billion pounds worth of commissioning across over 1.9 million rows of clean data. Approximately 1% at each level of procurement comes from institutions listed on the Charity Commission for England and Wales: a number relatively consistent across time, despite contractual patterns. We show a slight regional variation and analyse the ‘North-South’ divide. Linking to the International Classification of Non-profit Organizations, we show involvement of multiple different types of charity, with more payments going to the ‘Social Services’ aggregate, but the highest cumulative values going to the ‘Health’ aggregate. We analyse the distribution across various sizes and ages, from grassroots to ‘Super Major’ non-profits, and analyse variation over time. We conclude with a re-evaluation of the effects of the controversial Health and Social Care Act of 2012 and the integration of the free market and volunteerism, otherwise known as the ‘Big Society’.

3. From a Seed of Doubt Grows a Forest of Uncertainty

With Arun Frey, Jiani Yan, Mark Verhagen. Abstract, code, and working paper coming soon!

4. The Legacy of Longevity: Persistent inequalities in UK life expectancy

With Aaron Reeves, Felix Tropf and Darryl Lundy. Working paper coming soon!

5. From the Small Capitalist to the Corporate Elite: Who owns and controls the means of production?

With Aaron Reeves. Work in Progress, code library coming soon!

Abstract: We generate unique Big, Open datasets related to company officer-ship and control and use a variety of algorithms to engineer new features, providing a range of advancements related to stratification, diversity and corporate governance. Our approach allows a consideration beyond the traditionally studied ‘Corporate Elite’ (one of the three branches of the Millsian ‘Power Elite’), utilizing population-level data across all sub-strata of officers and owners (such Wright’s definition of the petite bourgeoisie and ‘Small Capitalists’) and across all Standard Industry Classifiers. We integrate and advance existing gender inference algorithms, geo-spatial methods, techniques to parse electronic accounts, and computationally efficient network approximations, applying them to a highly developed and market-oriented economy which leads the world in terms of Open Data. Our applications incorporate axillary information (such as indices of deprivation and shapefiles), focusing on variables of classical interest integral to the study of social stratification with respect to distributions across gender, age, nationality, occupation and spatial segregation. Through this new digital, computational approach which makes use of an ‘interface method’, we are also able to provide new results related to board overlap, the rise of the service economy and changing shape of the business class in addition to providing a re-examination of the business case for diversity.

6. A Grid Based Approach to Analysing Spatial Weighting Matrix Specification

Code library available here, with a link to the working paper version here (I hope to finish this paper one day…).

Abstract: We outline a grid-based approach to provide further evidence against the misconception that the results of spatial econometric models are sensitive to the exact specification of the exogenously set weighting matrix (otherwise known as the ‘biggest myth in spatial econometrics’). Our application estimates three large sets of specifications using an original dataset which contains information on the Prime Central London housing market. We show that while posterior model probabilities may indicate a strong preference for an extremely small number of models, and while the spatial autocorrelation parameter varies substantially, median direct effects remain stable across the entire permissible spatial weighting matrix space. We argue that spatial econometric models should be estimated across this entire space, as opposed to the current convention of merely estimating a cursory number of points for robustness.