When 4 ≈ 10,000: The power of social science knowledge in predictive performance

McKay, Steve (2018) When 4 ≈ 10,000: The power of social science knowledge in predictive performance. Socius, 4 . ISSN 2378-0231

Documents
When 4 ≈ 10,000: The power of social science knowledge in predictive performance
[img]
[Download]
[img] Microsoft Word
socius_submission_v3b_clean.docx - Whole Document
Available under License Creative Commons Attribution-NonCommercial 4.0 International.

119kB
Item Type:Article
Item Status:Live Archive

Abstract

Computer science has devised leading methods for predicting variables; can social science compete? The author sets
out a social scientific approach to the Fragile Families Challenge. Key insights included new variables constructed
according to theory (e.g., a measure of shame relating to hardship), lagged values of the target variables, using predicted
values of certain outcomes to inform others, and validated scales rather than individual variables. The models were
competitive: a four-variable logistic regression model was placed second for predicting layoffs, narrowly beaten by a
model using all the available variables (>10,000) and an ensemble of algorithms. Similarly, a relatively small random
forest model (25 variables) was ranked seventh in predicting material hardship. However, a similar approach overfitted
the prediction of grit. Machine learning approaches proved superior to linear regression for modeling the continuous
outcomes. Overall, social scientists can contribute to predictive performance while benefiting from learning more
about data science methods.

Keywords:fragile families, logistic regression, data science, random forests
Subjects:L Social studies > L400 Social Policy
Divisions:College of Social Science > School of Social & Political Sciences
ID Code:34204
Deposited On:28 Nov 2018 11:39

Repository Staff Only: item control page