While the police concentrate on looking for missing people, Professor Emeritus at Harvard Donald Rubin (1943) focuses his attention on dealing with missing data. This world-famous American statistician spent much of his illustrious career hunting out causes, effects, potential outcomes and data that had gone AWOL.
Indecisive or curious?
Born in Washington D.C., Donald Rubin was an excellent student and embarked on an accelerated PhD physics program at Princeton University. Along the way, he switched to phycology, before being told to swot up on stats. Which he did with a PhD in statistics — not before dabbling in computer science and teaching himself to program in Fortran.
The missing pieces
Fresh from university, he took on a consulting role at the US Educational Testing Service. Unleashed to research what he wanted (within reason), he set about establishing the causal model that would later be named after him. Drawing on the work of Polish mathematician Jerzy Neyman, his approach was based on the idea of potential outcomes and explored what happens to individuals, or groups, if part of their environment changes. Not to mention how to deal with the problem of missing data — that is to say, when there is no data value for a particular variable e.g. nonresponse or dropout. Particularly common in economics, sociology and social sciences, this can have a significant impact on the validity of the conclusions.
“Often decisions about interventions must be made, even if based on limited empirical evidence, and we should help decision-makers make sensible decisions under clearly stated assumptions.”
Key causal concepts
Rubin didn’t stop there. Through a number of prestigious academic roles, he kept busy optimizing survey sampling, building on Bayesian inference and working on the Expectation-Maximization (EM) algorithm to find the maximum likelihood. He established the Propensity Score as a way of reducing or eliminating selection bias in observational studies by balancing covariates. Time and time again he went back to his lifelong passion for data, developing, most notably, Multiple Imputation to help account for uncertainty.
Beyond the big theories
His work on statistics and, in particular causal inference, has helped bring causality to the heart of social science — revolutionizing development economics and randomized field experiments, not to mention psychology and medicine by addressing dropout and noncompliance. This important contribution has been widely recognized, earning him numerous awards and positions, including fellowship of the American Statistical Association.
Key Dates
-
1974
The Rubin Causal Model papers
Donal Rubin publishes his major papers on the Rubin Causal Model. They would become the basis of much of his subsequent work.
-
2002
The 7th most cited author in Mathematics
Rubin is named 7th most cited author in mathematics for the decade 1991–2000 by Science Watch.
-
2015
The Causal Inference in Statistics textbook
Rubin publishes Causal Inference in Statistics, Social, and Biomedical Sciences, a textbook on causal interference that he wrote with econometrician Guido Imbens.