American statistician Leo Breiman (1928-2005) was a man of action. He didn’t just want to develop theories, but also apply them to the real world. Whether working for UNESCO in Liberia, as an industry consultant or university professor, he used statistics to change everyday life for the better and take data in new directions.
Beyond university
Leo embraced academia. After a PhD in mathematics, he taught probability theory at the University of California, Los Angeles (UCLA). Yet, he quickly realized that he “wasn’t cut out to be an abstract mathematician.” Always keen to help, he hosted poor Mexicans learning English, volunteered to teach mathematics to emotionally disturbed youngsters and later became President of the Santa Monica School District Board. During a sabbatical as an educational statistician for UNESCO, he trekked through the rainforests of Liberia to count the number of schools and pupils.
Resigning his professorship in 1968, he started working as a consultant for the US government and industry. Drawing on his mathematical background, he developed statistical methods to predict patterns in everything from traffic to pollution.
“One problem in the field of statistics has been that everyone wants to be a theorist.”
In the right place
Fascinated by the newly emerging area of machine learning (which relies heavily on classification), he understood the important role algorithms would play in the future of statistics. During this time, he helped craft and test techniques for classification including the famous “Classification and Regression Trees” (CART), setting some of the essential groundwork for data mining and machine learning.
Representing decisions, describing data and predicting its value, CART is the basis of numerous modern decision tree concepts from cost-complexity to surrogate splinters. Upping the stability, accuracy and scale of tree ensembles over the years, he is also associated with key concepts like bagging and Random Forests — using a large number of individual decision trees to enable machine learning algorithms to learn faster and make predictions.
Embracing the computer
In 1980, he was ready to jump back into university life and accepted a position in the statistics department of the University of California, Berkeley. Focusing on applying statistics to computer science, he transformed the Statistical Laboratory — that had just one small computer — into one of the most sophisticated computing facilities in the country.
Throughout his life, his practical application of statistics has, above all, bridging the gap between statistics and computer science. And, paved the way for advances in ML and data mining.
Key Dates
-
1968
Probability is published
Leo Breiman writes and publishes Probability, the celebrated graduate textbook since re-published several times.
-
2001
The concept of the Random Forests is introduced
He introduces the concept of the Random Forests (RF) in peer-reviewed scientific journal Machine Learning.
-
2001
Statistical Modeling: The Two Cultures is published
He publishes Statistical Modeling: The Two Cultures, a paper encouraging statisticians to look beyond data modeling and use algorithms to solve real problems deriving from massive data sets.