Wyatt Earp Effect and its Consequences for Data Analytics and Science | by Christianlauer | Nov, 2020
What do gunslingers like Wyatt Earp have in common with Data Analytics and Science you might ask yourself — it’s about small probabilities (to stay alive as a gunman) and how it can result in selection bias.
What on Earth is the Wyatt Earp Effect?
The Wyatt-Earp effect or the bewitching power of small probabilities was an article written by F. Thomas Bruss after watching the film Wyatt-Earp with actor Kevin Costner. The author was motivated by the fact that, although being a probabilist, often explaining to students to be aware of selection bias, he was still surprised and almost trapped when learning that Wyatt survived all these shootings without the slightest scratch, and died at age 81. Wyatt-Earp is a typical example of selection bias. He survived incredibly dangerous events, and this was very improbable. Hence, the character is so extraordinary, that one produces movies about him. 
In simple words: When you are thinking about the thousands of gunslingers that have existed and the fact that most of them died, it’s not that surprising anymore that at least one of them survived. But for the movie that one guy was picked not one of the not so lucky ones which results in classical selection bias.
Relation to Data Analytics/Science
An easily understood example for selection bias is the following use case in which gross domestic product and satisfaction of people of certain countries were put in a regression model. Due to the fact, that they selected only a few countries, their message was: A higher gross domestic product results in more happiness. But after adding more countries and using a brighter sample, the result wasn’t that clear anymore. 
Another use case which I also experienced during a ML project was that for an automated image recognition algorithm only a small variety of images were chosen. The goal was to detect images where condition and possible failures of the infrastructure are recognizable. But the training data mostly contained data from functional infrastructure. Which resulted in an algorithm that labeled all the images without failure correctly but had problems recognizing images with damages.
Selection bias is a very important topic to pay attention to during your data analytics and machine learning projects. Note: But it’s only one topic to pay respect to, there are many more potential mistakes you should take care of— e.g.:
- Correlation versus causality
- Unobserved influencing factors
- Missing data
- and more
Hopefully, this gives you a first idea of how selection bias can result in bad algorithms and how you can avoid them — while still enjoy the next western movie with your favorite whisky. Cheers!
Sources and further readings
F. Bruss, The Wyatt-Earp Effekt, Spektrum der Wissenschaft (2007), 110–113.
 Aurélien Géron, Praxiseinstieg Machine Learning mit Scikit-Learn und TensorFlow: Konzepte, Tools und Techniken für intelligente Systeme (2008), 24–25.
Read More …