Selection Bias

Also known as: Sampling bias (related)

Selection bias occurs when the participants or data in a study are not representative of the larger population because they were selected in a non-random way. This distorts the results and makes them inapplicable to the general population.

Statistical Biases

2 min read

experimental Evidence


Selection Bias

The Psychology Behind It

To know the truth about a group (e.g., "Do Americans like pizza?"), you need a random sample. If you only ask people inside a pizza restaurant, your data is garbage. That is selection bias.

It happens because true randomness is hard. We naturally sample what is convenient, available, or willing. This creates a "distorted mirror" where we think we are seeing the world, but we are only seeing a specific slice of it.

Real-World Examples

The 1936 Literary Digest Poll

The magazine polled 2.4 million people and predicted Alf Landon would beat FDR in a landslide. FDR won. Why? The magazine polled its subscribers, car owners, and telephone users. In 1936 (the Great Depression), these were the rich. They selected a wealthy sample that hated FDR, ignoring the poor majority.

Online Reviews

Product reviews are heavily biased. Who writes a review? People who loved it (5 stars) or people who hated it (1 star). The vast middle ground of people who thought it was "okay" don't bother writing. Thus, reviews show a polarized world that doesn't exist.

Medical Studies

If a study on a new weight-loss drug recruits volunteers, it gets people who are motivated to lose weight. The drug might work for them but fail for the unmotivated general population.

Consequences

Selection bias can lead to:

  • False Conclusions: We believe things are true that are only true for a specific subgroup.
  • Bad Policy: Laws are passed based on the loud voices of a selected few (lobbyists, activists) rather than the silent majority.
  • Algorithm Bias: AI trained on biased data (e.g., resumes of mostly men) will learn to replicate that bias (hiring only men).

How to Mitigate It

Randomize, randomize, randomize.

  1. Random Sampling: Ensure every member of the population has an equal chance of being selected.
  2. Check the Source: Ask, "Who is in this dataset? Who is excluded? Why?"
  3. Weighting: If you know your sample is biased (e.g., too many men), mathematically weight the data to represent the true population.

Conclusion

Selection bias reminds us that "data" is not "truth." Data is only as good as the method used to collect it. If the net is flawed, the catch will be flawed.

Mitigation Strategies

Randomized Controlled Trials (RCT): The gold standard in science. Randomly assign participants to control and treatment groups to eliminate selection bias.

Effectiveness: high

Difficulty: moderate

Potential Decision Harms

A company only recruits from Ivy League schools, missing out on talented candidates from other backgrounds and creating a homogenous culture.

major Severity

Facial recognition software trains on white faces, leading to high error rates for people of color.

critical Severity

Key Research Studies

Selection bias in web surveys

Bethlehem, J. (2010) International Statistical Review

Analyzed how self-selection in online surveys leads to biased estimates that cannot always be corrected.

Read Study →


Related Biases

Explore these related cognitive biases to deepen your understanding

Neglect of Probability

2 min read

Neglect of probability is the tendency to completely disregard probability when making a decision under uncertainty.

Statistical Biases

/ Probability blindness

Ludic Fallacy

2 min read

The ludic fallacy is the misuse of games to model real-life situations.

Statistical Biases

/ Gaming fallacy

Sampling Bias

2 min read

Sampling bias is a bias in which a sample is collected in such a way that some members of the intended population have a lower or higher sampling probability than others.

Statistical Biases

/ Ascertainment bias

Survivorship Bias

2 min read

Survivorship bias is the logical error of concentrating on the people or things that made it past some selection process and overlooking those that did not, typically because of their lack of visibility.

Statistical Biases

/ Survival bias

Texas Sharpshooter Fallacy

2 min read

The Texas sharpshooter fallacy is an informal fallacy which is committed when differences in data are ignored, but similarities are overemphasized. From this reasoning, a false conclusion is inferred.

Statistical Biases

/ Clustering illusion (related)

Pareidolia

2 min read

Pareidolia is a specific form of apophenia involving the perception of images or sounds in random stimuli, such as seeing faces in clouds.

Statistical Biases

/ Face pareidolia