Category

Statistical Biases

Impact level

3 / 5

Last updated

Nov 2025

Category Statistical Biases

Impact 3 / 5

STATISTICAL BIASES

Selection
Bias

Selection bias occurs when the participants or data in a study are not representative of the larger population because they were selected in a non-random way. This distorts the results and makes them inapplicable to the general population.

Also known as: Sampling bias (related)

01

Overview

Selection Bias

The Psychology Behind It

To know the truth about a group (e.g., "Do Americans like pizza?"), you need a random sample. If you only ask people inside a pizza restaurant, your data is garbage. That is selection bias.

It happens because true randomness is hard. We naturally sample what is convenient, available, or willing. This creates a "distorted mirror" where we think we are seeing the world, but we are only seeing a specific slice of it.

Real-World Examples

The 1936 Literary Digest Poll

The magazine polled 2.4 million people and predicted Alf Landon would beat FDR in a landslide. FDR won. Why? The magazine polled its subscribers, car owners, and telephone users. In 1936 (the Great Depression), these were the rich. They selected a wealthy sample that hated FDR, ignoring the poor majority.

Online Reviews

Product reviews are heavily biased. Who writes a review? People who loved it (5 stars) or people who hated it (1 star). The vast middle ground of people who thought it was "okay" don't bother writing. Thus, reviews show a polarized world that doesn't exist.

Medical Studies

If a study on a new weight-loss drug recruits volunteers, it gets people who are motivated to lose weight. The drug might work for them but fail for the unmotivated general population.

Consequences

Selection bias can lead to:

  • False Conclusions: We believe things are true that are only true for a specific subgroup.
  • Bad Policy: Laws are passed based on the loud voices of a selected few (lobbyists, activists) rather than the silent majority.
  • Algorithm Bias: AI trained on biased data (e.g., resumes of mostly men) will learn to replicate that bias (hiring only men).

How to Mitigate It

Randomize, randomize, randomize.

  1. Random Sampling: Ensure every member of the population has an equal chance of being selected.
  2. Check the Source: Ask, "Who is in this dataset? Who is excluded? Why?"
  3. Weighting: If you know your sample is biased (e.g., too many men), mathematically weight the data to represent the true population.

Conclusion

Selection bias reminds us that "data" is not "truth." Data is only as good as the method used to collect it. If the net is flawed, the catch will be flawed.

Cognitive processing

System 2 (deliberate). Biases often lean on quick judgments (System 1) unless you slow down and analyze (System 2).

Evidence & time

Evidence strength: experimental. Typical read: about 2 min.

02

Mitigation strategies

Randomized Controlled Trials (RCT): The gold standard in science. Randomly assign participants to control and treatment groups to eliminate selection bias.

Effectiveness: high

Difficulty: moderate

03

Potential decision harms

A company only recruits from Ivy League schools, missing out on talented candidates from other backgrounds and creating a homogenous culture.

major Severity

Facial recognition software trains on white faces, leading to high error rates for people of color.

critical Severity

04

Key research studies

Selection bias in web surveys

Bethlehem, J. (2010) International Statistical Review

Analyzed how self-selection in online surveys leads to biased estimates that cannot always be corrected.

Read Study →

Tags