Sampling Bias
The Psychology Behind It
Sampling bias is the "lazy researcher" bias. It is easier to ask your friends, your students, or people on the street corner than it is to design a truly random national sample. We gravitate towards "convenience sampling."
Additionally, certain groups are "hard to reach" (the homeless, the very rich, the busy). If a survey requires a 20-minute phone call, you are sampling "people with 20 minutes of free time who like talking on the phone," not the general public.
Real-World Examples
Self-Selection Bias
Internet polls are notorious for this. "Vote on our website!" Only people who visit that specific website and care enough to click will vote. The results tell you nothing about the wider world.
Pre-Screening Bias
In clinical trials, researchers often exclude patients with other conditions (comorbidities) to make the data cleaner. But in the real world, patients often have multiple conditions. The drug might work in the "clean" sample but fail in the "messy" real world.
Survivorship Sampling
Analyzing the financial performance of current companies ignores those that went bankrupt (a form of survivorship bias that is also a sampling error).
Consequences
Sampling bias can lead to:
- Echo Chambers: We think everyone agrees with us because we only sample our own social circle.
- Product Failure: A product tests well in a focus group of loyal fans but flops in the mass market.
- Medical Harm: Treatments are approved based on young, healthy samples and then cause side effects in the elderly.
How to Mitigate It
Define the population, then sample the population.
- Stratified Sampling: Divide the population into groups (age, race, income) and sample randomly from each group to ensure representation.
- Oversampling: Intentionally sample more small groups (minorities) to ensure you have enough data to analyze them.
- Response Rate Analysis: If only 10% of people answered your survey, analyze the 90% who didn't. Are they different?
Conclusion
A cup of water from the ocean tells you about the ocean only if you stirred the ocean first. Sampling bias is what happens when you dip your cup in a stagnant pool and think you know the sea.