Jan 16 2015
“You shouldn’t believe everything you read on the internet, but according to some of the more reliable sources, during World War II:
- Over 12,000 Bomber Command aircraft were shot down
- 55,500 aircrew died.
- The life expectancy of a Lancaster bomber was 3 weeks
- Tail-gunners were lucky if they survived four missions.”
This is a great story both about effective visualization of series of events in space-time and about proper interpretation in the face of sample bias.
Manufacturing, thankfully, is less dangerous than flying bombers in World War II was, but it is still more dangerous than it should be. Posting the locations of injuries on a map of the human body is also an effective way to identify which body parts are most commonly affected, and which safety improvements are most effective.
But are all injuries reported? Many organizations blame the victims for lowering their safety metrics, and discourage reporting. As a consequence, we can expect under-reporting and a bias towards injuries severe enough that reporting is unavoidable.
If you get data on an entire population, or if you thoughtfully select a representative sample, you can avoid bias, but many of the most commonly used samples are biased, often in ways that are difficult to figure out.
Customer surveys of product quality, for example, are biased by self-selection of the respondents. Are unhappy customers more likely to take the opportunity to vent than happy customers to praise? If so, to what extent? The effect of self-selection is even stronger for posting reviews on websites.