Statistics for fun - The Motive Center
Statistics for fun|
I'll look for the How to Lie book.
I was forced to learn the subject because regulations require statistical analysis of water quality data. Unfortunately the regs are badly written and require things that aren't well justified by the nature of the data... in any case I had to come up to speed in order to understand what the regs meant and what was actually justifiable. Now I'm in danger of becoming a statistics geek. (Did you know the Challenger disaster was in large part caused by failure to properly handle censored data?)
No! What's the story on that? Censored data? Does that mean they didn't have the data they needed because it was censored?
"Censored" in statistics means data that are known only to be above or below some threshold value. In my field, the threshold values are laboratory limits of detection, which "censor" the numbers below the limits. Often, a large proportion of the data are below the detection limit. These limits are based on technical processes, not health risks - so the censored data might be important, yet conventional statistics cannot make use of them.
In the Challenger case, engineers were concerned about the possible failure of O-rings at low temperatures. They had done a number of tests at different temperature ranges, and the night before the launch, they sent a graph to managers in an attempt to convince them to scrub. The graph showed the number of damage incidents as a function of temperature, but it had a major flaw: Tests where no damage had occured were not plotted. (This is very common, in fact it is the usual way people treat censored data - they exclude all the data below the threshold, because it isn't amenable to conventional analysis.) No data were available for the very low temperatures expected at launch time, because no testing had been done at those temps.
The graph showed only a few points, with no apparent pattern. The Rogers Commission produced a graph, constructed the same way as the original but with the non-detect values added, indicating a pattern of increasing failures with decreasing temperatures. A graph showing the proportion of failures by temperature range makes it strikingly obvious that low temperatures presented a greatly increased risk of failure.
(This is condensed from the introduction to Dennis Helsel's extremely useful book, "Nondetects and Data Analysis," Wiley Interscience 2005. It's the most powerful introduction to a book I've ever read.)