Statistics for fun - The Motive Center
Statistics for fun|
I'm with you on statistics. I avoided the subject throughout my academic career and am now having to catch up; it would have been easier when I was 20-something. Now I'm seeing how important is a basic understanding of probability. It's not intuitive and most people never learn to think in those terms, so they make the kinds of mistakes you allude to.
This confuses me:
Then, when they had to get it down to a mere five per category, none of them had enough votes to get on the list. I'd bet some did. Some did what?
I didn't "get" statistics until my third course in it, while getting my masters. It just suddenly clicked for me, probably because it is typically taught by statisticians and mathematicians who get it intuitively, and it took that long for my brain to process it. Nowadays, it's second nature to question certain numbers. I strongly recommend Darrell Huff's How to Lie with Statistics as a great introduction to how to think with and about statistics -- instead of just doing them, which is what most courses teach you.
And now I don't even know what I meant by that phrase anymore. I think maybe I should have said "get close" after that. I'll fix.
I'll look for the How to Lie book.
I was forced to learn the subject because regulations require statistical analysis of water quality data. Unfortunately the regs are badly written and require things that aren't well justified by the nature of the data... in any case I had to come up to speed in order to understand what the regs meant and what was actually justifiable. Now I'm in danger of becoming a statistics geek. (Did you know the Challenger disaster was in large part caused by failure to properly handle censored data?)
No! What's the story on that? Censored data? Does that mean they didn't have the data they needed because it was censored?
"Censored" in statistics means data that are known only to be above or below some threshold value. In my field, the threshold values are laboratory limits of detection, which "censor" the numbers below the limits. Often, a large proportion of the data are below the detection limit. These limits are based on technical processes, not health risks - so the censored data might be important, yet conventional statistics cannot make use of them.
In the Challenger case, engineers were concerned about the possible failure of O-rings at low temperatures. They had done a number of tests at different temperature ranges, and the night before the launch, they sent a graph to managers in an attempt to convince them to scrub. The graph showed the number of damage incidents as a function of temperature, but it had a major flaw: Tests where no damage had occured were not plotted. (This is very common, in fact it is the usual way people treat censored data - they exclude all the data below the threshold, because it isn't amenable to conventional analysis.) No data were available for the very low temperatures expected at launch time, because no testing had been done at those temps.
The graph showed only a few points, with no apparent pattern. The Rogers Commission produced a graph, constructed the same way as the original but with the non-detect values added, indicating a pattern of increasing failures with decreasing temperatures. A graph showing the proportion of failures by temperature range makes it strikingly obvious that low temperatures presented a greatly increased risk of failure.
(This is condensed from the introduction to Dennis Helsel's extremely useful book, "Nondetects and Data Analysis," Wiley Interscience 2005. It's the most powerful introduction to a book I've ever read.)