Monday, January 24, 2011

"Lies, Damn Lies, and Statistics" (Mark Twain, et. al.)


I haven’t admitted this to many people, but I was somewhat of a math geek in college.  While others were taking courses like “Underwater Fire Prevention” and “Remedial Basketweaving” as electives (please, no comments about that, hehehehe), I was taking 4000-level math courses.  So I like to think I know what I’m talking about here.

In my last blog, I mentioned the importance of “looking beyond the numbers”.  While statistics was not a strength of mine, I do like to try to analyze them.  For instance, many years ago I was watching an Atlanta Braves baseball game (and baseball is just chock full of statistics).  One of the opponents had made it to first base, which led the announcer to say “The Braves lead the league in double plays”.  That sounds impressive, doesn’t it?  But what does it really mean?  Think about it.  To achieve a double play, that means that there has to be at least one other runner on base.  Is that a good thing in itself?  What about a team that is last in the league in turning double plays?  Does that mean that they are a poor defensive team?  What if they have a strong pitching staff and none of the opponents get on base?  Then that team has no chance of making a double play, right?  My point is that there is much more to statistics and numbers than meets the eye.

How does this relate to schools?  Not long ago a local newspaper reported that one school system here had a higher average SAT score than another, making it sound like that particular school system was superior.  But is it true?

Here is an over-simplified example, but it makes the point. Let’s say that a report indicates that the average SAT score of School A is 1550, and the average score of School B is 1515.  It certainly sounds like School A is the better school, right?  This is what the media likes to report - the “surface numbers”.  Now, let’s look a little deeper.  Let’s assume that both schools have 10 people who are ELIGIBLE to take the SAT (but not all of them take it).  Here are the results: 

School A (name and score):
Abe – 1600
Betty – 1550
Charlie – 1500
Dave – Did not take the SAT
Evelyn – Did not take the SAT
Frank – Did not take the SAT
George – Did not take the SAT
Harry – Did not take the SAT
Isabel – Did not take the SAT
Jack – Did not take the SAT

School B (name and score) 
Ashley – 1600
Brad – 1550
Cathy – 1500
Devin – 1475
Eric – 1450
Francis – Did not take the SAT
Greg – Did not take the SAT
Horace – Did not take the SAT
Isaac – Did not take the SAT
Jim – Did not take the SAT

Using this example, School A has an average SAT of 1550, and School B has an average of 1515.  Note that the averages that the media reports are based on those that actually take the SAT, and does not include those that don’t.  Do we STILL believe that School A is better than School B?  If only the top three of School B took the test, the schools would be even! 

Statistically speaking, a larger sample size will give a more accurate indication.  If all 10 people took the test from either school, then we would know the true average for the school, right?  In the aforementioned example, one school had 50% participation, and the other had 30%.  The school with 30% has a LOT more room for error than the school with 50%!  (With this over-simplified example there might be enough information to be “statistically accurate”, but we’re going to ignore that for now). 

Again, which is the best school as far as SAT scores?  You’re GUESS is as good as mine, but you cannot say with certainty that one school is better than the other.  Yet this is what the media bases their reports on – the raw numbers.  So I beg you, please look beyond the numbers when statistics are involved.  Not everything is at it seems.