The Philosopher's Stone: STATISTICS PUZZLE

Monday, September 3, 2012

STATISTICS PUZZLE

Nate Silver, as all political junkies know, is the baseball stats guru who has morphed into a politics stats guru, running the blog 538.com, which now appears in the NY TIMES. [538 is the number of votes in the Electoral College, for overseas readers of this blog.] Silver achieved a brilliant forecasting record in 2008, successfully predicting the outcome in 49 of the 50 states and all 35 Senate races. Silver currently gives Obama a 75% chance of victory. Now, if my rudimentary grasp of statistics is correct, that means that roughly, over a long run, someone with Obama's current chances against his opponent should win three-quarters if the time, and someone with Romney's current chances against his opponent should win roughly one-quarter of the time. Right?

But in 2008, if my memory is correct, Silver was giving Obama and McCain winning chances in the various states in the range of 70-90%, not in the range 96-99%. Which means that either his statistical estimates were way off or 2008 was a really, really anomolous roll of the dice.

It is like adventures in white-water rafting or sky diving, which are billed as very dangerous, but in which virtually no one in fact ever dies.

It seems to me that if Silver is so often correct when he predicts presidential or senatorial races, then he should be assigning probabilities close to 1, not in the 75% range.

Am I missing something?

11 comments:

AnonymouseSeptember 3, 2012 at 11:29 AM
I suspect that much of Romney's current chance of winning is simply the system recognizing that crazy stuff could happen between now and November: the economy could crash, Obama could say something really stupid, etc. The basic idea is that a lead in the polls three months out is less determinative of the final result than the same lead a week before the election. The projection system likely recognizes this fact, and adjusts the percentages accordingly.

If this is correct, then the percentages for Obama should have gradually risen in the fall of 2008. It's not easy to track down Silver's 2008 reports, but here's an article showing that, by Oct. 16 of 2008, he was giving Obama a 95% chance of winning:
http://rawstory.com/news/2008/Survey_6_out_of_9_early_1016.html
ReplyDelete
Replies
Robert Paul WolffSeptember 3, 2012 at 12:05 PM
That is really interesting. Let's allow some time to pass, and, assuming nothing catastrophic, see whether the odds on Obama rise steadily as days go by.
ReplyDelete
Replies
Jacob T. LevySeptember 3, 2012 at 12:14 PM
Yes, Silver's model is explicitly time-to-election dependent.
ReplyDelete
Replies
Kris RhodesSeptember 3, 2012 at 12:16 PM
I guess the probabilities tell the frequencies output by his model, but don't necessarily have to be interpreted as meaning "these would be the frequencies if you ran the election a large number of times." Instead, they could be interpreting as meaning "Here's how confident I am that person X will win this election."

In other words, the assumption is that the frequency of wins within the model corresponds, not to frequencies of wins in the real world, but rather, to levels of confidence about predictions concerning the real world.

On that reading, the more Silver turns out right about individual elections, the greater the reported percentages should be in his future predictions--because the past record of success should rationally make him more and more confident about his results.
ReplyDelete
Replies
OrenSeptember 3, 2012 at 1:20 PM
Is your "mental model" here that the state probabilities are of independent events? So that if there were 16 states with Obama probabilities at 75%, you'd expect 4 of them to have gone for McCain?

That seems like the wrong model, because it ignores - among other things - the correlation of outcomes between states. The Fair model that seems to be the granddaddy of these things had - I think - just 3 or 4 factors. In the simplest case you could imagine just one factor driving the deviation of all states from their "default" red/blue tendency. Then the state-by-state vote outcomes would be perfectly correlated - with no remaining independent degrees of freedom. And an Obama win nationally would go with wins in all the states at > 50% probability in the forecasts.
ReplyDelete
Replies
UnknownSeptember 3, 2012 at 2:54 PM
This is a slightly tricky business. If the outcome Silver says should happen 75% of the time occurs 99% of the time (in the long run, etc.), then yes, he really should be predicting a probability of 99%. (Otherwise his probability forecasts are not, as the jargon has it, "calibrated".) But I'd be extremely surprised if his models didn't include a lot of correlation across states (as in, maybe a 75% chance of winning PA., and a 75% chance of winning New Jersey, but a 90% chance of winning either both or neither - example made up). So his 49/50 states isn't really 50 separate trials. This might actually make a good homework problem for my students...
ReplyDelete
Replies
Robert Paul WolffSeptember 3, 2012 at 4:33 PM
Sigh. This is what happens when an idiot asks experts. I just want to know how much sleep I should lose at night worrying about whether Obama is going to win.
ReplyDelete
Replies
MarinusSeptember 3, 2012 at 7:20 PM
It turns out judging probabilities of one-time events is a immensely deep philosophical issue for which there are no satisfactory, or even commonly accepted models. I took a lot of joy and knowledge out of reading D.H. Mellor's 'Probability: A Philosophical Introduction', which makes this point powerfully.

One of the main issues with any attempt to give a model for the probability of one-off events is just the one you identify, that it's hard to explain how the probabilities can be anything except 1 or 0, since the events either happen or they don't. It gets worse: trying the make use of counterfactuals to flesh it out doesn't really help, because counterfactual version of events are different events, and the same problem (that it looks like they should have probabilities either of 1 or 0) infects them as well.

Fascinating stuff!
ReplyDelete
Replies
Robert Paul WolffSeptember 4, 2012 at 2:36 AM
I think there is only one rational solution: I need to go into a deep sleep and awaken at 8 p.m. on November 6th, just in time to watch the first results coming in [this is after early voting, of course.]
ReplyDelete
Replies
Dan HicksSeptember 4, 2012 at 7:48 AM
I suspect that none of the respondents above have read this page: http://fivethirtyeight.blogs.nytimes.com/methodology/

Strictly speaking, the probabilities are frequencies of wins in a simulation, not levels of confidence or a probability of a single event. The 538 "model" is a computer simulation; given polling data in every state, plus some economic data, it calculates electoral votes and so a winner. (The fancy statistical analysis comes in here.) On every run of the model, the computer automatically adds a little random noise to the (known) input variables. (How much noise to add is, again, decided using fancy statistical analysis.) Doing this a few hundred thousand times is called a Monte Carlo simulation. Currently, Obama wins about 75% of the runs of the simulation. If you scroll down the 538 page far enough, you'll see a chart labeled "Electoral Vote Distribution." This shows how many times each candidate got a given number of electoral votes in runs of the simulation.
ReplyDelete
Replies
P. J. GrathSeptember 4, 2012 at 9:29 AM
I hope you will not sleep through beautiful September and October! Only at night, please. Set aside a limited time period per day for the news, and eschew radio, television, and newspapers the rest of the 24 hours. The stories repeat so many times that you won't be in danger of missing anything.

Probabilities! No! (She ran away screaming....)
ReplyDelete
Replies

Add comment