The comparison with hitting performance seems off somehow. First of all, if you were to ask me what the chances were that a batter would strike out, my first reaction would be to reply "Who's pitching?" I am reminded of tourist attractions like white water rafting which are advertised as very exciting and risky but on which no one has ever been lost. How can it be risky if no one has ever died?

I am really ignorant of statistical theory so I am looking for guidance here.

## 9 comments:

I'm probably way more statistically ignorant than you, Bob, and I share your bafflement at the hitting and free throw analogies. A clearer analogy for me is a horse race. The odds of the favorite winning are 4 to 1. One would think at first blush that the favorite should win every time (s/he's the better horse, after all), but we know that's not what happens. Due to a constellation of factors (track conditions, how the favorite and all the other horses are feeling that day, a bunch of other stuff we don't totally understand and are unable to predict), a horse other than the favorite sometimes wins. So, if I understand Cohn, he's saying, "In the past, when the pattern of polls looked like this, the favorite won 4 times to every time the opponent won. Given what we know now, Hillary is very likely to win, but there's a lot of stuff we don't know---October surprises, fickleness of the electorate, who turns out to vote, unknown factors, etc., etc. So this could be the unlikely year that the opponent wins. I, Nate Cohn, would bet you a thousand dollars at 4 to 1 odds, but I wouldn't be willing to say, 'I'm so certain that Hillary will win that I'll just give you a thousand dollars if she doesn't.'" (Oh-oh, I'm losing my nerve because I'm afraid your question may have been at a more sophisticated level than my response, but what the hell.)

Perhaps there are different concepts/conceptions of probability inherent in these probability judgments? The subjectivist conception of probability seems better suited to an election forecast where the result is "one off" whereas a frequentist conception seems better suited to a baseball player's chances of striking out.

http://plato.stanford.edu/entries/probability-interpret/#ClaPro

I'd also recommend reading Nate Silver's explanation of his election model/forecast:

http://fivethirtyeight.com/features/a-users-guide-to-fivethirtyeights-2016-general-election-forecast/

I am totally ignorant of probability theory, but doesn't the quantity of variables affect the way we should focus on the situation?

In the pitcher-hitter situation there are a limited number of variables, the ability of the hitter, of the pitcher, maybe the wind, maybe the sun, but someone who knows more about baseball than I do can list them easily.

In the coming election the number of variables which could affect the outcome is staggering: a sudden fall in the stock market due to a devaluation in China (or somewhere else) producing a worldwide economic crash, Trump really blowing it, Hillary really blowing it, a massive terrorist attack, Putin attacking the Ukraine (which he is "probably" not going to do, but here even more variables appear). Each variable depends on an incredible number of other variables (how many variables do we need to calculate the performance of the U.S. or the Chinese economy until election day?).

In the pitcher-hitter lineup, as far as Sabermetrics is concerned, either uses the Log5 Method or the OBP method.

But Cohn appears to be computing the ratio of strikeouts (SO) to at-bats (AB) for 2016, 26061/111249=0.23425828546. That would give you the league average, a good "prior" probability.

Probabilities in frequentist statistics are not always a useful way to think about the world. If the election were held over and over and over under the same conditions HRC would win 80% of the time. But what does this mean? The replication simply can't be done in the real world, conditions always change and each election is different. I think the bigger issue is the tendency for people to try to reduce uncertainty to risk. It makes sense to think of the probability of drawing a certain card from a shuffled deck, or rolling a six from the throw of a fair die, but as Keynes famously said "sometimes we simply do not know."

The experiment that is repeated is the polling experiment, as opposed to voting in the polling booth. It is this experiment for which the sample means are estimated. It is another matter to establish the relation between polling results and what happens on election day.

Now that, I. M. Flaud, is a really interesting point which I had not considered. You are saying, as I understand you, that what Sam Wang or Nate Silver or Nate Cohen is saying is that eighty percent of the time, or whatever, such a poll, conducted in such a way, will show Clinton winning. Is that right?

Post a Comment