Electoral College Meta-Analysis (election.princeton.edu)

From Prof. Sam Wang of Princeton University.

This page is available online at http://synapse.princeton.edu/~sam/pollcalc.html

Below is a meta-analysis directed at the question of who will win the Electoral College. Meta-analysis provides more objectivity and precision than looking at one or a few polls, and in the case of election prediction gives a more accurate current snapshot. The calculations are based on all available recent state polls, which are used to estimate the probability of a Bush/Kerry win, state by state. These are then used to go through all possible combinations of battleground state results. The effects of undecided voters, turnout, and polling bias are calculated using the bias analysis. Here are the full methods.

This site gets over 30,000 visitors per day (site meter) (November visits). A backup site is here. I do not accept donations. Express your support politically by giving through ActBlue. Republicans may donate through the NRSC. Use this site to help decide where to get out the vote. More than anything else, this election is about voter turnout.

Monday, November 1, 2004 at 12:00PM noon Eastern time

Median outcome, decided voters: Kerry 252 EV, Bush 286 EV (±40 EV MoE) (probability map)

Popular Meta-Margin among decided voters (explanation): Bush leads Kerry by 0.9%

Predicted median with undecideds: Kerry 280 EV, Bush 258 EV (probability map)

Electoral prediction with undecideds and turnout: Kerry 323 EV, Bush 215 EV (probability map)

Popular vote prediction with undecideds and turnout: Kerry 50%, Bush 48%, Nader/other 2%

Commentary (Recent comments)

Monday, November 1, 8:45PM: For all of you writing me about the cell phone idea, I do not think this objection has merit. I have read the recent Zogby poll. I will write about this after the election.

Monday, November 1, 7:15PM: Upon reflection the last bit involves double-counting of turnout. Perhaps this factor should be less than 2.5%. This will be reflected when I incorporate the last polls.

Monday, November 1, 6:30PM: As expected, I've been getting mail criticizing my estimates of the undecided and turnout adjustments. Errors in either direction (too high or too low) reflect a failure in my estimates regarding registration, registered vs. likely voter estimates, overseas voting patterns, and so on. Why don't we postpone discussion of that until tomorrow, when we know the difference between my projection and the true outcome. We can then use that difference as an approximate measure of my partisanship. Anyway, I have given you enough information to let you make your own predictions.

But...the least I owe you is a back-of-the-envelope calculation. Lower and upper bounds can be set using Gallup's last poll. This shows Kerry +2 among RVs and Bush +2 among LVs. The difference betweem these sets an upper bound of +4%. To set a lower bound: where data are available, new registrations give Democrats an approximately 0.5% edge in swing states; these new registrants are first-time voters who may fail LV screens. Also, the election will be high-turnout, suggesting that at least some of the 4% gap in Gallup will be filled. For instance, a turnout increase from 50% to 60% would lead to +0.8%. Together, these two factors sum to +1.3%, a lower bound.

Finally: new registrants and overseas voters skew anti-Bush by a large margin, and 527s are pouring massive amounts of money into GOTV activities. I simply do not know what these will amount to. Nor do I know to what extent the 4 million evangelicals or Rove's 72-hour operation we hear about are going to materialize for Bush. In the end I just set the difference somewhere between the lower and upper bounds, to get 2.5%.

Monday, November 1, 4:45PM: Gambling is a vice. But putting that aside, looking at TradeSports I think that a good cautious bet is to take equal positions against Bush to win the election (i.e. SELL) and against Bush to win 250 or more EV (again, SELL). One way to do this quickly is to click on 'Live Help' at their site and telling the support representative you want to do a manual credit card charge. Other sites to try are here, here, and here.

Monday, November 1, 12:00PM noon: Here are my final calculations and predictions (though I will still recalculate at the end of the day). Extended discussion and supporting links can be found in the November 1, 8:00AM update. I have a brochure for your use to follow the returns on Tuesday night and test my assumptions (and yours). I will release the brochure by 5:00PM Eastern time today. Delayed again. sorry, sorry - this thing is taking forever...

The map below shows Kerry and Bush's win probabilities, individually by state. The closer a state is to a tossup, the closer it will be to white. The map uses undecided and turnout assumptions; maps that do not make these assumptions can be seen by clicking in the box above. Note that unlike the map, the median projection takes compound events into account. Thus the map EV total is often not the same as the median EV total. An explanation is here. Click on the map for an interactive pop-up (thanks to Drew Thaler). For difficult browsers here is a static map.

Overview: The basic calculation derives from polls only. Using statistical methods of meta-analysis, I use polls to calculate a starting point, referred to as "decided voters only." This result is an uncorrected snapshot of where the polls stand. In addition to this, I estimate the effects of last-minute undecided/uncommitted voter decisions and differential turnout. My estimates are supported by evidence, but are by no means certain. Results based on uncertain assumptions are clearly labeled. To let you try your own assumptions, a table of medians with different bias values is given here.

The basic decided voter result. The median of Kerry 252 EV, Bush 288 EV among decided voters was calculated from 152 polls taken in 23 battleground states, and stepping probabilistically through all possible outcomes. Most of these polls were completed between October 24 and 31. The EV estimate carries a large amount of uncertainty: the 95% confidence interval is ±40 EV. Thus, if only decided voters counted, the nominal Kerry win probability would be 18%, or 4-1 in favor of Bush.

Decided voters only (% Kerry win probability): AR 2, AZ 1, CO 2, FL 24, HI 67, IA 38, ME 100, MI 90, MN 72, MO 1, NC 0, NJ 100, NV 8, NH 96, NM 4, OH 38, OR 99, PA 85, TN 1, VA 1, WA 100, WV 0, WI 25.

Rank order of states: States currently in play in the 20-80% probability range, indicating a near-tie, are in bold. Turnout and how the undecideds break will shift which states are at a near-tie, but the order, from most Democratic to most Republican, should stay about the same.

Decided voters only: Democratic <- ME/WA/NJ/OR/NH(95-100%) / MI / PA / MN / HI / IA / OH / WI / FL / NV / NM/AR/CO/TN/MO/AZ/WV/VA/NC(0-5%) -> Republican

With undecided voters assigned. Undecided voters typically end up voting against the incumbent. In previous presidential races this has given a 2.5 ± 2.0% advantage to the challenger. I currently estimate that 3.0% of voters are undecided. A 3-1 Kerry-Bush split gives a +1.5% net advantage to Kerry. This leads to a median EV estimate: counting undecided voters, Kerry 280 EV, Bush 258 EV, and a nominal Kerry win probability of 71%, or 2.4-1 in Kerry's favor.

Turnout estimates and other corrections. The principal factor not measured by polls is turnout. Pollsters ask respondents questions to determine if they are likely to vote. However, this cannot capture efforts by voter turnout organizations. In addition, newly registered voters have no track record. Finally, telephone polls may not accurately sample the voting population. I estimate that these factors sum to an advantage in battleground states of 2 to 3% for Kerry. As I am sure you are all aware, this number cannot be known with certainty. With that caveat in mind, I use +2.5% as my turnout figure. Combined with the +1.5% undecided allocation this makes a +4.0% bias as plugged into the MATLAB script. This leads to my final prediction. Predicted electoral outcome (11/1/2004 noon EST): Kerry 323 EV, Bush 215 EV, nominal Kerry win probability near 100%.

Note that all of these probabilites are conditional on the turnout and undecided voter assumptions being correct. The chance that I am wrong makes the true probability substantially lower than this! As Niels Bohr (and Yogi Berra) said, "Prediction is hard, especially of the future." Just for the record, my gut estimate of the likelihood of a Kerry win is about 6-1 in favor.

Back to predictions: Based on the probabilites below, of the 23 states modeled, Kerry's expectation value of states is approximately 16 of them, for a total of 25 states plus the District of Columbia.

Prediction, undecideds assigned, plus turnout (% Kerry win probability): AR 48, AZ 10, CO 47, FL 90, HI 99, IA 96, ME 100, MI 100, MN 100, MO 34, NC 1, NJ 100, NV 72, NH 100, NM 60, OH 95, OR 100, PA 100, TN 8, VA 12, WA 100, WV 13, WI 91.

The popular vote. To estimate the popular vote I use two approaches: (a) one based on presidential preference polls and (b) one based on Bush's job approval numbers. In 16 national polls the medians (± SEM) are Bush 48.0 ± 0.4%, Kerry 47.0 ± 0.4%. Assuming 2.0% for Nader/other, the fraction of undeclared voters ("undecideds") is 3.0%. Assuming Cook's incumbent rule that undecideds split 3:1 for the challenger (2.25% and 0.75%), this gives a net of 1.5 ± 1.2% to Kerry. This predicts a national popular vote (not corrected for turnout) of Kerry 49.3 ± 0.9%, Bush 48.7 ± 0.9%, Nader/other 2%. The second measure uses job approval ratings. In ten polls taken since mid-October the median ± SEM is 49.0 ± 0.9%. Based on historical trends, this places an upper bound on Bush's share of the popular vote. Thus, both approaches indicate that Bush's popular vote share will be 49% or less.

I use the turnout factor to make a final estimate. National turnout should be less enhanced than battleground state turnout, so I assume that the margin will be increased by about 2.0%. Predicted popular outcome: Kerry 50.3%, Bush 47.7%, Nader/other 2%. National polls come from davidwissing.com, RealClearPolitics, and yougov.com. Job approval numbers come from pollingreport.com.

An electoral tie. A 269-269 EV tie would throw the election into the House and Senate, which would most likely lead to the re-election of Bush and Cheney. However, this would be an emotionally divisive event. The probability of an electoral tie is: Decided voters only, 3.6% (26-to-1 against). With undecideds, 2.9% (34-to-1 against). Final prediction with turnout included, 0.02% (5600-to-1 against).

The power of your vote (the jerseyvotes calculation). Previously I have discussed where you are most effective in your door-to-door activism. My unit is the jerseyvote, which is the power of a New Jersey voter to influence the national election. Among decided voters only, the current value of a single vote in the top states is (measured in jerseyvotes): Hawaii 11,900, Iowa 10,300, Nevada 10,000, Florida 9,200, Wisconsin 8,200, Ohio 6,500, New Mexico 6,400. Counting undecideds, the top states are the same, but in different order. Other values of relevance are (decided voters) New Hampshire 1,400 and Pennsylvania 2,800. As you can see, a jerseyvote's value to American politics is what the Reichsmark's was to the Weimar German economy.

Key states. These statistics summarize polls completed between October 25 and 31.

In Florida (14 polls), Bush leads in 8 polls, Kerry leads in 5 polls, and one poll is tied. Bush's average (± SEM) margin is 1.4 ± 0.9% in polls. I predict a Kerry victory by 2%. Polls close at 6:00-8:00PM Eastern time.

In Ohio (14 polls), Bush leads in 9 polls, Kerry leads in 4 polls, and one poll is tied. Bush's average (± SEM) margin is 0.6 ± 0.8% in polls. I predict a Kerry victory by 3%. Polls close at 7:30PM Eastern time.

In Pennsylvania (14 polls), Kerry leads by 2.1 ± 0.8% in polls. I predict a Kerry victory by 6%. Polls close at 8:00PM Eastern time. General poll closing times are diagrammed here.

Bias analysis: The potential effects of differential turnout, splitting undecided voters, or systematic polling bias are as follows. The baseline from which bias is defined is decided voters only. Decisions by undecided voters and get-out-the-vote activities on Election Day will be major determinants of how large this bias effect is.

4 points towards Kerry: Kerry 323 EV, Bush 215 EV, Kerry win 99.9%.
3 points towards Kerry: Kerry 311 EV, Bush 227 EV, Kerry win 98%.
2 points towards Kerry: Kerry 291 EV, Bush 247 EV, Kerry win 85%.
1 points towards Kerry: Kerry 277 EV, Bush 266 EV, Kerry win 52%.
no swing (decideds only, flat turnout): Kerry 252 EV, Bush 286 EV, Kerry win 18%.
1 points towards Bush: Kerry 235 EV, Bush 303 EV, Bush win 97%.
2 points towards Bush: Kerry 217 EV, Bush 321 EV, Bush win 99.8%.
3 points towards Bush: Kerry 203 EV, Bush 335 EV, Bush win 99.99%.
4 points towards Bush: Kerry 188 EV, Bush 350 EV, Bush win 100%.

Monday, November 1, 9:00AM: Today I was interviewed live by Wall Street Journal This Morning. The program was carried on over 80 stations across the U.S. and on the Sirius and XM satellite networks. Listen to it here (MP3, 1.3 MB).

Saturday, October 30, 5:30PM: That was a fascinating experience. So much was left out but I did get to say my piece. After we cut, the FOX guy asked me what I really thought would happen. I said I thought Kerry would get over 300 electoral votes. Of course this happened afterwards. Typical television - very compressed. Real analysis soon - sorry about lack of substance on this post.

Thursday, October 28, 8:15PM: More letters.

Thursday, October 28, 3:00PM: Edward Witten writes, "On mydd.com, I read yesterday a rumor that a NYT poll of Florida showing Kerry ahead by +9 percent was buried as being implausible. I don't know if the rumor is true, and if it is I am sure the poll was flawed, as Kerry is surely not leading Florida by that amount. But to me it illustrates the fragility of trying to predict the election from the available state polls. Including or excluding a single, undoubtedly flawed, poll showing a +9 percent lead in Florida for Kerry (or Bush) would probably have a significant impact on your overall assessment of the outcome of the election." To some extent he has a point. If such a poll exists then the decided-voters moves K up to K 271 B 267, win probability 51%. The use of median rather than mean circumvents this problem a bit, which I will do soon. In any event, the confidence interval is +/-36 EV. Therefore neither result is statistically significant. Any way you slice it, the election is a toss-up and will depend on turnout and undecideds.

Thursday, October 28, 9:45AM: Since posting a few letters here I have received many more. Here are some selected from yesterday's mail, October 27. Of special interest is one from Jim G. from New Hampshire, an undecided voter. He is articulate and thoughtful, and though we don't agree on a few things I strongly recommend his letter to all of you.

Evidently I spoke too soon regarding Bush giving up on Ohio. His travel plans include three stops there until Election Day. He is also pushing in Michigan, where Kerry has recently slipped a bit.

Wednesday, October 27, 5:30PM: A brief note on the vagaries of opinion polling. When we read polls we often make the implicit assumption that people report what is really going on inside their heads. However, this is a subjective report. The famous example in these closing days is the "undecided voter." But are these people undecided in the sense that we mean colloquially? Are they one monolithic category of person?

It's been pointed out that many undecided voters are unfavorable about the incumbent, and usually break for the challenger. This phenomenon may simply reflect the fact that some people are unable or unwilling to state a set preference. To cite a homely example, you may find yourself unable to articulate what you want for dinner, but you can react immediately to what you don't want.

Recently Scott Rasmussen reported data that he says supports the notion that late-deciding voters prefer Bush. The survey was done from 136 late-deciding voters, far too few to reach statistical significance. This is a message poll aimed at driving the discussion in his preferred direction. Also, the survey assumes that the voters who decided during the survey period are similar in characteristics to those who wait until the last minute, possibly until they are standing in the voting booth. This is untested.

A parting thought on undecided voters: we are not going to resolve this by further argument! The best we can do is come up with a way to measure what they do, and wait until after the election. I will try to provide this as part of my final Election Night briefing document.

Other examples of respondent inaccuracy are the party-ID question, which can depend on when in the survey it is asked (especially if asked after the presidential preference!) and the question of who people voted for in the last election (on average, people show a tendency to think that they voted for the winner even if they did not).

Finally, once again: the probability map is not the same as the median calculation. This is why they do not match. If you were thinking about writing me to point this out, read this first.

Wednesday, October 27, 12:30PM: My regular email address works once again. Send your correspondence there.

Charles Cullen writes asking today's probability of a 269-269 electoral tie. Using decided voters only it's 3.9% - a lot! With the undecideds assumption it's 0.4%. In this scenario the newly elected House and Senate would determine the president and vice-president, leading to Bush-Cheney (if the Senate remains Republican) or Bush-Edwards (if the Senate goes Democratic).

Wednesday, October 27, 7:00AM: One of the pleasures of running a popular Web site is the correspondence. Click here for some selected letters from the last few days of various types - illuminating, entertaining, and unintentionally hilarious.

Wednesday, October 27, 5:30AM: Hawaii has been added because of two recent polls showing possible leads for Bush. This seems very unlikely. In any event, what is really needed is a third poll.

With that, let's think about a favorite subject of mine: why individual polls seem surprising or contradictory. I can think of three reasons:

1. Reporters often don't understand statistics. A poll showing Bush up by 5% and another showing Kerry up by 1% are in fact consistent with one another because of random sampling error. For more on this read yesterday's entry by Mystery Pollster (Mark Blumenthal). A better way to get a good answer is to examine many polls at once. For the record, Charles Forelle at the WSJ is a very notable exception - in his article about this Web site and others, he captured the subject perfectly!

2. Man bites dog. When a poll's finding sounds interesting, it gets more attention than a boring result. Therefore reports of outliers tend to grab headlines, creating apparent discrepancies.

3. Competition among organizations. News organizations usually rely on their own data alone. If they do this, they cannot achieve the increased accuracy that comes from comparing multiple polls. Indeed, little incentive exists to improve accuracy, since low accuracy leads to more frequent news stories, and therefore more readers or viewers.

Tuesday, October 26, 11:00AM: Statistically based analysis of the Electoral College is featured in today's Wall Street Journal. Welcome to new readers!

The overall raw polling outcome (decided voters only) is still a statistical tie. This is true even with more than 100 polls used in today's calculation. Bush has tiny leads among decided voters in Florida and Ohio, indicating that the outcome in these key states will be determined by undecided voters and turnout.

Finally, a note on national polls. In 8 national polls (two-way choice) the median result is Bush 49, Kerry 47. Assuming that methods are similar, the fact that this margin is larger than the Meta-Margin above supports the idea the distribution of support for the candidates gives Kerry a small advantage.

Saturday, October 23,10:00AM: I am working on a reference sheet to give you late next week. In addition to bottom-line predictions, this reference will give you a list of things to watch for on Election Night, along with key combinations that Kerry and Bush need. The content will change a bit in the coming week as the last polls come in. However, some outlines are now coming into view.

Under today's polling conditions, four states are clearly in serious contention: Florida, Iowa, Ohio, and Wisconsin To a lesser extent so are NV, WV, and some others. Depending on undecideds/turnout/bias, states come into or go out of play, but in those situations Kerry or Bush typically win the Electoral College by a more comfortable margin. So let's concentrate on this near-tie condition. After assigning other states as indicated by polls (PA and MI to Kerry, MO to Bush, and so on) and playing with combinations, several patterns emerge.

First - if Kerry wins Florida, the election is over - he wins. Kerry can also win by taking Ohio plus one of the smaller states. In the other direction, Bush must win not only Florida, but also either Ohio or all of the smaller states. In light of these facts, the Saletan piece (see below) indicates that the Bush campaign's actions may amount to a defensive move - otherwise why give up on Ohio?

It's also possible to identify states that look moderately solid, but might flip if the combination of undecided/turnout/bias factors adds up. This is interesting because this shift is likely to be similar across states. Therefore these states can act as an early-warning system for a surprising election night. For instance, Arkansas, North Carolina and Virginia currently look like Bush states, and Maine looks like a Kerry state. If a surprise occurs in any of these states, this might presage a significant offset between decided-voter polls and the real outcome.

Wednesday, October 20, 4:00PM: I've received lots of feedback on undecided voter assignment, much of it constructive. This has led me to rearrange the way the results are presented.

First: the calculation is now set to its old definition from two days ago. Many of you are very familiar with the raw (decided voters only) calculation by now. Switching back was suggested by many readers of various political persuasions. Whether people liked the direction of the outcome or not, many were uncomfortable with the mixing of current numbers and previous election outcomes. Also, this site has many calculations that are based on decided voters only, and it only adds confusion to redo those.

Second: the assignment of undecided voters is now done probabilistically, like the rest of the calculation. Past elections from 1956 to 1996 show a wide range of undecided breaks for the challenger: [+3 +6 +2 +1 +6 +0 +2 +4], median 2.5%, estimated SD 2.0% (analysis). This year may be unusual, though note in 1972 a break of 2% away from Nixon, at the height of the Vietnam conflict and after the invasion of Cambodia. Anyway, because the contribution is variable, the undecideds-assigned calculation (MATLAB script) takes this variation into account. The results are listed in the box above. This is my own current prediction. Also, the state probabilities are now given both with decideds only and with undecideds added. Thanks to Alan Cobo-Lewis and Rachel Findley for key discussions.

Third: There are now two maps (see box). The static image below is set with undecideds assigned.

Now, to the interesting bits. Look at the state probabilities. Because the undecideds could break evenly or for the challenger, many states are still toss-ups, including Florida and Ohio. The lingering uncertainty reinforces the idea that the election is close enough to be determined by turnout. Even if the undecideds break evenly, a 2% difference in turnout could change the result drastically, which you can see in one direction by comparing the maps. Have I mentioned before that I think turnout is important? Turnout is very, very important.

Whew, that was tiring. I think I need a wee dram!

Tuesday, October 19, noon: Today I implement the first major change to this calculation - I am allocating undecided voters. To do this I use past presidential election voting patterns, specifically the incumbent rule as described by Charlie Cook of the National Journal. This gives a more accurate snapshot and is a step toward making an actual prediction.

Rationale: It is known to poll analysts that voters who are undecided usually end up voting against the incumbent. In particular, compared with their final poll numbers, incumbents get between 2% less and 1% more. In contrast, challengers do better on average by 3%. These figures are consistent with Cook's estimate that undecideds split at least 75% for the challenger. In today's summary of national polls, the average Bush-Kerry split is 48.5-45.5, which sums to 94%. Assuming 2% for Nader and other candidates, the remaining undecideds are 4%. Splitting these by Cook's rule gives 1% to Bush and 3% to Kerry, reducing the margin by 2%.

Therefore, for the main calculation I will assume that the undecided-voter shift is +2.0% towards Kerry, shift state polls by this amount (using the variable already provided in the script), and proceed with the calculation. Based on state polls in Florida, Ohio and Pennsylvania, I estimate that the proportion of undecided voters in these states is similar to the national figures. Because national polls come more frequently, I will use them to calculate the shift. The size of this shift may change in the final days, and I will be monitoring this.

This new estimate is likely to be more accurate. However, it is also the first change to the calculation that is not neutral: it goes beyond the polling numbers themselves, and it is in a direction that is favorable to my candidate. For example, Florida, Iowa, Ohio and Wisconsin are still toss-ups, but they are now above the 50% probability threshold for Kerry. Therefore I will continue to report the results without this adjustment. This is listed in the box above on the line labeled "Decided voters only." The corresponding Meta-Margin can be calculated by subtracting 2.0% from the value listed.

To read more about the incumbent rule, see Charlie Cook, Guy Molyneux, the Los Angeles Times, Mark Shields, the Mystery Pollster, and a contrarian.

I have also simplified the box by removing the line about the Colorado ballot initiative, which, based on a recent poll and Salazar's opposition, seems likely to fail.

Finally, I will continue listing rankings and probabilities for all states. I have decided that there is little benefit to leaving these out.

Hitting the streets: How much do you affect the election by getting out the vote? Also, where are your efforts most valuable? To help guide your efforts, here is a synthesis of previous posts. Once you decide, I recommend contacting your local Democratic (or Republican) organization or America Coming Together.

This question can be answered by calculating how much the Electoral College win probability is changed by one person's vote. This affects where you should go because as an individual, you can only get out a finite number of votes. Today the best states to go to are Iowa, Ohio, Nevada, and Florida. Nevada, while small, is on the list because it is a near-tossup and relatively few voters per electoral vote. Here is a case study. If you are a New Jersey resident, your vote has some value, but it is low since the state is very likely to go Democratic by a substantial margin. In contrast, driving a voter to the polls in Pennsylvania is worth nearly 300 times as much. If you go to Ohio each vote is worth even more, over 500 "jerseyvotes." The top states are IA (686 jerseyvotes), OH (528), NV (508), FL (372), NM (304), WI (295), PA (295), MO (199), AR (151).

Although the calculation is unbiased, I am not. I am a Democrat. To see a list of races I consider critical, see my ActBlue page. My advice to all voters (including Republicans) is the same: Go to battleground states. Register voters. Make phone calls and knock on doors (a very effective strategy) to canvass for voters. Vote absentee or vote early (online resource), and on Election Day, work to get out the vote.

State-by-State Probabilities

Current percentage probabilities of a Kerry win in each battleground state, computed from the last three polls or going back seven days, whichever gives more polls. States in boldface had a new poll completed and released since the last day of updates. The probabilities are calculated assuming that the SEM cannot go below 2%. Click on a state to view a tabulation of most of the polls. Some of the others come from these data sources (some are subscription-only). Other sources are electoral-vote.com and RealClearPolitics. All data are visible in this MATLAB script.

This is a history of the calculation. For this calculation each poll is assigned to the last date on which polling was done. The marked events are inspired by a similar graph by electoral-vote.com. In my graph, the effect of events is clearer because I use polling margins and because I average over three polls. Fahrenheit 9/11, adding John Edwards to the ticket, and the Democratic convention seemed to have measurable effects within a few days. The passing of Ronald Reagan and the assault on Kerry's war heroism did not. The last update was October 12. Note: Around the time of the first debate I started using more polls per state. This and the start of Rasmussen daily tracking polls has complicated updates. Therefore after September 25 the graph is simply a record of previous daily updates - not quite the same. For instance, in this graph the bounce from the first debate looks delayed. In fact, it was immediate. This graph will be done more properly soon.

History of meta-analysis over time

Selected comments from the author (all comments)

Saturday, October 30, 2004, 11:00AM: Update comes later. Thanks to readers for pointing out the WV mistake (it should be solidly red). The calculation and maps above are updated to correct this error; the bias calculations have to wait until later. I must now go and get duded up for the FOX News thing (tune in, 2:45PM Eastern)...

Friday, October 29, 2004, 11:30PM: After due consideration, I need to stick with using the mean in order to be consistent with previous analyses. Thanks for your patience - that's what you get for a project that is essentially open-source - that is, I shoot first and take your feedback later. In retrospect the best approach might have been to use a filter weighted by time going back more than seven days. This will have to wait until 2008.

Pollster John Zogby was on The Daily Show last night. He said what I have been telling you for many weeks: a break of undecideds towards Kerry is likely to be enough to get him over the top. I did not watch the program, but I think Zogby did not mention turnout. Too bad - I remain convinced that turnout will be the decisive variable in this race.

The polling-related comments section is getting a bit out of hand. If anyone is aware of a chat room that focuses on this site, please let me know. I would love to restrict my comment thread to technical and other serious inquiries.

Friday, October 29, 2004, 10:00AM: Tune in to FOX News on Saturday (tomorrow) around 2:45PM, when I am scheduled to talk about the meta-analysis.

I am traveling today until evening, so the next update may be late in coming. If nothing shows up in the next four hours, check back late tonight. I also hope to have the briefing sheet I promised before, but other events have slowed me down a bit and I might not have a good version until Sunday. In the meantime, the fastest state poll updates can be found at race2004.net and The Hedgehog Report.

Monday, October 25, 2004, 6:30PM

First, I apologize for the relative lack of updates and commentary! I face logistical hurdles, including my professional society's annual meeting (where I am), intermittent Internet access, and university mail system failures.

This site is mentioned in an article on polls in today's Newsday. However, there is one error - my margin of Bush over Kerry counts decided voters only, and does not include undecided voters.

I no longer list the overall probability of a Kerry win with undecideds allocated. This is because the uncertainty of how undecideds will break is accounted for state by state, but the compound probability calculation assumes independence among states, which is unlikely. The true probability is, roughly speaking, approximately equal to the probability that the undecided advantage (which I assume is 2.5 ± 2.0% for Kerry) and the Meta-margin (currently 0.5% for Bush) sum to a positive value for Kerry. Today this probability is around 75%. To take away to stat-speak, restated in English what I mean is that given the history of what undecided voters do, today I give Kerry 3-1 odds over Bush. The median EV count with undecideds assigned is still OK.

Regarding the Hawaii question, it's possible that this state is competitive but right now there are not enough data to say. Stay tuned.

Sunday, October 24, 2004, 8:00PM

A story in today's Washington Post confirms what I am suspecting: the Bush campaign is in trouble, and Bush-Cheney campaign insiders recognize this. It's consistent with the defensive move of pulling out of Ohio for a last stand in Florida.

By the way, I am having email troubles on the university server. To reach me cc your messages to mindgeek at gmail dot com.

Friday, October 22, 2004, 3:30AM

At Slate, Will Saletan points out that based on Bush's travels, his campaign may consider Florida more of a must-win state than Ohio. Looking at today's decided-only numbers, this has merit. If Bush takes Ohio his win probability is 72%, but if he loses then this drops to 40% - not even a twofold difference. However, Florida is different. If Bush wins Florida, his win probability.is 88%; if he loses, it's only 20%. To show why this is, Saletan describes electoral scenarios involving smaller states (WI, IA, NV, NM) that Bush could cobble together to make up for the loss of Ohio.

October 18, 2004

Although I don't analyze national polls, I am asked about them frequently. For instance, how to interpret the latest Gallup poll reporting a Bush-Kerry margin of 8%? My brief reply: if you look carefully at all available polls, the race is closer than this single poll indicates. Consider the following.

Imagine that the race were perfectly tied and the margin of error were 4 points. In this case six measurements of the Bush-Kerry margin could easily be: Kerry +2, Bush +2, tie, Bush +6, Kerry +6, Kerry +1. Add the fact that the CNN/USA Today/Gallup poll is somewhat favorable to Bush compared with other polls, and one can see the problems with interpreting any one poll. If a national horserace summary is required for the sake of curiosity, then looking at an average (here is more data) or a median is better. If one does this, Bush is currently about 2 points up on Kerry among decided voters.

This brings me to the biggest point of all: undecided voters are not counted in point spreads, yet history suggests that most of them vote against the incumbent. This suggests that Bush's true threshold separating victory from defeat is about 49%; he is currently slightly below that. This is the big story among pollwatchers this week. For a discussion see these L.A. Times and CNN articles.

October 15, 2004

Colorado Democratic Senate candidate Ken Salazar has come out clearly against Amendment 36, the electoral vote splitting initiative. It seems likely to fail.

I have been asked to evaluate a tactical-voting idea proposed by Nader supporters, votepair.org. The idea is for swing state voters to agree to change their Nader vote in exchange for a vote change in a non-swing state. Nader is a much smaller factor this year than in 2000, but it is still worth considering how much value you get for your trade. (This relies on the same calculation that I did on Wednesday the 13th to ask where you should get out the vote.)

Regarding Nader: Today the NY Times says that Nader is a threat to the Democrats. Possibly, but this is not supported by the data! In the nine states listed in the article, I looked at 20 polls since October 1 with both two-way results (Kerry-Bush) and three-way results (Kerry-Bush-Nader). Nader does not change the outcome in any of these - and several are ties.

October 14, 2004

The Washington Post hosts an online chat with Charlie Cook. This is excellent, a treat. He slams on the quality of polling information available on the Internet. Despite being a purveyor, I agree with him. How's that for mind-bending.

The effects of the last debate won't show up until next week. Polls typically take at least two days to complete, and pollsters usually start fresh after a big event. This is why results after the second debate are only trickling in now. In the meantime, more people think Kerry won than think Bush won the third debate, by between 1 and 14 points (CBS 39-25, ABC 42-41, CNN/Gallup 52-39, Democracy Corps 41-36). In addition to overall numbers, Kerry was favored among undecideds by 14 points (CBS), and among independents by 7 points (ABC) or 20 points (CNN/Gallup). Among independents in battleground states, where it matters most, the margin was 9 points (Democracy Corps [D]). A summary of polls for all three debates can be found here.

October 13, 2004

Lately I have been asked what-if questions (What if Bush wins Ohio? What if Kerry wins Wisconsin?) I have three ways of answering this type of question. The last answer may be of practical use in guiding your activism!

1. Flipping states: How much is the win probability affected by guaranteeing a given state? If Kerry wins Florida, then his overall win probability today jumps to 83% (five-to-one odds). If Bush wins Ohio then his win probability is 87% (eight-to-one odds).

2. Shifting the margin: What is the benefit of changing the margin by one point? You could imagine a campaign strategist making use of this to help decide where to place ads. For both candidates the three best states are Ohio, Florida and Pennsylvania. No surprises there.

3. Hitting the streets: How much do you affect the election by going somewhere to get out the vote? The way to do this calculation is to see how much the Electoral College win probability is changed by incrementing a state's margin by some fraction F, where F is inversely proportional to the state's voting population. This is because as an individual, you can only get out a finite number of votes.

Today, the best states to go to, in descending order, are: Iowa, Ohio, Nevada, and Florida. Things change a little bit if the margin is different from the estimate (for instance towards Kerry because of the incumbent rule as originated by Guy Molyneux and reviewed by the Mystery Pollster and Mark Shields), but the top four states always include Ohio and Nevada. Why Nevada? Nevada is a near-tossup and has a disproportionately high share of electoral votes.

October 5, 2004

As previously mentioned, I have been looking at the party-ID numbers in the Gallup data. I have found evidence that party ID is not fixed over time. The Gallup poll internal numbers contain the fraction of voters who call themselves Republican, Democratic, or independent. The average GOP fraction is 39%, but fluctuates. The fluctuation has been the source of much discussion and is said to be too high. As it turns out, the amount of fluctuation can be predicted from binomial statistics if the fraction of Republicans (for instance) is fixed over time. The expected standard deviation is sqrt(r*(1-r)/ N), where r is the fraction 0.39 and N is the number of people per poll, about 1000. These numbers predict a standard deviation of 1.5%. From Gallup's data, the actual standard deviation is 2.9%, almost twice this. This suggests that Gallup's way of measuring party-ID shifts over time. This supports the defense by Gallup that weighting by party ID distorts the result.

However, using unweighted data has its own problem, namely that the sample may be consistently biased in one direction or the other. In 2000, Rasmussen did not weight and predicted a margin that was 9 points more favorable to Bush than the final outcome. This is the accusation currently being made against Gallup.

But the cure may be as bad as the disease, as exemplified by Rasmussen's new approach. Rasmussen now weights, and now his presidential tracking poll fluctuates very little. Because party ID and preferred candidate (Bush/Kerry) are strongly correlated, this means that his weighting procedure will always work to reduce the margin of the leading candidate. This may explain why his poll is so stable - statistically, too stable to be right. In recent 3-day tracking data (analyzing every third day) the standard deviation of 0.7% (random fluctuation alone predicts an SD of 1.6%).

The real problem with weighting is as follows: The horserace result depends on assumptions on party ID. If these covary with sentiment, then real changes will be filtered out, and it will be very hard to learn from weighted data on who is ahead, a basic fact we want from polls. We can see an example of this today because a recent poll from Zogby shows little change from the previous poll.

Therefore I currently think that both weighting by party ID (Rasmussen now) and not weighting at all (Gallup now, Rasmussen in 2000) have serious problems. A better way to weight would be to use a question or questions with fixed answers, such as "Who did you vote for in the last election, Bush, Gore or Nader?" Time magazine does this, but does not weight. Of course, the unreliability of memory is a problem. Zogby Interactive does a sensible version of this: party ID is asked at a different time than candidate preference, which might de-link the variables. If anyone knows of other organizations that go beyond simple party-ID-weighting, please let me know.

October 2, 2004

The median electoral vote (EV) estimate is very sensitive to swings in reported opinion because of the winner-take-all mechanism of awarding EV. Under near-tie conditions I find that the change is about 30 EV per 1-point change in the popular margin. Therefore Kerry's approximately 110-EV slide since mid-August represents a 4-point swing, equivalent to 2% of voters switching from Kerry to Bush. October 5 and 9 corrections: Looking at the numbers more carefully, from August 1 to mid-September the national popular margin swung by about 8 points, 4% of voters switching. This works out to 12-15 EV gain for one candidate per 1-point change in margin or turnout. For comparison, past EV outcomes were 2000 (Bush) 271-266, 1996 (Clinton) 379-159, 1992 (Clinton) 370-168, 1988 (Bush elder) 426-111-1, 1984 (Reagan) 525-13, 1980 (Reagan) 489-49, 1976 (Carter) 297-240-1. October 9: Historically, the Electoral College margin has shown, on average, a 29 EV margin per 1% popular margin. This is consistent with my calculations this year. Since June, neither Kerry or Bush has gotten much past 320 EV, demonstrating that this is the close race that both campaigns have predicted all along.

In addition to the final polls, the outcome will ultimately be effectively adjusted by three big factors: (a) undecideds, (b) new voter registration, and (c) turnout. Undecideds usually break for the challenger, though this is not certain. Newly registered voters should in principle be reflected in polls, though how many of them pass likely-voter criteria is unclear. Turnout is a big unknown (though a known one). In 2000 Democrats did better than expected from pre-election polls. In 2002, Republicans did better. This year, unusual levels of progressive activism would seem to favor Democrats. But prediction is hard, especially of the future.

September 29, 2004

Relevant to the current Gallup controversy: Here is a table of Gallup national presidential polls, along with Party ID statistics for each poll. The GOP-Democratic margin in the poll correlates quite closely with the Bush-Kerry margin. In fact, the correlation coefficient is 0.73 (r²=0.53, P<0.001). Put into lay terms, this means that Party ID gap and Bush-Kerry margin vary together, and variation in one can account for over half the variation in the other. A linear fit between the two is near 1: on average, every extra Republican in the sample added one to Bush's margin.

This seems consistent with the idea that how much Gallup samples from each group (Republicans, Indepdendents, Democrats) affects poll outcome. But could it be the other way around: could voter sentiment affect self-reported party ID? One test of this is to see if the group sizes fluctuate as much as would be expected by chance. For a sample of 1000 voters composed of 39% R, 34% D, and 25% I (the average of all those polls), the percentage of R's would be expected to have a standard deviation of 1.5%. The actual SD is almost twice as large, 2.9%. Therefore party ID does vary more than sampling error would suggest. Maybe it varies with sentiment. Or maybe conditions at Gallup change (for instance, the time of day and week that calls were made). Hard to know. One thing is clear: their average party ID breakdown does not match known values (35% R, 38-39% D, 26% I).

Gallup Party ID bias

August 23, 2004

You can use the bias calculation to estimate where things are headed. If you think turnout will boost your candidate by N points, add that. If you think that one candidate will gain X points at the expense of the other, add 2*X. For instance, if turnout will increase Kerry's vote by 2 points, but Bush will pick up 1.5% of voters from Kerry, then the bias is 2 - (1.5 * 2) = -1%, or 1% to Bush.

August 20, 2004

For margin of error junkies: Rachel Findley pointed out this comparison of polls to election outcomes, which finds that only 84% of election outcomes fall within the reported 95% confidence interval. This discrepancy allows a way to estimate polling errors that go beyond sampling error. If the additional error is normally distributed, an appropriate correction would be to increase the reported margin of error by a factor of 1.4.

However, this correction does not apply to my calculation because instead of relying on reported MoE, I use inter-poll data to make an independent estimate of variance.

Methods

These calculations are based on state polls from many polling organizations (data sources). The primary source is Race 2004, which emphasizes likely voters (LV). Whether Nader is included depends on his ballot status in that state; if unknown then he is included. Polling organizations that provide rolling averages (Rasmussen FL, MI, MN, OH, PA) are updated twice a week. The data are fed through a MATLAB script to mathematically compute all of the above results.

The first step is to calculate the probability of winning a state, taking into account the variability of polls. This is done by calculating simple statistics on the polls: average and standard error of the mean (SEM). This is then converted to a probability of a win using the normal distribution (bell-shaped curve).

The second step of the calculation is complex: it calculates the probability of every possible outcome. For 17 states the total number of possibilities is 2^17 = 131,072. Adding Colorado, Tennessee, North Carolina and Virginia makes over 2 million possibilities. In order to reduce computing time, probabilities less than 0.1% or greater than 99.9% are classified as certain outcomes. Each possibility corresponds to a different number of electoral votes (EV).

Those are then tabulated to come up with a 50th-percentile (expected) outcome, as well as a 95-percent confidence interval. The 95-percent confidence interval is particularly useful because, like the famous Margin of Error (MoE), it gives the range of outcomes that would occur 95 percent of the time based on the available information. Note that this confidence interval is very similar to the 50th-percentile outcome from a 1-point bias towards Bush or towards Kerry. Note, October 21: The confidence interval varies in size somewhat. When large states (such as OH, WI, FL) are toss-ups the confidence band can be up to twice as large.

Although this calculation takes into account the variability of polls, it is important to note what it does not do. It is integrated over the last three polls (mostly 1-4 weeks), so fast swings do not show up. It does not reject any polls, nor does it account for potential bias or predict future opinion shift in any way.

October 6: I am now using a far faster way of calculating the probability distribution in closed form (see discussion). Thanks to Lee of Quant Consulting.

Poll selection

Polls are unfiltered and equally weighted, in part because selecting data leads to unintended biases. Therefore even though some polling organizations give demonstrable and consistent outlier results, such as state-level Fox News polls (example), Fairleigh Dickinson Public Mind, and the Badger Poll, all polls are still included. I have also included Zogby Interactive polls, which have relatively untested methods but do not show measurable bias in either direction from the average.

However, even when all of the above polls are excluded the result is virtually identical. Thus the method can be tailored but is also robust enough to give a reasonable answer even with no selection of data. For a more full discussion of my methods, see this DailyKos thread.

Bias

These calculations would be affected if there is an overall poll bias, which can have a large effect in a close race. Bias could happen if polling methods do not accurately sample actual voting patterns. However, in 2000, Ryan Lizza at The New Republic compiled state polls. On the day before the election, that compilation indicated that the outcome would hinge on Florida. This matches what happened, arguing against major built-in biases in state polls.

Other factors may have an effect of unknown size, such as increased motivation by Democrats or the possibility that undecided voters will break against the incumbent.

A key measure of the current closeness of the race is the Popular Meta-Margin (a.k.a. Swing Index). This is the across-the-board percentage shift in opinion (or poll bias) that would be needed to make the electoral college an exact toss-up? This is analogous to the popular margin in national polls, but is more relevant to what it would take in terms of real electoral mechanisms.

Bias occurs if (a) polling organizations give skewed results (on Election Eve 2000, they favored Bush by 2.5%) or if (b) one side turns its voters out better or worse than predicted. The closeness of the race for the last few months (less than 3% in either direction) indicates a heavy role for registration of new voters and election-day turnout.

Predicting the future

You can use the bias calculation to estimate where things are headed. If you think turnout efforts will boost your candidate by N points, add that. If you think that one candidate will gain X points at the expense of the other, add 2*X.

For instance, if you predict that turnout will increase Kerry's vote by 2 points, but Bush will pick up 1.5% of voters from Kerry, then the bias to use is 2 - (1.5 * 2) = -1%, or 1% to Bush.

Site History

My academic specialties are biophysics and neuroscience. In these fields I make heavy use of probability and statistics in analyzing complex experimental data, and have published many papers using these approaches. Polling provides an interesting everyday example.

I originally did this calculation to help think about how to allocate my campaign contributions. I believe that one can make the biggest difference by donating at the margin, where probabilities for success are 20-80%. To read a discussion click here. This site now gets over 10,000 hits per day.

In addition to Kerry's race, the Senate is within reach, My recommendations for Democratic donations are listed on my ActBlue page.

For those of you wanting to reinforce the national election (to see why, go to the bias analysis above), I recommend the voter registration and turnout organization America Coming Together. For the optimists there is also the DCCC.

To point out the obvious, the converse interpretation of these calculations for Republicans is to direct resources toward the White House and the Senate.

Thanks to Drew Thaler for the site facelift and for the interactive map!

Other election resources

Data sources: Race 2004, electoral-vote.com, and RealClearPolitics. [database]
People who apply probabilistic methods to polls: Matthew Hubbard, Larry Allen, Andrea Moro.
DailyKos Senate analysis
Charlie Cook's National Journal analyses of the presidential race (older column) and Senate
OurCongress.org (House)
Bush approval rating. Less than 46 means he is toast, more than 53 means Kerry is toast; in between is uncharted territory. [Pollkatz Graph]
National horserace numbers
Ed Fitzgerald's survey of other electoral vote analyses
Analysis by William Saletan at Slate on current polls
Running commentary by Ruy Texeira on reading polls
Comparison of polls with ultimate election outcomes - suggests that the published margin of error is too optimistic by a factor of 1.4