PredictWise Blog

Hidden Errors and Overconfident Pollsters

Bookmark and Share

Written with Sharad Goel and Houshmand Shirani-Mehr

Election forecasts, whether on HuffingtonPost's Pollster, New York Times’ Upshot, FiveThirtyEight, or PredictWise, report a margin of error of typically 3 percentage points. That means that 95% of the time the election outcome should lie within that interval. We find, however, that the true error is actually much larger than that, and moreover, polls historically understate support for Democratic candidates.

To estimate the true margin of error, we looked at all polls for senatorial races in 2012 that were published on the two major poll aggregation sites (Huffington Post’s Pollster and Real Clear Politics).Then, using the standard formula, we computed their theoretical margin of error. Finally, we simply plotted the percentage of polls where the outcome of the election actually fell within the standard confidence range.

Note: Data are from Huffington Post’s Pollster and Real Clear Politics. The thick horizontal line, a little below 70%, represents overall percent of outcomes in the reported 95% confidence ranges.

For polls conducted right before Election Day, the actual election outcome falls within a poll’s stated 95% confidence interval about 75% of the time. That means that whereas the polls’ margin of error says they should capture 95% of outcomes, they in fact capture only 75%. In other words, the reported margins of error are far too optimistic.

Why are the reported confidence intervals too narrow? First, polls only measure attitudes at the time they were conducted, not on Election Day, and the standard error estimates neglect to account for this. (To be fair, the pollsters typically add the disclaimer that results reflect the likely outcome of a hypothetical election held on the day of the poll.) But, close to Election Day, there is probably little real change in support, and the reported confidence intervals are still too small. This discrepancy is attributable to polling companies reporting only one of four major sources of error, as we describe below.

Sampling Error: This is the one source of error that pollsters do report, and it captures the error associated with only measuring opinions in a random sample of the population, as opposed to among all voters.

Coverage Error: Pollsters aim to contact each likely voter with equal probability and deviations from this result in coverage error. This was relatively easy in the world of ubiquitous landline phones (remember those), but with the rise of cell phones and internet it is not so easy to determine how to mix polling methods so that any given likely voter is contacted. This problem is getting worse each year, as landline penetration decreases. Coverage error is exacerbated by shifting modes of voting, such as voting by mail or early voting, which complicate traditional screens used to determine who is likely to vote.

Non-Response Error: After identifying a random set of likely voters, pollsters still need them to actual answer the polling questions. If those who are willing to be interviewed systematically differ from those who are not, this introduces yet another source of error, non-response error. This problem is also getting worse each year, as people are increasingly reluctant to answer cell-phone calls from unknown numbers or to take ten minutes to answer a poll in a busy world.

Survey Error: The exact wording of the questions, the order of the questions, the tone of the interviewer, and numerous other survey design factors all affect the result, leading to still another error source.

As Nate Cohn outlined in the New York Times on Thursday, the latter three error sources are more likely to undercount Democrats than Republicans. For example, Democrats are more likely than Republicans to have a cell-phone from a different area code than where they currently live (like all three of the authors of this article), which in turn results in coverage error since such individuals cannot be included in state-level polls. Cohn notes that among cell-phone only adults, people whose area code does not match where they live lean Democratic by 14 points, whereas those that matched lean Democratic by 8 points. For an example of non-response and survey error, Cohn notes that Hispanics who are uncomfortable taking a poll in English are more likely to vote Democratic than demographically similar Hispanics.

Thus, we expect the actual polling errors to be larger than the stated errors, and moreover, we expect polling results to favor the Republicans. This pattern is strikingly apparent when we plot the observed differences between poll predictions and actual election outcomes for the 2012 Senate races. Positive numbers indicate the poll skewed in favor of the Republicans. Alongside the observed differences, we plot the theoretical distribution of poll results if sampling error were the only factor.

Note: Data are from Huffington Post’s Pollster and Real Clear Politics.

The observed distribution clearly skews toward the Republican candidates. Further, the observed distribution is wider than the theoretical one, in large part because the polls are conducted over several weeks prior to the election, while the theoretical distribution does not take into account how much candidate support varies over the course of the campaign.

How much do these overly optimistic forecasts matter? First, the theoretical 3 percentage point margin of error is already substantial, and puts nearly every competitive race within that range. Second, when you add in the unaccounted for errors, election outcomes in contested races are simply far less certain; and coverage and non-response errors will likely only get worse each cycle. Third, while aggregating a bunch of polls for each election reduces the variance, it does not eliminate the bias, so these overconfident predictions pose a problem for aggregate forecasts as well. In short, those fancy models that show probability of victory are only as good as their ingredients, and if the polls are wrong, the poll aggregations will be wrong as well.

Sharad Goel is an Assistant Professor at Stanford University

David Rothschild is an economist at Microsoft Research and runs PredictWise

Houshmand Shirani-Mehr is a graduate student at Stanford University

Election Update - 11/2, 2 Days

Bookmark and Share

The Democrats are likely to lose the senate for two years. My predictions have been consistently more bullish on Republican victory than any of the other main forecasters: New York Times’ Upshot, FiveThirtyEight, HuffingtonPost’s Pollster, Princeton’s Sam Wang, etc. And, to be frank, the data is more generous to the Democrats than my gut, but I am obliged to run with the data.

The Democrats will have 47 seats if they take all of their certain races, along with New Hampshire and North Carolina. Of course, New Hampshire and North Carolina are not certain, but for the sake of this exercise, let us assume the Democrats take those seats. There are just eight other seats that are even remotely in play, and the Democrats would have to win three of them to get to a 50/50 tie, where Joe Biden is the tie-breaker.

Three of the races are extreme longshots: Arkansas, Kentucky, and Louisiana. In both Arkansas and Kentucky the Republican has been consistently leading by more than 5 percentage points. Neither of these states are particularly susceptible to polling error, they do not have fast moving populations, high levels of Hispanics, etc. So, it is unlikely they will suffer a catastrophic poling failure. In Louisiana Landrieu is within striking distance, but is hurt by the majority voting system. Senator Landrieu will not get 50% of the vote in the original vote and Democrats tend to suffer in runoffs, because Democratic voters are less likely than Republicans to bother voting twice. Winning any of these three elections is become extremely unlikely.

The Democrats really need to get three of the five other races, but they all pose their particular problems. First, Colorado is drifting back to the incumbent Democratic governor and, to be frank, the senatorial polling is a bit of mystery. The Democratic incumbent is liked and the state is reasonably blue. Despite consistent polling showing Udall losing, Colorado is a state that polling error is possible and early voting is confusing. Second, Iowa, like Colorado, is one that I would have expected the Democrats to challenge closer, but the polling is consistent for the Republican. This state is a little less blue than Colorado, less likely polling error, but the Democratic candidate has been closer all race. Third, Alaska is Republican state and the incumbent Democratic senator is polling consistently behind. He would not be in this race at all except for two crazy outlier polls showing him dominating. Fourth, Georgia shows the Republican in the lead and, again, the Democrats are not poised to do well in a possible runoff. Finally, the race in Kansas is a toss-up, but with the Republican governor almost definitely going to lose, expect people to split their vote in the ballot box and keep the Republican senator.

Actually, the Democrats really need to get three of the four race that are not Kansas. I doubt an independent Senator Orman will cast the deciding vote in the senate for the Democrats, because that would be political suicide for him in 2020. Instead, if the senate is 49 Democrats and 50 Republican, expect Orman to caucus with the Republicans in 2015-6 and then he will quietly caucus with the Democratic majority that will take over the senate on January 1, 2017.

All of this begs the question, can the Democrats capture Iowa, Colorado, and either Georgia or Alaska. It is possible, but if Sam Wang or Nate Silver were backing up their probabilities with real-money bets. To translate, Sam Wang is implicitly saying he is good getting $60 if the Democrats control the senate and paying me $40 if not, while Nate Silver is implicitly saying he is good getting $70 if the Democrats control the senate and paying me $30 if not. I consider a fair wager at $80 if the Democrats control the senate and $20 if not.

Here is New York Times and FiveThirtyEight compared with PredictWise. The one key difference is the other forecasters are much more bullish on the Democratic pickup in Georgia. I admit one key issue is that there is no historical identification for what will happen in a runoff that determines the balance of power in the U.S. Senate.

Updating Predictions: house, senatorial, senatorial balance of power, and gubernatorial.

Republicans are going to win the U.S. House

Bookmark and Share

The Republicans are going to hold on to the House. Our latest forecast has the Republicans controlling 237 seats to 198 seats for the Democrats following this election. After the 2012 election the Republicans controlled 234 seats to 201 seats; we are projecting a gain of 3 seats. In the 2012 election the Democrats receive 59.6 million to 58.2 million votes for 50.6% of the two-party vote. Currently Huffington Post’s Pollster and Real Clear Politics have the national popular House vote at between 1.5 and 2.5 percentage points for the Republicans.

The forecasts are generated with two type of data: fundamental data and the Cook Reports. The fundamental model is very simple: past election results, changes in demographics, and incumbent running. The Cook Reports adds a subjective value that they update every few days. I simply took this data, put it into a probit regression from previous years, and used the coefficients to project for 2014.

What I am going to be following closely on Election Day, besides the exact number of seats, is the national population vote. The Republicans are poised to get a much lower percentage than they did in 2010, just higher than they did in 2012.

Starting 2016 we hope to use district-by-district polling, but until then, this should give a pretty strong prediction for Election Day 2014.

Updating Predictions: house, senatorial, senatorial balance of power, and gubernatorial.

Election Update - 10/31, 4 Days

Bookmark and Share

As we near Election Day there are really 8 competitive elections out of 36. This is normal compared with previous years. If we assume that the other 28 elections are now done, the Republicans are going to go into Election Night with 47 seats and the Democrats with 45 seats. If you want to expand he realm of possibility to anything that is not 0% and 100% (assuming the Republicans have not already taken Arkansas and Kentucky), then there are 10 seats in play and the chamber is 45 to 45.

The Democrats are heavily favored in two elections: New Hampshire and North Carolina; the Republicans are heavily favored in Louisiana and Colorado. Colorado is the most interesting of these four states, as their new voting scheme could mess up the polling. Early voting does look ok for Democrats in Colorado, but it needs to be great.

The remaining states: Kansas, Georgia, Iowa, and Alaska, are all very tight. Kansas has a wildcard situation in both, leading candidate, by the slimmest of margins, is an independent. Further, the incumbent Republican governor is losing. So, people may vote differently than the polls if they panic about giving the Democrats too big of victory or where Orman, the independent will caucus. Georgia is also tough for the Democrats, because Nunn does not have enough to win without a runoff and a runoff lowers her likelihood of victory. Iowa is the only one of the four where the uncertainty is still a lot about the campaign, not Election Day. The Republican is a bit of wild card and has been stumbling down the stretch, avoiding spontaneous appearance sand interviews, trying “run out the clock.” Somehow Begich, the incumbent Democratic senator in Alaska, keeps the race close and everything comes down to whether he truly make a huge turnout on Election Day that overcomes challenger Sullivan’s slight lead in the polls.

Here is New York Times and FiveThirtyEight compared with PredictWise. FiveThirtyEight is surprisingly bullish on the Democrats in Kentucky.

Updating Predictions: senatorial, senatorial balance of power, and gubernatorial.

Election Update - 10/28, 7 Days

Bookmark and Share

The Democrats have had a few good days in the polls, but it is unlikely to be enough to hold onto the senate. Currently the Democrats are about 25% to hold the senate, up from a low of 20% yesterday.

Despite a crazy outlier poll today, Shaheen looks more and more likely to old in New Hampshire against former Massachusetts’ senator Brown. And, Hagan is holding the lead in North Carolina for another week against challenger Tillis. But, the reason it has been a good week for the Democrats is that there were so few races left in their column after last week!

Both Georgia and Kansas have stayed really tight. Georgia is increasingly likely to go to runoff where, despite the current polling, the Democratic candidate, Nunn, will be in trouble. As turnout decreases the Democratic candidates lose voters.

Iowa is back in play, as the next most likely to flip. This is not surprising as the Republican Ernst is a wildcard. She was my pre-season pick for an embarrassing and costly gaff. Alaska has had some crazy polls recently, but any poll with Don Young only up by 1 point is pretty suspect (despite his recent erratic behavior). The Democratic incumbent Begich is still in a lot of trouble holding his seat.

Here is New York Times and FiveThirtyEight compared with PredictWise.

The crazy thing is that as these states go up and down there is not that much movement in the likely outcome. There has been a steady 22% or so with 48 seats for the Democrats as the most likely outcome. But, the likelihood of the Democrats controlling the senate has fallen as 49 or 50 seats have fallen and 47 seats has risen.

Updating Predictions: senatorial, senatorial balance of power, and gubernatorial.