PredictWise Blog

Election Update - 10/23, 12 Days

Bookmark and Share

Things just keep getting worse for the Democrats in the senate. We now have the Democrats at just 19% to hold onto the senate. The amazing thing is that the continuous slide is not any serious slip-ups, but just the gradual shifting of leaning Republican to strong Republicans and one (or two) big surprises.

First, it was never really likely that the Democrats were going to carry Georgia, Alaska, Louisiana, South Dakota, Kentucky, or Arkansas. Several of the seats were blue, but the states are red. Over the last few months anything thing should be red, has just gotten a little redder, and that cements as time goes by. It is nearly Election Day and there have been no campaign altering incidents (i.e., no talk of rape, like Mourdock in 2012, or macaca, like George Allen in 2006).

Second, Colorado, a blue state, is proving very difficult for the Democrats in the polls; both the senatorial and gubernatorial incumbents are losing. I do not have a great explanation for this, but it is consistent in both races. I have heard arguments on both sides and feel confident in saying it is equally likely the polls are off in either direction; there is no reason, ex-ante, to assume the polls are off in the direction of the Republicans.

Third, Iowa is still a wild card in that it is both very tight and Ernst is very conservative and her handlers appear worried about her talking unscripted. A few hours ago she skipped her planned meeting the editorial board of the largest paper in Iowa. The 27% in Iowa reflects the uncertainty in the polls, there is slightly additional uncertainty in the candidate that the model is not capable of fully realizing.

Here is New York Times and FiveThirtyEight compared with PredictWise:

Updating Predictions: senatorial, senatorial balance of power, and gubernatorial.

Election Update - 10/19, 16 Days

Bookmark and Share

The last week and a half has been an unmitigated disaster for the Democrats in the polls and their fleeting chances of holding onto the majority of the senate.

The Democrats only bright spot has been Georgia, where Michelle Nunn has pushed into a tight race with David Perdue. The most likely outcome of the election is a runoff between the two candidates, as it is likely neither will get 50% of the vote. Which is why, despite leading many polls, Nunn is still slightly less than 50% to win, as Libertarian supporters are little more likely to break for Perdue in a runoff.

The biggest movement is in Kansas, where voters appear to be second guessing their choice of a Democratic leaning independent. The next biggest movement is more surprising, as the incumbent Democratic senator Udall in CO has fallen steadily behind the Republican challenger Gardner. Finally, the Iowa senate race has been a bit more of a regular roller coaster as Ernst, the Republican, holds a slight, but steady, lead over Braley.

South Dakota has been added to the short list, but Rounds, the Republican, still hold a strong lead in the three candidate race. Polling is sparse in South Dakota though and we are still waiting to see what will happen next!

Here is New York Times and FiveThirtyEight compared with PredictWise. Not too much difference:

Updating Predictions: senatorial, senatorial balance of power, and gubernatorial.

Senate v. Electoral College

Bookmark and Share

The race for the control of the U.S. senate feels a lot like the race for control of the Electoral College (i.e., president), but there are a few crucial differences. First, the only thing that matters after the Electoral College convenes is who won the Electoral College, but minority party senators still get to vote for the next six years (and may tip the majority in the next election or sooner). Second, the Electoral College is 51 elections about the exact same two people, while senatorial elections are about 36 different sets of candidates. Thus, movements in the Electoral College are highly correlated, but senatorial elections are very independent.

Individual elections matter after determining the balance of power in the senate, but do not in the Electoral College. The president is only elected every four years, so it is hard to measure the impact of margin of victory, but the most powerful president in the last few decades lost the popular vote and barely won the Electoral College. In the senate, it is hard for individual senators to enact change, but it is easy for individual senators to block change. Also, looking forward to 2016, the Democrats will defend just 10 seats to the Republicans 24. If the Democrats do lose the senate in 2014, they are likely to win it back in 2016 on the strength of senators who won in 2014.

Start of campaign season to Election Day

The Electoral College elections are extremely correlated. I introduced my Electoral College predictions on February 16, 2012 and there were 26 races where between 5% and 95% for Obama. When I ranked those 26 states from most likely to least likely and compared that to the rank of percentage of votes on Election Day, the correlation was 0.93. Most of the movement was at the less identified fringe, where Arizona was slightly more Democratic than a few similar states with 45% of two-party vote share for Obama (still a landslide) and Maine was also slightly more Democratic than similarly secure states with 58%. When we lined up all of the states in February and pointed out the pivotal states in the middle they went: FL, VA, OH, NH, CO, IA, PA, in that order. Nine months later the vote share in order was: FL, OH, VA. PA, CO, NH, IA. Over the course of nine months the secure states all drifted towards their likely winners, but the true battleground states moved up and down in lock-step as videos turned to debates turned to Sandy. Assume there are three states with the probability of A, B, and C voting Democratic: 25%, 50%, and 75%. You can assume that if one state votes Democratic it will be C, if two states vote Democratic it will be B and C, and if three states vote Democratic it will be A, B, and C. If you assume the possibility of any other combinations at 0%, you are generally going to be fine.

The correlation between the senatorial elections is much less correlated. 16 of 33 elections were between 5% and 95% when I introduced my senatorial predictions in June of 2012. The third most likely for the Republicans in this group was Indiana at 82% for the GOP. The most Republican of the toss-up states was Missouri at 52% for the GOP. Indiana fell to the Democrats in a reasonably tight race and the Missouri fell in a landslide. Both candidates said questionable states on rape and their polls plummeted; their statements certainly entered the public debate, but there is little evidence that their individual falls seriously affected other candidates. Over the five month period of my data the correlations between initial rank of probability and final rank of vote share was 0.78. Assume there are three states with the probability of A, B, and C voting Democratic: 25%, 50%, and 75%. You can assume that if one state votes Democratic it will be C, if two states vote Democratic it will be B and C, and if three states vote Democratic it will be A, B, and C. If you assume the possibility of any other combinations at 0%, you are going to have a problem.

Election Day

Election Day poses a different type of uncertainty than the course of the election. Election Day uncertainty can be correlated over both types of elections if polling is systematically biasing one party of the other. State-to-state polling for senatorial election is still less likely to be systematically biased than state-to-state polling for the Electoral College, as the polling itself is less correlated between companies and time. Yet, it is legitimate to assume that the uncertainty left on Election Day, unlike uncertainty during the campaign season, is relatively correlated for senatorial elections.

What does this mean for 2014?

The Democrats currently control 34 seats, the Republicans 30, and there 36 seats up for election. So, the Democrats, who need 50 seats for a majority, need to win 16 seats to control the senate and the Republicans, who need 51 seats for a majority, need to win 21 seats.

If this was the Electoral College, I would be comfortable lining up the states from most likely Democratic to least likely Democratic. In that list, Georgia is the swing state (if Orman goes Democratic at 49) or Colorado (if Orman goes Democratic at 50). Thus, I could say the likelihood of the Democrats controlling the senate was 35% or 27%, depending on your Orman assumption. Or 31% if you assume Orman flips a coin (50% to causus with either party) in the scenario that the Democrats hold 49 other seats and he wins. This ranking method can be attributed to Ray Fair and talked about extensively in 2012.

But, the senate is different, in that North Carolina is 72% likely to go Democratic and Alaska is 15% likely to go Democratic. If this was the Electoral College, I would say that the possibility of the Alaska going Democratic and North Carolina Republican was about 0%. States just do not leapfrog like that when the movement is so correlated. But, the possibility of the Alaska senatorial election going Democratic and the North Carolina going Republican is about 15%*28% = 4% (maybe a little less, due to some correlation).

In practice, this does not change the answer that much; assuming near independence (and 50% likelihood Orman goes Democratic if they control 49 seats) we get a probability of 27% that the Democrats control the senate. Near independence versus near perfect correlation lowers the probability just a few percentage points. But, it does dramatically alter the possible coalition that the Democrats or Republicans bring to the next senate; Begich from Alaska may toil in the minority and Hagan from North Carolina could lose, even if the Democrats hold the senate.

Updating Predictions: senatorial, senatorial balance of power, and gubernatorial.

Why Microsoft Prediction Lab

Bookmark and Share

I launched a new website, with a few friends, including Miro Dudik and David Pennock, called Microsoft Prediction Lab. The website consolidates research into both non-representative polling and prediction games. I have spent years understanding how various raw data: polling, prediction markets, and social media and online data, can be transformed into indicators of present interest and sentiment, as well as predictions, of varying populations. Then, how decision makers allocate resources with the low latency and quantifiable market intelligence that we produce. Microsoft Prediction Lab allows us to continuously innovate not only on the path of raw data to analytics to consumption, but the collection of the data itself.

Microsoft Prediction Lab serves two symbiotic purposes; for it to be a successful laboratory, it must also be a successful product, and vice-versa. The project is designed to promote engagement and showcase the bleeding-edge work of Microsoft Research (and other collaborators). Further, the research is making an impact in how people create predictions in the several billion dollar election industry, and that will spread into other domains soon.

Markets: Markets have been an efficient method of aggregating data for millennia, and prediction markets have been forecasting elections for over century, but there is room for improvement. Here are a few of the innovations we are exploring in Microsoft Prediction Lab. First, we are examining how well markets can work without currency by using incentives like teams, leaderboards, etc. Second, we are examining how we can lower the barriers to entry into markets by making more intuitive interfaces and wording the questions efficiently depending on the user’s knowledge of markets and expectations. Third, we are adapting the right questions for the right people to ensure that information flow is maximized from the users to the market. Fourth, once the data is collected we are using fully combinatorial market makers. Individual probabilities are interesting, but combinatorial and conditional probabilities pose a meaningful and interesting challenge.

Polls: The only acceptable form of polling in the multi-billion dollar survey research field utilizes representative “probability” samples; my colleagues and I argue that with proper statistical adjustment, non-representative polling data can translate into accurate predictions, and often in a much more timely and cost-effective fashion. We demonstrated this by applying multilevel regression and post-stratification (MRP) to a 2012 election survey on the Xbox gaming platform. This was an incredibly non-representative sample. But, not only did the transformed top-line projections from this data closely trend standard indicators, we used the unique nature of the data’s size and panel to answer a meaningful political puzzle. We found that reported swings in public opinion polls are generally not due to actual shifts in vote intention, but rather are the result of temporary periods of relatively low response rates among supporters of the reportedly slumping candidate. We raise the possibility that decades of large, reported swings in public opinion—including the perennial “convention bounce”—are mostly artifacts of sampling bias. More broadly, the work on the Xbox, and subsequent studies with Sharad Goel, show great promise for using non-representative polling data to measure public opinion and general social science questions at a lower cost, with more speed and flexibility.

Visit the new site at:

Methods for gubernatorial and senatorial predictions

Bookmark and Share

We use really simple and transparent methods for creating forecasts for the gubernatorial and senatorial elections. Everything I do is outlined is this forthcoming paper. The method is unchanged from 2012, but the coefficients are updated with 2012 data.

I consider three different types of data:  fundamental, polling, and prediction markets. Fundamental data includes: incumbency, past election results, change in economic indicators, presidential approval, state ideology, and biographical data. Polling data includes aggregated traditional polls Huffington Post’s Pollster and Real Clear Politics. Prediction market data includes prices on contracts from Betfair.

All of the data needs to transform from raw data into predictions. For fundamental data I take advantage of historical correlations, tested for out-of-sample robustness, to match current variables to likely outcomes. For polling I ameliorate several different biases, including the anti-incumbency bias (where incumbents poll lower early than they do on Election Day) and reversion to mean (where big lead tend to contract). For prediction markets I focus on the favorite-longshot bias where prices tend to be under-confident.

I transform the raw data into three separate probabilities of victory and then combine them to form a single probability of victory. The combined probability of victory is accurate, updates regularly, answers the key question of most stakeholders, and easily scaled from Electoral College to senatorial to gubernatorial.

There is no question that there are more complex forecasts out there, but they are no more accurate than my forecasts. Why? Because they lack the identification to verify their “improvements”. And, because of their complexity, their forecasts do not easily scale to gubernatorial or House elections.

Updating Predictions: senatorial, senatorial balance of power, and gubernatorial.