PredictWise Blog

Why Microsoft Prediction Lab

Bookmark and Share

I launched a new website, with a few friends, including Miro Dudik and David Pennock, called Microsoft Prediction Lab. The website consolidates research into both non-representative polling and prediction games. I have spent years understanding how various raw data: polling, prediction markets, and social media and online data, can be transformed into indicators of present interest and sentiment, as well as predictions, of varying populations. Then, how decision makers allocate resources with the low latency and quantifiable market intelligence that we produce. Microsoft Prediction Lab allows us to continuously innovate not only on the path of raw data to analytics to consumption, but the collection of the data itself.

Microsoft Prediction Lab serves two symbiotic purposes; for it to be a successful laboratory, it must also be a successful product, and vice-versa. The project is designed to promote engagement and showcase the bleeding-edge work of Microsoft Research (and other collaborators). Further, the research is making an impact in how people create predictions in the several billion dollar election industry, and that will spread into other domains soon.

Markets: Markets have been an efficient method of aggregating data for millennia, and prediction markets have been forecasting elections for over century, but there is room for improvement. Here are a few of the innovations we are exploring in Microsoft Prediction Lab. First, we are examining how well markets can work without currency by using incentives like teams, leaderboards, etc. Second, we are examining how we can lower the barriers to entry into markets by making more intuitive interfaces and wording the questions efficiently depending on the user’s knowledge of markets and expectations. Third, we are adapting the right questions for the right people to ensure that information flow is maximized from the users to the market. Fourth, once the data is collected we are using fully combinatorial market makers. Individual probabilities are interesting, but combinatorial and conditional probabilities pose a meaningful and interesting challenge.

Polls: The only acceptable form of polling in the multi-billion dollar survey research field utilizes representative “probability” samples; my colleagues and I argue that with proper statistical adjustment, non-representative polling data can translate into accurate predictions, and often in a much more timely and cost-effective fashion. We demonstrated this by applying multilevel regression and post-stratification (MRP) to a 2012 election survey on the Xbox gaming platform. This was an incredibly non-representative sample. But, not only did the transformed top-line projections from this data closely trend standard indicators, we used the unique nature of the data’s size and panel to answer a meaningful political puzzle. We found that reported swings in public opinion polls are generally not due to actual shifts in vote intention, but rather are the result of temporary periods of relatively low response rates among supporters of the reportedly slumping candidate. We raise the possibility that decades of large, reported swings in public opinion—including the perennial “convention bounce”—are mostly artifacts of sampling bias. More broadly, the work on the Xbox, and subsequent studies with Sharad Goel, show great promise for using non-representative polling data to measure public opinion and general social science questions at a lower cost, with more speed and flexibility.

Visit the new site at: Prediction.Microsoft.com.

Methods for gubernatorial and senatorial predictions

Bookmark and Share

We use really simple and transparent methods for creating forecasts for the gubernatorial and senatorial elections. Everything I do is outlined is this forthcoming paper. The method is unchanged from 2012, but the coefficients are updated with 2012 data.

I consider three different types of data:  fundamental, polling, and prediction markets. Fundamental data includes: incumbency, past election results, change in economic indicators, presidential approval, state ideology, and biographical data. Polling data includes aggregated traditional polls Huffington Post’s Pollster and Real Clear Politics. Prediction market data includes prices on contracts from Betfair.

All of the data needs to transform from raw data into predictions. For fundamental data I take advantage of historical correlations, tested for out-of-sample robustness, to match current variables to likely outcomes. For polling I ameliorate several different biases, including the anti-incumbency bias (where incumbents poll lower early than they do on Election Day) and reversion to mean (where big lead tend to contract). For prediction markets I focus on the favorite-longshot bias where prices tend to be under-confident.

I transform the raw data into three separate probabilities of victory and then combine them to form a single probability of victory. The combined probability of victory is accurate, updates regularly, answers the key question of most stakeholders, and easily scaled from Electoral College to senatorial to gubernatorial.

There is no question that there are more complex forecasts out there, but they are no more accurate than my forecasts. Why? Because they lack the identification to verify their “improvements”. And, because of their complexity, their forecasts do not easily scale to gubernatorial or House elections.

Updating Predictions: senatorial, senatorial balance of power, and gubernatorial.

Election Update - 10/10, 25 Days

Bookmark and Share

The balance of power in the senate is both extremely tight and extremely important. I get that. But, race for race, the gubernatorial elections are fascinating to follow. We have seven races between 30% and 70% and all of them have national implications. The most interesting part of this list is that in six of the seven races the incumbent (or seat) is a Republican, many thought leaders of their party. Depending on how many of them turn Democratic, the narrative of a Republican wave will be in serious jeopardy. From most likely Republican to most likely Democratic:

1) Wisconsin (36% Democratic): Scott Walker, Republican incumbent, is running against Mary Burke, Democrat. Walker survived a recall vote in 2012 that was directly related to him slashing benefits for public union employees (excluding police and firefighters). He also championed a voter id law that the Supreme Court blocked on October 9, 2014. Both of these issues have become very prominent for the Republicans in the last few years and Walker is a leader within the party for championing them.

2) Arizona (38% Democratic): The incumbent Republican, Jan Brewer, is still the most interesting aspect of this race between Doug Ducey (R) and Fred DuVal (D). Arizona faced boycotts in over its immigration policy, specifically SB 1070. Further, with the shooting of Congresswomen Gabby Giffords and her subsequent push for gun control, Arizona is now a hotbed of discussion on two major national issues.

3) Kansas (48% Democratic): Sam Brownback, the Republican incumbent, is in serious trouble against Paul Davis, the Democratic challenger. Brownback actually implemented serious austerity in Kansas. He cut taxes and cut the budget. This may sound like standard Republican policy, but it is extremely rare to see both cuts so deep (and frequently the taxes are cut, but the budget is not). And, it has been a serious disaster so far. That is how an incumbent governor in a very red state is in serious trouble.

4) Florida (49% Democratic): The current governor, Republican Rick Scott, is up against former governor, Democrat Charlie Crist. Scott is known nationally for two Republican initiatives: cutting funding for public transportation infrastructure and drug testing people on government assistance. But, this election is more about a clash of personalities.

5) Colorado (50% Democratic): Incumbent Democrat John Hickenlooper is in big trouble against Republican Bob Beauprez. This time it is signature Democratic policies under attack as Hickenlooper championed: cannabis, gun control, and eliminating capital punishment.

6) Maine (54% Democratic): Incumbent Republican Paul LePage is in serious trouble against Democratic challenger Mike Michaud. LePage has found his way into the national spotlight for a string of insensitive or inflammatory remarks. And, his stand against union works, both real and symbolic.

7) Alaska (56% Independent): Independent Bill Walker is leading incumbent Republican Sean Parnell. Sean Parnell took over the seat when Sarah Palin resigned after the 2008 presidential election.

Updating Predictions: senatorial, senatorial balance of power, and gubernatorial.

Nobel Peace Prize

Bookmark and Share

The odds are pretty uniform for the Nobel Peace Prize from major punters. Here is my translation of the odds into probabilties. Pope Francis the strong front-runner, followed closely by Denis Mukwege the doctor from the Congo:

Election Update - 10/8, 27 Days

Bookmark and Share

The senatorial forecasts have remained remarkably steady over the last week with two small exceptions. First, Kansas has come into stronger focus and the Independent Orman is pulling away from the incumbent Roberts. Second, Michigan, which was leaning heavily Democratic is now a solid Democratic (and off of our chart!).

I have also updated the forecast for balance of power. For those of you who watch the site closely, you may have seen a small, but reasonable discrete jump when I went live with my updates about 2 AM ET on 10/8/2014. The reason is that I was previously assuming that Orman was 100% likely to caucus with the Democrats. Now, the calculations assume that Orman is 100% to caucus with the Democrats if they get 50 or more seats, 100% to caucus with the Republicans if they get 51 or more seats, and 50% to the caucus with either party of the final tally (excluding him) is 50 Republicans and 49 Democrats. Thus, what I have done is derived the balance of power as if the Kansas race did not exist and then add in the possible effect of the Kansas election. I will update this choice as Orman’s choice comes into focus. Further, the prediction market forecast that is directly forecasting balance of power is a few percentage points different from the aggregated balance of power prediction generated from the state-by-state elections. In order to ensure consistency, the topline balance of power numbers now reflect the aggregated forecast of the states, but all data is noted.

The forecasts from the major news organizations are converging, which is not surprising. More unique fundamental forecasts that dominate early stage forecasting are now completely supplanted by heavy polling, which everyone sees.

Here is New York Times and FiveThirtyEight compared with PredictWise. Not too much difference:

Updating Predictions: senatorial, senatorial balance of power, and gubernatorial.