DavidMRothschild on October 17, 2014 @ 5:48PM
The race for the control of the U.S. senate feels a lot like the race for control of the Electoral College (i.e., president), but there are a few crucial differences. First, the only thing that matters after the Electoral College convenes is who won the Electoral College, but minority party senators still get to vote for the next six years (and may tip the majority in the next election or sooner). Second, the Electoral College is 51 elections about the exact same two people, while senatorial elections are about 36 different sets of candidates. Thus, movements in the Electoral College are highly correlated, but senatorial elections are very independent.
Individual elections matter after determining the balance of power in the senate, but do not in the Electoral College. The president is only elected every four years, so it is hard to measure the impact of margin of victory, but the most powerful president in the last few decades lost the popular vote and barely won the Electoral College. In the senate, it is hard for individual senators to enact change, but it is easy for individual senators to block change. Also, looking forward to 2016, the Democrats will defend just 10 seats to the Republicans 24. If the Democrats do lose the senate in 2014, they are likely to win it back in 2016 on the strength of senators who won in 2014.
Start of campaign season to Election Day
The Electoral College elections are extremely correlated. I introduced my Electoral College predictions on February 16, 2012 and there were 26 races where between 5% and 95% for Obama. When I ranked those 26 states from most likely to least likely and compared that to the rank of percentage of votes on Election Day, the correlation was 0.93. Most of the movement was at the less identified fringe, where Arizona was slightly more Democratic than a few similar states with 45% of two-party vote share for Obama (still a landslide) and Maine was also slightly more Democratic than similarly secure states with 58%. When we lined up all of the states in February and pointed out the pivotal states in the middle they went: FL, VA, OH, NH, CO, IA, PA, in that order. Nine months later the vote share in order was: FL, OH, VA. PA, CO, NH, IA. Over the course of nine months the secure states all drifted towards their likely winners, but the true battleground states moved up and down in lock-step as videos turned to debates turned to Sandy. Assume there are three states with the probability of A, B, and C voting Democratic: 25%, 50%, and 75%. You can assume that if one state votes Democratic it will be C, if two states vote Democratic it will be B and C, and if three states vote Democratic it will be A, B, and C. If you assume the possibility of any other combinations at 0%, you are generally going to be fine.
The correlation between the senatorial elections is much less correlated. 16 of 33 elections were between 5% and 95% when I introduced my senatorial predictions in June of 2012. The third most likely for the Republicans in this group was Indiana at 82% for the GOP. The most Republican of the toss-up states was Missouri at 52% for the GOP. Indiana fell to the Democrats in a reasonably tight race and the Missouri fell in a landslide. Both candidates said questionable states on rape and their polls plummeted; their statements certainly entered the public debate, but there is little evidence that their individual falls seriously affected other candidates. Over the five month period of my data the correlations between initial rank of probability and final rank of vote share was 0.78. Assume there are three states with the probability of A, B, and C voting Democratic: 25%, 50%, and 75%. You can assume that if one state votes Democratic it will be C, if two states vote Democratic it will be B and C, and if three states vote Democratic it will be A, B, and C. If you assume the possibility of any other combinations at 0%, you are going to have a problem.
Election Day poses a different type of uncertainty than the course of the election. Election Day uncertainty can be correlated over both types of elections if polling is systematically biasing one party of the other. State-to-state polling for senatorial election is still less likely to be systematically biased than state-to-state polling for the Electoral College, as the polling itself is less correlated between companies and time. Yet, it is legitimate to assume that the uncertainty left on Election Day, unlike uncertainty during the campaign season, is relatively correlated for senatorial elections.
What does this mean for 2014?
The Democrats currently control 34 seats, the Republicans 30, and there 36 seats up for election. So, the Democrats, who need 50 seats for a majority, need to win 16 seats to control the senate and the Republicans, who need 51 seats for a majority, need to win 21 seats.
If this was the Electoral College, I would be comfortable lining up the states from most likely Democratic to least likely Democratic. In that list, Georgia is the swing state (if Orman goes Democratic at 49) or Colorado (if Orman goes Democratic at 50). Thus, I could say the likelihood of the Democrats controlling the senate was 35% or 27%, depending on your Orman assumption. Or 31% if you assume Orman flips a coin (50% to causus with either party) in the scenario that the Democrats hold 49 other seats and he wins. This ranking method can be attributed to Ray Fair and talked about extensively in 2012.
But, the senate is different, in that North Carolina is 72% likely to go Democratic and Alaska is 15% likely to go Democratic. If this was the Electoral College, I would say that the possibility of the Alaska going Democratic and North Carolina Republican was about 0%. States just do not leapfrog like that when the movement is so correlated. But, the possibility of the Alaska senatorial election going Democratic and the North Carolina going Republican is about 15%*28% = 4% (maybe a little less, due to some correlation).
In practice, this does not change the answer that much; assuming near independence (and 50% likelihood Orman goes Democratic if they control 49 seats) we get a probability of 27% that the Democrats control the senate. Near independence versus near perfect correlation lowers the probability just a few percentage points. But, it does dramatically alter the possible coalition that the Democrats or Republicans bring to the next senate; Begich from Alaska may toil in the minority and Hagan from North Carolina could lose, even if the Democrats hold the senate.
DavidMRothschild on October 17, 2014 @ 10:36AM
I launched a new website, with a few friends, including Miro Dudik and David Pennock, called Microsoft Prediction Lab. The website consolidates research into both non-representative polling and prediction games. I have spent years understanding how various raw data: polling, prediction markets, and social media and online data, can be transformed into indicators of present interest and sentiment, as well as predictions, of varying populations. Then, how decision makers allocate resources with the low latency and quantifiable market intelligence that we produce. Microsoft Prediction Lab allows us to continuously innovate not only on the path of raw data to analytics to consumption, but the collection of the data itself.
Microsoft Prediction Lab serves two symbiotic purposes; for it to be a successful laboratory, it must also be a successful product, and vice-versa. The project is designed to promote engagement and showcase the bleeding-edge work of Microsoft Research (and other collaborators). Further, the research is making an impact in how people create predictions in the several billion dollar election industry, and that will spread into other domains soon.
Markets: Markets have been an efficient method of aggregating data for millennia, and prediction markets have been forecasting elections for over century, but there is room for improvement. Here are a few of the innovations we are exploring in Microsoft Prediction Lab. First, we are examining how well markets can work without currency by using incentives like teams, leaderboards, etc. Second, we are examining how we can lower the barriers to entry into markets by making more intuitive interfaces and wording the questions efficiently depending on the user’s knowledge of markets and expectations. Third, we are adapting the right questions for the right people to ensure that information flow is maximized from the users to the market. Fourth, once the data is collected we are using fully combinatorial market makers. Individual probabilities are interesting, but combinatorial and conditional probabilities pose a meaningful and interesting challenge.
Polls: The only acceptable form of polling in the multi-billion dollar survey research field utilizes representative “probability” samples; my colleagues and I argue that with proper statistical adjustment, non-representative polling data can translate into accurate predictions, and often in a much more timely and cost-effective fashion. We demonstrated this by applying multilevel regression and post-stratification (MRP) to a 2012 election survey on the Xbox gaming platform. This was an incredibly non-representative sample. But, not only did the transformed top-line projections from this data closely trend standard indicators, we used the unique nature of the data’s size and panel to answer a meaningful political puzzle. We found that reported swings in public opinion polls are generally not due to actual shifts in vote intention, but rather are the result of temporary periods of relatively low response rates among supporters of the reportedly slumping candidate. We raise the possibility that decades of large, reported swings in public opinion—including the perennial “convention bounce”—are mostly artifacts of sampling bias. More broadly, the work on the Xbox, and subsequent studies with Sharad Goel, show great promise for using non-representative polling data to measure public opinion and general social science questions at a lower cost, with more speed and flexibility.
Visit the new site at: Prediction.Microsoft.com.
DavidMRothschild on October 13, 2014 @ 1:29PM
We use really simple and transparent methods for creating forecasts for the gubernatorial and senatorial elections. Everything I do is outlined is this forthcoming paper. The method is unchanged from 2012, but the coefficients are updated with 2012 data.
I consider three different types of data: fundamental, polling, and prediction markets. Fundamental data includes: incumbency, past election results, change in economic indicators, presidential approval, state ideology, and biographical data. Polling data includes aggregated traditional polls Huffington Post’s Pollster and Real Clear Politics. Prediction market data includes prices on contracts from Betfair.
All of the data needs to transform from raw data into predictions. For fundamental data I take advantage of historical correlations, tested for out-of-sample robustness, to match current variables to likely outcomes. For polling I ameliorate several different biases, including the anti-incumbency bias (where incumbents poll lower early than they do on Election Day) and reversion to mean (where big lead tend to contract). For prediction markets I focus on the favorite-longshot bias where prices tend to be under-confident.
I transform the raw data into three separate probabilities of victory and then combine them to form a single probability of victory. The combined probability of victory is accurate, updates regularly, answers the key question of most stakeholders, and easily scaled from Electoral College to senatorial to gubernatorial.
There is no question that there are more complex forecasts out there, but they are no more accurate than my forecasts. Why? Because they lack the identification to verify their “improvements”. And, because of their complexity, their forecasts do not easily scale to gubernatorial or House elections.
DavidMRothschild on October 10, 2014 @ 7:59AM
The balance of power in the senate is both extremely tight and extremely important. I get that. But, race for race, the gubernatorial elections are fascinating to follow. We have seven races between 30% and 70% and all of them have national implications. The most interesting part of this list is that in six of the seven races the incumbent (or seat) is a Republican, many thought leaders of their party. Depending on how many of them turn Democratic, the narrative of a Republican wave will be in serious jeopardy. From most likely Republican to most likely Democratic:
1) Wisconsin (36% Democratic): Scott Walker, Republican incumbent, is running against Mary Burke, Democrat. Walker survived a recall vote in 2012 that was directly related to him slashing benefits for public union employees (excluding police and firefighters). He also championed a voter id law that the Supreme Court blocked on October 9, 2014. Both of these issues have become very prominent for the Republicans in the last few years and Walker is a leader within the party for championing them.
2) Arizona (38% Democratic): The incumbent Republican, Jan Brewer, is still the most interesting aspect of this race between Doug Ducey (R) and Fred DuVal (D). Arizona faced boycotts in over its immigration policy, specifically SB 1070. Further, with the shooting of Congresswomen Gabby Giffords and her subsequent push for gun control, Arizona is now a hotbed of discussion on two major national issues.
3) Kansas (48% Democratic): Sam Brownback, the Republican incumbent, is in serious trouble against Paul Davis, the Democratic challenger. Brownback actually implemented serious austerity in Kansas. He cut taxes and cut the budget. This may sound like standard Republican policy, but it is extremely rare to see both cuts so deep (and frequently the taxes are cut, but the budget is not). And, it has been a serious disaster so far. That is how an incumbent governor in a very red state is in serious trouble.
4) Florida (49% Democratic): The current governor, Republican Rick Scott, is up against former governor, Democrat Charlie Crist. Scott is known nationally for two Republican initiatives: cutting funding for public transportation infrastructure and drug testing people on government assistance. But, this election is more about a clash of personalities.
5) Colorado (50% Democratic): Incumbent Democrat John Hickenlooper is in big trouble against Republican Bob Beauprez. This time it is signature Democratic policies under attack as Hickenlooper championed: cannabis, gun control, and eliminating capital punishment.
6) Maine (54% Democratic): Incumbent Republican Paul LePage is in serious trouble against Democratic challenger Mike Michaud. LePage has found his way into the national spotlight for a string of insensitive or inflammatory remarks. And, his stand against union works, both real and symbolic.
7) Alaska (56% Independent): Independent Bill Walker is leading incumbent Republican Sean Parnell. Sean Parnell took over the seat when Sarah Palin resigned after the 2008 presidential election.