DavidMRothschild on October 27, 2014 @ 9:02PM
The New York Times’ Upshot published an article on their latest New York Times/CBS News/ YouGov poll which highlighted work that I have done with Justin Wolfers of the University of Michigan on expectation polling. The expectation question asks, “Regardless of how you are voting, which candidate do you think is most likely to be elected?” Historically, this question has been extremely effective at pointing towards the eventual winner of the election and, even identifying the vote share. (For more background, you can read the paper or this Q&A with Justin in 2012.)
When it comes to the 2014 senatorial elections the most interesting differentials between the expectation polling with traditional intention polling is the sizable lead in the expectation polling for the Republican incumbent Roberts in Kansas and Republican Perdue in the open Georgia seat. Both of these are dead-heats with the litany of quantitative forecasters (Upshot, FiveThirtyEight, Huffington Posts’ Pollster, etc.) whose forecasts are mainly driven from traditional polling. And, the traditional intention question in this YouGov poll also has them within the margin of error. But, the Republicans have commanding leads in the expectation question.
The below chart shows the percent of Democratic and Republican supporters that expect the candidate from the other party to win the election. I count just the supporters that expected one of the two major candidates to win the election (i.e., I disregard those people who respond that they do not know who will win). I plotted them from left to right depending on the expected vote share of the Republican candidate on PredictWise.
Note: there is no Democratic candidate in Kansas, but rather an independent running against a Republican.
Despite millions of dollars of polls (each poll in Pollster and Real Clear Politics’ lists costs tens of thousands of dollars) and millions of lines of historical data feeding into PredictWise’s algorithms, this single poll’s expectation polling breaks perfectly on the 50% line. For every election that PredictWise expects the Republican to win, a higher percentage of Democrats cross over to the Republican candidate. Similarly, for every election that PredictWise expects the Democrat to win, a higher percentage of Republicans cross over to the Democratic candidate.
There is strong upward slope depending on how convincing PredictWise expects the election to be, but all of the identification comes from the opposite party support. The percentage of Republicans that expect the Republican candidate to win is not too interesting once the Republican has a commanding lead (right side of the chart). But, the left side of the chart has a lot of identification about the expected vote share. While it is not a strictly monotonic relationship, generally, the higher percentage of the vote share the Democratic candidate is likely to receive, the more Republicans cross over and think the Democratic candidate will win.
We know that there is a lot of information going into people’s expectations: their own voting intention, the voting intention of their social network, and what they are hearing from the media. Further, this interplay of information varies by demographics. Yet, what matters for forecasting is not the exact information that goes into each forecast, but that it is meaningful information and that its relationship with the outcome is stable.
The data from expectation polling is so powerful, that it is the equivalent of the respondent going out and asking 10 random likely voters who they will vote for, including their own vote, and then telling the pollster who won his/her private poll (of 10 random likely voters and themselves). We do not think that that actually is what people do when they answer the poll, but that is how powerful the poll is for forecasting.
We have no problem that 30-50 percent of partisans think their candidate is going to win in landslide losses; actually it makes perfect sense! In our model that percentage perfectly mirrors the impact of the supporter including him/herself in their poll. A Republican supporter starts his/her personal poll with one definite Republican supporter. We know in reality it is also driven by some wishful thinking, but that is ok, because, this relationship between expectations and outcomes is stable through dozens of election cycles, with varying degrees of media coverage.
So, PredictWise, with its millions of dollars of inputs and millions of lines of data is saying a toss-up in Georgia and Kansas. But, the expectation of the local voters, from just one poll, is a mighty powerful data point. Thus, I expect Georgia and Kansas will likely go Republican.
DavidMRothschild on October 23, 2014 @ 4:02PM
Things just keep getting worse for the Democrats in the senate. We now have the Democrats at just 19% to hold onto the senate. The amazing thing is that the continuous slide is not any serious slip-ups, but just the gradual shifting of leaning Republican to strong Republicans and one (or two) big surprises.
First, it was never really likely that the Democrats were going to carry Georgia, Alaska, Louisiana, South Dakota, Kentucky, or Arkansas. Several of the seats were blue, but the states are red. Over the last few months anything thing should be red, has just gotten a little redder, and that cements as time goes by. It is nearly Election Day and there have been no campaign altering incidents (i.e., no talk of rape, like Mourdock in 2012, or macaca, like George Allen in 2006).
Second, Colorado, a blue state, is proving very difficult for the Democrats in the polls; both the senatorial and gubernatorial incumbents are losing. I do not have a great explanation for this, but it is consistent in both races. I have heard arguments on both sides and feel confident in saying it is equally likely the polls are off in either direction; there is no reason, ex-ante, to assume the polls are off in the direction of the Republicans.
Third, Iowa is still a wild card in that it is both very tight and Ernst is very conservative and her handlers appear worried about her talking unscripted. A few hours ago she skipped her planned meeting the editorial board of the largest paper in Iowa. The 27% in Iowa reflects the uncertainty in the polls, there is slightly additional uncertainty in the candidate that the model is not capable of fully realizing.
Here is New York Times and FiveThirtyEight compared with PredictWise:
DavidMRothschild on October 19, 2014 @ 10:26AM
The last week and a half has been an unmitigated disaster for the Democrats in the polls and their fleeting chances of holding onto the majority of the senate.
The Democrats only bright spot has been Georgia, where Michelle Nunn has pushed into a tight race with David Perdue. The most likely outcome of the election is a runoff between the two candidates, as it is likely neither will get 50% of the vote. Which is why, despite leading many polls, Nunn is still slightly less than 50% to win, as Libertarian supporters are little more likely to break for Perdue in a runoff.
The biggest movement is in Kansas, where voters appear to be second guessing their choice of a Democratic leaning independent. The next biggest movement is more surprising, as the incumbent Democratic senator Udall in CO has fallen steadily behind the Republican challenger Gardner. Finally, the Iowa senate race has been a bit more of a regular roller coaster as Ernst, the Republican, holds a slight, but steady, lead over Braley.
South Dakota has been added to the short list, but Rounds, the Republican, still hold a strong lead in the three candidate race. Polling is sparse in South Dakota though and we are still waiting to see what will happen next!
Here is New York Times and FiveThirtyEight compared with PredictWise. Not too much difference:
DavidMRothschild on October 17, 2014 @ 5:48PM
The race for the control of the U.S. senate feels a lot like the race for control of the Electoral College (i.e., president), but there are a few crucial differences. First, the only thing that matters after the Electoral College convenes is who won the Electoral College, but minority party senators still get to vote for the next six years (and may tip the majority in the next election or sooner). Second, the Electoral College is 51 elections about the exact same two people, while senatorial elections are about 36 different sets of candidates. Thus, movements in the Electoral College are highly correlated, but senatorial elections are very independent.
Individual elections matter after determining the balance of power in the senate, but do not in the Electoral College. The president is only elected every four years, so it is hard to measure the impact of margin of victory, but the most powerful president in the last few decades lost the popular vote and barely won the Electoral College. In the senate, it is hard for individual senators to enact change, but it is easy for individual senators to block change. Also, looking forward to 2016, the Democrats will defend just 10 seats to the Republicans 24. If the Democrats do lose the senate in 2014, they are likely to win it back in 2016 on the strength of senators who won in 2014.
Start of campaign season to Election Day
The Electoral College elections are extremely correlated. I introduced my Electoral College predictions on February 16, 2012 and there were 26 races where between 5% and 95% for Obama. When I ranked those 26 states from most likely to least likely and compared that to the rank of percentage of votes on Election Day, the correlation was 0.93. Most of the movement was at the less identified fringe, where Arizona was slightly more Democratic than a few similar states with 45% of two-party vote share for Obama (still a landslide) and Maine was also slightly more Democratic than similarly secure states with 58%. When we lined up all of the states in February and pointed out the pivotal states in the middle they went: FL, VA, OH, NH, CO, IA, PA, in that order. Nine months later the vote share in order was: FL, OH, VA. PA, CO, NH, IA. Over the course of nine months the secure states all drifted towards their likely winners, but the true battleground states moved up and down in lock-step as videos turned to debates turned to Sandy. Assume there are three states with the probability of A, B, and C voting Democratic: 25%, 50%, and 75%. You can assume that if one state votes Democratic it will be C, if two states vote Democratic it will be B and C, and if three states vote Democratic it will be A, B, and C. If you assume the possibility of any other combinations at 0%, you are generally going to be fine.
The correlation between the senatorial elections is much less correlated. 16 of 33 elections were between 5% and 95% when I introduced my senatorial predictions in June of 2012. The third most likely for the Republicans in this group was Indiana at 82% for the GOP. The most Republican of the toss-up states was Missouri at 52% for the GOP. Indiana fell to the Democrats in a reasonably tight race and the Missouri fell in a landslide. Both candidates said questionable states on rape and their polls plummeted; their statements certainly entered the public debate, but there is little evidence that their individual falls seriously affected other candidates. Over the five month period of my data the correlations between initial rank of probability and final rank of vote share was 0.78. Assume there are three states with the probability of A, B, and C voting Democratic: 25%, 50%, and 75%. You can assume that if one state votes Democratic it will be C, if two states vote Democratic it will be B and C, and if three states vote Democratic it will be A, B, and C. If you assume the possibility of any other combinations at 0%, you are going to have a problem.
Election Day poses a different type of uncertainty than the course of the election. Election Day uncertainty can be correlated over both types of elections if polling is systematically biasing one party of the other. State-to-state polling for senatorial election is still less likely to be systematically biased than state-to-state polling for the Electoral College, as the polling itself is less correlated between companies and time. Yet, it is legitimate to assume that the uncertainty left on Election Day, unlike uncertainty during the campaign season, is relatively correlated for senatorial elections.
What does this mean for 2014?
The Democrats currently control 34 seats, the Republicans 30, and there 36 seats up for election. So, the Democrats, who need 50 seats for a majority, need to win 16 seats to control the senate and the Republicans, who need 51 seats for a majority, need to win 21 seats.
If this was the Electoral College, I would be comfortable lining up the states from most likely Democratic to least likely Democratic. In that list, Georgia is the swing state (if Orman goes Democratic at 49) or Colorado (if Orman goes Democratic at 50). Thus, I could say the likelihood of the Democrats controlling the senate was 35% or 27%, depending on your Orman assumption. Or 31% if you assume Orman flips a coin (50% to causus with either party) in the scenario that the Democrats hold 49 other seats and he wins. This ranking method can be attributed to Ray Fair and talked about extensively in 2012.
But, the senate is different, in that North Carolina is 72% likely to go Democratic and Alaska is 15% likely to go Democratic. If this was the Electoral College, I would say that the possibility of the Alaska going Democratic and North Carolina Republican was about 0%. States just do not leapfrog like that when the movement is so correlated. But, the possibility of the Alaska senatorial election going Democratic and the North Carolina going Republican is about 15%*28% = 4% (maybe a little less, due to some correlation).
In practice, this does not change the answer that much; assuming near independence (and 50% likelihood Orman goes Democratic if they control 49 seats) we get a probability of 27% that the Democrats control the senate. Near independence versus near perfect correlation lowers the probability just a few percentage points. But, it does dramatically alter the possible coalition that the Democrats or Republicans bring to the next senate; Begich from Alaska may toil in the minority and Hagan from North Carolina could lose, even if the Democrats hold the senate.
DavidMRothschild on October 17, 2014 @ 10:36AM
I launched a new website, with a few friends, including Miro Dudik and David Pennock, called Microsoft Prediction Lab. The website consolidates research into both non-representative polling and prediction games. I have spent years understanding how various raw data: polling, prediction markets, and social media and online data, can be transformed into indicators of present interest and sentiment, as well as predictions, of varying populations. Then, how decision makers allocate resources with the low latency and quantifiable market intelligence that we produce. Microsoft Prediction Lab allows us to continuously innovate not only on the path of raw data to analytics to consumption, but the collection of the data itself.
Microsoft Prediction Lab serves two symbiotic purposes; for it to be a successful laboratory, it must also be a successful product, and vice-versa. The project is designed to promote engagement and showcase the bleeding-edge work of Microsoft Research (and other collaborators). Further, the research is making an impact in how people create predictions in the several billion dollar election industry, and that will spread into other domains soon.
Markets: Markets have been an efficient method of aggregating data for millennia, and prediction markets have been forecasting elections for over century, but there is room for improvement. Here are a few of the innovations we are exploring in Microsoft Prediction Lab. First, we are examining how well markets can work without currency by using incentives like teams, leaderboards, etc. Second, we are examining how we can lower the barriers to entry into markets by making more intuitive interfaces and wording the questions efficiently depending on the user’s knowledge of markets and expectations. Third, we are adapting the right questions for the right people to ensure that information flow is maximized from the users to the market. Fourth, once the data is collected we are using fully combinatorial market makers. Individual probabilities are interesting, but combinatorial and conditional probabilities pose a meaningful and interesting challenge.
Polls: The only acceptable form of polling in the multi-billion dollar survey research field utilizes representative “probability” samples; my colleagues and I argue that with proper statistical adjustment, non-representative polling data can translate into accurate predictions, and often in a much more timely and cost-effective fashion. We demonstrated this by applying multilevel regression and post-stratification (MRP) to a 2012 election survey on the Xbox gaming platform. This was an incredibly non-representative sample. But, not only did the transformed top-line projections from this data closely trend standard indicators, we used the unique nature of the data’s size and panel to answer a meaningful political puzzle. We found that reported swings in public opinion polls are generally not due to actual shifts in vote intention, but rather are the result of temporary periods of relatively low response rates among supporters of the reportedly slumping candidate. We raise the possibility that decades of large, reported swings in public opinion—including the perennial “convention bounce”—are mostly artifacts of sampling bias. More broadly, the work on the Xbox, and subsequent studies with Sharad Goel, show great promise for using non-representative polling data to measure public opinion and general social science questions at a lower cost, with more speed and flexibility.
Visit the new site at: Prediction.Microsoft.com.