DavidMRothschild on September 26, 2013 @ 10:01AM
We obsess about the aggregated prices that emerge from markets, whether it is oil, the Dow Jones, or the prediction market contract on who will be the next president of the United States. The price is a reflection of the subjective beliefs of individual traders, and we spend too little time considering the individual traders’ expectations, strategies, and motivations that combine to create that price. Rajiv Sethi of Barnard College and I were very lucky to examine a unique dataset this summer, which allowed us to learn more about how individual traders behave in markets; specifically, we examined trade-level data for all trades that occurred in the final two weeks of the 2012 election for either Obama or Romney to win on Intrade, the largest political prediction market in 2012. Our main academic finding is that traders are surprisingly one directional, almost always buying contracts favoring one of the two candidates. A secondary finding that has garnered popular attention is that one trader heavily influenced the price of these contracts, possibly for potential political gain, by investing nearly $4 million in Romney positions over two weeks.
The academic question our paper examines is the trading strategies and motivations of the traders. And our finding is that most traders are either performing arbitrages (i.e., buying and selling contracts that guarantee them a small return) or trading in one direction (i.e., only going long on Obama or Romney). The archetypical informational trader gets new information and goes long on whichever side it favors. What we see is that traders get new information and use it to keep trading for their chosen candidate. An example would be a situation where there was bad news for the Romney campaign. Obama traders would go long for Obama and push up the price for Obama, but after a little while, the Romney traders would push back when they thought the price has moved too far towards for Obama. Everyone gets to keep going long for their candidate, but the new price still reflects the new information. Rajiv goes into this in more detail in an earlier blog post.
But, you are probably not reading this article to learn about trading strategies, so I will pivot to the possible market manipulation.
Many observers noted in real-time that a peculiar wedge opened up between Intrade and another prediction market, Betfair. Intrade was consistently 5-10 percentage points more bullish on Romney (i.e., a contract that paid out $1.00 if Obama won could trade for $0.70 on Intrade and $0.75 or $0.80 on Betfair). Our new dataset shows that one trader was making that happen by providing massive amounts of liquidity to the Romney side of all trades (accounting for roughly 1/3 of all action favoring Romney) and creating “fire-walls” of several hundred thousand dollars to keep the Romney price static at times of high information flow (e.g., if the price for an Obama contract was $0.70 the trader would note on the order book the willingness to sell $100,000 or more of Obama to win at $0.70 so that Obama traders could keep buying at that $0.70 price indefinitely without the price going up). Note that we have no personal information on this trader, just the trader’s trades.
We provide three possible explanations for the trader’s strategy: (1) the trader was convinced that Romney was underpriced throughout the period and was expressing a price view, (2) the trader was hedging an exposure held elsewhere, or (3) the trader was attempting to distort prices in the market for some other purpose.
(1) Simply going long Romney is unlikely because at any point the trader could have gone to Betfair and purchased the same contract for less money. There are many costs to utilizing Betfair: it trades in British Pounds, it blocks U.S. ip addresses, etc. But, a trader of this size could have overcome those costs at less money than the trader’s loss by utilizing Intrade versus Betfair.
(2) Hedging is a more plausible explanation for the trader’s behavior, but still unlikely. Earlier academic literature shows that some market indexes and the likelihood of the election outcome can be correlated, but we could not find similar patterns in 2012 (e.g., historically, sudden increases in the likelihood of a Democratic victory are adversely correlated with the S&P, despite historical correlations of stronger stock markets under Democratic presidencies). We cannot eliminate the possibility that the trader was hedging some more detailed securities like specific energy contracts that traders may have concluded would be more tightly impacted by the election outcome (e.g., the trader could have been going long Romney to cover potential losses in renewable energy contacts should Romney have won the election).
(3) Distorting the price for some other purpose, possibility political, is the most likely motivation of the trader. Placing hundreds of thousands of dollars on the order book, precisely at times of a lot of new information, is a strategy that maximizes the impact of the investment, but not the return. One of the trader’s most active periods was between 7:30 PM and 9:00 PM ET on Election Day, when new information was arriving by the second. At that time the trader essentially placed so many potential trades at about $0.70 per $1.00 for Obama that the trader was telling the market that s/he would match anyone who wanted to buy Obama at that price. This froze the price for the crucial 1.5 hours between the first major reports of election returns and the last swing state poll closings. Ultimately, the trader spent about $375,000 during that 1.5 hours and, as soon as the trader left the market at 9:00 PM, Obama shot up past $0.90 per $1.00 contract.
If this trader were attempting to manipulate the market, it is a three step process to make it successful: (1) change price, (2) convince people it is real, and (3) have people change behavior because of it.
(1) One trader was able to control the price of a liquid market with a massive wall of limit orders for two reasons. First, markets, by design move with large quantities of money regardless of whether it is 1,000 people with $100 or 1 person with $100,000. A sea of traders could move a stock price or Warren Buffett alone could move a stock price, as the market does not care about the motive or quantity of traders. Second, the government’s harassment of Intrade, and other online markets, made it difficult and risky to join and keep money in Intrade, which limited participants and readily available money. Thus, even if it was tempting to buy Obama long at $0.70 per $1.00 contract on Election Day, it is likely that few traders had the money sitting in the market necessary to buy up all of the contracts that the Romney trader was willing to sell. You would need to already be a trader and have tens or hundreds of thousands of dollars sitting idle in your account.
(2) This is the hardest step; it is not as easy to convince people that the price level is real, but maybe people who want evidence will appreciate any data source that validates them. Sites that present prediction market data, frequently aggregated Intrade with other markets to ameliorate the concern that any one data source can be wrong. Overall, prediction market data was very successful in providing accurate and timely predictions of the 2012 election. Yet, if someone was looking for a reason to be hopeful about Romney, Intrade’s price provided a solid piece of data for them.
(3) If people are convinced it is real, the impact on the campaign is going to happen. There is a cascading effect to being a viable candidate; the more viable a candidate appears, the more money and volunteers, support, and turnout the candidate receives. Thus, the more viable the candidate appears, the more viable the candidate becomes. When it comes to Election Day, one piece of positive news may be the validation someone needs to stop off at the polling place on the way home from a long day at work.
If manipulation could be successful, it would be worth it. If a few million dollars could boost fundraising and morale, than it would be a good investment next to one more television advertisement in a flooded Ohio, Florida, or Virginia market. Roughly $28 million was spent on TV advertising in just one state, Ohio, in the last week of the election alone.
We cannot say for sure if this market was manipulated, but someone definitely shifted the price heavily towards Romney and maintained that price imbalance until 9 PM on Election Day, when the polls closed in the last swing state and the election was finally up to the vote counters. Yet, despite this trader’s efforts, most observers, even if they were following just prediction markets, still received a very accurate forecast of the election.
This column syndicates with the HuffingtonPost.
DavidMRothschild on September 04, 2013 @ 9:05AM
On August 4 I tweeted that “Smart money is on de Blasio edging out [Bill] Thompson” for the second spot on the runoff. I followed that up by noting that Christine Quinn’s trajectory was troubling; it is not a good sign for a runoff if you are heading in the wrong direction. Both statements proved prescient as de Blasio was fourth in the polls on August 4 and is the current heavy favorite to be the next mayor of New York City. Meanwhile Quinn’s downward trajectory may push her out of a potential runoff, or even ameliorate the need for a runoff. But, Twitter does lead a little too much to the imagination, so here are some more details on the New York City mayoral contest.
The election in New York City is, potentially, a three step process. First, both parties have a primary on September 10. Second, if no candidate receives over 40% of the vote in the primary, there is a runoff between the top two candidates on October 1. Third, there is an election between the Democratic and Republican candidate on Tuesday, November 5.
Through the end of July Quinn and Anthony Weiner were trading spots on the top of the polls with de Blasio and Thompson battling it out for third and fourth. The New York Times poll that was in the field from August 2-7 showed the completion of Weiner’s fall to nearly single digits, but Quinn still had a lead and there were a huge amount of undecided voters; de Blasio was third with 14% and Thompson second with 16%. After that, August saw a string of six straight polls with de Blasio leading, finally blowing past 40% with the latest Quinnipiac poll that was in the field from August 28-September 1.
Meanwhile, Quinn continued to plateau through July with a string of polls with her leading, but always in the 22% to 32% range and no upward trajectory. As the frontrunner, this was troubling, because she has a lot of time to make her case to the Democratic electorate. Since then she has shown a consistent downward trajectory with the last three polls putting her below 20% and, crucially, below Thompson.
The smart money I was referring to on August 4 was the bookies like Paddy Power, Stan James and few other others; but, it was not as easy picking out the best odds. First, Quinn had the best odds of all, so she was still the favorite to get one of two spots in the potential run. Second, de Blasio had slightly more favorable odds than Thompson. The polls showed them nearly tied, but the betters favored de Blasio. Thus, the smart money had de Blasio pulling ahead of Thompson. Third, with Quinn still dominating the polls the bookies had no reason to push Quinn down below de Balsio and Thompson; it was still safe money to assume she had a higher probability than either de Blasio or Thompson. But, did she have a higher probability than the ultimate winner of the "not-Quinn" fight between de Blasio and Thompson; in my reading of the data, no.
The likely Republican candidate, Joe Lhota, is trailing any Democratic candidate by a wide margin. After five terms of Republican mayors, the Big Apple looks poised to put a Democrat back in Gracie Mansion and it is about 60-65% Bill de Blasio will be the next mayor of NYC.1
Then again, my friend Andrew Gelman of Columbia has a timely reminder for us in his blog: primary elections are hard to predict.
DavidMRothschild on June 11, 2013 @ 11:49AM
We start with three different types of data …
Voter Intention Polling Data: Polling data has been the most prominent component of election forecasts for decades. From 1936 to about 2000, it was standard to just display the raw data, the results of individual voter intention polls, as an implicit forecast of an election. By 2004 poll aggregation became common on the internet (see Pollster.com). Although aggregated polls provide both stability and accuracy relative to individual poll results, as an implicit estimated vote share they still succumb to two well-known poll-based biases, especially earlier in the cycle: polls demonstrate larger margins than the election results and they have an anti-incumbency bias (i.e., early leads in polls fade towards Election Day and incumbent party candidates have higher vote shares on Election Day than their poll values in the late summer into the early fall) (see James Campbell). In 2008 some websites finally began publishing versions of aggregated and then debiased poll-based forecasts (see Nate Silver, 2008). Further, they shifted the outcome to the probability of victory in the Electoral College or senatorial elections versus the expected vote shares. Yet, raw, daily polls still dominate popular press coverage and simple aggregation and debiasing of polls is just starting to permeate the academic literature.
Fundamental Data: There is also a long history of econometric models that forecast elections with fundamental data. These models use a variety of economic and political indicators such as: past election results, incumbency, presidential approval ratings, economic indicators, ideological indicators, biographical information, policy indices, military situations, and facial features of the candidates. There are numerous examples of articles that forecast the national presidential vote share; there is a nine-page reference list in my paper with Patrick Hummel. However, there are few models that focus on Electoral College or senatorial elections; most simply forecast national vote shares. Further, models that include late arriving or non-duplicable data dominate the literature and press; these models cannot create forecasts until late in the cycle, if they can create forecasts before the election at all.
Prediction Market Data: The modern history of prediction markets is not as long as the other two data sources. The Iowa Electronic Market launched the modern era of prediction markets in 1988, introducing a winner-takes-all market in 1992. This type of market trades binary options which pay, for example, $10 if the chosen candidate wins and $0 otherwise. Thus, an investor who pays $6 for a “Democrat to Win” stock, and holds the stock through Election Day, earns $4 if the Democrat wins and loses $6 if the Democrat loses. In that scenario, if there are no transaction or opportunity costs, the investor should be willing to pay up to the price that equals her estimated probability of the Democrat winning the election. The market price is the value at which, if a marginal investor were willing to buy above it, investors would sell the stock and drive the price back down to that market price (and vice-versa if an investor were willing to sell below it); thus, the price is an aggregation of the subjective probability beliefs of all investors. Scholars have found that prediction market prices can create more accurate forecasts than polls-based forecasts in the last few cycles (see Berg et al. or my earlier paper) and in historical elections (see Paul Rhode and Koleman Strumpf). Like polls and fundamental data, prediction market prices also suffer a bias, the favorite longshot bias. Unfortunately, both the press and academia, if they acknowledge prediction markets at all, only cite raw prediction market prices as forecasts, thus failing to correct for these biases.
We ask what makes a good forecast …
One simple question motivates our method, what combination of these three key data types creates the most accurate, relevant, and timely forecasts (i.e., the most efficient and useful forecasts for the relevant stakeholders)? First, the answer is crucial for researchers studying electoral politics, or any other domain with forecasts, because accurate and granular forecasts allow them to connect shocks to the campaign with changes in the underlying likelihood of the relevant outcomes. Second, forecast accuracy is important for practitioners (i.e., campaigns or investors in campaigns) who want to make efficient choices when they spend time and money in the multi-billion dollar industry of political campaigns.
Accuracy: There have been few meaningful attempts to combine these different data types into a single forecast, even though the literature is clear that combining data is generally very effective in increasing accuracy. There are few exceptions, but most papers only investigate the national vote share and use simpler interpretations of the raw data. Overall, three related, but largely non-intersecting academic literatures persist, despite their shared goal of accurately forecasting election outcomes.
Relevancy: There is little discussion about what is the most relevant forecast. Academic forecasts tend to estimate vote share for two key reasons: academic literature focuses on incremental improvements on historical forecasts and estimated vote share is the historical standard, and observers frequently interpret raw polls as a naïve estimations of vote share, making it the simplest rubric. Expected vote share is certainly still extremely important for election workers, especially broken down by targetable demographics, but the marketplace for the general population is very clear that it desires probability of victory in the Electoral College. Further, state-by-state forecasts for the Electoral College not only offer a more compelling indicator for researchers and practitioners or investors, it also provides much more identification than forecasts of the national vote share.
Timeliness: There is no emphasis on the utility of the forecast when it is released; forecasts are judged by academia and the press at the time they are released or they are judged as if they were released on the eve of the event. Yet, both election researchers and practitioners benefit from early forecasts, when there are more resources left to allocate. And, they both benefit from timely forecasts, which provide a granular account of the election for researchers and are up-to-date when the practitioners or investors need to make a decision.
We create a model that combines the three types of data and maximizes the three attributes of a good forecast …
First, we aggregate then debias the raw voter intention polling data, using parameters that we calibrate separately by: election type, days before the election, and the certainty of the raw data. The resulting forecast is the most accurate poll-based forecast readily available.
Second, we examine and clarify the transformation that debiases raw prediction market data, yielding an improved prediction market-based forecast.
Third, we combine the three forecasts based on polling data, fundamental data (using my work with Patrick Hummel), and prediction market data. The weighting parameters for our model demonstrate and capitalize on the shifting strength of the different forecast types across the studied timeframe; 130 days out, the forecast averages the separate forecasts from all three data types, but the fundamental model’s unique information decreases until Election Day, when the forecast is an average of the polling and prediction market-based forecasts.
We see how we did in 2012 …
We do not like to dwell too much on single election cycles, as the correlation between the outcomes somewhat diminishes the explanatory power of even 84 (51 Electoral College and 33 senatorial) different outcomes. Yet, PredictWise’s forecast does well in predicting the 2012 election; the below chart shows the errors every 4 hours for the last 130 days of the election in 2012. Unlike the within-sample from previous years, on which we created the model, it was not dominant at every point in the cycle, but it was the most consistent forecast. For a span of about 30 days early in the cycle when poll-based forecasts had a lower error than prediction market-based forecasts, PredictWise’s forecast was either below or near the poll-based forecast. Towards the end of the summer until last the month of the campaign, a span of about 45 days when prediction market-based forecast had a lower error than polls, PredictWise’s forecast again held closely to the lowest errors. At any given moment from 130 before the election to Election Day in 2012 PredictWise’s forecast is likely to have a lower error than either the completely poll-based or completely prediction market-based forecast.
Accuracy of probability of victory estimates for Electoral College and senatorial elections by fundamental data, voter intention poll, and prediction markets-based forecasts, along with PredictWise for 2012
There is no comparison with FiveThirtyEight in this post, because there is no comparison with FiveThirtyEight. I have compared PredictWise to three single data forecasts created with: voter intention polling, fundamental, and prediction markets data. FiveThirtyEight is some unknown combination of polling and fundamental data. First, FiveThirtyEight did not post senatorial predictions until about Labor Day. Without predictions during the toughest part of the process, it is impossible to compare our forecasts. Second, FiveThirtyEight updated sparingly until the very end, so their forecast were frequently stale. And, for the record, we had 50 of 51 Electoral College races correct on February 16, 2012 … forecasts records on the eve of the election are not useful to the stakeholders and do not interest us.
DavidMRothschild on May 18, 2013 @ 11:54AM
May 18 at 6:05 ET: Halfway through the voting and only two viable countries left Denmark (89%) and Ukraine (7%). Of course, this was our initial top and second predictions for first place.
5:58 PM via ET Twitter: Calling it for Denmark with 17 of 39 countries voting! #ev2013
5:51 PM via ET Twitter: It is now officially at 2 team race between Azerbaijan and
Denmark at #EV2013!
4:29 PM ET via ET Twitter: Top worldwide trend on twitter is #EV2013.
Unsure what it means, 6th top trend #Eurovision2013:
real-time forecast: ow.ly/lapuc
May 18 at 12:00 ET: On Monday, May 13, I cemented my pre-Eurovision 2013 predictions with Denmark at 41% likely to win. After a strong semi-final performance, and the field shrinking from 39 to 26 competitors, my prediction on the eve of the final now has Denmark at 55% likely to win. Norway is the second most likely winner at 14%, followed by Russia and Ukraine at 5%. You can follow predictions live as they update during the competition tonight here:
DavidMRothschild on March 02, 2013 @ 8:10PM
I judge my predictions on four major attributes: relevancy, timeliness, accuracy, and cost-effectiveness. I am very proud of my 2013 Oscar predictions, because they excelled in all four attributes: they predicted all 24 categories (and all combinations of categories), moved in real-time, were very accurate, and built on a scalable and flexible prediction model.
Relevancy is the only one of my major attributes that relies on the subjective input of stakeholders, rather than an objective measure; I relied on people with more domain specific information than myself about what I should predict and, after watching my first Oscar show from start to finish, it certainly felt that any relevant set of predictions should have all 24 categories. Of the six major categories that fall into the standard set of predictions, only two, best supporting actor and actress, are scattered into the first 20 awards. The show does not get to the biggest four awards, best: picture, director, actor, and actress until well past 11:30 PM ET. If I were watching the Oscars casually with my family or friends I would certainly want information on all 24 categories to sustain interest throughout the telecast. Further, predictions in all 24 categories are necessary to predict the total quantity of awards won by any given movie.
The real-time nature of my predictions proved extremely interesting in both quantifying and understanding the major trends of the awards season; further, predictions that are created just before Oscar day, are not available to interested people during these earlier events. Both major trends that I illustrated the day before the Oscars played a big role on Oscar night: Argo's rise with the award show victories and Zero Dark Thirty's fall with the increased concern over its depiction of torture. A third trend is evident in both the major categories where Django Unchained competed. Winner of best supporting actor, Christoph Waltz, moved from a small10 percent likelihood of victory at the start of the season to 40 percent on Oscar day, a hair behind Lincoln's Tommy Lee Jones. And, taking advantage of Zero Dark Thirty's fall, Django Unchained came from behind for a commanding lead in the prediction for best original screenplay by Oscar day.
The first judge of accuracy is the error; my error is meaningfully smaller than the best comparisons. A simple way to calculate the error is the take the mean of the squared error for each nominee, where the error is (1- probability of victory) for a winner and (0 - probability of victory) for a loser. A full set of predictions is 122, with 22 categories of 5, 1 category of 9, and 1 category of 3. My final predictions at 4 PM ET on Oscar day had a MSE of 0.067. One comparison is my earliest set of predictions, which had a MSE of 0.108; the error got smaller and smaller as the award shows and other information spread into my predictions. Nate Silver's FiveThirtyEight only predicted the big six categories and Mr. Silver provided prediction points, rather than probabilities. But, converting his predictions into probabilities by dividing each nominee's points by the sum of points in the category, he had a MSE of 0.075 for those six categories to my meaningfully smaller 0.056. A final comparison is with my only input that had all 24 categories; the Oscar day prediction of Betfair, the prediction market, has virtually the same error as mine. Which is why I also consider calibration.
The second judge of accuracy is the calibration; my calibration is very strong. The easiest way to check calibration is to chart the percentage of predictions that occur for bucketed groups of predictions (e.g., for all of my predictions around 20 percent, how many occur?). As you can see from the chart, when I made a prediction that was around 20 percent, around 20 percent of the predictions occurred. Admittedly, my gut was a little concerned with prediction like 99 percent for Life of Pi to win best visual effects or 97 percent for Les Miserables to win best sound mixing, but that is why I trust my data/models, not my gut. Betfair, which has nearly identical errors, is systematically under confident; while my predictions dance around the magical 45 degree line (i.e., perfect calibration line), 100 percent of Betfair prices that round to 50, 80, 90, and 100 occur, while prices that round to 70 occur 80 percent of the time.
Sources: Betfair, Intrade, Hollywood Stock Exchange
The third judge of accuracy is has to do with the models themselves and is not born out in any one set of outcomes; is the prediction model robust for the future or over-fitted to the past and/or present. First, my models examine the historical data, but are carefully crafted using both in-sample and out-of-sample data to ensure they predict the future, rather than describe the past. Second, I always calibrate and release my models without any data from the current set of events. Models released too close to an event frequently suffer from inadvertent "look ahead-bias" where the forecaster, knowing what the other forecasters and his/her gut is saying, inadvertently massages the model to provide the prediction they want. That is why I release my models to run at the start of any season without ever checking what the current season's data will predict, before they are released.
The cost effective nature of my modeling is the key to predicting all 24 categories. Along with traditional fundamental data, my model relies primarily on easily scalable data, like prediction markets and user generated experimental data. It is scalable and cost effective models/data that will eventually allow us to make incredible quantities of predictions in a wide-range of domains. Adding accuracy in an existing set of predictions like best actor or expected national vote share in the presidential election is fun and could be meaningful, but creating accurate, real-time predictions in ranges of questions that could not exist before is the real challenge and goal of my work.
This column syndicates with the HuffingtonPost.