DavidMRothschild on March 02, 2013 @ 7:10PM
I judge my predictions on four major attributes: relevancy, timeliness, accuracy, and cost-effectiveness. I am very proud of my 2013 Oscar predictions, because they excelled in all four attributes: they predicted all 24 categories (and all combinations of categories), moved in real-time, were very accurate, and built on a scalable and flexible prediction model.
Relevancy is the only one of my major attributes that relies on the subjective input of stakeholders, rather than an objective measure; I relied on people with more domain specific information than myself about what I should predict and, after watching my first Oscar show from start to finish, it certainly felt that any relevant set of predictions should have all 24 categories. Of the six major categories that fall into the standard set of predictions, only two, best supporting actor and actress, are scattered into the first 20 awards. The show does not get to the biggest four awards, best: picture, director, actor, and actress until well past 11:30 PM ET. If I were watching the Oscars casually with my family or friends I would certainly want information on all 24 categories to sustain interest throughout the telecast. Further, predictions in all 24 categories are necessary to predict the total quantity of awards won by any given movie.
The real-time nature of my predictions proved extremely interesting in both quantifying and understanding the major trends of the awards season; further, predictions that are created just before Oscar day, are not available to interested people during these earlier events. Both major trends that I illustrated the day before the Oscars played a big role on Oscar night: Argo's rise with the award show victories and Zero Dark Thirty's fall with the increased concern over its depiction of torture. A third trend is evident in both the major categories where Django Unchained competed. Winner of best supporting actor, Christoph Waltz, moved from a small10 percent likelihood of victory at the start of the season to 40 percent on Oscar day, a hair behind Lincoln's Tommy Lee Jones. And, taking advantage of Zero Dark Thirty's fall, Django Unchained came from behind for a commanding lead in the prediction for best original screenplay by Oscar day.
The first judge of accuracy is the error; my error is meaningfully smaller than the best comparisons. A simple way to calculate the error is the take the mean of the squared error for each nominee, where the error is (1- probability of victory) for a winner and (0 - probability of victory) for a loser. A full set of predictions is 122, with 22 categories of 5, 1 category of 9, and 1 category of 3. My final predictions at 4 PM ET on Oscar day had a MSE of 0.067. One comparison is my earliest set of predictions, which had a MSE of 0.108; the error got smaller and smaller as the award shows and other information spread into my predictions. Nate Silver's FiveThirtyEight only predicted the big six categories and Mr. Silver provided prediction points, rather than probabilities. But, converting his predictions into probabilities by dividing each nominee's points by the sum of points in the category, he had a MSE of 0.075 for those six categories to my meaningfully smaller 0.056. A final comparison is with my only input that had all 24 categories; the Oscar day prediction of Betfair, the prediction market, has virtually the same error as mine. Which is why I also consider calibration.
The second judge of accuracy is the calibration; my calibration is very strong. The easiest way to check calibration is to chart the percentage of predictions that occur for bucketed groups of predictions (e.g., for all of my predictions around 20 percent, how many occur?). As you can see from the chart, when I made a prediction that was around 20 percent, around 20 percent of the predictions occurred. Admittedly, my gut was a little concerned with prediction like 99 percent for Life of Pi to win best visual effects or 97 percent for Les Miserables to win best sound mixing, but that is why I trust my data/models, not my gut. Betfair, which has nearly identical errors, is systematically under confident; while my predictions dance around the magical 45 degree line (i.e., perfect calibration line), 100 percent of Betfair prices that round to 50, 80, 90, and 100 occur, while prices that round to 70 occur 80 percent of the time.
Sources: Betfair, Intrade, Hollywood Stock Exchange
The third judge of accuracy is has to do with the models themselves and is not born out in any one set of outcomes; is the prediction model robust for the future or over-fitted to the past and/or present. First, my models examine the historical data, but are carefully crafted using both in-sample and out-of-sample data to ensure they predict the future, rather than describe the past. Second, I always calibrate and release my models without any data from the current set of events. Models released too close to an event frequently suffer from inadvertent "look ahead-bias" where the forecaster, knowing what the other forecasters and his/her gut is saying, inadvertently massages the model to provide the prediction they want. That is why I release my models to run at the start of any season without ever checking what the current season's data will predict, before they are released.
The cost effective nature of my modeling is the key to predicting all 24 categories. Along with traditional fundamental data, my model relies primarily on easily scalable data, like prediction markets and user generated experimental data. It is scalable and cost effective models/data that will eventually allow us to make incredible quantities of predictions in a wide-range of domains. Adding accuracy in an existing set of predictions like best actor or expected national vote share in the presidential election is fun and could be meaningful, but creating accurate, real-time predictions in ranges of questions that could not exist before is the real challenge and goal of my work.
This column syndicates with the HuffingtonPost.
Relevant, real-time, accurate, and scalable: 2013 Oscar predictions are a win for predictive science
DavidMRothschild on February 25, 2013 @ 12:33AM
Predicting the Oscars for me is not about the Oscars per se, but the science of predicting. The challenge was to make predictions in all 24 categories, when most predictions only do 6. The challenge was to make predictions that move in real-time during the time period between the nominations and the Oscars, when most predictions are static. The challenge was to make to predictions that were accurate, not just in the binary correctness, but in calibrated probabilities. The challenge was to make these cost effective predictions, so that they could not only scale to 24 categories, but be useful in making predictions in varying domains.Prediction market data, including Betfair, Hollywood Stock Exchange, and Intrade, combined with some user generated data from WiseQ, allowed me to meet all of these challenges.
I was able to produce predictions for all 24 categories, expanding down the list through film editing, sound mixing, etc. I showed how these predictions moved in real-time during the period between the Oscar nominations and the Oscars. For example, Argo zoomed upward in the best picture and adapted screenplay categories as Zero Dark Thirty plunged in best actress and original screenplay. I was very accurate with 19 of 24 categories correct and the winners in the other 5 categories showing reasonably high probabilities. Prediction market data and experimental prediction games harnessed the wisdom of the crowds to allow me to scale easily to all 24 categories. These same data/models will allow me to easily expand to all sort so domains in the near future.
DavidMRothschild on February 23, 2013 @ 11:27AM
I created my Oscar predictions in real-time, because real-time movement is an important part of my basic research into predictions, not because I thought the Oscars would provide an interesting domain for movement; I was wrong. In category after category significant movement in the likely winner provides a window into the power of certain events that occurred on the road to the Oscars. These events include regularly scheduled events, such as awards shows, and idiosyncratic events, such as prominent commentary on certain movies.
Every prediction I do is in real-time for two reasons. First, real-time predictions provide the most updated prediction for the end user whenever that user needs them. For example, it is easy to see with economic or financial predictions that knowing the likely outcome is an important part of major decisions that happen continuously. Movement is a good thing in predictions, because it demonstrates that predictions are absorbing new information that affects the outcome we are predicting. Second, real-time predictions provide a granular track-record to explore when/why movements occur (i.e., what things actually impact the final outcome). Granular predictions allow me to judge the value of a debate or big advertisement buy or vice-presidential choice or an awards show, something that cannot be isolated with less regular indicators.
The most obvious movement has been in the best picture category, where Lincoln's original lead has collapsed as award show after award show favored Argo. Shortly after the nominations were released Argo was in a distant second place to Lincoln at just 8 percent likely to win. Yet, all of these wins brought Argo to 93 percent.
This theme carried into the adapted screenplay category, where a commanding lead by Lincoln is now a tight proxy fight with Argo. Our data is demonstrating a strong positive correlation between the outcomes of these two categories. Lincoln started off with a smaller lead, 70 percent likely to win to best adapted screenplay. And, the change has not been as dramatic with Argo leading slightly at 57 percent.
Sources: Betfair, Hollywood Stock Exchange, Intrade, WiseQ (detailed at PredictWise.com)
Zero Dark Thirty's likelihood has fallen in nearly every one of its strongest categories including best actress and original screenplay. The implication is that the increased scrutiny of Zero Dark Thirty's depiction of torture will hurt it with the voters. Just after the nominations were released, Zero Dark Thirty's Jessica Chastain was a viable 28 percent to win best actress, but that has plummeted to 5 percent in the last few weeks. Similarly, Zero Dark Thirty was 65 percent likely to win for best original screenplay. Amour and Django Unchained were distant second and third at about 13 and 17 percent likelihood. Today we have Django Unchained leading with 47 percent and Zero Dark Thirty nearly tied with Amour around 25 percent.
By the time this Oscar night concludes we will have a much richer understanding of the value of the awards show and the cost of negative publicity.
If you think you are a better prognosticator than I, please play the new WiseQ Oscars Game and show me how smart you are!
This column syndicates with the HuffingtonPost.