PredictWise Blog

World Cup Day 3: Looking for a draw

Bookmark and Share

Today’s Games: There are four games on the docket today with Group C and D both playing their first round of their round robin. Group C kicks off with Colombia 52% against Greece 18% with a healthy 30% for a draw. This game is followed by Ivory Coast 36% and Japan 35% with another 29% for a draw. This is an incredibly tight group with Colombia slightly favored, but otherwise each team enters the group play with at least 30% likelihood of advancing. Group D features Uruguay 67% against Costa Rica 10% and 23% for a draw. But, the most anticipated game of the day is England 34% versus Italy 33% and 33% for a draw; this game is as evenly matched as they get! Amazingly I have England, Italy, and Uruguay as virtually all tied at 66% to make the round of sixteen.

Yesterday’s Games: The Mexico and Chile victories were as expected yesterday, but the blowout of the Netherlands over Spain was quite the experience. Obviously, the Dutch, 2010’s finalists, were not negligible to beat Spain, but a 5-1 thrashing was certainly a low probability event (disclaimer is that once the game gets really bad, the last few goals are not necessarily as meaningful).

Overall View: Spain is only slightly less likely to beat Chile in their head-to-head game than prior to their blowout loss. Chile itself did not look spectacular in beating the unfortunate Aussie team. But, with that win now being critical to advancement, their likelihood of advancing is now down to 50% and their likelihood of the top seed in the group is also down. Thus, it unsurprising that their likelihood of victory in the tournament is down by well over 50%. Much of their likelihood was sopped up by the Netherland, now almost assured to advance and likely to take the top position from Group B.

Updating Predictions: likelihood of any game and likelihood of any team reaching any round. Details on method and full coverage.


World Cup Day 2: Let the games begin!

Bookmark and Share

Today’s Games: There are three games on tap today with Mexico 44% versus Cameroon 25% (31% for draw) finishing out the Group A games. With Croatia losing its opening match against Brazil, Mexico is now in the driver’s seat for the second spot in Group A if they can beat Cameroon. The other two games are both Group B with a rematch of the 2010 World Cup final between Spain 53% and the Netherlands 18% (29% for draw). Today’s most likely outcome is for a repeat of the 2010 final, where Spain won 1-0. Finally, Chile 66%, expected to fight with the Netherlands for the second spot out of Group B is heavily favored over Australia 11% (23% for draw).

Yesterday’s Games: There was just one game yesterday and it went as expected with Brazil winning 3-1, but there were two key lessons. First, Brazil could lose a game; assuming they are a near lock to make it to round of sixteen they still need to win four games in a row making me very comfortable with them at 25% to win the tournament. Second, with the score tied 1-1 we got a great demonstration of the home field advantage with a very questionable penalty that favored the struggling home team Brazil.

Overall View: Nothing changed in the overall outcome based on yesterday’s game.

Updating Predictions: likelihood of any game and likelihood of any team reaching any round. Details on method and full coverage.

World Cup Day 1: Brazil opens up the World Cup

Bookmark and Share

Today’s Games: There is just one game today with Brazil playing Croatia in São Paulo, Brazil. Brazil is 77% likely to win, 17% to draw and just 6% to lose (or Croatia to win!). This game belongs to Group A, where Brazil is a near lock to advance and Croatia 46% and Mexico 41% are fighting for the other spot in the round of sixteen. Below is the likely outcome broken out with a little more detail. Nearly all of the unquoted outcomes include Brazil scoring 4 or more goals in a landslide victory, but the most likely outcome is Brazil winning 2-0.

Overall View: Brazil is 25% to win the tournament with Argentina at 18%, Spain at 14%, and Germany at 13%. Baring a shocking loss by Brazil in today’s game, the tournament will start to get a lot more interesting tomorrow when Spain plays the Netherlands.

Updating Predictions: likelihood of any game and likelihood of any team reaching any round. Details on method and full coverage.

Published on June 11, 2014 as a thought piece at

From speaking at Adweek Europe earlier this year, I gave my take on low-latency data, how it applies to my data modelling for sports and political events currently, and how brands can and will be able to use the insights drawn from data of this kind to tweak ad campaigns in real-time in the future. As the summer of football begins I though it would be interesting to try use data to predict the World Cup winners.

Brazil is 75% likely to win the opening game of the World Cup and 18% to draw, but if Brazil starts off poorly against Croatia those predictions will change with just a few second latency, so sports fans have updated quantifiable information all game. Which begs the question, why are these low latency and quantifiable statistics readily available for sports and other entertainment, but not for business, such as newly launched advertising campaigns?

It is useful for me to create statistics for politics, sports, and entertainment, because the raw data and outcomes are public and regular. Thus, when I build the infrastructure to capture and analyze data, and make the resulting statistics available for consumption, I am able to observe and update the process on a regular basis. These live events are big business themselves (my former boss did just buy the lowly NBA team the LA Clippers for $2 billion), but there is no questions I am in this forecasting business to answer business questions as well. And, the infrastructure is ready to supply business and advertisers low latency quantifiable statistics, but the demand from them to use it is not.

Providing statistics for live events is actually more difficult than equivalent business and advertising statistics, in that the speed is insanely fast. The infrastructure necessary to collect and analyze the data, and then publish the likelihood of victory following long pass, but before the next play, is unnecessarily robust for most business concerns. As with so many forms of meaningful technology sports and entertainment have actually led to the creation of an extremely robust infrastructure. The market intelligence community could provide companies with similarly low latency, quantitative answers to show, for example, how their advertising campaign is progressing with different demographics.

The problem is the lack of demand; advertisers are not using low latency and highly quantifiable answers to adjust their campaigns. I know stakeholders are consuming and using statistics for sports, and other live events, as I can see the readership of my articles and tables before, during, and after events. I have no doubt advertisers would read similar statistics about their live campaigns. But, it is easy to see how live events adjust what they deliver to the consumers as the event progresses; they provide advertisements and information that reflect the shifting outcomes (e.g., the stars of the game or who will be in the next round). Similarly, political campaigns adjust their spending daily or even hourly, sending different quantities and designs of advertisements to different demographics as the results shift. This contrasts with traditional advertisement, which does not shift their campaign dramatically from hour to hour or day to day.

Advertisers have both legitimate loss-aversion issues as well as a principal-agent problem. First, companies are legitimately more concerned with downside than upside. It was a phenomenon in the advertising world when Oreo tweeted out an advertisement that was directly responding to the events of the Super Bowl in 2013. But, they had a very expensive war room to ensure no mistakes were made. While they may have won the day with a great advertisement, a bad advertisement can sink a company for years. Second, agencies are even are more concerned than their employers, as a bad advertisement will get them fired even after strings of successes. I am fine if my live feed makes the occasional hiccup or a TV producer accidently produces the wrong statistic, but there is there is no room for bad advertisement, so there is a huge incentive to slow down and not take any risk.

Ultimately, if we deliver these low latency and quantifiable statistics from “big data” and no one uses it to allocate resources, than it is not a revolution, it is a parlor trick. I generally talk publically about work that my colleagues and I do in gathering and understanding “big data”, but we are also engaged in many projects to make it more efficient for people to utilize this data.

New technology, from both Microsoft and others, can streamline the vetting process to restrict downside loss. Digital creativity software is starting to allow advertisers to quickly and cheaply tailor advertisements for different demographics or outcomes. Focused delivery options allow advertisers to tightly target the advertisements they send out. Online focus groups can minimize the costs and time needed to ensure that an advertisement are not making some big mistake. Along with translation devices that can check that there is no mistake in multiple languages.

It seemed like magic when Disney created advertisement with the Super Bowl MSV right after the game and it is scary that it still seemed like magic 26 years later when Oreo tweeted out their advertisement. But, the same technology that makes it easy to for the TV producers and second-screen experiences to provide customized low latency answers as the game progress will soon allow advertisers to provide customized low latency advertisements as the game progress (or their any of their other advertisement campaigns progress).

I now have the full World Cup probabilities listed. This includes the likely outcome of every game and likely outcome of every team making it to every round. The predictions are as accurate as possible, based on historical correlations in both the World Cup specifically and my methods in general, and answer the question that most stakeholders ultimately care about: who is going to win. The prediction are updating every few minutes, allowing us to examine the impact of events during the game and early games on later games. There are many other predictions to follow, but I am confident in my accuracy and continuously updating make my predictions the most useful to interested stakeholders.

Two predictions that have been forwarded to me several time are by Bloomberg and Goldman Sachs. First, let me concede, while I try my best, these two strictly dominate me in style; these are really pretty reports! But, I do beat them as far as being a useful prediction.

These are pure fundamental models. Bloomberg does not provide much detail of this model. But, Goldman uses most of the same variables I use in my fundamental model: Elo rankings, goals for, goals against, dummy for type of match, home field, and home continent. There are a few small differences: I include friendly matches, I do not include home continent, etc. All of this is going to add up to noise for the predictions (i.e., not really make much of a difference). In short, both of our methods are completely sound.

The problem with pure fundamental models is that even the best fundamental models are lacking because the World Cup is an event held just once every four years without any regular season: there is a lot of idiosyncrasy in the event which is hard to capture in historical datasets. Goldman states, “To be clear, our model does not use any information on the quality of team or individual players that is not reflected in a team’s track record. For example, if a key player who was responsible for a team’s recent successes is injured, this will have no bearing on our predictions.” While this can be corrected for, to an extent, with individual-level data, they are referring to the idiosyncratic data that is included in the prediction market data. Thus, my prediction market-based predictions are going to be more accurate and updatable as the event progresses.

Goldman goes not to say that “There is no role for human judgment as the approach is purely statistical.” They should be applauded for that, but chided for not recognizing that the data exists; they do not need to add human judgment to note the effect of an injury.

This point is reflected well in the scatterplot of my forecasts for the 32 teams in reaching the round of sixteen and the round of eight. If the predictions were the same, they would run on the 45 degree diagonal; by definition the average prediction is 50% for reaching the round of 16 and 25% for reaching the round of 8. Notice that predictions from Bloomberg and Goldman are much flatter than mine: favorites are less favored and underdogs are more favored. This makes sense, this means we are all well calibrated in that the fundamental-based models accept that they have less information and more uncertainty.

There are two further peculiarities about the Goldman report: they compare their predictions to a broker not a market and they have Brazil as 48.5% likely to win. First, as one of the leading investment banks in the world, I am surprised they would compare their probabilities to the bid price of broker. Ladbrokes needs to make a profit and they do so by selling their predictions for more than they are worth. To guarantee a $1 return you would need to invest $1.18 to buy all of the teams to win. Second, the very, very under identified home field and continent advantage is what drives their prediction for Brazil to 48.5%. There are not that many World Cups to it is hard to identify the true advantage of hosting one. It is similar to home state in the presidency, which is also poorly identified. Brazil is extremely likely to win, I have them at 23%, but would Goldman advise their clients to buy Brazil long at 48.5%?!?

With that context, let me rewrite the entire column in a different way; if Goldman Sachs had a model for the price of an asset (e.g., MSFT stock in a month or Columbia to win the World Cup), but something just happened that shifts the underlying value of that asset far away from the model (e.g., a new CEO for MSFT or an injury to Columbia’s star player) would Goldman advise their clients to value the asset at the model’s price or the price on the open market? I would go with the market price …

Full coverage of the World Cup at: