DavidMRothschild on May 29, 2014 @ 3:26PM
There is no new data here, but some new organization. I thought it would be interesting to view the tournament by team with all games and likelihoods in one table. I hope to add the likelihood of reaching any given round by Monday morning:
World Cup at a Glance
DavidMRothschild on May 26, 2014 @ 8:55PM
Predicting the World Cup is not that much different from predicting other sports outcomes or even economic indicators or awards shows. First, we determine what the stakeholders want to know. For the World Cup, we determine that it is the likelihood of win, loss, or draw for either team in any game and the likelihood of any team advancing to any round (including winning the tournament); for reasons of tie-breakers and expediency we also consider goal differential. Second, as always, we ensure that these forecasts update as the games progress. Finally, we always consider the same set of data to ensure accuracy.
In the course of our regular forecasting we always review four different data types: fundamental data, online and social media, prediction markets, and polls of experts. Online and social media data are not significant for the World Cup, at this point. This type of data clearly provides value in understand the support and interest of people from around the world, but lacking historical context, it is impossible to identify if it has any predictive power relative to more traditional data. And, while polls of experts can be useful in predicting sports, we are going to keep things simple and transparent for this World Cup and focus on fundamental data and prediction markets.
I am going to walk through the fundamental data in some length before describing the prediction market data quickly. That is because the fundamental data is much more interesting and the prediction market data is the same as it always is, in all domains. But, it is a lot more predictive than the fundamental data and, despite my fun in running the fundamental data, prediction market data forms the basis of all of the forecasts we are going to generate.
Using fundamental data to predict how teams will do across a season, or in an upcoming game, is a relatively stable task across major sports. The key fundamental variables are always the same: scoring differential, home and away, and wins/losses in past season. Of course, different sports counts scores in different ways (e.g., American football has scores that range from 1 point to 6 points) and count wins/losses differently as well (e.g., soccer has outcomes that range from 0 to 3 points). Generally, home and away (and strength of schedule) are balanced, but that too is not always the case (e.g., baseball loads the schedule heavily with teams in the same division); home field has a huge advantage in soccer (e.g., in a fun example, this article notes that injury time in Spain heavily favors the home team). That being said, give me the scoring differentials of each team from the previous year, their schedule including home and away, and their final outcome in wins/losses and I can predict both season and game-by-game outcomes with precision.
We can improve upon this baseline prediction in several ways: account for shifts in personnel and factor out luck. All of the major sports now have models of wins or points above replacement; an idea that was generated out of baseball’s sabermetric community. This metric describes how valuable a certain player is compared with a baseline player in his/her position. There is still some debate on this metric and it varies a lot by sport, but a reasonable version of it will allow a researcher to get pretty close to quantifying the impact of a substitution of one player for another. Further research has examined the role of luck in the wins/points of any team in a given year to factor out what was in the control of the players and what was either lucky or unlucky.
Soccer is a standard case as I just described: goal differential, home/away, previous year’s points will get you pretty far predicting future outcomes. Add in the wins over replacement in changes in the team and factor in luck and you can be as good as anything.
Playoff predictions are just compilation of game-by-game predictions, using the current regular season’s data. There are two small quirks to consider, effort and playoff design. First, in certain there are definable times when teams are not at full strength or maximum effort, such as a late in the season for teams with nothing to play for; in those situations we need to account for this differential effort. Second, compiling the likelihood of a team advancing in any given round depends on the design of the playoffs. Single eliminations are straight forward applications of the likelihood to win a game formula between two teams, but best of seven series and round robins have their quirks (e.g., NBA teams are more likely to win game 2 if they lose game 1, than if they won game 1).
In short, major team sports from around the world are all pretty similar in predicting regular season and playoff success, but the World Cup has one crazy quirk; it as no regular season. There are direct comparable variables, but they are noisier (i.e., much less precise). Countries compete in three types of matches with other countries on a semi-regular basis between World Cups: friendly matches with other countries, regional tournaments, and World Cup qualification tournaments. All of the games combined are a fraction of what a regular season is in most leagues.
These games provide similar data to what we normally have: there is a goal differential, there is home/away, and, in lieu of past season wins/points, we have world rankings (complied by FIFA based on team’s performance in the last four years) and elo rankings (which is based on head-to-head matches). Unlike a regular season where the choice of opponents and location are balanced (or the choice set is transparent), the schedule of any team is endogenously chosen by the countries to maximize the return for their team, and more wins in a tournament means more games against better teams. Also, there are major personnel changes over any four year period, especially with players going in and out for friendlies and lesser tournaments.
Specifically we start with the following:
1) Average goal differential broken up by home/away/neutral, and friendly/tournament/World Cup qualifier. The friendly/tournament/World Cup qualifier split lets us examine the predictive power of game that are likely to have lower effort and more variable personnel.
2) World ranking and elo score act as the equivalent of points/wins from previous years and the elo score absorbs the strength of the schedule a team has played.
We take this data for past World Cup cycles and regress this on all of the World Cup games to get coefficients for the various variables. We can then plug in the 2014 data to get baseline forecasts for any given game going into the World Cup, both goal differential and likelihood of win, loss, or draw in any game.
The differences in goal differential swamp the rankings in both predicting goal differential and probability of victory in any game. This is not surprising as these rankings are just reflections of win/loss/draw (slightly coded by strength of oppenent), which is trumped by goal differential. Further, the away games are slightly more predictive than home games, which is not surprising, as there is just one home team in the World Cup.
Yet, these predictions for the World Cup games are a lot less precise than the predictions for a regular season or playoff soccer game. With all of the idiosyncratic variables of a World Cup, where teams with no regular season play at neutral sites, the fundamental data is going to provide forecasts of scores with larger margins of error and probability of victories that tend more towards toss-up than we would normally produce.
That is where prediction market data comes into play; it does its best when there is idiosyncratic data to incorporate. Prediction markets buy and sell contracts that are, canonically worth $1 if true and $0 if not. Thus, the price on a contract for Brazil to win the World Cup or any particular game is highly predictive of the probability of the outcome occurring. Massive amounts of historical data helps us translate raw prediction market prices into very precise probabilities of outcomes; this especially true in World Cup, where the prediction markets have very robust action on all games.
Armed with fundamental data and prediction market-based forecasts for every game, we jump into the actual World Cup action. The tournament setting for the first round is a round robin with four teams playing three games each for a total of six games. After that there is a standard 16 team single elimination tournament where the winner of a paired group plays the second place of the other paired group (e.g., the winner of group A plays the second of group B and the winner of B plays the second of A.)
The easiest way to think about the round robin is that there are 729 possible outcomes in a six game round robin (3 outcomes over 6 games is 3^6). Assuming independence between games (that the outcome of one game does not affect the outcome of another) we can easily determine the likelihood of any of the 729 possible outcomes from the likelihood of any of the three outcomes of the six games.
At that point we have the second round set, with certain probability, and can determine the likely wins between potential second round teams and so forth. Thus, providing both the likely outcome in any game and the likelihood of any team reaching any given round.
Of course, independence is not necessarily the correct choice for the World Cup; early games in the round robin affect later games in the round robin. I already noted that in the NBA some teams are more likely to win after a loss (due to either increased effort or referee’s calls). The opposite effect would be that we may learn that a team is better than we thought ex-ante due to them winning an earlier game. In the NBA they play 82 regular season games so we do not learn much if they happen to win a game in the playoffs, but in the World Cup they play 0 regular season games, so we learn a lot when the win a game. Thus, the consensus in our data is that we should slightly update teams after they win in the group stage. This is not significant in the later rounds, where all teams are winners, but it is in the round robin.
Prediction markets shine when there is a lot of idiosyncratic data making imprecise fundamental predictions. That is when we need the wisdom of the crowd to quantify the likely outcome. Thus, while we work through both the fundamental data and prediction market-based forecast, we put the weight of our prediction on the prediction market data.
Check out all of our World Cup coverage at: www.PredictWise.com/WorldCup.
DavidMRothschild on May 22, 2014 @ 6:52PM
The San Antonio Spurs are 48% likely, and the Miami Heat are 40% likely, to win the NBA championship. But, the Heat at (1-1) in the Eastern Conference finals are just 77% likely to make the NBA finals and the Spurts at (2-0) in the Western Conference finals are 92% likely to make it to the NBA finals. What does that mean if the two teams make it past the Indiana Pacers and Oklahoma Thunder respectively? They will likely enter the finals with the Heat as the slightest of favorites.
The probability that the Heat win the NBA finals, should they make it, is derived by taking their likelihood of winning and dividing by their likelihood of making it. That makes them 51% likely to win the finals, conditional on making it. The same math makes the Spurs 52% likely to win the finals, conditional on making. But, there is an 8% chance the Heat face the Thunder and a 23% chance the Spurs face the Pacers.
The Pacers are extremely unlikely to win, should they make the finals, while the Thunder are slightly more likely than the Spurs to win, should they make the finals. Thus the Spurs 52% likelihood is inflated relative to the likelihood of them winning against the Heat and the Heat’s 51% is deflated relative to the likelihood of them winning against the Spurs.
The estimate from the current numbers is that the Heat will be the slightest of favorites if they play the Spurs in the NBA finals; this is true despite the Spurs having home court advantage. But, the NBA plays a 2 home-3 away-2 home schedule in the finals, which is not as favorable as the 2-2-1-1-1 played in the previous three rounds.
Follow my NBA coverage at: www.PredictWise.com/NBA
DavidMRothschild on May 18, 2014 @ 1:07PM
The United States is in a group with Germany, Portugal, and Ghana. The two of four teams with the most points after a round robin will advance to the second round. I am giving the US about 25% to advance out of the group stage and this seemed high to some of my readers. This initial reaction is not surprising when you consider that Germany and Portugal are ranked two and three respectively in the FIFA World Rankings and Ghana beat the US in two straight World Cups. But, the numbers make sense.
This 25% is actually a very complicated calculation and it starts with the six individual games that will be played in Group G’s round robin. A win gets a team 3 points and a draw 1 point.
Germany and Portugal are heavy favorites in their games, but this is not American football or best of seven series in baseball, hockey, or basketball; single low scoring games leave open reasonable probabilities of upsets or draws. Thinking about these games independently, Germany is between 65 and 75% to beat Ghana and the US. While Portugal is between 55 and 60% to beat Ghana and the US. But, this is soccer where draws happen and Germany is about 20% to draw Ghana and the US. While Portugal is also between 20 and 25% to draw Ghana and the US.
There are six games with three possible outcomes each, leaving a total of 729 possible overall outcomes after the games are played. Knowing the independent likelihood of any of these three outcomes for the six games, I can compute the probability of any of the 729 overall outcomes and which teams would qualify in any of them. With this independence assumption, the US and Ghana are both a little over 25% to qualify with Portugal at about 65% and Germany at about 85%.
Thinking about the progression of games, the first set of games is the US versus Ghana and Portugal versus Germany. Germany is about 50% likely to win, Portugal is about 20% to win, and there is a 30% likelihood of a draw. The US is about 33% likely to win versus Ghana.
1) If the US loses against Ghana they have a negligible chance of advancing.
2) If the US draws against Ghana they are less about 15% to advance.
2) But, there is a 33% likelihood they beat Ghana and if they do, they are about 50% likely to advance! Two more points will guarantee they advance and one more point puts them at just over 50% to advance. No team with five or more points has failed to advance (think about it; that means they are either 3-0-0, 2-0-1, or 1-0-2) and a little over half the teams with four points advance (they are 1-1-1). The US will play Portugal then Germany. Let’s do the math:
Guaranteed to Advance (36%): win and win (2.7%), win and draw (4.7%), draw and win (2.7%), draw and draw (4.7%), win and loss (13.9%), or loss and win (7.4%). A win and win, win and draw, and draw and win would also be enough to advance if they tied Ghana. A win and loss and loss and win would give them about 50% likelihood of advancing if they tied Ghana.
Over 50% to Advance (26.5%): draw and loss (13.8%) or loss and draw (12.7%).
Guaranteed to not-Advance (37.5%): loss and loss (37.5%).
I will talk more about possible deviation from independence in future blog posts (i.e., how I expect my game-by-game predictions to shift as the earlier games unfold).
Check out all of our World Cup coverage at: www.PredictWise.com/WorldCup.
DavidMRothschild on May 15, 2014 @ 5:18PM
Brazil (24%) is the most likely team to win the 2014 World Cup followed by Argentina (16%), Germany (14%), and Spain (13%). The likelihood is exactly 66.7%, or 2 in 3, that one of these four teams win the World Cup. The United States is 0.4% … not 4%, but 0.4% to win. As usually, our World Cup data is heavily generated from betting lines and markets, most notably Betfair.
But, before anyone wins any World Cups, they need to get out of the group stage. There are eight groups of four teams each that play a full round-robin with the top two teams advancing after those six games. We have the breakdown of all eight groups below:
Group A: Brazil is nearly certain to advance out of the group at 94%. Mexico and Croatia are fighting it out at about 45% each. Cameroon is a long-shot to get out at about 15%. If I had to pick a pivotal game it would be one of the final games in this group, on June 23, which pits Croatia and Mexico; we currently have the game at 37% for Croatia, 36% for Mexico, and 27% for a draw.
Group B: Spain is highly likely to advance with 84% likelihood. Netherlands (58%) and Chile (50%) are fighting the second spot. Australia is a real long-shot to get out of the group at about 8%. Similar to Group A, I would put the pre-tournament pivotal game at Chile versus Netherlands on June 18, the game that pits the second and third most likely teams. We have the Netherlands favored to win 43%, but Chile is 28% to win, and there 29% likelihood of a draw.
Group C: This group has the most party of any group. A group with perfect parity would have all four teams at 50%, so I checked to see the average distance the countries in each group to 50%. This group is 13% and the next lowest, Group E, is 19% (the average average is 21%). Colombia is the most likely to get out of the group at 76%, but Japan (46%) and Ivory Coast (48%) are both pretty serious competitors. And, Greece has a reasonable likelihood at 30%. I will be tightly watching one of their opening games on June 14, between the Ivory Coast and Japan, with it likely to set the pace second place in the group. We have Ivory Coast at 37% likely to win, Japan at 32%, and a draw at 31%.
Group D: This group is really tight at the top with Italy (68%), Uruguay (67%), and England (57%) all bunched together. Costa Rica is a long shot at 8%. Looking towards one of the first games, the Italy and England game on June 14 should be exciting with Italy 37% slightly favored over England 32% and a 31% likelihood of a draw.
Group E: France is likely at 81% to advance, with Switzerland 57% and Ecuador 50% likely to battle it out for second. Honduras is not likely at 12%, but non-negligible. Again the battle of the mid-teams is going to be huge with the group’s first game on June 15 between Switzerland and Ecuador; Switzerland is favored at 42% to 27% for Ecuador, and 31% for a draw.
Group F: Argentina is a lock 93% to advance with Bosnia and Herzegovina at 52% and Nigeria at 42%. Iran is unlikely to advance at 13%. While the tightest game is the Bosnia and Nigeria game, I have my eye on the opening match between Bosnia and Argentina on June 15. At 69% this is the game where Argentina is least likely to win, with 11% for a Bosnian upset and 20% for a draw.
Group G: Germany is very likely to advance at 85%, Portugal is looking good at 63%, and the USA and Ghana are battling for respect at 26% each. Of course, neither Ghana nor the USA are negligible, but it will be huge victory to knock either Germany or Portugal out in the group stage. While the tightest game in the group is the USA and Ghana, a victory there is not going to get the USA out of the group stage. I am focused on the June 22 game against Portugal where the USA has 21% to win and 22% to draw (and 57% to lose).
Group H: Finally, Belgium is the most likely at 86% followed by Russia at 66%. South Korea has 36% and Algeria is at just 14%. Russia and South Korea battle it out on June 17 as one of the groups opening games. Russia is favored at 46%, but South Korea is 24% to win and 30% to draw.
Check out all of our World Cup coverage at: www.PredictWise.com/WorldCup.