The Anatomy of the Euro 16 Oracle from De Tijd

Predictions for the European Championship Football come in a wide variety of flavours: from pundit guesses over reader polls to predictions based on numbers and statistics. De Tijd chose the latter and built the De Tijd Oracle. This article explains how the Oracle works.

Chess at the base

The starting point for the statistical model is a ranking of all participating countries. This ranking is based on the so called Elo rating system. This system was designed by Hungarian American chess player Arpad Elo. He invented a system in which chess players exchange points: the losing player loses points to the winning player. The system takes into a account the relative strength of the players: a victory over a low ranking player yields only a limited amount of points for a top player.

Translated into football: a victory over Liechtenstein or Andorra yields only a few points, a victory over Germany yields a lot of points. The system also takes into account goal differences, so a 1-0 victory over a country like Liechtenstein could actually cost higher ranked teams points.

Our ranking takes into account all games ever played, but games played more recently get heavier weights. Home advantages are also accounted for.

Chances by game

The Elo points of the countries in the ranking are translated into game predictions. For this, the point difference between teams is calculated and it is assumed that a difference of 400 points means that the higher ranking team has a 10 times higher chance of winning the game. The following formula is used:

Let's take the second game of Group E as an example. The difference between the Elo-scores of Belgium and Italy amounts to 77 points. If we feed this into the formula, we see that Italy's chances of winning the game are equal to:


%

With this, the chances for Belgium to win the game become 61 %.

We can now calculate this for every combination of two teams. If team A has Elo points and team B has Elo points, then team A has a % chance of winning and team B %.

But of course a draw is also possible. For this, the model takes the historical average of draws and elevates it if the Elo scores of both team are close together. For the Belgium - Italy game, this results in a 47 % chance of winning for Belgium, 27 % for Italy and a 26 % chance of a draw.

From single games to the whole tournament

Now that we have the numbers for every game, we can make estimates for the whole tournament. To do so, the tournament is simulated several times.

Let's take group E as an example. If we simulate all the games in this group a 100 times, Belgium will end up winning the game against Italy 47 times, 27 times the game will end in a draw and 27 the Italians will win it. Now we run this simulation for all the games in the group, and final group standings are calculated for every one of the 100 simulations. In 91 of the 100 simulations, Belgium managed to qualify for the round of 16. 9 times out of a 100, they were sent home after the group stage.

This simulation is run for every one of the 6 groups. This leads to the following chances of reaching the round of 16:

It's also possible to estimate the chances of winning the group for all the teams: we simply count the number of simulations in which each of the teams ended up first of the group. This leads to this result:

Ok, but who will be champions?

For each of the games of the round of 16, the quarter finals, semi finals and the final, an estimate is made based on the formula discussed earlier. The team with the highest chance of winning is flagged as the winner and advances to the next round.

These are the chances of winning the tournament:

The calculated chances are of course influenced by the results of the past games at the Euros. These results are used to update the Elo ranking. After every game Elo scores are recalculated, simulations are run and estimated chances are updated. Chances of reaching the knock out stage or winning the tournament can be found here.