Bayesian networks for unbiased assessment of referee bias in football

Introduction and methodology

The notion that football referees are biased towards certain teams or in certain contexts is widely accepted by football pundits and supporters. In fact, whether or not such bias exists is an area of increasing interest that attracts the attention of researchers from the domains of sport science, psychology, statistics and computer science. Irrespective of the true underlying causes, there is no doubt that ‘playing at home’ has a significant impact on a team’s success.

Referees themselves are believed to contribute to home advantage by favoring home teams on the basis of penalty kicks, free kicks, yellow/red cards and/or extra time (Nevill et al., 1996; 1999; 2002; Sutter & Kocher, 2004; Boyko et al., 2007; Downward & Jones, 2007; Dawson et al., 2007; Dohmen, 2008; Buraimo et al., 2010; Goumas, 2012). However, these believed biases could be explained by team performance. The increased number of fouls, yellow cards, red cards, penalties and so on in favour of the home team might simply be the result of the home team performing better than the away team. For example, if the home team is in control of the ball (possession) more often than not, then we would expect it to be awarded more fouls and penalties, and less yellow and red cards relative to the opponent, on the basis that its control of possession will lead to it being on the receiving end of more tackles. We should also expect a higher proportion of these to be committed nearer to the opponent’s goal, as greater possession also tends to correspond to a marked territorial advantage. Hence, any credible attempt to determine referee bias in football matches must take account of these kind of causal explanatory factors.

Unlike previous studies, our work examined this notion by taking into consideration relevant explanatory factors which, if ignored, can lead to biased assessment of referee bias. The causal factors considered were possession, time spent in the opposition penalty box while in control of the ball, pass accuracy, the ability to win aerial duels in the air, the ability to dribble the ball and the ability to intercept the opponent’s pass. The term ‘Bayesian networks’ refers to a specific type of probabilistic modelling suitable for simulating, in a causal manner, complex real-world scenarios and answering complicated questions.

Results and analysis

Using the data for the 2011-12 EPL season, the penalty kick bias is assessed before and after the explanatory factors (for team performance) are taken into consideration. In Table 1, the variable B represents the probability for positive referee bias prior to considering team performance, and the variable B’ represents the probability for positive referee bias after considering team performance. The variable P is the number of penalty kicks awarded, and the teams are ranked by highest B’ (indicating higher positive referee bias) at home and away grounds, as well as overall.

The referee bias assessment (both B and B’ probabilistic values) is performed relative to the team with the lowest negative referee bias in each case (home/away/overall). For example, at home grounds (Table 1), Arsenal with 1 penalty awarded appears to have benefited the least (even against Sunderland and Tottenham with 0 penalties awarded) after taking into consideration the explanatory factors for team performance. As a result, Arsenal is ranked 20th, in terms of positive referee bias for penalty kicks awarded, and the residual teams are assessed against Arsenal. When a team is assigned the probability value of 0.5 this implies that there is no difference in referee bias between the specified team under assessment and Arsenal, whereas a value greater than 0.5 (and up to 1) indicates positive referee bias, again relative to Arsenal (or Manchester City in the case of away games), and vice versa.

The results suggest that the model successfully explains much of the bias when team performance is taken into consideration. Specifically, many B beliefs demonstrate highly significant discrepancies between teams, which are subsequently revised into non-significant B’ beliefs once the explanatory factors are considered by the model. For example, a clear prior home ‘bias’ of 84.04% for Liverpool reduces to an insignificant 60.78%, while a clear prior home bias of 18.42% against Tottenham reduces to an insignificant 52.01%. However, in some important cases the posterior bias beliefs remain strong. In particular, in home matches, Manchester United (with 9 penalties and an inferred belief of 86.09%) and Manchester City (with 8 penalties and an inferred belief of 86.03%) are the two teams most favoured by bias.

Interestingly, the two Manchester clubs were the only serious title contenders in an extremely close title-race. The two Manchester clubs appear to have benefited from referee decisions that cannot be fully justified by the explanatory factors taken into consideration in this research study. Conversely, Arsenal, a team of similar popularity and wealth and who finished third, benefited least of all 20 teams from referee bias at home.

While popular lay theories suggest that referees have a tendency to favor elite clubs in general and Manchester United in particular, at their home stadiums, it is possible that the combination of home advantage and being a title-favorite team in a close title race is what is more predictive of positive referee bias for penalty kicks awarded. To test such hypothesis properly would require applying the model over multiple seasons. No relevant (official) data exists that provides information on foul quality and this might be due to the fact that foul quality is very difficult to judge for consensus (e.g. it is very common for even ‘unbiased’ experts to disagree when it comes to judging penalties awarded). Both of these aspects could further explain the residual bias in penalties awarded. It appears that the explanatory variables taken into consideration by our model (which represent different aspects of team performance) have explained most of the biases when it comes to free kicks and penalty kicks awarded between home and away teams, and crowd attendance and crowd density are found not to be related with positive referee bias after team performance is considered (though crowd effect is believed at least affect team performance). We anticipate that our model now lays out a coherent and rational strategy for conducting such research.