Predictaball is an automated football prediction system, providing match outcome predictions for the top 4 European leagues, alongside team ratings. Please use the menu on the left to view the forecasted probabilities for today's games (updated daily at 11:30 CET) along with the predictions for previous matches. The team ratings are also displayed, along with information on how both of these systems are implemented.

Predictaball can also be found on Twitter, where predictions for Premier League matches are shared.

Rating system

For full details of the implementation of the rating system, please refer to this post on my website.

In short, an Elo-like system is used which incorporates both home-advantage and margin of victory (MoV). A large amount of inspiration is taken from FiveThirtyEight's NBA and NFL , and I can't thank them enough for making details of their system publicly available.

Elo-like systems are defined by two equations, governing the expected outcome of a game based on two team's strengths, and the method in which a team's rating is updated following a game.

Expected outcome

The expected outcome of a match E, is given by the following equation: $$E = \frac{1}{1 + 10^{\frac{-dr}{400}}}$$ This encodes the expected result (a categorical outcome with 3 possible values) as a continuous score in the range [0, 1]. This is because Elo systems were initially described for games with only 2 outcomes. In football modelling, we assume a value of 0.5 indicates a draw. dr is the difference between the two team's pre-match ratings, while taking home advantage into account. $$dr = elo_{home} - elo_{away} + HA$$ The home advantage value (HA) is calculated separately for each league and is updated each season.

Rating update

After a game, a team's rating is updated according to the following equation: $$elo_{new} = elo_{pregame} + KG(O-E)$$ The observed outcome O is defined as {0, 0.5, 1} for away win, draw, or home win respectively.
K acts as the system gain knob, reflecting the impact of a single match on a team's rating. A low value places greater emphasis on long-term form, resulting in a steady rating value, but it will be slow to reflect recent form.
A large value of K places greater importance on recent results with long term form being less important. However, it is susceptible to noisy (i.e. unexpected) results.
Predictaball uses K = 20.

G is an additional multiplier that allows for other factors to influence the change in rating, such as the goal differential. For draws or when MOV = 1, G = 1, otherwise: $$G = log(1.7MOV)\frac{2}{2+0.001(elo_{win} - elo_{lose})}$$ Adding MOV in to the equation is relatively self-explanatory, but the second set of terms is less obvious. It is there to handle the auto-correlation problem identified by FiveThirtyEight , whereby stronger teams tend to have their ratings inflated because they are more likely to win by large amounts. This term acts as a penalty function, reducing the multiplier when there is a large skill gap.

Match prediction model

Predicting goals scored

New for the 2018-2019 season I've taken the same hierarchical structure as the ordinal multinomial regression summarised below and adapted it to use a Poisson outcome to predict the number of goals scored by each team. In the following notation h and a superscript denote home and away teams respectively. I.e. for a given game i $$\text{goals}^{\text{h}}_{i} \sim \text{Poisson}(\mu^{\text{h}}_{i})$$ $$\text{goals}^{\text{a}}_{i} \sim \text{Poisson}(\mu^{\text{a}}_{i})$$ Where $$log(\mu^{\text{h}}_{i}) = \alpha^{h}_{\text{league}_{i}} + \beta_{h} \Delta \text{elo}_{i}$$ $$log(\mu^{\text{a}}_{i}) = \alpha^{h}_{\text{league}_{i}} + \beta_{a} \Delta \text{elo}_{i}$$ Which corresponds to a hierarchical intercept that varies by league and a single match-level predictor in the form of the team rating difference $$\Delta \text{elo}_{i} = \text{elo}^{\text{h}}_{i} - \text{elo}^{\text{a}}_{i}$$ By drawing a large number of samples from the posterior of both home and away goals scored, you can simply obtain estimates for the probability of each match outcome. The main motivation for modelling goals scored rather than the outcome directly is that it enables me to simulate a full season at a time, as the rating system requires goal difference.

Predicting outcome

The following model was used for the 2017-2018 season, but I've left the details up for posterity. It was a hierarchical Bayesian ordinal multinomial regression, only taking the league and elo difference between the two teams as inputs. In particular, it was modelled using JAGS as the following: $$\text{Outcome}_{i} \sim \text{Multinomial}(1, \phi_{i})$$ Which represent the three probability parameters for each of the outcomes $$\phi_{i, \text{away}} = T_{i, 1}$$ $$\phi_{i, \text{draw}} = T_{i, 2} - T_{i, 1}$$ $$\phi_{i, \text{home}} = 1 - T_{i, 2}$$ An ordinal model works by identifying two appropriate thresholds of the linear predictor to obtain the 3 probabilities, ensuring that the order of the probabilities do matter (unlike a standard multinomial model). $$logit(T_{i, 1}) = \alpha_{league_{i}, 1} - \mu_{i} $$ $$logit(T_{i, 2}) = \alpha_{league_{i}, 2} - \mu_{i} $$ The linear predictor is based simply on a single match-level covariate: the rating differenc.e $$\mu_{i} = \beta \Delta \text{elo}_{i}$$ This is a hierarchical model with varying intercepts, which depend on the league the match was held in, but fixed slopes. I found that using varying slopes (i.e. beta also depends on the league) didn't improve the model accuracy but had a significant effect on the time taken to fit the model, and I generally prefer parsimonious models anyway.

Previously, I spent more time optimising the model, using more complex techniques such as Evolutionary Algorithms and hierarchical Bayesian models to form team strength measures. However, I now prefer to place a greater emphasis on accurately modelling a team's skill level, and using a relatively simple model to obtain match outcome probabilities.

About Predictaball

I started Predictaball in the summer of 2014 as a way of using my methodological work in building ensemble classifiers with Evolutionary Algorithms for a real-world application. I thought football prediction might be fun, despite not being an overly keen football fan, and got an initial implementation running.

I also used it as an opportunity to develop additional skills, such as web scraping, bot automation in the form of a Twitter bot that displays premier league forecasts, and now web development. Since this initial naive implementation, I've taken more of an interest in sports statistics and have updated various aspects of the overall system, while learning new skills in the process.

The two main improvements on the modelling side have been a move to a Bayesian modelling framework and using an Elo-like rating system for team strength. I even had a brief attempt at producing an automated staking system, but quickly found this wasn't profitable (who would have thought?!). However, I've also made considerable improvements behind-the-scenes, particularly to help modularise the system so that other leagues, and even sports, can easily be added. This has required a rewrite of the backend code, as well as moving the DB to a Postgres implementation and heavily normalising it. Building this web app has also forced me to better separate the different components (update code, DB, and web app), so I've moved them to the cloud to help with this and also to provide better relibility.

I've got several ideas in the pipeline for how I'd like to improve things. I'm keen to use machine learning techniques to improve the rating system, perhaps separating home and away strength. I'm interested in forecasting the number of goals scored, both to potentially improve match outcome accuracy, but also to provide access to a far greater number of betting markets. I'd also like to form an overall rating system across European leagues, rather than having a separate scale for each league, although this could be quite challenging due to the relatively small number of inter-league matches. Finally, I'd like to extend Predictaball to include additional sports. The backend is ready, I just need to obtain some data and build predictive models.

Predictaball is developed by Stuart Lacy, and can be found on Twitter.