Bayesian Hierarchical Model

[1]:
import sys

sys.path.append("../../")

import penaltyblog as pb
WARNING (aesara.tensor.blas): Using NumPy C-API based implementation for BLAS functions.

Get data from football-data.co.uk

[2]:
fb = pb.scrapers.FootballData("ENG Premier League", "2019-2020")
df = fb.get_fixtures()

df.head()
[2]:
date datetime season competition div time team_home team_away fthg ftag ... b365_cahh b365_caha pcahh pcaha max_cahh max_caha avg_cahh avg_caha goals_home goals_away
id
1565308800---liverpool---norwich 2019-08-09 2019-08-09 20:00:00 2019-2020 ENG Premier League E0 20:00 Liverpool Norwich 4 1 ... 1.91 1.99 1.94 1.98 1.99 2.07 1.90 1.99 4 1
1565395200---bournemouth---sheffield_united 2019-08-10 2019-08-10 15:00:00 2019-2020 ENG Premier League E0 15:00 Bournemouth Sheffield United 1 1 ... 1.95 1.95 1.98 1.95 2.00 1.96 1.96 1.92 1 1
1565395200---burnley---southampton 2019-08-10 2019-08-10 15:00:00 2019-2020 ENG Premier League E0 15:00 Burnley Southampton 3 0 ... 1.87 2.03 1.89 2.03 1.90 2.07 1.86 2.02 3 0
1565395200---crystal_palace---everton 2019-08-10 2019-08-10 15:00:00 2019-2020 ENG Premier League E0 15:00 Crystal Palace Everton 0 0 ... 1.82 2.08 1.97 1.96 2.03 2.08 1.96 1.93 0 0
1565395200---tottenham---aston_villa 2019-08-10 2019-08-10 17:30:00 2019-2020 ENG Premier League E0 17:30 Tottenham Aston Villa 3 1 ... 2.10 1.70 2.18 1.77 2.21 1.87 2.08 1.80 3 1

5 rows × 111 columns

Train the Model

[3]:
clf = pb.models.BayesianHierarchicalGoalModel(
    df["goals_home"], df["goals_away"], df["team_home"], df["team_away"]
)
clf.fit()
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (10 chains in 10 jobs)
NUTS: [home, intercept, tau_att, atts_star, tau_def, def_star]
WARNING (aesara.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
WARNING (aesara.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
WARNING (aesara.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
WARNING (aesara.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
WARNING (aesara.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
WARNING (aesara.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
WARNING (aesara.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
WARNING (aesara.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
WARNING (aesara.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
WARNING (aesara.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
100.00% [40000/40000 00:06<00:00 Sampling 10 chains, 0 divergences]
Sampling 10 chains for 1_500 tune and 2_500 draw iterations (15_000 + 25_000 draws total) took 33 seconds.

The model’s parameters

[4]:
clf
[4]:
Module: Penaltyblog

Model: Bayesian Hierarchical

Number of parameters: 42
Team                 Attack               Defence
------------------------------------------------------------
Arsenal              0.102                -0.041
Aston Villa          -0.147               0.201
Bournemouth          -0.169               0.176
Brighton             -0.195               0.035
Burnley              -0.119               -0.019
Chelsea              0.293                0.052
Crystal Palace       -0.369               -0.027
Everton              -0.097               0.064
Leicester            0.258                -0.14
Liverpool            0.475                -0.261
Man City             0.652                -0.215
Man United           0.242                -0.221
Newcastle            -0.214               0.087
Norwich              -0.477               0.285
Sheffield United     -0.204               -0.187
Southampton          0.029                0.122
Tottenham            0.177                -0.052
Watford              -0.251               0.161
West Ham             -0.005               0.145
Wolves               0.017                -0.165
------------------------------------------------------------
Home Advantage: 0.23
Intercept: 0.117

Predict Match Outcomes

[5]:
probs = clf.predict("Liverpool", "Wolves")
probs
[5]:
Module: Penaltyblog

Class: FootballProbabilityGrid

Home Goal Expectation: 1.9292334267223095
Away Goal Expectation: 0.8804912965555178

Home Win: 0.6196094231207583
Draw: 0.21509241011631333
Away Win: 0.16529816435433928

1x2 Probabilities

[6]:
probs.home_draw_away
[6]:
[0.6196094231207583, 0.21509241011631333, 0.16529816435433928]
[7]:
probs.home_win
[7]:
0.6196094231207583
[8]:
probs.draw
[8]:
0.21509241011631333
[9]:
probs.away_win
[9]:
0.16529816435433928

Probablity of Total Goals >1.5

[10]:
probs.total_goals("over", 1.5)
[10]:
0.7705724022249051

Probability of Asian Handicap 1.5

[11]:
probs.asian_handicap("home", 1.5)
[11]:
0.37250377745763735

Probability of both teams scoring

[12]:
probs.both_teams_to_score
[12]:
0.5003828781086023

Train the model with more recent data weighted to be more important

[13]:
weights = pb.models.dixon_coles_weights(df["date"], 0.001)

clf = pb.models.BayesianHierarchicalGoalModel(
    df["goals_home"], df["goals_away"], df["team_home"], df["team_away"], weights
)
clf.fit()
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (10 chains in 10 jobs)
NUTS: [home, intercept, tau_att, atts_star, tau_def, def_star]
WARNING (aesara.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
WARNING (aesara.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
WARNING (aesara.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
WARNING (aesara.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
WARNING (aesara.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
WARNING (aesara.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
WARNING (aesara.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
WARNING (aesara.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
WARNING (aesara.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
WARNING (aesara.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
100.00% [40000/40000 00:07<00:00 Sampling 10 chains, 0 divergences]
Sampling 10 chains for 1_500 tune and 2_500 draw iterations (15_000 + 25_000 draws total) took 33 seconds.