Bayesian Hierarchical Model
[1]:
import sys
sys.path.append("../../")
import penaltyblog as pb
WARNING (aesara.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
Get data from football-data.co.uk
[2]:
fb = pb.scrapers.FootballData("ENG Premier League", "2019-2020")
df = fb.get_fixtures()
df.head()
[2]:
date | datetime | season | competition | div | time | team_home | team_away | fthg | ftag | ... | b365_cahh | b365_caha | pcahh | pcaha | max_cahh | max_caha | avg_cahh | avg_caha | goals_home | goals_away | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | |||||||||||||||||||||
1565308800---liverpool---norwich | 2019-08-09 | 2019-08-09 20:00:00 | 2019-2020 | ENG Premier League | E0 | 20:00 | Liverpool | Norwich | 4 | 1 | ... | 1.91 | 1.99 | 1.94 | 1.98 | 1.99 | 2.07 | 1.90 | 1.99 | 4 | 1 |
1565395200---bournemouth---sheffield_united | 2019-08-10 | 2019-08-10 15:00:00 | 2019-2020 | ENG Premier League | E0 | 15:00 | Bournemouth | Sheffield United | 1 | 1 | ... | 1.95 | 1.95 | 1.98 | 1.95 | 2.00 | 1.96 | 1.96 | 1.92 | 1 | 1 |
1565395200---burnley---southampton | 2019-08-10 | 2019-08-10 15:00:00 | 2019-2020 | ENG Premier League | E0 | 15:00 | Burnley | Southampton | 3 | 0 | ... | 1.87 | 2.03 | 1.89 | 2.03 | 1.90 | 2.07 | 1.86 | 2.02 | 3 | 0 |
1565395200---crystal_palace---everton | 2019-08-10 | 2019-08-10 15:00:00 | 2019-2020 | ENG Premier League | E0 | 15:00 | Crystal Palace | Everton | 0 | 0 | ... | 1.82 | 2.08 | 1.97 | 1.96 | 2.03 | 2.08 | 1.96 | 1.93 | 0 | 0 |
1565395200---tottenham---aston_villa | 2019-08-10 | 2019-08-10 17:30:00 | 2019-2020 | ENG Premier League | E0 | 17:30 | Tottenham | Aston Villa | 3 | 1 | ... | 2.10 | 1.70 | 2.18 | 1.77 | 2.21 | 1.87 | 2.08 | 1.80 | 3 | 1 |
5 rows × 111 columns
Train the Model
[3]:
clf = pb.models.BayesianHierarchicalGoalModel(
df["goals_home"], df["goals_away"], df["team_home"], df["team_away"]
)
clf.fit()
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (10 chains in 10 jobs)
NUTS: [home, intercept, tau_att, atts_star, tau_def, def_star]
WARNING (aesara.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
WARNING (aesara.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
WARNING (aesara.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
WARNING (aesara.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
WARNING (aesara.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
WARNING (aesara.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
WARNING (aesara.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
WARNING (aesara.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
WARNING (aesara.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
WARNING (aesara.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
100.00% [40000/40000 00:06<00:00 Sampling 10 chains, 0 divergences]
Sampling 10 chains for 1_500 tune and 2_500 draw iterations (15_000 + 25_000 draws total) took 33 seconds.
The model’s parameters
[4]:
clf
[4]:
Module: Penaltyblog
Model: Bayesian Hierarchical
Number of parameters: 42
Team Attack Defence
------------------------------------------------------------
Arsenal 0.102 -0.041
Aston Villa -0.147 0.201
Bournemouth -0.169 0.176
Brighton -0.195 0.035
Burnley -0.119 -0.019
Chelsea 0.293 0.052
Crystal Palace -0.369 -0.027
Everton -0.097 0.064
Leicester 0.258 -0.14
Liverpool 0.475 -0.261
Man City 0.652 -0.215
Man United 0.242 -0.221
Newcastle -0.214 0.087
Norwich -0.477 0.285
Sheffield United -0.204 -0.187
Southampton 0.029 0.122
Tottenham 0.177 -0.052
Watford -0.251 0.161
West Ham -0.005 0.145
Wolves 0.017 -0.165
------------------------------------------------------------
Home Advantage: 0.23
Intercept: 0.117
Predict Match Outcomes
[5]:
probs = clf.predict("Liverpool", "Wolves")
probs
[5]:
Module: Penaltyblog
Class: FootballProbabilityGrid
Home Goal Expectation: 1.9292334267223095
Away Goal Expectation: 0.8804912965555178
Home Win: 0.6196094231207583
Draw: 0.21509241011631333
Away Win: 0.16529816435433928
1x2 Probabilities
[6]:
probs.home_draw_away
[6]:
[0.6196094231207583, 0.21509241011631333, 0.16529816435433928]
[7]:
probs.home_win
[7]:
0.6196094231207583
[8]:
probs.draw
[8]:
0.21509241011631333
[9]:
probs.away_win
[9]:
0.16529816435433928
Probablity of Total Goals >1.5
[10]:
probs.total_goals("over", 1.5)
[10]:
0.7705724022249051
Probability of Asian Handicap 1.5
[11]:
probs.asian_handicap("home", 1.5)
[11]:
0.37250377745763735
Probability of both teams scoring
[12]:
probs.both_teams_to_score
[12]:
0.5003828781086023
Train the model with more recent data weighted to be more important
[13]:
weights = pb.models.dixon_coles_weights(df["date"], 0.001)
clf = pb.models.BayesianHierarchicalGoalModel(
df["goals_home"], df["goals_away"], df["team_home"], df["team_away"], weights
)
clf.fit()
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (10 chains in 10 jobs)
NUTS: [home, intercept, tau_att, atts_star, tau_def, def_star]
WARNING (aesara.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
WARNING (aesara.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
WARNING (aesara.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
WARNING (aesara.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
WARNING (aesara.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
WARNING (aesara.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
WARNING (aesara.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
WARNING (aesara.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
WARNING (aesara.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
WARNING (aesara.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
100.00% [40000/40000 00:07<00:00 Sampling 10 chains, 0 divergences]
Sampling 10 chains for 1_500 tune and 2_500 draw iterations (15_000 + 25_000 draws total) took 33 seconds.