The NBA season has started and I'm ready to make some money now that the Supreme Court has legalized sports betting! So I searched around for sports betting models and came across a post on rugby prediction. Here it is replicated with NBA basketball data. The model is very simple and surely won't win me any money but it's a fun introduction to PyMC3, hierarchical (multilevel) models and MCMC.
Scores are based on the team's offensive (attack) and defensive strengths, as well as a home court advantage for the home team. Scores are assumed Poisson distributed. Data is for the 2017-18 regular season I found on Kaggle.
Without further ado, here are the Wednesday October 31 predictions. Like how Madden NFL runs a single simulation for their Super Bowl prediction, this is base on a single game simulation.
DET vs BKN BKN -12.5 100-113
IND vs NY IND -20.5 106-86
DEN vs CHI CHI -4.5 112-117
UTA vs MIN UTA -8.5 101-93
NO vs GS GS -13.5 104-118
DAL vs LAL LAL -1.5 107-109
SA vs PHO SA -20.5 111-91
import numpy as np
import pandas as pd
import pymc3 as pm, theano.tensor as tt
import matplotlib.pyplot as plt
import seaborn as sns
from theano import shared
%matplotlib inline
df = pd.read_csv('nba_games_2017-18.csv')
teams = df.home_team.unique()
teams = pd.DataFrame(teams, columns=['team'])
teams['i'] = teams.index
df = pd.merge(df, teams, left_on='home_team', right_on='team', how='left')
df = df.rename(columns={'i': 'i_home'}).drop('team', 1)
df = pd.merge(df, teams, left_on='away_team', right_on='team', how='left')
df = df.rename(columns={'i': 'i_away'}).drop('team', 1)
df.head()
observed_home_goals = df.home_score.values
observed_away_goals = df.away_score.values
home_team = df.i_home.values
away_team = df.i_away.values
home_team_shared = shared(home_team)
away_team_shared = shared(away_team)
num_teams = len(df.i_home.drop_duplicates())
num_games = len(home_team)
with pm.Model() as model:
# global model parameters
home = pm.Flat('home')
sd_att = pm.HalfStudentT('sd_att', nu=1, sd=0.5)
sd_def = pm.HalfStudentT('sd_def', nu=1, sd=0.5)
intercept = pm.Flat('intercept')
# team-specific model parameters
atts_star = pm.Normal("atts_star", mu=0, sd=sd_att, shape=num_teams)
defs_star = pm.Normal("defs_star", mu=0, sd=sd_def, shape=num_teams)
atts = pm.Deterministic('atts', atts_star - tt.mean(atts_star))
defs = pm.Deterministic('defs', defs_star - tt.mean(defs_star))
home_theta = tt.exp(intercept + home + atts[home_team_shared] +
defs[away_team_shared])
away_theta = tt.exp(intercept + atts[away_team_shared] +
defs[home_team_shared])
# likelihood of observed data
home_points = pm.Poisson(
'home_points', mu=home_theta, observed=observed_home_goals)
away_points = pm.Poisson(
'away_points', mu=away_theta, observed=observed_away_goals)
with model:
trace = pm.sample(2000, tune=1000)
pm.traceplot(trace)