Model Comparisons#
import numpy as np
import pandas as pd
import statsmodels.api as sm
from sklearn.linear_model import LogisticRegression
import seaborn as sns
import patsy
sns.set(rc={'figure.figsize':(11.7,8.27)})
Below, I run three models:
model 1: intercept only
model 2: adding predictor x1
model 3: adding both predictors x1 and x2
How do we know which model fits the data the best?
def inverse_logit(x):
return(1 / (1 + np.exp(-x)))
n = 500
np.random.seed(0)
df = pd.DataFrame({
"x1":np.random.normal(1,1,n),
"x2":np.random.normal(-2,1,n),
})
df["eta"] = - 3 + 2*df["x1"] - 1.5*df["x2"]
df["probs"] = inverse_logit(df["eta"])
df["y"] = np.random.binomial(size=n, n=1, p=df["probs"])
df["y"].value_counts()
def run_logistic_model(formula):
y, X = patsy.dmatrices(formula, df, return_type='dataframe')
mod = sm.GLM(y, X, family=sm.families.Binomial())
res = mod.fit()
print(res.summary())
mod1 = run_logistic_model("y ~ 1")
Generalized Linear Model Regression Results
==============================================================================
Dep. Variable: y No. Observations: 500
Model: GLM Df Residuals: 499
Model Family: Binomial Df Model: 0
Link Function: Logit Scale: 1.0000
Method: IRLS Log-Likelihood: -286.53
Date: Thu, 01 Jun 2023 Deviance: 573.06
Time: 12:45:52 Pearson chi2: 500.
No. Iterations: 4 Pseudo R-squ. (CS): 0.000
Covariance Type: nonrobust
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
Intercept 1.0460 0.102 10.259 0.000 0.846 1.246
==============================================================================
mod2 = run_logistic_model("y ~ x1")
Generalized Linear Model Regression Results
==============================================================================
Dep. Variable: y No. Observations: 500
Model: GLM Df Residuals: 498
Model Family: Binomial Df Model: 1
Link Function: Logit Scale: 1.0000
Method: IRLS Log-Likelihood: -227.85
Date: Thu, 01 Jun 2023 Deviance: 455.70
Time: 12:45:52 Pearson chi2: 483.
No. Iterations: 5 Pseudo R-squ. (CS): 0.2092
Covariance Type: nonrobust
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
Intercept 0.0734 0.142 0.517 0.605 -0.205 0.352
x1 1.3618 0.152 8.947 0.000 1.063 1.660
==============================================================================
mod3 = run_logistic_model("y ~ x1 + x2")
Generalized Linear Model Regression Results
==============================================================================
Dep. Variable: y No. Observations: 500
Model: GLM Df Residuals: 497
Model Family: Binomial Df Model: 2
Link Function: Logit Scale: 1.0000
Method: IRLS Log-Likelihood: -190.93
Date: Thu, 01 Jun 2023 Deviance: 381.87
Time: 12:45:52 Pearson chi2: 447.
No. Iterations: 6 Pseudo R-squ. (CS): 0.3178
Covariance Type: nonrobust
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
Intercept -2.2766 0.351 -6.495 0.000 -2.964 -1.590
x1 1.6063 0.179 8.950 0.000 1.255 1.958
x2 -1.1719 0.156 -7.521 0.000 -1.477 -0.866
==============================================================================
mod4 = run_logistic_model("y ~ x1 + x2 + x1*x2")
Generalized Linear Model Regression Results
==============================================================================
Dep. Variable: y No. Observations: 500
Model: GLM Df Residuals: 496
Model Family: Binomial Df Model: 3
Link Function: Logit Scale: 1.0000
Method: IRLS Log-Likelihood: -190.90
Date: Thu, 01 Jun 2023 Deviance: 381.80
Time: 12:45:52 Pearson chi2: 445.
No. Iterations: 6 Pseudo R-squ. (CS): 0.3179
Covariance Type: nonrobust
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
Intercept -2.2236 0.402 -5.529 0.000 -3.012 -1.435
x1 1.5279 0.344 4.436 0.000 0.853 2.203
x2 -1.1481 0.179 -6.405 0.000 -1.499 -0.797
x1:x2 -0.0439 0.166 -0.264 0.792 -0.370 0.282
==============================================================================
Likelihood Ratio Tests#
The likelihood tells us the probability of seeing the data if the parameter estimates are true. We want to identify a model that maximizes the likelihood. Given a few models, we can compare their likelihoods to determine if one yields a significantly higher likelihood. This comparison is called a likelihood ratio (LR) test.
The LR test uses a LR test statistic that is chi-square distributed to test for significance. This test statistic is also called the deviance (log-likelihood ratio statistic).
where L(m1) and L(m2) are the likelihoods of two models.
Suppose we want to compare model 1 (intercept only) to model 2 (adds x1). From the model summaries above:
The test statistic is chi-squared with 1 degree of freedom, since m2 adds 1 predictor to m1. If we were comparing m3 to m1, there would be 2 degrees of freedom.
from scipy.stats.distributions import chi2
p = chi2.sf(117.36, 1)
print(("{:.20f}".format(p)))
0.00000000000000000000
Since p < 0.05 (in fact, much smaller), we can say m2 significantly improves m1.
Comparing model 4 to model 3 (the correctly specified model), the deviance test statistic:
The chi-square test below shows that the new model does not offer a significantly better fit.
p = chi2.sf(.06, 1)
print(("{:.20f}".format(p)))
0.80649594050734008110