Interpretation#
import numpy as np
import pandas as pd
import statsmodels.api as sm
import seaborn as sns
import patsy
sns.set(rc={'figure.figsize':(11.7,8.27)})
n = 500
np.random.seed(0)
df = pd.DataFrame({
"x1":np.random.exponential(2,n),
"x2":np.random.normal(2,1,n),
})
eta = np.exp(2 + 0.25*df["x1"] + 0.25*df["x2"])
df["n_students_flagged"] = np.random.poisson(lam=eta)
df["n_students_flagged"].value_counts()
df["n_enrolled"] = np.round(df["n_students_flagged"],0) + np.random.binomial(200,0.8,n)
f = 'n_students_flagged ~ x1 + x2'
y, X = patsy.dmatrices(f, df, return_type='dataframe')
mod = sm.GLM(y, X, family=sm.families.Poisson())
print(mod.fit().summary())
Generalized Linear Model Regression Results
==============================================================================
Dep. Variable: n_students_flagged No. Observations: 500
Model: GLM Df Residuals: 497
Model Family: Poisson Df Model: 2
Link Function: Log Scale: 1.0000
Method: IRLS Log-Likelihood: -1486.9
Date: Thu, 01 Jun 2023 Deviance: 571.92
Time: 12:45:54 Pearson chi2: 563.
No. Iterations: 5 Pseudo R-squ. (CS): 1.000
Covariance Type: nonrobust
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
Intercept 2.0270 0.025 79.737 0.000 1.977 2.077
x1 0.2469 0.003 86.086 0.000 0.241 0.253
x2 0.2447 0.010 25.097 0.000 0.226 0.264
==============================================================================
We interpret the coefficients by exponentiating them and treating them as multiplicative effects.
\[E(Y|X) = log(\mu) = X\beta\]
\[\mu = exp(X\beta)\]
\[\mu = exp(b_0) * exp(b_1x_1) * exp(b_2x_2)\]
The coefficient of \(x_1\) is the expected difference in \(log(y)\) for a one unit increase in \(x_1\). We can say that for a one unit increase in \(x_1\), we expect a \(exp(b_1)\) increase in y.
print(f"{round(100*(np.exp(0.2469) - 1),2)}% increase")
28.01% increase