Interpretation#

import numpy as np
import pandas as pd
import statsmodels.api as sm
import seaborn as sns
import patsy
sns.set(rc={'figure.figsize':(11.7,8.27)})
n = 500
np.random.seed(0)
df = pd.DataFrame({
    "x1":np.random.exponential(2,n),
    "x2":np.random.normal(2,1,n),
})

eta = np.exp(2 + 0.25*df["x1"] + 0.25*df["x2"])
df["n_students_flagged"] = np.random.poisson(lam=eta)
df["n_students_flagged"].value_counts()
df["n_enrolled"] = np.round(df["n_students_flagged"],0) + np.random.binomial(200,0.8,n)
f = 'n_students_flagged ~ x1 + x2'
y, X = patsy.dmatrices(f, df, return_type='dataframe')
mod = sm.GLM(y, X, family=sm.families.Poisson())
print(mod.fit().summary())
                 Generalized Linear Model Regression Results                  
==============================================================================
Dep. Variable:     n_students_flagged   No. Observations:                  500
Model:                            GLM   Df Residuals:                      497
Model Family:                 Poisson   Df Model:                            2
Link Function:                    Log   Scale:                          1.0000
Method:                          IRLS   Log-Likelihood:                -1486.9
Date:                Thu, 01 Jun 2023   Deviance:                       571.92
Time:                        12:45:54   Pearson chi2:                     563.
No. Iterations:                     5   Pseudo R-squ. (CS):              1.000
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      2.0270      0.025     79.737      0.000       1.977       2.077
x1             0.2469      0.003     86.086      0.000       0.241       0.253
x2             0.2447      0.010     25.097      0.000       0.226       0.264
==============================================================================

We interpret the coefficients by exponentiating them and treating them as multiplicative effects.

\[E(Y|X) = log(\mu) = X\beta\]
\[\mu = exp(X\beta)\]
\[\mu = exp(b_0) * exp(b_1x_1) * exp(b_2x_2)\]

The coefficient of \(x_1\) is the expected difference in \(log(y)\) for a one unit increase in \(x_1\). We can say that for a one unit increase in \(x_1\), we expect a \(exp(b_1)\) increase in y.

print(f"{round(100*(np.exp(0.2469) - 1),2)}% increase")
28.01% increase