Applying Breusch Pagan Test using Python (2nd October, Monday)

The Linear regression plots did not give us a very high R2 value.

Hence, I went ahead to check if the data was homoscedastic or heteroscedastic, for which I used the Breusch Pagan Test. I have discussed about the p value and the Breusch Pagan Test in my earlier posts which helps determine the nature of the data points.

Here the null hypothesis is that the data is homoscedastic and the alternate hypothesis is that the data is heteroscedastic.

import pandas as pd
import numpy as np
import statsmodels.api as sm
from statsmodels.stats.diagnostic import het_breuschpagan

excel_file_path = ‘C:\\Users\\Tiyasa\\Desktop\\cdc-diabetes-2018.xlsx’

xls = pd.read_excel(excel_file_path, sheet_name=None)

# Access individual DataFrames by sheet name
df1 = xls[‘Diabetes’]
df2 = xls[‘Obesity’]
df3 = xls[‘Inactivity’]

df3.rename(columns={‘FIPDS’: ‘FIPS’}, inplace=True)
# Inner join df1 and df2 on the ‘FIPS’ column
merged_df = pd.merge(df1, df2, on=’FIPS’, how=’inner’)

# Inner join the result with df3 on the ‘FIPS’ column
final_merged_df = pd.merge(merged_df, df3, on=’FIPS’, how=’inner’)

# Prepare the input features (X) and target variable (y)
X = final_merged_df[[‘% OBESE’, ‘% INACTIVE’]]
y = final_merged_df[‘% DIABETIC’]

# Add a constant to the input features for the intercept term
X = sm.add_constant(X)

# Fit a linear regression model
model = sm.OLS(y, X).fit()

# Perform the Breusch-Pagan test
_, p_value, _, _ = het_breuschpagan(model.resid, X)
print(“Breusch-Pagan Test p-value:”, p_value)

# Interpret the results
if p_value < 0.05:
print(“Heteroscedasticity is detected (reject the null hypothesis).”)
else:
print(“No significant evidence of heteroscedasticity (fail to reject the null hypothesis).”)

Output:

Breusch-Pagan Test p-value: 3.555846910402186e-05
Heteroscedasticity is detected (reject the null hypothesis).

Leave a Reply Cancel reply