ANOVA test and determining its feasibility

Analysis of Variance (ANOVA) is a statistical method used to assess and compare the means of three or more groups or treatments, aiming to determine if there are noteworthy differences among them. ANOVA works under the assumption that the data follows a normal distribution and that the variances within the groups are roughly similar. It achieves this by dividing the total data variance into distinct sources of variation, aiding in the detection of statistically significant differences in means. However, it’s crucial to avoid applying ANOVA to evaluate mean differences among the six racial groups’ age data in the police shooting dataset, given the substantial variation in variances among these groups. ANOVA relies on the fundamental assumption of roughly equal variances among groups, and its use may not be suitable when significant variance discrepancies exist.

Code to compare variances:

import pandas as pd

file_path = r’C:\Users\Tiyasa\Desktop\Courses_Sem1\MTH 522\fatal-police-shootings-data.xls’

df = pd.read_excel(file_path)

 

# Drop missing values in the “age” and “race” columns

df.dropna(subset=[‘age’, ‘race’], inplace=True)

 

# Group the data by the “race” column

grouped = df.groupby(‘race’)

print(“Variances per race:”)

 

# Calculate and print the variance for each race category

for race, group in grouped:

    age_data = group[‘age’]

    variance_age = age_data.var()

 

    print(f”Race: {race}”)

    print(f”Variance: {variance_age:.2f}”)

    print()

Output:

Variances per race:

Race: A

Variance: 134.38

 

Race: B

Variance: 129.70

 

Race: H

Variance: 115.42

 

Race: N

Variance: 80.90

 

Race: O

Variance: 139.15

 

Race: W

Variance: 173.24