When working with data as a data science or data analyst, ANOVA is very common and something that many industries and companies utilize to compare the means of two distinct populations.
There are many major companies and industries which use SAS (banking, insurance, etc.), but with the rise of open source and the popularity of languages such as Python and R, these companies are exploring converting their code to Python.
A commonly used procedure for performing least means squared ANOVA in SAS is the PROC MIXED procedure. In this article, you’ll learn the Python equivalent of PROC MIXED for Least Means Squared ANOVA.
PROC MIXED Equivalent in Python for Least Squared Means ANOVA
Doing least squared means ANOVA in Python is very straightforward. All it takes is a few lines of code and you can fit your ANOVA model.
We will use the statsmodels Package to fit our regression models and get the least squared means ANOVA results.
Let’s say we have data like the following, made up of some categorical and numeric data:
In SAS, to do a least squared means ANOVA, we would do something like the following:
The code above produces the following results:
To get the same results in Python, you can do the following with the statsmodels package:
import pandas as pd
import numpy as np
import statsmodels.api as sm
from statsmodels.formula.api import ols
model = 'height ~ C(type)'
anova = sm.stats.anova_lm(ols(model,data=data).fit(),type=2)
print(anova)
#output:
# df sum_sq mean_sq F PR(>F)
#C(type) 1.0 266.944444 266.944444 5.540133 0.034981
#Residual 13.0 626.388889 48.183761 NaN NaN
print(ex.groupby("type")["height"].describe())
#output:
# count mean std min 25% 50% 75% max
#type
#Cat 9.0 23.888889 4.859127 15.0 20.00 25.0 25.00 30.0
#Dog 6.0 32.500000 9.354143 20.0 26.25 32.5 38.75 45.0
We can see here that the results are the same as SAS.
I hope that this article has been useful for you in trying to get the Python equivalent of PROC MIXED.