To group a pandas DataFrame by multiple columns, you can use the groupby() function and pass a list of column names to group by.
import pandas as pd
df = pd.DataFrame({"animal_type":["dog","cat","dog","cat","dog","dog","cat","cat","dog"],
"gender":["F","F","F","F","M","M","M","F","M"],
"age":[1,2,3,4,5,6,7,8,9],
"weight":[10,20,15,20,25,10,15,30,40]})
print(df)
print(df.groupby(["animal_type","gender"])
#Output:
animal_type gender age weight
0 dog F 1 10
1 cat F 2 20
2 dog F 3 15
3 cat F 4 20
4 dog M 5 25
5 dog M 6 10
6 cat M 7 15
7 cat F 8 30
8 dog M 9 40
When working with different collections of data, the ability to create groups in our data and perform operations on those groups easily is very valuable.
One such case is if you want to group a pandas DataFrame by multiple columns in Python.
To group a pandas DataFrame by multiple columns, you can use the groupby() function and pass a list of column names to group by.
After grouping, you can use any of the pandas DataFrame GroupBy functions to further perform different operations on the groups you’ve created.
Below is a simple example showing you how to use groupby() to group by multiple columns in pandas.
import pandas as pd
df = pd.DataFrame({"animal_type":["dog","cat","dog","cat","dog","dog","cat","cat","dog"],
"gender":["F","F","F","F","M","M","M","F","M"],
"age":[1,2,3,4,5,6,7,8,9],
"weight":[10,20,15,20,25,10,15,30,40]})
print(df)
print(df.groupby(["animal_type","gender"])
#Output:
animal_type gender age weight
0 dog F 1 10
1 cat F 2 20
2 dog F 3 15
3 cat F 4 20
4 dog M 5 25
5 dog M 6 10
6 cat M 7 15
7 cat F 8 30
8 dog M 9 40
How to Group By Columns and Find Mean in pandas DataFrame
One example of grouping by multiple columns in pandas and performing another operation is if you want to group by columns and find the mean of a column in pandas.
To get the mean of a variable after grouping the pandas DataFrame by specific columns, you can use the groupby() and mean() functions.
Below is a simple example showing you how you can group by and then get the average of a variable of each group in a pandas DataFrame in Python.
In the example below, I’ve renamed the mean of rows to ‘age_mean’ and then reset the index so that we can work with the resulting DataFrame easier.
import pandas as pd
df = pd.DataFrame({"animal_type":["dog","cat","dog","cat","dog","dog","cat","cat","dog"],
"gender":["F","F","F","F","M","M","M","F","M"],
"age":[1,2,3,4,5,6,7,8,9],
"weight":[10,20,15,20,25,10,15,30,40]})
print(df)
print(df.groupby(["animal_type","gender"])["age"].mean().rename('age_mean').reset_index())
#Output:
animal_type gender age weight
0 dog F 1 10
1 cat F 2 20
2 dog F 3 15
3 cat F 4 20
4 dog M 5 25
5 dog M 6 10
6 cat M 7 15
7 cat F 8 30
8 dog M 9 40
animal_type gender age_mean
0 cat F 4.666667
1 cat M 7.000000
2 dog F 2.000000
3 dog M 6.666667
This same code structure can work if you want to find the minimum, maximum, count, sum, etc. of a variable after grouping in pandas.
Hopefully this article has been useful for you to learn how to group a pandas DataFrame by multiple columns in Python.