To group by multiple columns and then find the standard deviation of rows in a pandas DataFrame, you can use the groupby() and std() functions.

import pandas as pd

df = pd.DataFrame({"animal_type":["dog","cat","dog","cat","dog","dog","cat","cat","dog"], 
                   "gender":["F","F","F","F","M","M","M","F","M"], 
                   "age":[1,2,3,4,5,6,7,8,9], 
                   "weight":[10,20,15,20,25,10,15,30,40]})

print(df)
print(df.groupby(["animal_type","gender"])["age"].std().rename('age_standard_deviation').reset_index())

#Output:
  animal_type gender  age  weight
0         dog      F    1      10
1         cat      F    2      20
2         dog      F    3      15
3         cat      F    4      20
4         dog      M    5      25
5         dog      M    6      10
6         cat      M    7      15
7         cat      F    8      30
8         dog      M    9      40

  animal_type gender  age_standard_deviation
0         cat      F                3.055050
1         cat      M                     NaN
2         dog      F                1.414214
3         dog      M                2.081666

When working with data, it is very useful to be able to group and aggregate data by multiple columns to understand the various segments of our data.

One such case is if you want to group your data and get the standard deviation of a variable for each group.

To get the standard deviation of a variable by groups of columns in a pandas DataFrame, you can use the groupby() and std() functions.

Below is a simple example showing you how you can group by and then get the standard deviation of a variable of each group in a pandas DataFrame in Python.

In the example below, I’ve renamed the standard deviation of rows to ‘age_standard deviation’ and then reset the index so that we can work with the resulting DataFrame easier.

import pandas as pd

df = pd.DataFrame({"animal_type":["dog","cat","dog","cat","dog","dog","cat","cat","dog"], 
                   "gender":["F","F","F","F","M","M","M","F","M"], 
                   "age":[1,2,3,4,5,6,7,8,9], 
                   "weight":[10,20,15,20,25,10,15,30,40]})

print(df)
print(df.groupby(["animal_type","gender"])["age"].std().rename('age_standard_deviation').reset_index())

#Output:
  animal_type gender  age  weight
0         dog      F    1      10
1         cat      F    2      20
2         dog      F    3      15
3         cat      F    4      20
4         dog      M    5      25
5         dog      M    6      10
6         cat      M    7      15
7         cat      F    8      30
8         dog      M    9      40

  animal_type gender  age_standard_deviation
0         cat      F                3.055050
1         cat      M                     NaN
2         dog      F                1.414214
3         dog      M                2.081666

Using groupby() and std() on Single Column in pandas DataFrame

You can use groupby() to group a pandas DataFrame by one column or multiple columns.

If you want to group a pandas DataFrame by one column and then calculate the standard deviation of a variable in each group with std(), you can do the following.

import pandas as pd

df = pd.DataFrame({"animal_type":["dog","cat","dog","cat","dog","dog","cat","cat","dog"], 
                   "gender":["F","F","F","F","M","M","M","F","M"], 
                   "age":[1,2,3,4,5,6,7,8,9], 
                   "weight":[10,20,15,20,25,10,15,30,40]})

print(df)
print(df.groupby(["animal_type"])["age"].std().rename('age_standard_deviation').reset_index())

#Output:
  animal_type gender
0         dog      F
1         cat      F
2         dog      F
3         cat      F
4         dog      M
5         dog      M
6         cat      M
7         cat      F
8         dog      M

  animal_type  age_standard_deviation
0         cat                2.753785
1         dog                3.033150

If you want to group by a single column and find the standard deviations of multiple variables, you can do the following. In this case, the column names will be the names of the original columns.

import pandas as pd

df = pd.DataFrame({"animal_type":["dog","cat","dog","cat","dog","dog","cat","cat","dog"], 
                   "gender":["F","F","F","F","M","M","M","F","M"], 
                   "age":[1,2,3,4,5,6,7,8,9], 
                   "weight":[10,20,15,20,25,10,15,30,40]})

print(df)
print(df.groupby(["gender"])["age","weight"].std().reset_index())

#Output:
  animal_type gender  age  weight
0         dog      F    1      10
1         cat      F    2      20
2         dog      F    3      15
3         cat      F    4      20
4         dog      M    5      25
5         dog      M    6      10
6         cat      M    7      15
7         cat      F    8      30
8         dog      M    9      40

  gender       age     weight
0      F  2.701851   7.416198
1      M  1.707825  13.228757

Using groupby() to Group By Multiple Columns and std() in pandas DataFrame

If you want to group a pandas DataFrame by multiple columns and then get the standard deviations of a single variable for each group with std(), you can do the following.

import pandas as pd

df = pd.DataFrame({"animal_type":["dog","cat","dog","cat","dog","dog","cat","cat","dog"], "gender":["F","F","F","F","M","M","M","F","M"], "age":[1,2,3,4,5,6,7,8,9], "weight":[10,20,15,20,25,10,15,30,40]})

print(df)
print(df.groupby(["animal_type","gender"])["age"].std().rename('age_standard_deviation').reset_index())

#Output:
  animal_type gender  age  weight
0         dog      F    1      10
1         cat      F    2      20
2         dog      F    3      15
3         cat      F    4      20
4         dog      M    5      25
5         dog      M    6      10
6         cat      M    7      15
7         cat      F    8      30
8         dog      M    9      40

  animal_type gender  age_standard_deviation
0         cat      F                3.055050
1         cat      M                     NaN
2         dog      F                1.414214
3         dog      M                2.081666

If you want to group by multiple columns and find the standard deviations of multiple variables, you can do the following. In this case, the column names will be the names of the original columns.

import pandas as pd

df = pd.DataFrame({"animal_type":["dog","cat","dog","cat","dog","dog","cat","cat","dog"], "gender":["F","F","F","F","M","M","M","F","M"], "age":[1,2,3,4,5,6,7,8,9], "weight":[10,20,15,20,25,10,15,30,40]})

print(df)
print(df.groupby(["animal_type","gender"])["age","weight"].std().reset_index())

#Output:
  animal_type gender  age  weight
0         dog      F    1      10
1         cat      F    2      20
2         dog      F    3      15
3         cat      F    4      20
4         dog      M    5      25
5         dog      M    6      10
6         cat      M    7      15
7         cat      F    8      30
8         dog      M    9      40

  animal_type gender       age     weight
0         cat      F  3.055050   5.773503
1         cat      M       NaN        NaN
2         dog      F  1.414214   3.535534
3         dog      M  2.081666  15.000000

Hopefully this article has been useful for you to learn how to group by and standard deviation in pandas with groupby() and std().

Categorized in:

Python,

Last Update: March 1, 2024