To find the covariance between columns in a DataFrame or Series in pandas, the easiest way is to use the pandas cov() function.
df.cov()
You can also use the numpy cov() function to calculate the covariance between two Series.
s1.cov(s2)
Finding the covariance between columns or Series using pandas is easy. We can use the pandas cov() function to find the covariance estimates of columns of numbers, or the covariance between multiple Series.
Let’s say we have the following DataFrame.
df = pd.DataFrame({'Name': ['Jim', 'Sally', 'Bob', 'Sue', 'Jill', 'Larry'],
'Weight': [130.54, 160.20, 209.45, 150.35, 117.73, 187.52],
'Height': [50.10, 68.94, 71.42, 48.56, 59.37, 63.42],
'Age': [43,23,71,49,52,37] })
print(df)
# Output:
Name Weight Height Age
0 Jim 130.54 50.10 43
1 Sally 160.20 68.94 23
2 Bob 209.45 71.42 71
3 Sue 150.35 48.56 49
4 Jill 117.73 59.37 52
5 Larry 187.52 63.42 37
To get the covariance matrix between the numeric columns, we can use the pandas cov() function in the following Python code:
print(df.cov())
# Output:
Weight Height Age
Weight 1189.501177 218.115103 157.815667
Height 218.115103 90.154177 8.200333
Age 157.815667 8.200333 257.766667
Calculating Covariance between Series in pandas
We can also use the numpy cov() function to find the covariance between Series using pandas.
Let’s say we have the same DataFrame from the example in the first section of this article.
To compute the covariance using the numpy cov() function, we just need to create two Series from the DataFrame and then call the function.
s1 = pd.Series(df["Weight"])
s2 = pd.Series(df["Age"])
print(s1.cov(s2))
# Output:
157.8156666666667
As you can see, this is the same covariance estimate we saw in the first example for the columns “Weight” and “Age”.
Hopefully this article has been helpful for you to understand how to compute covariance for columns in a DataFrame or Series using pandas.