To find the covariance between columns in a DataFrame or Series in pandas, the easiest way is to use the pandas cov() function.

df.cov()

You can also use the numpy cov() function to calculate the covariance between two Series.

s1.cov(s2)

Finding the covariance between columns or Series using pandas is easy. We can use the pandas cov() function to find the covariance estimates of columns of numbers, or the covariance between multiple Series.

Let’s say we have the following DataFrame.

df = pd.DataFrame({'Name': ['Jim', 'Sally', 'Bob', 'Sue', 'Jill', 'Larry'],
                   'Weight': [130.54, 160.20, 209.45, 150.35, 117.73, 187.52],
                   'Height': [50.10, 68.94, 71.42, 48.56, 59.37, 63.42],
                   'Age': [43,23,71,49,52,37] })

print(df)
# Output: 
    Name  Weight  Height  Age
0    Jim  130.54   50.10   43
1  Sally  160.20   68.94   23
2    Bob  209.45   71.42   71
3    Sue  150.35   48.56   49
4   Jill  117.73   59.37   52
5  Larry  187.52   63.42   37

To get the covariance matrix between the numeric columns, we can use the pandas cov() function in the following Python code:

print(df.cov())

# Output:
             Weight      Height         Age
Weight  1189.501177  218.115103  157.815667
Height   218.115103   90.154177    8.200333
Age      157.815667    8.200333  257.766667

Calculating Covariance between Series in pandas

We can also use the numpy cov() function to find the covariance between Series using pandas.

Let’s say we have the same DataFrame from the example in the first section of this article.

To compute the covariance using the numpy cov() function, we just need to create two Series from the DataFrame and then call the function.

s1 = pd.Series(df["Weight"])
s2 = pd.Series(df["Age"])
print(s1.cov(s2))

# Output:
157.8156666666667

As you can see, this is the same covariance estimate we saw in the first example for the columns “Weight” and “Age”.

Hopefully this article has been helpful for you to understand how to compute covariance for columns in a DataFrame or Series using pandas.

Categorized in:

Python,

Last Update: February 26, 2024