To find the correlation between series or columns in a DataFrame in pandas, the easiest way is to use the pandas corr() function.

df["Column1"].corr(df["Column2"])

If you want to compute the pairwise correlations between all numeric columns in a DataFrame, you can call corr() directly on the DataFrame.

df.corr()

You can also use the pandas corrwith() function to compute the correlation of the columns of a DataFrame with another Series.

df.corrwith(df2["Column"])

Finding the correlation between columns or Series using pandas is easy. We can use the pandas corr() function to find the correlations of columns of numbers, or the correlation between multiple Series.

Let’s say we have the following DataFrame.

df = pd.DataFrame({'Name': ['Jim', 'Sally', 'Bob', 'Sue', 'Jill', 'Larry'],
                   'Weight': [160.20, 160.20, 209.45, 150.35, 187.52, 187.52],
                   'Height': [50.10, 68.94, 71.42, 48.56, 59.37, 63.42] })

print(df)
# Output: 
    Name  Weight  Height
0    Jim  160.20   50.10
1  Sally  160.20   68.94
2    Bob  209.45   71.42
3    Sue  150.35   48.56
4   Jill  187.52   59.37
5  Larry  187.52   63.42

To get the pairwise correlation between the columns “Weight” and “Height”, we can use the pandas corr() function in the following Python code:

print(df["Height"].corr(df["Weight"]))

# Output:
0.6754685833670168

The pandas corr() function allow us to compute a few different types of correlation, namely, Pearson correlation, Kendall Tau correlation, and the Spearman Rank correlation. You can also pass your own function if you’d like.

To calculate these correlation coefficients, just pass method=”kendall” or method=”spearman” to the corr() function.

Note you will have to import the module scipy to find the kendall and spearman coefficients.

df["Height"].corr(df["Weight"], method="pearson")
df["Height"].corr(df["Weight"], method="kendall")
df["Height"].corr(df["Weight"], method="spearman")

Calculating the Correlation between Multiple Columns in pandas

There are many time when analyzing a dataset that we want to see the correlations between all variables. We can use the pandas corr() method to calculate the correlation over all columns.

Let’s say we have the same DataFrame from above, but now we’ve added another column “Age”.

df = pd.DataFrame({'Name': ['Jim', 'Sally', 'Bob', 'Sue', 'Jill', 'Larry'],
                   'Weight': [130.54, 160.20, 209.45, 150.35, 117.73, 187.52],
                   'Height': [50.10, 68.94, 71.42, 48.56, 59.37, 63.42],
                   'Age': [43,23,71,49,52,37] })

print(df)
# Output: 
    Name  Weight  Height  Age
0    Jim  130.54   50.10   43
1  Sally  160.20   68.94   23
2    Bob  209.45   71.42   71
3    Sue  150.35   48.56   49
4   Jill  117.73   59.37   52
5  Larry  187.52   63.42   37

We can get the pairwise correlation coefficients for all columns by calling the corr() function. In this case, the corr() function will return a correlation matrix.

print(df.corr())

#Output:
          Weight    Height       Age
Weight  1.000000  0.666055  0.285006
Height  0.666055  1.000000  0.053793
Age     0.285006  0.053793  1.000000

Finding Correlation with pandas corrwith() function

We can also use the pandas corrwith() function to calculate the correlation coefficient between a DataFrame and columns of another DataFrame or Series.

Let’s say we have the same dataset from above, and let’s say we have another DataFrame that we’d like to see if it is correlated with our DataFrame from the previous example.

df = pd.DataFrame({'Name': ['Jim', 'Sally', 'Bob', 'Sue', 'Jill', 'Larry'],
                   'Weight': [130.54, 160.20, 209.45, 150.35, 117.73, 187.52],
                   'Height': [50.10, 68.94, 71.42, 48.56, 59.37, 63.42],
                   'Age': [43,23,71,49,52,37] })

df_new = pd.DataFrame({'Test_Score':[90,87,92,96,84,79]})

We can find the correlation between the columns of two DataFrames using the pandas corrwith() function.

print(df.corrwith(df_new["Test_Score"]))

#Output:
Weight   -0.016455
Height   -0.359045
Age       0.408819
dtype: float64

Hopefully this article has been helpful for you to understand how to find the correlation coefficients between columns in a DataFrame or between Series using pandas.

Categorized in:

Python,

Last Update: February 26, 2024