To find percentiles of a numeric column in a DataFrame, or the percentiles of a Series in pandas, the easiest way is to use the pandas quantile() function.
df.quantile(0.25)
You can also use the numpy percentile() function.
np.percentile(df["Column"], 25)
When working with data, many times we want to calculate summary statistics to understand our data better. Percentiles, or quantiles, are very important for us to understand how the data is distributed.
Finding the percentile for a given column, or the quantile for all columns or rows in a DataFrame using pandas is easy. We can use the pandas quantile() function to find various quantile values of a column of numbers, or a DataFrame.
Let’s say we have the following DataFrame.
df = pd.DataFrame({'Age': [43,23,71,49,52,37],
'Test_Score':[90,87,92,96,84,79]})
print(df)
# Output:
Age Test_Score
0 43 90
1 23 87
2 71 92
3 49 96
4 52 84
5 37 79
To get the the 50th quantile, or the median, for all columns, we can call the pandas quantile() function and pass 0.5.
print(df.quantile(0.5))
# Output:
Age 46.0
Test_Score 88.5
Name: 0.5, dtype: float64
If we only want to get the percentile of one column, we can do this using the pandas quantile() function in the following Python code:
print(df["Test_Score"].quantile(0.5))
# Output:
88.5
Calculating Multiple Percentiles at Once with pandas
We can use the pandas quantile() function to calculate multiple percentiles at once. To calculate multiple quantiles, we pass a list of quantile values to the quantile() function.
Let’s say we have the same data from above. Let’s calculate the 25th, 50th and 75th percentiles of our data.
print(df.quantile([0.25,0.5,0.75]))
# Output:
Age Test_Score
0.25 38.50 84.75
0.50 46.00 88.50
0.75 51.25 91.50
Using numpy percentile to Calculate Medians in pandas DataFrame
We can also use the numpy percentile() function to calculate percentile values for the columns in our pandas DataFrames.
Let’s get the 25th, 50th, and 75th percentiles of the “Test_Score” column using the numpy percentile() function. We can do this easily in the following Python code. The difference here is that you need to pass integer values instead of decimal values (i.e. 50 instead of 0.50).
print(np.percentile(df["Test_Score"],[25,50,75]))
# Output:
[84.75 88.5 91.5]
As you can see above, this is the same value we received from the pandas quantile() function.
Hopefully this article has been helpful for you to understand how to find percentiles of numbers in a Series or DataFrame in pandas.