To find the largest values in a Series or Dataframe column using pandas, the easiest way is to use the pandas nlargest() function.
df.nlargest(n,"column")
By default, The pandas nlargest() function returns the first n largest rows in the given columns in descending order.
Finding the largest values of a column or Series using pandas is easy. We can use the pandas nlargest() function to find the largest values of a column or numbers.
Let’s say we have the following DataFrame.
df = pd.DataFrame({'Name': ['Jim', 'Sally', 'Bob', 'Sue', 'Jill', 'Larry'],
'Weight': [160.20, 123.81, 209.45, 150.35, 102.43, 187.52]})
print(df)
# Output:
Name Weight
0 Jim 160.20
1 Sally 123.81
2 Bob 209.45
3 Sue 150.35
4 Jill 102.43
5 Larry 187.52
To get the 2 largest values of the numbers in the column “Weight”, we can use the pandas nlargest() function in the following Python code:
print(df.nlargest(2,"Weight"))
# Output:
Name Weight
2 Bob 209.45
5 Larry 187.52
Please note, you can use the pandas nlargest() function on a column or Series with numeric values. If we pass “Name” to nlargest in our example, we will receive an error because the “Name” column is made up of strings.
If you are looking to find the n smallest values, you can use the pandas nsmallest() function
Finding the N Largest Values in a Column using pandas
The nlargest() function has a few different options if there are rows with the same values in your Dataframe.
Let’s say our Dataframe from above has changed a little bit and we now have some values which occur multiple times in the column weight:
df = pd.DataFrame({'Name': ['Jim', 'Sally', 'Bob', 'Sue', 'Jill', 'Larry'],
'Weight': [160.20, 160.20, 209.45, 150.35, 187.52, 187.52]})
print(df)
# Output:
Name Weight
0 Jim 160.20
1 Sally 160.20
2 Bob 209.45
3 Sue 150.35
4 Jill 187.52
5 Larry 187.52
By default, the pandas nlargest() function returns the first occurrence of the nth largest value.
print(df.nlargest(2,"Weight"))
# Output:
Name Weight
2 Bob 209.45
4 Jill 187.52
In this case, since Jill came before Larry, Jill’s row is returned.
If we want to return the last occurrence, we can pass keep=’last’ to nlargest():
print(df.nlargest(2,"Weight", keep='last'))
# Output:
Name Weight
2 Bob 209.45
5 Larry 187.52
If we want to keep all rows which contain values in the nth largest values, we can pass keep=’all’ to nlargest().
print(df.nlargest(2,"Weight", keep='all'))
# Output:
Name Weight
2 Bob 209.45
4 Jill 187.52
5 Larry 187.52
Find the n Largest values over Multiple Columns in Dataframe
We can also use the pandas nlargest() function to find the n largest values over multiple columns. We just need to pass multiple column names to the function.
Let’s say we have another column on the DataFrame from above:
df = pd.DataFrame({'Name': ['Jim', 'Sally', 'Bob', 'Sue', 'Jill', 'Larry'],
'Weight': [160.20, 160.20, 209.45, 150.35, 187.52, 187.52],
'Height': [50.10, 68.94, 71.42, 48.56, 59.37, 63.42] })
print(df)
# Output:
Name Weight Height
0 Jim 160.20 50.10
1 Sally 160.20 68.94
2 Bob 209.45 71.42
3 Sue 150.35 48.56
4 Jill 187.52 59.37
5 Larry 187.52 63.42
To get the largest values for both the “Weight” and “Height” columns, we just need to pass both column names in a list like in the following Python code.
print(df.nlargest(3,["Weight","Height"]))
# Output:
Name Weight Height
2 Bob 209.45 71.42
5 Larry 187.52 63.42
4 Jill 187.52 59.37
This will order the largest values by the first column, then the second column specified, and so on.
Hopefully this article has been helpful for you to understand how to find the largest values in a Series or DataFrame using pandas.