To find the sum of columns in a DataFrame, or the sum of the values of a Series in pandas, the easiest way is to use the pandas sum() function.
df.sum() # Calculate sum for all columns
df["Column"].sum() #calculate sum for 1 column
You can also use the numpy sum() function.
np.sum(df["Column"]) #calculate sum for 1 column
When working with data, many times we want to calculate summary statistics to understand our data better. One such statistic is the sum, or the additive total of a list of numbers.
Finding the sum of a column, or the sum for all columns in a DataFrame using pandas is easy. We can use the pandas sum() function to find the total of a column of numbers, or a DataFrame.
Let’s say we have the following DataFrame.
df = pd.DataFrame({'Age': [43,23,71,49,52,37],
'Test_Score':[90,87,92,96,84,79]})
print(df)
# Output:
Age Test_Score
0 43 90
1 23 87
2 71 92
3 49 96
4 52 84
5 37 79
To get the sum for all columns, we can call the pandas sum() function.
print(df.sum())
# Output:
Age 275
Test_Score 528
dtype: int64
If we only want to get the sum of just one column, we can do this using the pandas sum() function in the following Python code:
print(df["Test_Score"].sum())
# Output:
528
If you want to see how the sum is calculated step by step, you can use the pandas cumsum() function and return a Series for each column with the cumulative sum at each point.
Using numpy sum to Calculate a Sum in pandas DataFrame
We can also use the numpy sum() function to calculate the sum of the numbers in a column in a pandas DataFrame.
To get the sum of the numbers in the column “Test_Score”, we can use the numpy sum() function in the following Python code:
print(np.sum(df["Test_Score"]))
# Output:
528
As you can see above, this is the same value we received from the pandas sum() function.
Hopefully this article has been helpful for you to understand how to find the sum of numbers in a Series or DataFrame in pandas.