To calculate the cumulative sum over columns in a DataFrame, or the cumulative sum of the values of a Series in pandas, the easiest way is to use the pandas cumsum() function.

df.cumsum() # Calculate cumulative sum for all columns
df["Column"].cumsum() #calculate cumulative sum for 1 column

You can also use the numpy cumsum() function to calculate the cumulative sum for a column or Series.

np.cumsum(df["Column"])

When working with data, many times we want to calculate summary statistics to understand our data better. One such statistic is the cumulative sum, or the additive total of a list of numbers after each element of the list.

Finding the cumulative sum of a column, or the cumulative sum for all columns or rows in a DataFrame using pandas is easy. We can use the pandas cumsum() function to find the cumulative sum of a column of numbers, or for all columns or rows in a DataFrame.

Let’s say we have the following DataFrame.

df = pd.DataFrame({'Age': [43,23,71,49,52,37], 
      'Test_Score':[90,87,92,96,84,79]})

print(df)
# Output: 
   Age  Test_Score
0   43          90
1   23          87
2   71          92
3   49          96
4   52          84
5   37          79

To get the cumulative sum for all columns, we can call the pandas cumsum() function.

print(df.cumsum())

# Output:
   Age  Test_Score
0   43          90
1   66         177
2  137         269
3  186         365
4  238         449
5  275         528

If we only want to get the cumulative sum of just one column, we can do this using the pandas cumsum() function in the following Python code:

print(df["Test_Score"].cumsum())

# Output:
0     90
1    177
2    269
3    365
4    449
5    528
Name: Test_Score, dtype: int64

Calculating the Cumulative Sum by Row in pandas DataFrame

We can also calculate the cumulative sum by row by pass “axis=1” to the cumsum() function.

Below is how to find the cumulative sum across the rows of a pandas DataFrame using the same DataFrame from above.

print(df.cumsum(axis=1))

# Output:
   Age  Test_Score
0   43         133
1   23         110
2   71         163
3   49         145
4   52         136
5   37         116

Calculating the Cumulative Sum of a Column With Missing Values

As many of us know, when working with data, sometimes we have to work with messy data or data with missing values. Let’s take our DataFrame from above and add a few NaN values.

df = pd.DataFrame({'Age': [43,np.NaN,71,49,np.NaN,37], 
      'Test_Score':[90,87,92,np.NaN,84,79]})

print(df)
# Output: 
    Age  Test_Score
0  43.0        90.0
1   NaN        87.0
2  71.0        92.0
3  49.0         NaN
4   NaN        84.0
5  37.0        79.0

If you want to calculate the cumulative sum of a column with missing values, by default, the cumsum() function will ignore those missing values.

print(df.cumsum())

# Output:
     Age  Test_Score
0   43.0        90.0
1    NaN       177.0
2  114.0       269.0
3  163.0         NaN
4    NaN       353.0
5  200.0       432.0

If you want the cumsum() function to include the NaN values in the sum operation, you can pass “skipna=False”.

print(df.cumsum(skipna=False))

# Output:
    Age  Test_Score
0  43.0        90.0
1   NaN       177.0
2   NaN       269.0
3   NaN         NaN
4   NaN         NaN
5   NaN         NaN

Using the numpy cumsum() Function to Calculate Cumulative Sum of a Column

We can also use the numpy cumsum() function to calculate the cumulative sum of columns in a pandas DataFrame.

Let’s say we have the same dataset from above.

To get the cumulative sum of the numbers in the column “Test_Score”, we can use the numpy cumsum() function in the following Python code:

print(np.cumsum(df["Test_Score"]))

# Output:
0     90
1    177
2    269
3    365
4    449
5    528
Name: Test_Score, dtype: int64

As you can see above, this is the same value we received from the pandas cumsum() function.

Hopefully this article has been helpful for you to understand how to find the cumulative sum of numbers in a Series or columns in a DataFrame in pandas.

Categorized in:

Python,

Last Update: March 20, 2024