In Python, we can use the pandas resample() function to resample time series data in a DataFrame or Series object. Resampling is a technique which allows you to increase or decrease the frequency of your time series data.
Let’s say we have the following time series data.
import pandas as pd
import numpy as np
df = pd.DataFrame({'time':pd.date_range(start='05-01-2022',end='06-30-2022', freq="D"), 'value':np.random.randint(10,size=61)})
print(df.head(10))
#Output:
time value
0 2022-05-01 2
1 2022-05-02 4
2 2022-05-03 7
3 2022-05-04 9
4 2022-05-05 6
5 2022-05-06 9
6 2022-05-07 2
7 2022-05-08 4
8 2022-05-09 2
9 2022-05-10 1
You can resample this daily data to monthly data with resample() as shown below.
df.set_index('time', inplace=True)
resampled_df = df.resample('M').mean()
print(resampled_df)
#Output:
value
time
2022-05-31 4.741935
2022-06-30 3.300000
When working with time series data, the ability to change the frequency of the data can be very useful.
The Python pandas module gives us many great tools for working with time series data. We can use the pandas resample() function to resample time series data easily.
Resampling is a technique which allows you to increase the frequency of your time series data or decrease the frequency of your time series data.
Increasing the frequency of your time series data, or upsampling, would be like taking monthly data and making it daily data. Resampling in this case would perform enable to you perform interpolation of your time series data.
Decreasing the frequency of time series data, or downsampling, would be like taking daily data and smoothing it to monthly data.
From the documentation, you can read about the many different ways you can use resample().
In the rest of this article, you’ll learn how to resample time series data in a few of the very common ways with the pandas resample() function.
How to Resample Time Series Data and Interpolate with the pandas resample() Function
One way we can use resample() is to increase the frequency of our time series data. To increasing the frequency of our time series data is called upsampling. This is like taking monthly data and making it daily.
Let’s say we have the following data which has data points every 12 hours.
import pandas as pd
import numpy as np
df = pd.DataFrame({'time':pd.date_range(start='05-01-2022',end='05-31-2022', freq="12H"), 'value':np.random.randint(10,size=61)})
print(df.head(10))
#Output:
time value
0 2022-05-01 00:00:00 1
1 2022-05-01 12:00:00 7
2 2022-05-02 00:00:00 9
3 2022-05-02 12:00:00 8
4 2022-05-03 00:00:00 9
5 2022-05-03 12:00:00 0
6 2022-05-04 00:00:00 6
7 2022-05-04 12:00:00 3
8 2022-05-05 00:00:00 7
9 2022-05-05 12:00:00 6
Let’s increase the frequency of our data to every 6 hours with resample(). First, we need to set the date time column as the index, and then we can resample.
Then, we can increase the frequency of our data by passing “6H” to resample().
df.set_index('time', inplace=True)
resampled_df = df.resample("6H").mean()
print(resampled_df.head(10))
#Output:
value
time
2022-05-01 00:00:00 1.0
2022-05-01 06:00:00 NaN
2022-05-01 12:00:00 7.0
2022-05-01 18:00:00 NaN
2022-05-02 00:00:00 9.0
2022-05-02 06:00:00 NaN
2022-05-02 12:00:00 8.0
2022-05-02 18:00:00 NaN
2022-05-03 00:00:00 9.0
2022-05-03 06:00:00 NaN
As you can see, we’ve now added datapoints between the datapoints which previously existed, but the values for these datapoints are NaN.
For interpolation and filling the NaN values, we have a few options. We can use the bfill() function which will “back fill” the NaN values.
resampled_df = df.resample("6H").bfill()
print(resampled_df.head(10))
#Output:
value
time
2022-05-01 00:00:00 1
2022-05-01 06:00:00 7
2022-05-01 12:00:00 7
2022-05-01 18:00:00 9
2022-05-02 00:00:00 9
2022-05-02 06:00:00 8
2022-05-02 12:00:00 8
2022-05-02 18:00:00 9
2022-05-03 00:00:00 9
2022-05-03 06:00:00 0
You can also use ffill() to “forward fill” the NaN values.
If you want to use interpolation, then you can use the pandas interpolate() function to interpolate and fill the NaN values in the newly created time series.
Below is an example of how you can interpolate a time series in pandas with the pandas resample() function.
resampled_df = df.resample("6H").interpolate(method="linear")
print(resampled_df.head(10))
#Output:
value
time
2022-05-01 00:00:00 1.0
2022-05-01 06:00:00 4.0
2022-05-01 12:00:00 7.0
2022-05-01 18:00:00 8.0
2022-05-02 00:00:00 9.0
2022-05-02 06:00:00 8.5
2022-05-02 12:00:00 8.0
2022-05-02 18:00:00 8.5
2022-05-03 00:00:00 9.0
2022-05-03 06:00:00 4.5
How to Resample Time Series Data and Aggregate Data with the pandas resample() Function
You can also use resample() to decrease the frequency of your time series data. Decreasing the frequency of your times series data is called downsampling and is like if you go from daily data to monthly data.
Let’s say we have same dataset from above with datapoints every 12 hours.
To resample this data and convert it to daily data, we can use resample() and pass “D” for days as the new frequency. Let’s also aggregate the resampled data and get the sum for each day.
Below is how you can downsample and aggregate time series data with the pandas resample() function.
resampled_df = df.resample('D').sum()
print(resampled_df.head(10))
#Output:
value
time
2022-05-01 8
2022-05-02 17
2022-05-03 9
2022-05-04 9
2022-05-05 13
2022-05-06 5
2022-05-07 9
2022-05-08 10
2022-05-09 8
2022-05-10 6
Hopefully this article has been useful for you to learn how to resample time series data in Python with the pandas resample() function.