Let's explore pandas.DataFrame.resample with Examples


Python Pandas

Introduction to Pandas DataFrame.resample()

rolling is used to provide the window calculations for the given pandas object. It will take Monthly / weekly or quarterly data etc.. and perform the analysis using statistical functions like mean(), min() , sum() and max().

As the names suggests, min() will return the minimum value from the given range based on the data ( data represents - monthly/weekly or quarterly etc, max() will return the maximum value from the given range based on the data,  sum() will return the total value from the given range based on the data  and mean() will return the average value from the given range based on the data

Syntax:

DataFrame.resample(M/W/Q/T/H).function

Parameters

  1. M - represents Monthly , which resamples the data by month, W - represents Weekly, which resamples the data by week , Q represents Quarterly, which resamples the data by quarter, H  represents Hourly, which resamples the data by an Hour and T represents Time to Time.
  2. function refers mean()/min()/sum()/max()

 

Create sample Pandas DataFrame

Here we are going to create a dataframe with 2 columns, One column represent the date with Hourly frequency and second column represents the values . Totally we are creating the dataframe with 12 rows.

# import pandas
import pandas

# create 12 dates
data = pandas.date_range('2/1/2021', periods=12, freq='H')

# set the data to the dataframe
dataframe = pandas.DataFrame(data=range(12), index=data, columns=['values'])

#display
print(dataframe)

Output:

                     values
2021-02-01 00:00:00       0
2021-02-01 01:00:00       1
2021-02-01 02:00:00       2
2021-02-01 03:00:00       3
2021-02-01 04:00:00       4
2021-02-01 05:00:00       5
2021-02-01 06:00:00       6
2021-02-01 07:00:00       7
2021-02-01 08:00:00       8
2021-02-01 09:00:00       9
2021-02-01 10:00:00      10
2021-02-01 11:00:00      11

 

1. How to resample dataframe on monthly frequency

1.1 Using mean()

Here we are going to resample the data by monthly with mean() function. So we have to use M as parameter in resample() function

Syntax:

dataframe.resample('M').mean()

Example:

In this approach, we are going to create a dataframe with hourly frequency and resample the data with monthly to get average value only. So the output should be the average value on  February 28

# import pandas
import pandas

# create 12 dates
data = pandas.date_range('2/1/2021', periods=12, freq='H')

# set the data to the dataframe
dataframe = pandas.DataFrame(data=range(12), index=data, columns=['values'])

# resample the data on monthly frequency
print(dataframe.resample('M').mean())

Output:

        values
2021-02-28 5.5

 

1.2 Using min()

Here we are going to resample the data by monthly with min() function. So we have to use M as parameter in resample() function

Syntax:

dataframe.resample('M').min()

Example:

In this approach, we are going to create a dataframe with hourly frequency and resample the data with monthly to get minimum value only. So the output should be the minimum value on  February 28.

# import pandas
import pandas

# create 12 dates
data = pandas.date_range('2/1/2021', periods=12, freq='H')

# set the data to the dataframe
dataframe = pandas.DataFrame(data=range(12), index=data, columns=['values'])

# resample the data on monthly frequency
print(dataframe.resample('M').min())

Output:

        values
2021-02-28   0

 

1.3 Using sum()

Here we are going to resample the data by monthly with sum() function. So we have to use M as parameter in resample() function

Syntax:

dataframe.resample('M').sum()

Example:

In this approach, we are going to create a dataframe with hourly frequency and resample the data with monthly to get sum value only. So the output should be the sum value on  February 28

# import pandas
import pandas

# create 12 dates
data = pandas.date_range('2/1/2021', periods=12, freq='H')

# set the data to the dataframe
dataframe = pandas.DataFrame(data=range(12), index=data, columns=['values'])

# resample the data on monthly frequency
print(dataframe.resample('M').sum())

Output:

        values
2021-02-28  66

 

1.4 Using max()

Here we are going to resample the data by monthly with max() function. So we have to use M as parameter in resample() function

Syntax:

dataframe.resample('M').max()

Example:

In this approach, we are going to create a dataframe with hourly frequency and resample the data with monthly to get maximum value only. So the output should be the maximum value on  February 28

# import pandas
import pandas

# create 12 dates
data = pandas.date_range('2/1/2021', periods=12, freq='H')

# set the data to the dataframe
dataframe = pandas.DataFrame(data=range(12), index=data, columns=['values'])

# resample the data on monthly frequency
print(dataframe.resample('M').max())

Output:

        values
2021-02-28  11

 

2. How to resample dataframe on weekly frequency

2.1 Using mean()

Here we are going to resample the data by weekly with mean() function. So we have to use W as parameter in resample() function

Syntax:

dataframe.resample('W').mean()

Example:

In this approach, we are going to create a dataframe with hourly frequency and resample the data with weekly to get average value only. So the output should be the average value on  February 7 - 2021.

# import pandas
import pandas

# create 12 dates
data = pandas.date_range('2/1/2021', periods=12, freq='H')

# set the data to the dataframe
dataframe = pandas.DataFrame(data=range(12), index=data, columns=['values'])

# resample the data on weekly frequency
print(dataframe.resample('W').mean())

Output:

        values
2021-02-07 5.5

 

2.2 Using min()

Here we are going to resample the data by weekly with min() function. So we have to use W as parameter in resample() function

Syntax:

dataframe.resample('W').min()

Example:

In this approach, we are going to create a dataframe with hourly frequency and resample the data with weekly to get minimum value only. So the output should be the minimum value on  February 7 - 2021.

# import pandas
import pandas

# create 12 dates
data = pandas.date_range('2/1/2021', periods=12, freq='H')

# set the data to the dataframe
dataframe = pandas.DataFrame(data=range(12), index=data, columns=['values'])

# resample the data on weekly frequency
print(dataframe.resample('W').min())

Output:

        values
2021-02-07   0

 

2.3 Using sum()

Here we are going to resample the data by weekly with sum() function. So we have to use W as parameter in resample() function

Syntax:

dataframe.resample('W').sum()

Example:

In this approach, we are going to create a dataframe with hourly frequency and resample the data with weekly to get sum/total value only. So the output should be the total value on  February 7 - 2021.

# import pandas
import pandas

# create 12 dates
data = pandas.date_range('2/1/2021', periods=12, freq='H')

# set the data to the dataframe
dataframe = pandas.DataFrame(data=range(12), index=data, columns=['values'])

# resample the data on weekly frequency
print(dataframe.resample('W').sum())

Output:

        values
2021-02-07  66

 

2.4 Using max()

Here we are going to resample the data by weekly with max() function. So we have to use W as parameter in resample() function

Syntax:

dataframe.resample('W').max()

Example:

In this approach, we are going to create a dataframe with hourly frequency and resample the data with weekly to get maximum value only. So the output should be the maximum value on  February 7 - 2021.

# import pandas
import pandas

# create 12 dates
data = pandas.date_range('2/1/2021', periods=12, freq='H')

# set the data to the dataframe
dataframe = pandas.DataFrame(data=range(12), index=data, columns=['values'])

# resample the data on weekly frequency
print(dataframe.resample('W').max())

Output:

        values
2021-02-07  11

 

3. How to resample dataframe on quarterly frequency

3.1 Using mean()

Here we are going to resample the data by quarterly with mean() function. So we have to use Q as parameter in resample() function

Syntax:

dataframe.resample('Q').mean()

Example:

In this approach, we are going to create a dataframe with hourly frequency and resample the data with quarterly to get average value only. So the output should be the average  value on  march  31 - 2021.

# import pandas
import pandas

# create 12 dates
data = pandas.date_range('2/1/2021', periods=12, freq='H')

# set the data to the dataframe
dataframe = pandas.DataFrame(data=range(12), index=data, columns=['values'])

# resample the data on quarterly frequency
print(dataframe.resample('Q').mean())

Output:

        values
2021-03-31 5.5

 

3.2 Using min()

Here we are going to resample the data by quarterly with min() function. So we have to use Q as parameter in resample() function

Syntax:

dataframe.resample('Q').min()

Example:

In this approach, we are going to create a dataframe with hourly frequency and resample the data with quarterly to get minimum value only. So the output should be the minimum value on march  31 - 2021.

# import pandas
import pandas

# create 12 dates
data = pandas.date_range('2/1/2021', periods=12, freq='H')

# set the data to the dataframe
dataframe = pandas.DataFrame(data=range(12), index=data, columns=['values'])

# resample the data on quarterly frequency
print(dataframe.resample('Q').min())

Output:

        values
2021-03-31   0

 

3.3 Using sum()

Here we are going to resample the data by quarterly with sum() function. So we have to use Q as parameter in resample() function

Syntax:

dataframe.resample('Q').sum()

Example:

In this approach, we are going to create a dataframe with hourly frequency and resample the data with quarterly to get sum value only. So the output should be the sum value on march  31 - 2021.

# import pandas
import pandas

# create 12 dates
data = pandas.date_range('2/1/2021', periods=12, freq='H')

# set the data to the dataframe
dataframe = pandas.DataFrame(data=range(12), index=data, columns=['values'])

# resample the data on quarterly frequency
print(dataframe.resample('Q').sum())

Output:

        values
2021-03-31  66

 

3.4 Using max()

Here we are going to resample the data by quarterly with max() function. So we have to use Q as parameter in resample() function

Syntax:

dataframe.resample('Q').max()

Example:

In this approach, we are going to create a dataframe with hourly frequency and resample the data with quarterly to get maximum value only. So the output should be the maximum value on march  31 - 2021.

# import pandas
import pandas

# create 12 dates
data = pandas.date_range('2/1/2021', periods=12, freq='H')

# set the data to the dataframe
dataframe = pandas.DataFrame(data=range(12), index=data, columns=['values'])

# resample the data on quarterly frequency
print(dataframe.resample('Q').max())

Output:

        values
2021-03-31  11

 

Miscellaneous examples using pandas resample()

Example 1: Get Time (T)  based sampling

Here we are going to resample the data by quarterly with all functions. So we have to use T as parameter in resample() function

In this approach, we are going to create a dataframe with hourly frequency and resample the data with timely.

# import pandas
import pandas

# create 12 dates
data = pandas.date_range('2/1/2021', periods=12, freq='H')

# set the data to the dataframe
dataframe = pandas.DataFrame(data=range(12), index=data, columns=['values'])

# resample the data on Time
print(dataframe.resample('T').max())

# resample the data on Time
print(dataframe.resample('T').mean())

# resample the data on Time
print(dataframe.resample('T').min())

# resample the data on Time
print(dataframe.resample('T').sum())

# resample the data on Time
print(dataframe.resample('4T').max())

# resample the data on Time
print(dataframe.resample('4T').mean())

# resample the data on Time
print(dataframe.resample('4T').min())

# resample the data on Time
print(dataframe.resample('4T').sum())

Output:

[661 rows x 1 columns]
                     values
2021-02-01 00:00:00     0.0
2021-02-01 00:04:00     NaN
2021-02-01 00:08:00     NaN
2021-02-01 00:12:00     NaN
2021-02-01 00:16:00     NaN
...                     ...
2021-02-01 10:44:00     NaN
2021-02-01 10:48:00     NaN
2021-02-01 10:52:00     NaN
2021-02-01 10:56:00     NaN
2021-02-01 11:00:00    11.0

[166 rows x 1 columns]
                     values
2021-02-01 00:00:00     0.0
2021-02-01 00:04:00     NaN
2021-02-01 00:08:00     NaN
2021-02-01 00:12:00     NaN
2021-02-01 00:16:00     NaN
...                     ...
2021-02-01 10:44:00     NaN
2021-02-01 10:48:00     NaN
2021-02-01 10:52:00     NaN
2021-02-01 10:56:00     NaN
2021-02-01 11:00:00    11.0

[166 rows x 1 columns]
                     values
2021-02-01 00:00:00     0.0
2021-02-01 00:04:00     NaN
2021-02-01 00:08:00     NaN
2021-02-01 00:12:00     NaN
2021-02-01 00:16:00     NaN
...                     ...
2021-02-01 10:44:00     NaN
2021-02-01 10:48:00     NaN
2021-02-01 10:52:00     NaN
2021-02-01 10:56:00     NaN
2021-02-01 11:00:00    11.0

[166 rows x 1 columns]
                     values
2021-02-01 00:00:00       0
2021-02-01 00:04:00       0
2021-02-01 00:08:00       0
2021-02-01 00:12:00       0
2021-02-01 00:16:00       0
...                     ...
2021-02-01 10:44:00       0
2021-02-01 10:48:00       0
2021-02-01 10:52:00       0
2021-02-01 10:56:00       0
2021-02-01 11:00:00      11

[166 rows x 1 columns]

 

Example 2: Get Hourly (H)  based sampling

Here we are going to resample the data by quarterly with all functions. So we have to use H as parameter in resample() function

In this approach, we are going to create a dataframe with hourly frequency and resample the data with hourly.

# import pandas
import pandas

# create 12 dates
data = pandas.date_range('2/1/2021', periods=12, freq='H')

# set the data to the dataframe
dataframe = pandas.DataFrame(data=range(12), index=data, columns=['values'])

# resample the data on hour
print(dataframe.resample('H').max())

# resample the data on hour
print(dataframe.resample('H').mean())

# resample the data on hour
print(dataframe.resample('H').min())

# resample the data on hour
print(dataframe.resample('H').sum())

# resample the data on hour
print(dataframe.resample('4H').max())

# resample the data on hour
print(dataframe.resample('4H').mean())

# resample the data on hour
print(dataframe.resample('4H').min())

# resample the data on hour
print(dataframe.resample('4H').sum())

Output:

             values
2021-02-01 00:00:00       0
2021-02-01 01:00:00       1
2021-02-01 02:00:00       2
2021-02-01 03:00:00       3
2021-02-01 04:00:00       4
2021-02-01 05:00:00       5
2021-02-01 06:00:00       6
2021-02-01 07:00:00       7
2021-02-01 08:00:00       8
2021-02-01 09:00:00       9
2021-02-01 10:00:00      10
2021-02-01 11:00:00      11
                     values
2021-02-01 00:00:00       0
2021-02-01 01:00:00       1
2021-02-01 02:00:00       2
2021-02-01 03:00:00       3
2021-02-01 04:00:00       4
2021-02-01 05:00:00       5
2021-02-01 06:00:00       6
2021-02-01 07:00:00       7
2021-02-01 08:00:00       8
2021-02-01 09:00:00       9
2021-02-01 10:00:00      10
2021-02-01 11:00:00      11
                     values
2021-02-01 00:00:00       0
2021-02-01 01:00:00       1
2021-02-01 02:00:00       2
2021-02-01 03:00:00       3
2021-02-01 04:00:00       4
2021-02-01 05:00:00       5
2021-02-01 06:00:00       6
2021-02-01 07:00:00       7
2021-02-01 08:00:00       8
2021-02-01 09:00:00       9
2021-02-01 10:00:00      10
2021-02-01 11:00:00      11
                     values
2021-02-01 00:00:00       0
2021-02-01 01:00:00       1
2021-02-01 02:00:00       2
2021-02-01 03:00:00       3
2021-02-01 04:00:00       4
2021-02-01 05:00:00       5
2021-02-01 06:00:00       6
2021-02-01 07:00:00       7
2021-02-01 08:00:00       8
2021-02-01 09:00:00       9
2021-02-01 10:00:00      10
2021-02-01 11:00:00      11
                     values
2021-02-01 00:00:00       3
2021-02-01 04:00:00       7
2021-02-01 08:00:00      11
                     values
2021-02-01 00:00:00     1.5
2021-02-01 04:00:00     5.5
2021-02-01 08:00:00     9.5
                     values
2021-02-01 00:00:00       0
2021-02-01 04:00:00       4
2021-02-01 08:00:00       8
                     values
2021-02-01 00:00:00       6
2021-02-01 04:00:00      22
2021-02-01 08:00:00      38

 

Summary

In this tutorial we discussed how to use resample() function and covered all the calculations - sum(), min() , min() and mean() for monthly, weekly and quarterly.

Frequency conversion provides basic conversion of data using the new frequency intervals and allows the filling of missing data using either NaN, forward filling, or backward filling. More elaborate control is provided through the process of resampling.

Resampling can be either downsampling, where data is converted to wider frequency ranges (such as downsampling from day-to-day to month-to-month) or upsampling, where data is converted to narrower time ranges. Data for the associated labels are then calculated by a function provided to pandas instead of simple filling.

 

References

Pandas - resample

 

Deepak Prasad

Deepak Prasad

He is the founder of GoLinuxCloud and brings over a decade of expertise in Linux, Python, Go, Laravel, DevOps, Kubernetes, Git, Shell scripting, OpenShift, AWS, Networking, and Security. With extensive experience, he excels in various domains, from development to DevOps, Networking, and Security, ensuring robust and efficient solutions for diverse projects. You can connect with him on his LinkedIn profile.

Can't find what you're searching for? Let us assist you.

Enter your query below, and we'll provide instant results tailored to your needs.

If my articles on GoLinuxCloud has helped you, kindly consider buying me a coffee as a token of appreciation.

Buy GoLinuxCloud a Coffee

For any other feedbacks or questions you can send mail to admin@golinuxcloud.com

Thank You for your support!!

Leave a Comment