Introduction to python reset_index()
The reset_index()
function in pandas is a tool for resetting the index of a DataFrame to a default integer index, which is 0, 1, 2, ..., n (where n is the length of the DataFrame - 1).
Indexes in pandas are the labels for rows. They are essentially a 'name' given to each row. They can be numbers, dates, times, or even strings. The index of a DataFrame provides a unique identifier for each row.
However, in the process of data manipulation (like group by operations, sorting, slicing, merging, concatenating, etc.), these indexes can become disordered, or maybe the existing index isn't suitable for a particular analysis. This is when reset_index()
comes into play.
By default, reset_index()
transforms the index into a new column in the DataFrame and creates a new index with default integer values. This is particularly useful when the index needs to be treated as a regular column, or when the index is meaningless and needs to be reset to the default before a new one is set.
import pandas as pd
# Creating a DataFrame
df = pd.DataFrame({"A": [1, 2, 3]}, index=['x', 'y', 'z'])
print(df)
# Output:
# A
# x 1
# y 2
# z 3
# Resetting the index
df_reset = df.reset_index()
print(df_reset)
# Output:
# index A
# 0 x 1
# 1 y 2
# 2 z 3
In this example, reset_index()
moves the index (x, y, z) into a new column named 'index', and creates a new integer index.
Syntax and Parameters
The reset_index()
function in pandas is used to reset the index of a DataFrame or Series. The syntax of the function is:
DataFrame.reset_index(level=None, drop=False, inplace=False, col_level=0, col_fill='')
The function takes the following parameters:
- level: This parameter is used when you have a DataFrame with MultiIndex (hierarchical index). You can specify which levels you want to reset. The
level
can be int, str, tuple, or list. Default is None, which means to reset all levels. - drop: This is a boolean parameter. If you set
drop=True
, the function will not insert the old index as a new column in the DataFrame; it will simply remove it, and a default numbered index will be set. This is useful when you don't want to keep the index values. By default, it is set to False. - inplace: Another boolean parameter, when set to True, the function will not return a new DataFrame, but it will reset the index of the original DataFrame in place. By default, it is set to False, which means the function does return a new DataFrame.
- col_level: This is used when you have a DataFrame with MultiIndex columns. It specifies the level(s) to which the new column(s) (created when
drop=False
) will be added. If the columns have multiple levels, you can specify at which level the new column should be inserted. By default, it is 0, which means to insert into the first level. - col_fill: This parameter is also for MultiIndex columns. If the columns have multiple levels, the new index level(s) is (are) created under the new column(s). If the new index level(s) do not have a name, you can specify one using
col_fill
.
Some Practical Examples
Example-1: Change column name
In this example, the current index is reset by calling reset_index()
and the resulting DataFrame has a column named 'index' which holds the old index values. Then the rename()
function is used to rename the 'index
' column to 'original_index
'.
import pandas as pd
# Create a sample DataFrame
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]}, index=['a', 'b', 'c'])
# Reset the index
df = df.reset_index()
# Rename the 'index' column to 'original_index'
df = df.rename(columns={'index': 'original_index'})
# Print the DataFrame
print(df)
# original_index A B C
# 0 a 1 4 7
# 1 b 2 5 8
# 2 c 3 6 9
Example-2: Start the index at 1
By default, when you reset the index of a DataFrame in Pandas using the reset_index()
function, the new index starts at 0. If you want the new index to start at 1, you can use the rename()
function to change the index values after resetting the index.
In this example, the current index is reset by calling reset_index()
and all the index values are incremented by 1
.
import pandas as pd
# Create a sample DataFrame
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]}, index=['a', 'b', 'c'])
# Reset the index
df = df.reset_index()
# Add 1 to all index values
df.index = df.index + 1
# Print the DataFrame
print(df)
# index A B C
# 1 a 1 4 7
# 2 b 2 5 8
# 3 c 3 6 9
Example-3: Reset the index after groupby()
In Pandas, when you use the groupby()
function to group a DataFrame by one or more columns, the resulting DataFrame has the grouping columns as the index.
If you want to reset the index of the grouped DataFrame, you can use the reset_index()
function after calling the groupby()
function.
Here is an example:
import pandas as pd
# Create a sample DataFrame
df = pd.DataFrame({'A': [1, 2, 3, 1, 2, 3],
'B': [4, 5, 6, 4, 5, 6],
'C': [7, 8, 9, 7, 8, 9]})
# Group the DataFrame by column 'A'
grouped_df = df.groupby('A').sum()
# Reset the index
grouped_df = grouped_df.reset_index()
# Print the grouped DataFrame
print(grouped_df)
# A B C
# 0 1 8 14
# 1 2 10 13
# 2 3 12 15
Example-4: Reset Pandas series index
A pandas series is a one-dimensional labeled array that can contain any data type. Similar to DataFrame , series have indices. This is the label or name given to each item in the series.
The reset_index()
function can also be used to reset the series index, similar to DataFrame. Calling this function replaces the current index of the series with a standard 0-based integer index. The current index is added to the series as a new column called "Index
".
Here is an example of using the reset_index()
function for a series.
import pandas as pd
# Create a sample Series
s = pd.Series([1, 2, 3], index=['a', 'b', 'c'])
# Print the Series
print(s)
# a 1
# b 2
# c 3
# dtype: int64
# Reset the index
s = s.reset_index()
# Print the Series
print(s)
# index 0
# 0 a 1
# 1 b 2
# 2 c 3
In this example, the original Series has an index of ['a', 'b', 'c']. When we call reset_index()
, the current index is replaced with a default integer index, starting from 0, and the current index is added as a column to the Series with the name 'index'.
Example-5: Reset index after sort()
In Pandas, when you sort a DataFrame or Series using the sort_values()
function, the resulting DataFrame or Series will contain new indices based on the sorted order. If you want to reset the index of a sorted DataFrame or sorted series, you can use the reset_index()
function after calling the sort_values()
function.
Here is an example of using the reset_index()
function on a DataFrame after sorting.
import pandas as pd
# Create a sample DataFrame
df = pd.DataFrame({'A': [3, 2, 1], 'B': [6, 5, 4], 'C': [9, 8, 7]}, index=['c', 'b', 'a'])
# Sort the DataFrame by column 'A'
sorted_df = df.sort_values('A')
# Reset the index
sorted_df = sorted_df.reset_index()
# Print the sorted DataFrame
print(sorted_df)
# index A B C
# 0 a 1 4 7
# 1 b 2 5 8
# 2 c 3 6 9
In this example, the DataFrame is sorted by the column 'A' using the sort_values()
function and the resulting DataFrame has a new index based on the sorted order. The reset_index()
function is then used to reset the index and the original index values are added to the DataFrame as a new column named 'index'.
Example-6: Reset index after filter
In Pandas, when you filter a DataFrame using query()
or Boolean index, the resulting DataFrame keeps the original index, but the filtering operation only shows the rows that match the filter condition.
If you want to reset the index of the filtered DataFrame, you can use the reset_index()
function after applying the filter.
Here is an example of using the reset_index()
function on a DataFrame after filtering with the query()
function.
import pandas as pd
# Create a sample DataFrame
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]}, index=['a', 'b', 'c'])
# Filter the DataFrame
filtered_df = df.query("A > 1")
# Reset the index
filtered_df = filtered_df.reset_index()
# Print the filtered DataFrame
print(filtered_df)
# index A B C
# 0 b 2 5 8
# 1 c 3 6 9
In this example, the DataFrame is filtered by the query function, only rows where A>1 are selected, the resulting DataFrame has a new index based on the filtered order. The reset_index()
function is then used to reset the index and the original index values are added to the DataFrame as a new column named 'index'.
Summary
Pandas' reset_index()
function is used to reset the index of a DataFrame or Series. Calling this function replaces the current index with a standard 0-based integer index and adds the current index as a new column named 'index' to the DataFrame or Series (if you don't use the drop=True
parameter ).
It is used to reset the indices of a DataFrame or Series when the current indices are no longer needed or when new indices are needed. For example, when you sort or group a DataFrame, the resulting DataFrame will have new indices based on the sorting or grouping order. In this case, you can reset the index to the original index or the default integer index.
This is also used after filtering the DataFrame. When you filter a DataFrame using the query()
function or boolean index, the resulting DataFrame keeps the original index, but the filtering operation only shows the rows that match the filter condition. To keep the filtered DataFrame in a consistent format, it is recommended to reset the index so that the index starts from 0 again.
Additionally, resetting the index makes the data easier to identify and manipulate, making the data frame more readable and consistent.
In summary, the reset_index
function is used to reset the index of the dataframe to the default integer index starting from 0 or add the current index as a new column to the dataframe. This is used after sorting, grouping, filtering, and other operations that change the index of the data frame, and resets the index to a consistent, readable format.
References