Pandas rename column using DataFrame.rename() function

Pandas is an open source Python library for data analysis. It gives Python the ability to work with spreadsheet-like data for fast data loading, manipulating, aligning, and merging, among other functions.

To give Python these enhanced features, Pandas introduces two new data types to Python:

  • Series: It represents single column of the DataFrame
  • DataFrame: It represents your entire spreadsheet or rectangular data

A Pandas DataFrame can also be thought of as a dictionary or collection of Series objects.

 

Install Python Panda Module

It is possible by default panda module is not installed on your Linux environment

ModuleNotFoundError: No module named 'pandas'

So you must first install this module. Since I am using RHEL 8, I will use dnf

# dnf -y install python3-pandas.x86_64

HINT: On RHEL 8 environment, I was getting nothing provides libqhull.so.7()(64bit) needed by python3-matplotlib-3.0.3-3.el8.x86_64 error while installing python3-panda. To overcome this we need to enable "codeready-builder" repo using subscription-manager repos --enable "codeready-builder-for-rhel-8-{ARCH}-rpms". Here, replace {ARCH} with your architecture value

 

Loading your dataset

  • When given a data set, we first load it and begin looking at its structure and contents. The simplest way of looking at a data set is to examine and subset specific rows and columns
  • Since Pandas is not part of the Python standard library, we have to first tell Python to load (import) the library.
  • When working with Pandas functions, it is common practice to give pandas the alias pd
import pandas as pd
  • With the library loaded, we can use the read_csv function to load a CSV data file. To access the read_csv function from Pandas, we use dot notation.
  • I have created a sample csv file (cars.csv) for this tutorial (separated by comma char), by default the read_csv function will read a comma-separated file:
#!/usr/bin/env python3

import pandas as pd

# Pass the filename to the dataset to read_csv
# and assign the resulting DataFrame to variable 
df = pd.read_csv('/root/cars.csv')

# Print the content of cars.csv
print(df)

Output from this script:

# python3 /tmp/dataframe_ex.py
    Company       Car Country       State   Owner
0  Mahindra    XUV300   India   Karnataka  Deepak
1      Tata     Nexon   India  Tamil Nadu    Amit
2   Hyundai     Creta   India   New Delhi   Rahul
3    Maruti    Brezza   India       Bihar  Saahil
4      Ford  Ecosport   India      Kerela    Anup

 

pandas.DataFrame.rename

  • The rename DataFrame method accepts dictionaries that map the old value to the new value.
  • We can use this function to rename single or multiple columns in Python DataFrame
  • The rename() method provides an inplace named parameter that is by default False and copies the underlying data.
  • We can pass inplace=True to rename the data in place.

Syntax:

DataFrame.rename(self, mapper=None, index=None, columns=None, axis=None, copy=True, inplace=False, level=None, errors='ignore')

The provided values in Syntax are default values unless provided other wise. This will return DataFrame with the renamed axis labels.

Parameters:
mapper: dict-like or function
Dict-like or functions transformations to apply to that axis’ values. Use either mapper and axis to specify the axis to target with mapper, or index and columns.

index: dict-like or function
Alternative to specifying axis (mapper, axis=0 is equivalent to index=mapper).

columns: dict-like or function
Alternative to specifying axis (mapper, axis=1 is equivalent to columns=mapper).

axis: int or str
Axis to target with mapper. Can be either the axis name (‘index’, ‘columns’) or number (0, 1). The default is ‘index’.

copy: bool, default True
Also copy underlying data.

inplace: bool, default False
Whether to return a new DataFrame. If True then value of copy is ignored.

level: int or level name, default None
In case of a MultiIndex, only rename labels in the specified level.

errors: {‘ignore’, ‘raise’}, default ‘ignore’

 

Method 1: Using column label

Rename single column

The syntax to replace single column within the same file would be:

pandas.DataFrame.rename(columns = {'<old_value>':'<new_value>'}, inplace = True/False)

In this example we will rename Column name "Company" with "Brand"

#!/usr/bin/env python3

import pandas as pd

# Pass the filename to the dataset to read_csv
# and assign the resulting DataFrame to variable
df = pd.read_csv('/root/cars.csv')

# Rename Single Column
df.rename(columns = {'Company':'Brand'}, inplace = True)

# print the output
print(df)

Execute the script and you can check that our inplace rename for column was successful:

# python3 /tmp/dataframe_ex.py
      Brand       Car Country       State   Owner
0  Mahindra    XUV300   India   Karnataka  Deepak
1      Tata     Nexon   India  Tamil Nadu    Amit
2   Hyundai     Creta   India   New Delhi   Rahul
3    Maruti    Brezza   India       Bihar  Saahil
4      Ford  Ecosport   India      Kerela    Anup

 

Rename multiple columns

To rename multiple columns inplace we will use below syntax:

pandas.DataFrame.rename(columns = {'<old_value_1>':'<new_value_1>', '<old_value_2>':'<new_value_2>'}, inplace = True/False)

We just need to add <old_value>:<new_value> section within the curl braces for columns to replace the same at multiple places
In this example we replace column name Company with Brand and Car with SUV:

#!/usr/bin/env python3

import pandas as pd

# Pass the filename to the dataset to read_csv
# and assign the resulting DataFrame to variable
df = pd.read_csv('/root/cars.csv')

# Rename Multiple Columns
df.rename(columns = {'Company':'Brand', 'Car':'SUV'}, inplace = True)

# print the output
print(df)

Output from this script confirms the replacement was successful:

# python3 /tmp/dataframe_ex.py
      Brand       SUV Country       State   Owner
0  Mahindra    XUV300   India   Karnataka  Deepak
1      Tata     Nexon   India  Tamil Nadu    Amit
2   Hyundai     Creta   India   New Delhi   Rahul
3    Maruti    Brezza   India       Bihar  Saahil
4      Ford  Ecosport   India      Kerela    Anup

 

Method 2: Using axis-style

DataFrame.rename() also supports an “axis-style” calling convention, where you specify a single mapper and the axis to apply that mapping to.

The syntax to use this method would be:

pandas.DataFrame.rename({'<old_value_1>':'<new_value_1>', '<old_value_2>':'<new_value_2>'}, axis='columns', inplace = True/False)

In this sample python script I will replace two column values from Company to Brand and Car to SUV

#!/usr/bin/env python3

import pandas as pd

# Pass the filename to the dataset to read_csv
# and assign the resulting DataFrame to variable
df = pd.read_csv('/root/cars.csv')

# Rename Multiple Columns
df.rename({'Company': 'Brand', 'Car':'SUV'}, axis='columns',  inplace = True)

# print the output
print(df)

Output from this script:

# python3 /tmp/dataframe_ex.py
      Brand       SUV Country       State   Owner
0  Mahindra    XUV300   India   Karnataka  Deepak
1      Tata     Nexon   India  Tamil Nadu    Amit
2   Hyundai     Creta   India   New Delhi   Rahul
3    Maruti    Brezza   India       Bihar  Saahil
4      Ford  Ecosport   India      Kerela    Anup

 

Some more examples:

In this example we will change all headers to Uppercase character using str.upper with rename() function

#!/usr/bin/env python3

import pandas as pd

# Pass the filename to the dataset to read_csv
# and assign the resulting DataFrame to variable
df = pd.read_csv('/root/cars.csv')

# Rename Multiple Columns
df.rename(str.upper, axis='columns', inplace = True)

# print the output
print(df)

Output from this script:

# python3 /tmp/dataframe_ex.py
    COMPANY       CAR COUNTRY       STATE   OWNER
0  Mahindra    XUV300   India   Karnataka  Deepak
1      Tata     Nexon   India  Tamil Nadu    Amit
2   Hyundai     Creta   India   New Delhi   Rahul
3    Maruti    Brezza   India       Bihar  Saahil
4      Ford  Ecosport   India      Kerela    Anup

 

Similarly you can use str.lower to transform the Column header format to lowercase

 

Rename columns using read_csv with names

names parameter in read_csv function is used to define column names. If you pass extra name in this list, it will add another new column with that name with new values.
Use header = 0 to remove the first header from the output

In this example we define a new list new_colums and store the new column name. make sure that length of new list is same as the existing one in your input CSV or List

#!/usr/bin/env python3

import pandas as pd

# Define new list with the new column names
# The length of the new list must be same as existing column length
new_columns = ['Brand', 'SUV', 'Country', 'State', 'Owner']

# Pass the filename to the dataset to read_csv
# and assign the resulting DataFrame to variable
df = pd.read_csv('/root/cars.csv', names = new_columns, header = 0 )

# print the output
print(df)

Output from this script:

# python3 /tmp/dataframe_ex.py
      Brand       SUV Country       State   Owner
0  Mahindra    XUV300   India   Karnataka  Deepak
1      Tata     Nexon   India  Tamil Nadu    Amit
2   Hyundai     Creta   India   New Delhi   Rahul
3    Maruti    Brezza   India       Bihar  Saahil
4      Ford  Ecosport   India      Kerela    Anup

 

Re-assign column attributes using tolist()

  • tolist() is used to convert series to list.
  • It is possible to reassign the column attributes directly to a Python list.
  • This assignment works when the list has the same number of elements as the row and column labels.
  • The following code uses the tolist method on each Column object to create a Python list of labels.
  • It then modifies a couple values in the list and reassigns the list to the columns attributes:
#!/usr/bin/env python3

import pandas as pd

# Pass the filename to the dataset to read_csv
# and assign the resulting DataFrame to variable
df = pd.read_csv('/root/cars.csv')
# Store the column header and converts to list
columns = df.columns
column_list = columns.tolist()

# Rename column labels with list assignments
column_list[0] = 'Brand'
column_list[1] = 'SUV'

# Reassign the new column values
df.columns = column_list

# print the output
print(df)

The output from this script:

# python3 /tmp/dataframe_ex.py
      Brand       SUV Country       State   Owner
0  Mahindra    XUV300   India   Karnataka  Deepak
1      Tata     Nexon   India  Tamil Nadu    Amit
2   Hyundai     Creta   India   New Delhi   Rahul
3    Maruti    Brezza   India       Bihar  Saahil
4      Ford  Ecosport   India      Kerela    Anup

 

Define new Column List using Panda DataFrame

I would not call this as rename instead you can define a new Column List and replace the existing one using columns attribute of the dataframe object. But make sure the length of new column list is same as the one which you are replacing. This is similar to what we did in Method 3 using read_csv with names attribute

#!/usr/bin/env python3

import pandas as pd

# Pass the filename to the dataset to read_csv
# and assign the resulting DataFrame to variable
df = pd.read_csv('/root/cars.csv')

# Print the BEFORE content
print(df.columns)

# Rename the Column List
df.columns = ['Brand', 'SUV', 'Country', 'State', 'Owner']

# Print the AFTER content
print(df.columns)

The output from this script:

# python3 /tmp/dataframe_ex.py
Index(['Company', 'Car', 'Country', 'State', 'Owner'], dtype='object')
Index(['Brand', 'SUV', 'Country', 'State', 'Owner'], dtype='object')

 

Conclusion

In this tutorial we learned about different methods to rename column values using Python Panda DataFrame function. I have used CSV and read_csv for all the examples but you can also import and type of object to dataframe using pd.DataFrame(var) and then process it to rename the column. You can also replace row values using index property which we will learn in separate chapter.

Lastly I hope this Python tutorial to rename column values using Panda Dataframe was helpful. So, let me know your suggestions and feedback using the comment section.

 

References

I have used below external references for this tutorial guide
DataFrame.rename()

Leave a Comment

Please use shortcodes <pre class=comments>your code</pre> for syntax highlighting when adding code.