Table of Contents
Pandas is an open source Python library for data analysis. It gives Python the ability to work with spreadsheet-like data for fast data loading, manipulating, aligning, and merging, among other functions.
To give Python these enhanced features, Pandas introduces two new data types to Python:
- Series: It represents single column of the DataFrame
- DataFrame: It represents your entire spreadsheet or rectangular data
A Pandas DataFrame can also be thought of as a dictionary or collection of Series objects.
Install Python Panda Module
It is possible by default panda module is not installed on your Linux environment
ModuleNotFoundError: No module named 'pandas'
So you must first install this module. Since I am using RHEL 8, I will use dnf
# dnf -y install python3-pandas.x86_64
libqhull.so.7()(64bit)
needed by python3-matplotlib-3.0.3-3.el8.x86_64
error while installing python3-panda
. To overcome this we need to enable "codeready-builder
" repo using subscription-manager repos --enable "codeready-builder-for-rhel-8-{ARCH}-rpms"
. Here, replace {ARCH}
with your architecture valueAlternatively you can install pip3 and then use pip to install panda module:
# dnf -y install python3-pip
Now you can use pip3 to install the panda module:
Loading your dataset
- When given a data set, we first load it and begin looking at its structure and contents. The simplest way of looking at a data set is to examine and subset specific rows and columns
- Since Pandas is not part of the Python standard library, we have to first tell Python to load (import) the library.
- When working with Pandas functions, it is common practice to give pandas the alias pd
import pandas as pd
- With the library loaded, we can use the
read_csv
function to load a CSV data file. To access theread_csv
function from Pandas, we use dot notation. - I have created a sample csv file (
cars.csv
) for this tutorial (separated by comma char), by default theread_csv
function will read a comma-separated file:
#!/usr/bin/env python3 import pandas as pd # Pass the filename to the dataset to read_csv # and assign the resulting DataFrame to variable df = pd.read_csv('/root/cars.csv') # Print the content of cars.csv print(df)
Output from this script:
# python3 /tmp/dataframe_ex.py
Company Car Country State Owner
0 Mahindra XUV300 India Karnataka Deepak
1 Tata Nexon India Tamil Nadu Amit
2 Hyundai Creta India New Delhi Rahul
3 Maruti Brezza India Bihar Saahil
4 Ford Ecosport India Kerela Anup
pandas.DataFrame.rename
- The rename DataFrame method accepts dictionaries that map the old value to the new value.
- We can use this function to rename single or multiple columns in Python DataFrame
- The
rename()
method provides aninplace
named parameter that is by default False and copies the underlying data. - We can pass
inplace=True
to rename the data in place.
Syntax:
DataFrame.rename(self, mapper=None, index=None, columns=None, axis=None, copy=True, inplace=False, level=None, errors='ignore')
The provided values in Syntax are default values unless provided other wise. This will return DataFrame with the renamed axis labels.
Parameters:
mapper: dict-like or function
Dict-like or functions transformations to apply to that axis’ values. Use either mapper and axis to specify the axis to target with mapper, or index and columns.
<code>
index: dict-like or function
Alternative to specifying axis (mapper, axis=0 is equivalent to index=mapper).
<code>
columns: dict-like or function
Alternative to specifying axis (mapper, axis=1 is equivalent to columns=mapper).
axis: int or str
Axis to target with mapper. Can be either the axis name (‘index’, ‘columns’) or number (0, 1). The default is ‘index’.
copy: bool, default True
Also copy underlying data.
inplace: bool, default False
Whether to return a new DataFrame. If True then value of copy is ignored.
<code>
level: int or level name, default None
In case of a MultiIndex, only rename labels in the specified level.
errors: {‘ignore’, ‘raise’}, default ‘ignore’
Method 1: Using column label
Pandas rename single column
The syntax to replace single column within the same file would be:
pandas.DataFrame.rename(columns = {'<old_value>':'<new_value>'}, inplace = True/False)
In this example we will rename Column name "Company" with "Brand"
#!/usr/bin/env python3 import pandas as pd # Pass the filename to the dataset to read_csv # and assign the resulting DataFrame to variable df = pd.read_csv('/root/cars.csv') # Rename Single Column df.rename(columns = {'Company':'Brand'}, inplace = True) # print the output print(df)
Execute the script and you can check that our inplace rename for column was successful:
# python3 /tmp/dataframe_ex.py
Brand Car Country State Owner
0 Mahindra XUV300 India Karnataka Deepak
1 Tata Nexon India Tamil Nadu Amit
2 Hyundai Creta India New Delhi Rahul
3 Maruti Brezza India Bihar Saahil
4 Ford Ecosport India Kerela Anup
Pandas rename multiple columns
To rename multiple columns inplace we will use below syntax:
pandas.DataFrame.rename(columns = {'<old_value_1>':'<new_value_1>', '<old_value_2>':'<new_value_2>'}, inplace = True/False)
We just need to add <old_value>:<new_value> section within the curl braces for columns to replace the same at multiple places
In this example we replace column name Company with Brand and Car with SUV:
#!/usr/bin/env python3 import pandas as pd # Pass the filename to the dataset to read_csv # and assign the resulting DataFrame to variable df = pd.read_csv('/root/cars.csv') # Rename Multiple Columns df.rename(columns = {'Company':'Brand', 'Car':'SUV'}, inplace = True) # print the output print(df)
Output from this script confirms the replacement was successful:
# python3 /tmp/dataframe_ex.py
Brand SUV Country State Owner
0 Mahindra XUV300 India Karnataka Deepak
1 Tata Nexon India Tamil Nadu Amit
2 Hyundai Creta India New Delhi Rahul
3 Maruti Brezza India Bihar Saahil
4 Ford Ecosport India Kerela Anup
Method 2: Using axis-style
DataFrame.rename()
also supports an “axis-style” calling convention, where you specify a single mapper and the axis to apply that mapping to.
The syntax to use this method would be:
pandas.DataFrame.rename({'<old_value_1>':'<new_value_1>', '<old_value_2>':'<new_value_2>'}, axis='columns', inplace = True/False)
In this sample python script I will replace two column values from Company
to Brand
and Car
to SUV
#!/usr/bin/env python3 import pandas as pd # Pass the filename to the dataset to read_csv # and assign the resulting DataFrame to variable df = pd.read_csv('/root/cars.csv') # Rename Multiple Columns df.rename({'Company': 'Brand', 'Car':'SUV'}, axis='columns', inplace = True) # print the output print(df)
Output from this script:
# python3 /tmp/dataframe_ex.py
Brand SUV Country State Owner
0 Mahindra XUV300 India Karnataka Deepak
1 Tata Nexon India Tamil Nadu Amit
2 Hyundai Creta India New Delhi Rahul
3 Maruti Brezza India Bihar Saahil
4 Ford Ecosport India Kerela Anup
Some more examples:
In this example we will change all headers to Uppercase character using str.upper
with rename()
function
#!/usr/bin/env python3 import pandas as pd # Pass the filename to the dataset to read_csv # and assign the resulting DataFrame to variable df = pd.read_csv('/root/cars.csv') # Rename Multiple Columns df.rename(str.upper, axis='columns', inplace = True) # print the output print(df)
Output from this script:
# python3 /tmp/dataframe_ex.py
COMPANY CAR COUNTRY STATE OWNER
0 Mahindra XUV300 India Karnataka Deepak
1 Tata Nexon India Tamil Nadu Amit
2 Hyundai Creta India New Delhi Rahul
3 Maruti Brezza India Bihar Saahil
4 Ford Ecosport India Kerela Anup
Similarly you can use str.lower
to transform the Column header format to lowercase
Pandas rename columns using read_csv with names
names
parameter in read_csv
function is used to define column names. If you pass extra name in this list, it will add another new column with that name with new values.
Use header = 0
to remove the first header from the output
In this example we define a new list new_colums
and store the new column name. make sure that length of new list is same as the existing one in your input CSV or List
#!/usr/bin/env python3 import pandas as pd # Define new list with the new column names # The length of the new list must be same as existing column length new_columns = ['Brand', 'SUV', 'Country', 'State', 'Owner'] # Pass the filename to the dataset to read_csv # and assign the resulting DataFrame to variable df = pd.read_csv('/root/cars.csv', names = new_columns, header = 0 ) # print the output print(df)
Output from this script:
# python3 /tmp/dataframe_ex.py
Brand SUV Country State Owner
0 Mahindra XUV300 India Karnataka Deepak
1 Tata Nexon India Tamil Nadu Amit
2 Hyundai Creta India New Delhi Rahul
3 Maruti Brezza India Bihar Saahil
4 Ford Ecosport India Kerela Anup
Re-assign column attributes using tolist()
tolist()
is used to convert series to list.- It is possible to reassign the column attributes directly to a Python list.
- This assignment works when the list has the same number of elements as the row and column labels.
- The following code uses the
tolist
method on each Column object to create a Python list of labels. - It then modifies a couple values in the list and reassigns the list to the columns attributes:
#!/usr/bin/env python3 import pandas as pd # Pass the filename to the dataset to read_csv # and assign the resulting DataFrame to variable df = pd.read_csv('/root/cars.csv') # Store the column header and converts to list columns = df.columns column_list = columns.tolist() # Rename column labels with list assignments column_list[0] = 'Brand' column_list[1] = 'SUV' # Reassign the new column values df.columns = column_list # print the output print(df)
The output from this script:
# python3 /tmp/dataframe_ex.py
Brand SUV Country State Owner
0 Mahindra XUV300 India Karnataka Deepak
1 Tata Nexon India Tamil Nadu Amit
2 Hyundai Creta India New Delhi Rahul
3 Maruti Brezza India Bihar Saahil
4 Ford Ecosport India Kerela Anup
Define new Column List using Panda DataFrame
I would not call this as rename instead you can define a new Column List and replace the existing one using columns attribute of the dataframe object. But make sure the length of new column list is same as the one which you are replacing. This is similar to what we did in Method 3 using read_csv
with names attribute
#!/usr/bin/env python3 import pandas as pd # Pass the filename to the dataset to read_csv # and assign the resulting DataFrame to variable df = pd.read_csv('/root/cars.csv') # Print the BEFORE content print(df.columns) # Rename the Column List df.columns = ['Brand', 'SUV', 'Country', 'State', 'Owner'] # Print the AFTER content print(df.columns)
The output from this script:
# python3 /tmp/dataframe_ex.py
Index(['Company', 'Car', 'Country', 'State', 'Owner'], dtype='object')
Index(['Brand', 'SUV', 'Country', 'State', 'Owner'], dtype='object')
Conclusion
In this tutorial we learned about different methods to rename column values using Python Panda DataFrame function. I have used CSV and read_csv
for all the examples but you can also import and type of object to dataframe using pd.DataFrame(var)
and then process it to rename the column. You can also replace row values using index property which we will learn in separate chapter.
Lastly I hope this Python tutorial to rename column values using Panda Dataframe was helpful. So, let me know your suggestions and feedback using the comment section.
References
I have used below external references for this tutorial guide
DataFrame.rename()