Related Searches: pandas set index, pandas dataframe index, pandas set_index, dataframe set index, set_index, set a column as index pandas, set_index pandas, pandas set column as index, pandas dataframe set index, pandas set index to column, df set index, df set_index, set column as index pandas, python set index, pandas change index, pandas set, make index a column pandas, pandas index column, index dataframe, pandas convert index to column, change index pandas, index to column pandas, pd set index, dataframe set_index, index column pandas, change index to column pandas, how to set index in pandas, set index python, add index to dataframe
Introduction to Pandas set index method
Python pandas is one of the most popular and downloaded modules of Python. It is well known for its various and powerful features. Today, in this tutorial we will learn about pandas set index features. Pandas set index is a method to set a List, dataframe, or Series as an index of a Dataframe. We will learn about the basic syntax, and explain different scenarios with examples. Moreover, we will also cover different types of setting indexes in pandas. By the end of this tutorial, you will have a solid knowledge of the pandas set index method.
Install Python Panda Module
By default pandas module may not be installed on your distro so you may get following error while trying to execute your python code using pandas module:
ModuleNotFoundError: No module named 'pandas'
So you must first install this module. One RHEL/Rocky Linux/CentOS/Fedora distributions you can use dnf
while for Debia/Ubuntu you can use apt
package manager:
# dnf -y install python3-pandas.x86_64
libqhull.so.7()(64bit)
needed by python3-matplotlib-3.0.3-3.el8.x86_64
error while installing python3-panda
. To overcome this we need to enable "codeready-builder
" repo using subscription-manager repos --enable "codeready-builder-for-rhel-8-{ARCH}-rpms"
. Here, replace {ARCH}
with your architecture valueAlternatively you can install pip3 and then use pip to install panda module:
# dnf -y install python3-pip
Now you can use pip3 to install the panda module:
Getting start with pandas set index method
As we mentioned in the above section, the pandas set index method is used to set the list, dataframe, or series as an index of dataframe. The set index method takes keys, append, drop, inplace, and verify_intergrity as parameters and returns the data frame with index using one or more existing columns. Here is the simple syntax of the pandas set index method.
name_of_dataframe.set_index(keys, drop=True, append= False, inplace = False, verify_intergrity = False)
Parameters and their uses:
keys
: Column name or list of a column name.drop
: It’s a Boolean value that falls the column used for the index if True.append
: It appends the column to the existing index column if True.inplace
: It makes the changes in the DataFrame if True.verify_integrity
: It checks the new index column for duplicates if True.
Let us say we have the following data in datafame.csv
file.
student_id,Name,Gender,marks
5,Erlan,Male,34
10,Alex,Male,44
15,soro,Female,46
20,Khan,Male,33
25,ateeq,Male,28
30,MD,Female,49
In the following section, we will use this dataset to apply the pandas set index method.
Pandas set index using column
Setting an index using columns in pandas is very simple and easy. We can set a specific column or multiple columns as an index in the pandas dataframe. First, we have to create a list of column labels to be used to set an index. Then we need to pass the column or list of column labels as input to the pandas set index function to set it as the index of our dataframe.
See the following example which uses pandas set index using the column method.
# importing pandas
import pandas as pd
# reading csv file using pandas
my_dataframe = pd.read_csv("dataframe.csv")
# converting data into pandas dataframe
dataframe = pd.DataFrame(my_dataframe)
# printing dataframe before setting index
print("Before setting index\n{}".format(dataframe))
# set index using column
Student_name = dataframe.set_index('Name')
# printing dataframe after setting data frame
print("\nAfter setting index\n{}".format(Student_name))
Output:
In the example above, notice that we have set the name of students as an index. In a similar way, we can set multiple columns as indexes by passing a list of column labels.
Pandas set index using multiple columns
Now let us set the index using multiple columns. In the example, below we will pass a list of existing columns labels ‘Name’ and ‘marks’ to set a multi-index.
# importing pandas
import pandas as pd
# reading csv file using pandas
my_dataframe = pd.read_csv("dataframe.csv")
# converting data into pandas dataframe
dataframe = pd.DataFrame(my_dataframe)
# printing dataframe before setting index
print("Before setting index\n{}".format(dataframe))
# set index using column
Student_name = dataframe.set_index(['Name', 'marks'])
# printing dataframe after setting data frame
print("\nAfter setting index\n{}".format(Student_name))
Output:
Pandas set index using list and column
While working with different data frames, we might come across some situation where we want to create a two-level row index of the dataframe. One level with a new list of labels and another created from the existing column. We create two level indexing using the pandas set index method by combining a new list of labels and existing columns. See the example below to understand how the syntax looks like:
# importing pandas
import pandas as pd
# reading csv file using pandas
my_dataframe = pd.read_csv("dataframe.csv")
# converting data into pandas dataframe
dataframe = pd.DataFrame(my_dataframe)
# printing dataframe before setting index
print("Before setting index\n{}".format(dataframe))
# creating custom index labeling
custome_Index = pd.Index(["name1", "name2", "name3", "name4", "name5", "name6"])
# set index using list and a column
Student_name = dataframe.set_index([custome_Index ,'Name'])
# printing dataframe after setting data frame
print("\nAfter setting index\n{}".format(Student_name))
Output:
Pandas set index using python range
While working with data frames we might come across a situation where we need a sequence of numbers as an index of the data frame. For example in our case we might want to assign a roll number of all students starting from 1. One possible way is to pass all the numbers as a list to the set index method as we did in the previous section , but it is not recommended when we have a large set of data. In such cases we can create a pandas index using python range function and pass it to the set index method. See the example below:
# importing pandas
import pandas as pd
# reading csv file using pandas
my_dataframe = pd.read_csv("dataframe.csv")
# converting data into pandas dataframe
dataframe = pd.DataFrame(my_dataframe)
# printing dataframe before setting index
print("Before setting index\n{}".format(dataframe))
# creating custom range for indexing
range_Index = pd.Index(range(1, 13, 2))
# set index using range
Student_name = dataframe.set_index(range_Index)
# printing dataframe after setting data frame
print("\nAfter setting index\n{}".format(Student_name))
Output:
In the above example, create a range from 1 to 12 with a step size of two.
Pandas set index by column number
Sometimes we might want to set single or multiple columns as an index of our dataframe, but we might don’t know the column labels. In such a situation we can use the column number instead of column labels. We need to create a list of columns using column position and pass it to the set index function. See the example below:
# importing pandas
import pandas as pd
# reading csv file using pandas
my_dataframe = pd.read_csv("dataframe.csv")
# converting data into pandas dataframe
dataframe = pd.DataFrame(my_dataframe)
# printing dataframe before setting index
print("Before setting index\n{}".format(dataframe))
# creating column number
column_number = list(dataframe.columns[[0,3]])
#using column number in set index method
Student_name = dataframe.set_index(column_number)
# printing dataframe after setting data frame
print("\nAfter setting index\n{}".format(Student_name))
Output:
Notice that in the above example, it takes column numbers 1 and 4 and uses them as indexes.
Pandas set index without dropping column
When we use the set index method, it takes the column name as input which is used as an index and by default it drops the column. If we want to keep the column aftering setting it as index, we have to use drop parameter and set it to False. By default it is true. See the example below which sets the index without dropping the column.
# importing pandas
import pandas as pd
# reading csv file using pandas
my_dataframe = pd.read_csv("dataframe.csv")
# converting data into pandas dataframe
dataframe = pd.DataFrame(my_dataframe)
# creating column number
column_number = list(dataframe.columns[[0,3]])
#drop is True by default
Student_name = dataframe.set_index(column_number)
# printing with default drop
print("By default drop is true:\n{}".format(Student_name))
# setting drop to false
Student_names = dataframe.set_index(column_number, drop=False)
# printing with drop to be false
print("Drop to False\n{}".format(Student_names))
Output:
Notice that the columns that were used as an index were not dropped from the dataframe in because we set drop to False in the example
Pandas set index with inplace
So far in all the cases that we came across and applied the set index method, pandas created a new copy of the data frame because the modification was not in-place. Specifying inplace to True
means to set an index in the existing dataframe rather than creating a copy of it. See the example below which specifies inplace to be True
.
# importing pandas
import pandas as pd
# reading csv file using pandas
my_dataframe = pd.read_csv("dataframe.csv")
# converting data into pandas dataframe
dataframe = pd.DataFrame(my_dataframe)
# printing dataframe before setting index
print("Before setting index\n{}".format(dataframe))
# inplace to true
dataframe.set_index('Name', inplace=True)
print("after inplace\n{}".format(dataframe))
Output:
Notice that the changes have been applied to the original data.
Reset index method in pandas
So far we learned various ways to set indexes to our dataframe. However, pandas provides us a reset index method as well which rests the setted index. This method sets a list of integers ranging from 0
to the length as an index. See the example below:
# importing pandas
import pandas as pd
# reading csv file using pandas
my_dataframe = pd.read_csv("dataframe.csv")
# converting data into pandas dataframe
dataframe = pd.DataFrame(my_dataframe)
# creating column number
column_number = list(dataframe.columns[[0,3]])
# setting index method
Student_name = dataframe.set_index(column_number)
# printing with default drop
print("Setting index:\n{}".format(Student_name))
# resetting index
Student_names = Student_name.reset_index()
print("Resetting index:\n{}".format(Student_names))
Output:
Summary
Pandas set index is a built-in method in pandas that is used to set the list, Series or dataframe as an index of dataframe. Pandas set index method sets the dataframe index by utilizing the existing columns. In this tutorial as learned about the syntax and different ways to set index on pandas dataframe using various examples. We also covered how to reset indexes in pandas using examples.
Further Reading Section
Pandas set index method
Pandas set index documentation
set index in pandas