Different methods to select multiple columns in pandas DataFrame
In this tutorial we will discuss how to select multiple columns using the following methods:
- Using column name with
[]
- Using
columns
method - Using
loc[]
function - Using
iloc[]
function - Using
drop()
method
Create pandas DataFrame with example data
DataFrame is a data structure used to store the data in two dimensional format. It is similar to table that stores the data in rows and columns. Rows represents the records/ tuples and columns refers to the attributes.
We can create the DataFrame by using pandas.DataFrame() method.
Syntax:
pandas.DataFrame(input_data,columns,index)
Parameters:
It will take mainly three parameters
input_data
is represents a list of datacolumns
represent the columns names for the dataindex
represent the row numbers/values
We can also create a DataFrame using dictionary by skipping columns and indices.
Example: Python Program to create a dataframe for market data from a dictionary of food items by specifying the column names.
# import the module
import pandas
# consider the food data
food_input={'id':['foo-23','foo-13','foo-02','foo-31'],
'name':['ground-nut oil','almonds','flour','cereals'],
'cost':[567.00,562.56,67.00,76.09],
'quantity':[1,2,3,2]}
# pass this food to the dataframe by specifying rows
dataframe=pandas.DataFrame(food_input,index = ['item-1', 'item-2', 'item-3', 'item-4'])
# display the dataframe
print(dataframe)
Output:
id name cost quantity
item-1 foo-23 ground-nut oil 567.00 1
item-2 foo-13 almonds 562.56 2
item-3 foo-02 flour 67.00 3
item-4 foo-31 cereals 76.09 2
Method 1 : Select multiple columns using column name with []
In this method we are going to select the columns using []
with dataframe column name. we have to use [[]]
(double) to select multiple columns.
It will display the column name along with rows present in the column
Syntax:
dataframe.[['column',.......,'column']]
where,
- dataframe is the input dataframe
- column is the column name
Example1 : Python program to select id and name column
# import the module
import pandas
# consider the food data
food_input={'id':['foo-23','foo-13','foo-02','foo-31'],
'name':['ground-nut oil','almonds','flour','cereals'],
'cost':[567.00,562.56,67.00,76.09],
'quantity':[1,2,3,2]}
# pass this food to the dataframe by specifying rows
dataframe=pandas.DataFrame(food_input,index = ['item-1', 'item-2', 'item-3', 'item-4'])
# display id and name columns from the dataframe
print(dataframe[['id','name']])
Output:
id name
item-1 foo-23 ground-nut oil
item-2 foo-13 almonds
item-3 foo-02 flour
item-4 foo-31 cereals
Example 2: Python program to get the select id, cost and quantity columns
# import the module
import pandas
# consider the food data
food_input={'id':['foo-23','foo-13','foo-02','foo-31'],
'name':['ground-nut oil','almonds','flour','cereals'],
'cost':[567.00,562.56,67.00,76.09],
'quantity':[1,2,3,2]}
# pass this food to the dataframe by specifying rows
dataframe=pandas.DataFrame(food_input,index = ['item-1', 'item-2', 'item-3', 'item-4'])
# display id ,cost and quantity columns from the dataframe
print(dataframe[['id','cost','quantity']])
Output:
id cost quantity
item-1 foo-23 567.00 1
item-2 foo-13 562.56 2
item-3 foo-02 67.00 3
item-4 foo-31 76.09 2
Method 2 : Select multiple columns using columns method
columns()
method is used to return columns from the pandas dataframe, To get multiple columns we have to provide column index values range through slice operator. Indexing starts with 0
.
Syntax:
dataframe[dataframe.columns[start_index:stop_index]]
where,
- dataframe is the input dataframe
columns
is the methodstart_index
refers to the starting index columnend_index
refers to the ending index column
Example 1: Python program to select name, cost and quantity columns
# import the module
import pandas
# consider the food data
food_input={'id':['foo-23','foo-13','foo-02','foo-31'],
'name':['ground-nut oil','almonds','flour','cereals'],
'cost':[567.00,562.56,67.00,76.09],
'quantity':[1,2,3,2]}
# pass this food to the dataframe by specifying rows
dataframe=pandas.DataFrame(food_input,index = ['item-1', 'item-2', 'item-3', 'item-4'])
# display name ,cost and quantity columns from the dataframe
print(dataframe[dataframe.columns[1:4]])
Output:
name cost quantity
item-1 ground-nut oil 567.00 1
item-2 almonds 562.56 2
item-3 flour 67.00 3
item-4 cereals 76.09 2
Example 2: Python program to select name and cost columns
# import the module
import pandas
# consider the food data
food_input={'id':['foo-23','foo-13','foo-02','foo-31'],
'name':['ground-nut oil','almonds','flour','cereals'],
'cost':[567.00,562.56,67.00,76.09],
'quantity':[1,2,3,2]}
# pass this food to the dataframe by specifying rows
dataframe=pandas.DataFrame(food_input,index = ['item-1', 'item-2', 'item-3', 'item-4'])
# display name ,cost columns from the dataframe
print(dataframe[dataframe.columns[1:3]])
Output:
name cost
item-1 ground-nut oil 567.00
item-2 almonds 562.56
item-3 flour 67.00
item-4 cereals 76.09
Method 3 : Select multiple columns using loc[] function
Here we are going to use loc[]
function to select multiple columns.
We need to specify the column names to be selected inside loc[]
function.
Syntax:
dataframe.loc[:,['column',........,'column']]
where,
- dataframe is the input dataframe
- column refers to the column names
:
operator is used to select all rows from the column
Example 1: Python program to select name, cost and quantity columns.
# import the module
import pandas
# consider the food data
food_input={'id':['foo-23','foo-13','foo-02','foo-31'],
'name':['ground-nut oil','almonds','flour','cereals'],
'cost':[567.00,562.56,67.00,76.09],
'quantity':[1,2,3,2]}
# pass this food to the dataframe by specifying rows
dataframe=pandas.DataFrame(food_input,index = ['item-1', 'item-2', 'item-3', 'item-4'])
# display name ,cost and quantity columns from the dataframe
print(dataframe.loc[:,['name','cost','quantity']])
Output:
name cost quantity
item-1 ground-nut oil 567.00 1
item-2 almonds 562.56 2
item-3 flour 67.00 3
item-4 cereals 76.09 2
Example 2: Python program to select name and cost columns
# import the module
import pandas
# consider the food data
food_input={'id':['foo-23','foo-13','foo-02','foo-31'],
'name':['ground-nut oil','almonds','flour','cereals'],
'cost':[567.00,562.56,67.00,76.09],
'quantity':[1,2,3,2]}
# pass this food to the dataframe by specifying rows
dataframe=pandas.DataFrame(food_input,index = ['item-1', 'item-2', 'item-3', 'item-4'])
# display name ,cost columns from the dataframe
print(dataframe.loc[:,['name','cost']])
Output:
name cost
item-1 ground-nut oil 567.00
item-2 almonds 562.56
item-3 flour 67.00
item-4 cereals 76.09
Method 4 : Select multiple columns using iloc[] function
Here we are going to use iloc[]
function to select multiple columns.
We need to specify the column indices to be selected inside iloc[]
function.
Syntax:
dataframe.loc[:,['start_column_index':'end_column_index']]
where,
- dataframe is the input dataframe
start_column_index
refers to the starting columnend_column_index
refers to the ending column:
operator is used to select all rows from the column
Example 1: Python program to select name, cost and quantity columns.
# import the module
import pandas
# consider the food data
food_input={'id':['foo-23','foo-13','foo-02','foo-31'],
'name':['ground-nut oil','almonds','flour','cereals'],
'cost':[567.00,562.56,67.00,76.09],
'quantity':[1,2,3,2]}
# pass this food to the dataframe by specifying rows
dataframe=pandas.DataFrame(food_input,index = ['item-1', 'item-2', 'item-3', 'item-4'])
# display name ,cost and quantity columns from the dataframe
print(dataframe.iloc[:,1:4])
Output:
name cost quantity
item-1 ground-nut oil 567.00 1
item-2 almonds 562.56 2
item-3 flour 67.00 3
item-4 cereals 76.09 2
Example 2: Python program to select name and cost columns
# import the module
import pandas
# consider the food data
food_input={'id':['foo-23','foo-13','foo-02','foo-31'],
'name':['ground-nut oil','almonds','flour','cereals'],
'cost':[567.00,562.56,67.00,76.09],
'quantity':[1,2,3,2]}
# pass this food to the dataframe by specifying rows
dataframe=pandas.DataFrame(food_input,index = ['item-1', 'item-2', 'item-3', 'item-4'])
# display name ,cost columns from the dataframe
print(dataframe.iloc[:,1:3])
Output:
name cost
item-1 ground-nut oil 567.00
item-2 almonds 562.56
item-3 flour 67.00
item-4 cereals 76.09
Method 5 : Select multiple columns using drop() method
Here we are going to remove/drop unwanted columns to be displayed by using drop()
. with this we can select multiple columns from the dataframe.
Syntax:
dataframe.drop(['column'],axis=1)
where,
- dataframe is the input dataframe
- column refers to the column name to be dropped
axis=1
refers to the column
Example 1: Python program to select name, cost and quantity columns by dropping id columns.
# import the module
import pandas
# consider the food data
food_input={'id':['foo-23','foo-13','foo-02','foo-31'],
'name':['ground-nut oil','almonds','flour','cereals'],
'cost':[567.00,562.56,67.00,76.09],
'quantity':[1,2,3,2]}
# pass this food to the dataframe by specifying rows
dataframe=pandas.DataFrame(food_input,index = ['item-1', 'item-2', 'item-3', 'item-4'])
# display name ,cost and quantity columns from the dataframe
print(dataframe.drop(['id'],axis=1))
Output:
name cost quantity
item-1 ground-nut oil 567.00 1
item-2 almonds 562.56 2
item-3 flour 67.00 3
item-4 cereals 76.09 2
Example 2: Python program to select name and cost columns by dropping id and quantity columns.
# import the module
import pandas
# consider the food data
food_input={'id':['foo-23','foo-13','foo-02','foo-31'],
'name':['ground-nut oil','almonds','flour','cereals'],
'cost':[567.00,562.56,67.00,76.09],
'quantity':[1,2,3,2]}
# pass this food to the dataframe by specifying rows
dataframe=pandas.DataFrame(food_input,index = ['item-1', 'item-2', 'item-3', 'item-4'])
# display name ,cost and quantity columns from the dataframe
print(dataframe.drop(['id','quantity'],axis=1))
Output:
name cost
item-1 ground-nut oil 567.00
item-2 almonds 562.56
item-3 flour 67.00
item-4 cereals 76.09
Summary
In this tutorial we discussed how to select multiple column using loc, iloc[], [], columns
and drop()
methods. We observed that drop()
has an advantage to select multiple columns by dropping unwanted columns.
References