Different methods to select columns in pandas DataFrame
In this tutorial we will discuss how to select single columns using the following methods:
- Select column using column name with
"." operator
- Select column using column name with
[]
- Get all column names using
columns
method - Get all the columns information using
info()
method - Describe the column statistics using
describe()
method - Select
particular value
in a column
Create pandas DataFrame with example data
DataFrame is a data structure used to store the data in two dimensional format. It is similar to table that stores the data in rows and columns. Rows represents the records/ tuples and columns refers to the attributes.
We can create the DataFrame by using pandas.DataFrame() method.
Syntax:
pandas.DataFrame(input_data,columns,index)
Parameters:
It will take mainly three parameters
- input_data is represents a list of data
columns
represent the columns names for the dataindex
represent the row numbers/values
We can also create a DataFrame using dictionary by skipping columns and indices.
Let’s see an example.
Example:
Python Program to create a dataframe for market data from a dictionary of food items by specifying the column names.
#import the module
import pandas
#consider the food data
food_input={'id':['foo-23','foo-13','foo-02','foo-31'],
'name':['ground-nut oil','almonds','flour','cereals'],
'cost':[567.00,562.56,67.00,76.09],
'quantity':[1,2,3,2]}
#pass this food to the dataframe by specifying rows
dataframe=pandas.DataFrame(food_input,index = ['item-1', 'item-2', 'item-3', 'item-4'])
#display the dataframe
print(dataframe)
Output:
id name cost quantity
item-1 foo-23 ground-nut oil 567.00 1
item-2 foo-13 almonds 562.56 2
item-3 foo-02 flour 67.00 3
item-4 foo-31 cereals 76.09 2
Method 1 : Select column using column name with "." operator
In this method we are going to select the columns using . operator
with dataframe column name
It will display the column name along with rows present in the column
Syntax:
dataframe.column
where,
- dataframe is the input dataframe
- column is the column name
Example 1: In this example we are going to select id and name column
#import the module
import pandas
#consider the food data
food_input={'id':['foo-23','foo-13','foo-02','foo-31'],
'name':['ground-nut oil','almonds','flour','cereals'],
'cost':[567.00,562.56,67.00,76.09],
'quantity':[1,2,3,2]}
#pass this food to the dataframe by specifying rows
dataframe=pandas.DataFrame(food_input,index = ['item-1', 'item-2', 'item-3', 'item-4'])
#display id column from the dataframe
print(dataframe.id)
print()
#display name column from the dataframe
print(dataframe.name)
Output:
item-1 foo-23
item-2 foo-13
item-3 foo-02
item-4 foo-31
Name: id, dtype: object
item-1 ground-nut oil
item-2 almonds
item-3 flour
item-4 cereals
Name: name, dtype: object
Example 2: In this example we are going to select cost and quantity column
#import the module
import pandas
#consider the food data
food_input={'id':['foo-23','foo-13','foo-02','foo-31'],
'name':['ground-nut oil','almonds','flour','cereals'],
'cost':[567.00,562.56,67.00,76.09],
'quantity':[1,2,3,2]}
#pass this food to the dataframe by specifying rows
dataframe=pandas.DataFrame(food_input,index = ['item-1', 'item-2', 'item-3', 'item-4'])
#display cost column from the dataframe
print(dataframe.cost)
print()
#display quantity column from the dataframe
print(dataframe.quantity)
Output:
item-1 567.00
item-2 562.56
item-3 67.00
item-4 76.09
Name: cost, dtype: float64
item-1 1
item-2 2
item-3 3
item-4 2
Name: quantity, dtype: int64
Method 2 : Select column using column name with []
In this method we are going to select the columns using []
with dataframe column name
It will display the column name along with rows present in the column
Syntax:
dataframe.['column']
where,
- dataframe is the input dataframe
- column is the column name
Example 1: In this example we are going to select id and name column
#import the module
import pandas
#consider the food data
food_input={'id':['foo-23','foo-13','foo-02','foo-31'],
'name':['ground-nut oil','almonds','flour','cereals'],
'cost':[567.00,562.56,67.00,76.09],
'quantity':[1,2,3,2]}
#pass this food to the dataframe by specifying rows
dataframe=pandas.DataFrame(food_input,index = ['item-1', 'item-2', 'item-3', 'item-4'])
#display id column from the dataframe
print(dataframe['id'])
print()
#display name column from the dataframe
print(dataframe['name'])
Output:
item-1 foo-23
item-2 foo-13
item-3 foo-02
item-4 foo-31
Name: id, dtype: object
item-1 ground-nut oil
item-2 almonds
item-3 flour
item-4 cereals
Name: name, dtype: object
Example 2: In this example we are going to select cost and quantity column
#import the module
import pandas
#consider the food data
food_input={'id':['foo-23','foo-13','foo-02','foo-31'],
'name':['ground-nut oil','almonds','flour','cereals'],
'cost':[567.00,562.56,67.00,76.09],
'quantity':[1,2,3,2]}
#pass this food to the dataframe by specifying rows
dataframe=pandas.DataFrame(food_input,index = ['item-1', 'item-2', 'item-3', 'item-4'])
#display cost column from the dataframe
print(dataframe['cost'])
print()
#display quantity column from the dataframe
print(dataframe['quantity'])
Output:
item-1 567.00
item-2 562.56
item-3 67.00
item-4 76.09
Name: cost, dtype: float64
item-1 1
item-2 2
item-3 3
item-4 2
Name: quantity, dtype: int64
Method 3 : Get all column names using columns method
In this method we are going to select only the name of all columns using columns
method
Syntax:
dataframe.columns
where, dataframe is the input dataframe
Example:
#import the module
import pandas
#consider the food data
food_input={'id':['foo-23','foo-13','foo-02','foo-31'],
'name':['ground-nut oil','almonds','flour','cereals'],
'cost':[567.00,562.56,67.00,76.09],
'quantity':[1,2,3,2]}
#pass this food to the dataframe by specifying rows
dataframe=pandas.DataFrame(food_input,index = ['item-1', 'item-2', 'item-3', 'item-4'])
#display all columns
print(dataframe.columns)
Output:
Index(['id', 'name', 'cost', 'quantity'], dtype='object')
Method 4 : Get all the columns information using info() method
We will get the column data types , total number of Non - null values in each column from the dataframe using info()
method.
Syntax:
dataframe.info()
Example:
#import the module
import pandas
#consider the food data
food_input={'id':['foo-23','foo-13','foo-02','foo-31'],
'name':['ground-nut oil','almonds','flour','cereals'],
'cost':[567.00,562.56,67.00,76.09],
'quantity':[1,2,3,2]}
#pass this food to the dataframe by specifying rows
dataframe=pandas.DataFrame(food_input,index = ['item-1', 'item-2', 'item-3', 'item-4'])
#display all columns information
print(dataframe.info())
Output:
<class 'pandas.core.frame.DataFrame'>
Index: 4 entries, item-1 to item-4
Data columns (total 4 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 id 4 non-null object
1 name 4 non-null object
2 cost 4 non-null float64
3 quantity 4 non-null int64
dtypes: float64(1), int64(1), object(2)
memory usage: 160.0+ bytes
None
Method 5 : Describe the column statistics using describe() method
This method will return count, minimum value, maximum value, standard deviation, etc
from all the columns.
Syntax:
dataframe.describe()
Example: Get the statistics from the above dataframe columns
#import the module
import pandas
#consider the food data
food_input={'id':['foo-23','foo-13','foo-02','foo-31'],
'name':['ground-nut oil','almonds','flour','cereals'],
'cost':[567.00,562.56,67.00,76.09],
'quantity':[1,2,3,2]}
#pass this food to the dataframe by specifying rows
dataframe=pandas.DataFrame(food_input,index = ['item-1', 'item-2', 'item-3', 'item-4'])
#display all columns statistics information
print(dataframe.describe())
Output:
cost quantity
count 4.000000 4.000000
mean 318.162500 2.000000
std 284.799307 0.816497
min 67.000000 1.000000
25% 73.817500 1.750000
50% 319.325000 2.000000
75% 563.670000 2.250000
max 567.000000 3.000000
Method 6 : Select particular value in a column
Here , we are going to select particular value in a column using above methods. Note - value position starts from 0
. using this position we can select value from the selected column.
Syntax:
dataframe['column'][position]
where,
- column is the column name
position
refers to the value position
Example:
#import the module
import pandas
#consider the food data
food_input={'id':['foo-23','foo-13','foo-02','foo-31'],
'name':['ground-nut oil','almonds','flour','cereals'],
'cost':[567.00,562.56,67.00,76.09],
'quantity':[1,2,3,2]}
#pass this food to the dataframe by specifying rows
dataframe=pandas.DataFrame(food_input,index = ['item-1', 'item-2', 'item-3', 'item-4'])
#display cost column first value from the dataframe
print(dataframe['cost'][0])
#display quantity column second value from the dataframe
print(dataframe['quantity'][1])
#display id column first value from the dataframe
print(dataframe['id'][0])
#display name column second value from the dataframe
print(dataframe['name'][1])
#display id column third value from the dataframe
print(dataframe['id'][2])
Output:
567.0
2
foo-23
almonds
foo-02
Example: We can also get using . operator
#import the module
import pandas
#consider the food data
food_input={'id':['foo-23','foo-13','foo-02','foo-31'],
'name':['ground-nut oil','almonds','flour','cereals'],
'cost':[567.00,562.56,67.00,76.09],
'quantity':[1,2,3,2]}
#pass this food to the dataframe by specifying rows
dataframe=pandas.DataFrame(food_input,index = ['item-1', 'item-2', 'item-3', 'item-4'])
#display cost column first value from the dataframe
print(dataframe.cost[0])
#display quantity column second value from the dataframe
print(dataframe.quantity[1])
#display id column first value from the dataframe
print(dataframe.id[0])
#display name column second value from the dataframe
print(dataframe.name[1])
#display id column third value from the dataframe
print(dataframe.id[2])
Output:
567.0
2
foo-23
almonds
foo-02
Summary
In this tutorial we discussed how to select particular columns from the dataframe and we also discussed how to get the column names with the information like data types , statistics etc.
Finally, we have also seen how to get particular value
from the selected column using previous methods.
Further Reading