6 ways to select columns from pandas DataFrame

Different methods to select columns in pandas DataFrame

In this tutorial we will discuss how to select single columns  using the following methods:

  • Select column using column name with  "." operator
  • Select column using column name with  []
  • Get all column names using columns method
  • Get all the columns information using info() method
  • Describe the column statistics using describe() method
  • Select particular value in a column

 

Create pandas DataFrame with example data

DataFrame is a data structure used to store the data in two dimensional format. It is similar to table that stores the data in rows and columns. Rows represents the records/ tuples and columns refers to the attributes.

Advertisement

We can create the DataFrame by using pandas.DataFrame() method.

Syntax:

pandas.DataFrame(input_data,columns,index)

Parameters:

It will take mainly three parameters

  1. input_data is represents a list of data
  2. columns represent the columns names for the data
  3. index represent the row numbers/values

We can also create a DataFrame using dictionary by skipping columns and indices.

Let’s see an example.

Advertisement

Example:

Python Program to create a dataframe for market data from a dictionary of food items by specifying the column names.

#import the module
import pandas

#consider the food data
food_input={'id':['foo-23','foo-13','foo-02','foo-31'],
                  'name':['ground-nut oil','almonds','flour','cereals'],
                  'cost':[567.00,562.56,67.00,76.09],
                  'quantity':[1,2,3,2]}

#pass this food to the dataframe by specifying rows 
dataframe=pandas.DataFrame(food_input,index = ['item-1', 'item-2', 'item-3', 'item-4'])

#display the dataframe
print(dataframe)

Output:

            id            name    cost  quantity
item-1  foo-23  ground-nut oil  567.00         1
item-2  foo-13         almonds  562.56         2
item-3  foo-02           flour   67.00         3
item-4  foo-31         cereals   76.09         2

 

Method 1 : Select column using column name with  “.” operator

In this method we are going to select the columns using . operator with dataframe column name

It will display the column name along with rows present in the column

Syntax:

Advertisement
dataframe.column

where,

  1. dataframe is the input dataframe
  2. column is the column name

Example 1: In this example we are going to select id and name column

#import the module
import pandas

#consider the food data
food_input={'id':['foo-23','foo-13','foo-02','foo-31'],
                  'name':['ground-nut oil','almonds','flour','cereals'],
                  'cost':[567.00,562.56,67.00,76.09],
                  'quantity':[1,2,3,2]}

#pass this food to the dataframe by specifying rows 
dataframe=pandas.DataFrame(food_input,index = ['item-1', 'item-2', 'item-3', 'item-4'])

#display id column from the dataframe
print(dataframe.id)

print()

#display name column from the dataframe
print(dataframe.name)

Output:

item-1    foo-23
item-2    foo-13
item-3    foo-02
item-4    foo-31
Name: id, dtype: object

item-1    ground-nut oil
item-2           almonds
item-3             flour
item-4           cereals
Name: name, dtype: object

 

Example 2: In this example we are going to select cost and quantity column

#import the module
import pandas

#consider the food data
food_input={'id':['foo-23','foo-13','foo-02','foo-31'],
                  'name':['ground-nut oil','almonds','flour','cereals'],
                  'cost':[567.00,562.56,67.00,76.09],
                  'quantity':[1,2,3,2]}

#pass this food to the dataframe by specifying rows 
dataframe=pandas.DataFrame(food_input,index = ['item-1', 'item-2', 'item-3', 'item-4'])

#display cost column from the dataframe
print(dataframe.cost)

print()

#display quantity column from the dataframe
print(dataframe.quantity)

Output:

item-1    567.00
item-2    562.56
item-3     67.00
item-4     76.09
Name: cost, dtype: float64

item-1    1
item-2    2
item-3    3
item-4    2
Name: quantity, dtype: int64

 

Method 2 : Select column using column name with  []

In this method we are going to select the columns using [] with dataframe column name

Advertisement

It will display the column name along with rows present in the column

Syntax:

dataframe.['column']

where,

  1. dataframe is the input dataframe
  2. column is the column name

Example 1: In this example we are going to select id and name column

#import the module
import pandas

#consider the food data
food_input={'id':['foo-23','foo-13','foo-02','foo-31'],
                  'name':['ground-nut oil','almonds','flour','cereals'],
                  'cost':[567.00,562.56,67.00,76.09],
                  'quantity':[1,2,3,2]}

#pass this food to the dataframe by specifying rows 
dataframe=pandas.DataFrame(food_input,index = ['item-1', 'item-2', 'item-3', 'item-4'])

#display id column from the dataframe
print(dataframe['id'])

print()

#display name column from the dataframe
print(dataframe['name'])

Output:

item-1    foo-23
item-2    foo-13
item-3    foo-02
item-4    foo-31
Name: id, dtype: object

item-1    ground-nut oil
item-2           almonds
item-3             flour
item-4           cereals
Name: name, dtype: object

 

Example 2: In this example we are going to select cost and quantity column

Advertisement
#import the module
import pandas

#consider the food data
food_input={'id':['foo-23','foo-13','foo-02','foo-31'],
                  'name':['ground-nut oil','almonds','flour','cereals'],
                  'cost':[567.00,562.56,67.00,76.09],
                  'quantity':[1,2,3,2]}

#pass this food to the dataframe by specifying rows 
dataframe=pandas.DataFrame(food_input,index = ['item-1', 'item-2', 'item-3', 'item-4'])

#display cost column from the dataframe
print(dataframe['cost'])

print()

#display quantity column from the dataframe
print(dataframe['quantity'])

Output:

item-1    567.00
item-2    562.56
item-3     67.00
item-4     76.09
Name: cost, dtype: float64

item-1    1
item-2    2
item-3    3
item-4    2
Name: quantity, dtype: int64

 

Method 3 : Get all column names using columns method

In this method we are going to select only the name of all columns using columns method

Syntax:

dataframe.columns

where,  dataframe is the input dataframe

Example:

#import the module
import pandas

#consider the food data
food_input={'id':['foo-23','foo-13','foo-02','foo-31'],
                  'name':['ground-nut oil','almonds','flour','cereals'],
                  'cost':[567.00,562.56,67.00,76.09],
                  'quantity':[1,2,3,2]}

#pass this food to the dataframe by specifying rows 
dataframe=pandas.DataFrame(food_input,index = ['item-1', 'item-2', 'item-3', 'item-4'])

#display all columns 
print(dataframe.columns)

Output:

Advertisement
Index(['id', 'name', 'cost', 'quantity'], dtype='object')

 

Method 4 : Get all the columns information using info() method

We will get the column data types , total number of Non - null values in each column from the dataframe using info() method.

Syntax:

dataframe.info()

Example:

#import the module
import pandas

#consider the food data
food_input={'id':['foo-23','foo-13','foo-02','foo-31'],
                  'name':['ground-nut oil','almonds','flour','cereals'],
                  'cost':[567.00,562.56,67.00,76.09],
                  'quantity':[1,2,3,2]}

#pass this food to the dataframe by specifying rows 
dataframe=pandas.DataFrame(food_input,index = ['item-1', 'item-2', 'item-3', 'item-4'])

#display all columns information
print(dataframe.info())

Output:

<class 'pandas.core.frame.DataFrame'>
Index: 4 entries, item-1 to item-4
Data columns (total 4 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   id        4 non-null      object 
 1   name      4 non-null      object 
 2   cost      4 non-null      float64
 3   quantity  4 non-null      int64  
dtypes: float64(1), int64(1), object(2)
memory usage: 160.0+ bytes
None

 

Method 5 : Describe the column statistics using describe() method

This method will return count, minimum value, maximum value, standard deviation, etc from all the columns.

Syntax:

Advertisement
dataframe.describe()

Example: Get the statistics from the above dataframe columns

#import the module
import pandas

#consider the food data
food_input={'id':['foo-23','foo-13','foo-02','foo-31'],
                  'name':['ground-nut oil','almonds','flour','cereals'],
                  'cost':[567.00,562.56,67.00,76.09],
                  'quantity':[1,2,3,2]}

#pass this food to the dataframe by specifying rows 
dataframe=pandas.DataFrame(food_input,index = ['item-1', 'item-2', 'item-3', 'item-4'])

#display all columns statistics information
print(dataframe.describe())

Output:

             cost  quantity
count    4.000000  4.000000
mean   318.162500  2.000000
std    284.799307  0.816497
min     67.000000  1.000000
25%     73.817500  1.750000
50%    319.325000  2.000000
75%    563.670000  2.250000
max    567.000000  3.000000

 

Method 6 : Select particular value in a column

Here , we are going to select particular value in a column using above methods. Note - value position starts from . using this position we can select value from the selected column.

Syntax:

dataframe['column'][position]

where,

  1. column is the column name
  2. position refers to the value position

Example:

#import the module
import pandas

#consider the food data
food_input={'id':['foo-23','foo-13','foo-02','foo-31'],
                  'name':['ground-nut oil','almonds','flour','cereals'],
                  'cost':[567.00,562.56,67.00,76.09],
                  'quantity':[1,2,3,2]}

#pass this food to the dataframe by specifying rows 
dataframe=pandas.DataFrame(food_input,index = ['item-1', 'item-2', 'item-3', 'item-4'])

#display cost column first value from the dataframe
print(dataframe['cost'][0])

#display quantity column second value  from the dataframe
print(dataframe['quantity'][1])

#display id column first value from the dataframe
print(dataframe['id'][0])

#display name column second value  from the dataframe
print(dataframe['name'][1])

#display id column third value from the dataframe
print(dataframe['id'][2])

Output:

567.0
2
foo-23
almonds
foo-02

 

Example: We can also get using . operator

#import the module
import pandas

#consider the food data
food_input={'id':['foo-23','foo-13','foo-02','foo-31'],
                  'name':['ground-nut oil','almonds','flour','cereals'],
                  'cost':[567.00,562.56,67.00,76.09],
                  'quantity':[1,2,3,2]}

#pass this food to the dataframe by specifying rows 
dataframe=pandas.DataFrame(food_input,index = ['item-1', 'item-2', 'item-3', 'item-4'])

#display cost column first value from the dataframe
print(dataframe.cost[0])

#display quantity column second value  from the dataframe
print(dataframe.quantity[1])

#display id column first value from the dataframe
print(dataframe.id[0])

#display name column second value  from the dataframe
print(dataframe.name[1])

#display id column third value from the dataframe
print(dataframe.id[2])

Output:

567.0
2
foo-23
almonds
foo-02

 

Summary

In this tutorial we discussed how to select particular columns from the dataframe and we also discussed how to get the column names with the information like data types , statistics etc. Finally, we have also seen how to get particular value from the selected column using previous methods.

 

Further Reading

Pandas Get columns

 

Didn't find what you were looking for? Perform a quick search across GoLinuxCloud

If my articles on GoLinuxCloud has helped you, kindly consider buying me a coffee as a token of appreciation.

Buy GoLinuxCloud a Coffee

For any other feedbacks or questions you can either use the comments section or contact me form.

Thank You for your support!!

Leave a Comment

X