Table of Contents
Different methods to filter pandas DataFrame by column value
In this tutorial we will discuss how to filter pandas DataFrame by column value using the following methods:
- Filter by single column value using relational operators
- Filter by multiple column values using relational operators
- Filter by single column value using
loc[]
function - Filter by multiple column values using
loc[]
function
Create pandas.DataFrame with example data
DataFrame is a data structure used to store the data in two dimensional format. It is similar to table that stores the data in rows and columns. Rows represents the records/ tuples and columns refers to the attributes.
We can create the DataFrame by using pandas.DataFrame() method.
Syntax:
pandas.DataFrame(input_data,columns,index)
Parameters:
It will take mainly three parameters
- input_data is represents a list of data
- columns represent the columns names for the data
- index represent the row numbers/values
We can also create a DataFrame using dictionary by skipping columns and indices.
Let’s see an example.
Example:
Python Program to create a dataframe for market data from a dictionary of food items
#import the module
import pandas
#consider the food data
food_input={'id':['foo-23','foo-13','foo-02','foo-31'],
'name':['ground-nut oil','almonds','flour','cereals'],
'cost':[567.00,562.56,67.00,76.09],
'quantity':[1,2,3,2]}
#pass this food to the dataframe
dataframe=pandas.DataFrame(food_input)
#display the dataframe
print(dataframe)
Output:
id name cost quantity
0 foo-23 ground-nut oil 567.00 1
1 foo-13 almonds 562.56 2
2 foo-02 flour 67.00 3
3 foo-31 cereals 76.09 2
Method-1:Filter by single column value using relational operators
Here we are going to filter the dataframe using value present in single column using relational operators.
Relational operators include <,>,<=,>= !=,==
.
We have to specify along with column name, such that we can easily filter the dataframe by comparing with the given values.
Syntax:
dataframe[dataframe['column'] operator value]
where,
- dataframe is the input dataframe
column
refers the dataframe column name where value is filtered in this columnoperator
is the relational operatorvalue
is the string/numeric data compared with actual column value in the dataframe
Example 1:
In this example. we are going to filter the dataframe from cost and quantity column values by comparing with numeric values.
#import the module
import pandas
#consider the food data
food_input={'id':['foo-23','foo-13','foo-02','foo-31'],
'name':['ground-nut oil','almonds','flour','cereals'],
'cost':[567.00,562.56,67.00,76.09],
'quantity':[1,2,3,2]}
#pass this food to the dataframe
dataframe=pandas.DataFrame(food_input)
#filter cost column values that are greater than 100
print(dataframe[dataframe['cost'] > 100])
print()
#filter quantity column values that are greater than or equal to 2
print(dataframe[dataframe['quantity'] >= 2] )
print()
#filter cost column values that are less than 100
print(dataframe[dataframe['cost'] < 100])
print()
#filter quantity column values that are less than or equal to 2
print(dataframe[dataframe['quantity'] <= 2] )
Output:
id name cost quantity
0 foo-23 ground-nut oil 567.00 1
1 foo-13 almonds 562.56 2
id name cost quantity
1 foo-13 almonds 562.56 2
2 foo-02 flour 67.00 3
3 foo-31 cereals 76.09 2
id name cost quantity
2 foo-02 flour 67.00 3
3 foo-31 cereals 76.09 2
id name cost quantity
0 foo-23 ground-nut oil 567.00 1
1 foo-13 almonds 562.56 2
3 foo-31 cereals 76.09 2
Example 2:
In this example. we are going to filter the dataframe from id and name column values by comparing with string values.
#import the module
import pandas
#consider the food data
food_input={'id':['foo-23','foo-13','foo-02','foo-31'],
'name':['ground-nut oil','almonds','flour','cereals'],
'cost':[567.00,562.56,67.00,76.09],
'quantity':[1,2,3,2]}
#pass this food to the dataframe
dataframe=pandas.DataFrame(food_input)
#filter id column with value 'foo-02'
print(dataframe[dataframe['id'] == 'foo-02'] )
print()
#filter id column with value expect 'foo-23'
print(dataframe[dataframe['id'] != 'foo-23'] )
print()
#filter name column with value expect 'almonds'
print(dataframe[dataframe['name'] != 'almonds'])
print()
#filter name column with value 'flour'
print(dataframe[dataframe['name'] == 'flour'] )
Output:
id name cost quantity
2 foo-02 flour 67.0 3
id name cost quantity
1 foo-13 almonds 562.56 2
2 foo-02 flour 67.00 3
3 foo-31 cereals 76.09 2
id name cost quantity
0 foo-23 ground-nut oil 567.00 1
2 foo-02 flour 67.00 3
3 foo-31 cereals 76.09 2
id name cost quantity
2 foo-02 flour 67.0 3
Method - 2: Filter by multiple column values using relational operators
Here we are going to filter the dataframe using value present in single column using relational operators with multiple conditions.
Relational operators include <,>,<=,>= !=,==.
We can specify the condition using and(& ) , or(|) operators.
Syntax:
dataframe[(dataframe['column'] operator value) and/or operator (dataframe['column'] operator value) and/or operator .................... ]
where,
- dataframe is the input dataframe
column
refers the dataframe column name where value is filtered in this columnoperator
is the relational operatorvalue
is the string/numeric data compared with actual column value in the dataframeand/or
operator refers to &,| operators for conditions
Example:
In this example, we are going to filter the dataframe by columns values with multiple conditions
#import the module
import pandas
#consider the food data
food_input={'id':['foo-23','foo-13','foo-02','foo-31'],
'name':['ground-nut oil','almonds','flour','cereals'],
'cost':[567.00,562.56,67.00,76.09],
'quantity':[1,2,3,2]}
#pass this food to the dataframe
dataframe=pandas.DataFrame(food_input)
#filter id column with value expect 'foo-23' or cost with value greater than 100
print(dataframe[(dataframe['id'] != 'foo-23') | (dataframe['cost'] > 100)] )
print()
#filter name column with value expect 'almonds' and quantity with value greater than 4
print(dataframe[(dataframe['name'] != 'almonds') & ( dataframe['quantity'] > 4)])
print()
#filter name column with value 'flour' or quantity with value greater than 4
print(dataframe[(dataframe['name'] == 'flour') | (dataframe['quantity'] > 4)] )
Output:
id name cost quantity
0 foo-23 ground-nut oil 567.00 1
1 foo-13 almonds 562.56 2
2 foo-02 flour 67.00 3
3 foo-31 cereals 76.09 2
Empty DataFrame
Columns: [id, name, cost, quantity]
Index: []
id name cost quantity
2 foo-02 flour 67.0 3
Method 3: Filter by single column value using loc[] function
Here we are going to filter dataframe by single column value by using loc[]
function. This function will take column name as input and filter the data using relational operators.
Syntax:
dataframe[dataframe.loc['column'] operator value]
where,
- dataframe is the input dataframe
- column refers the dataframe column name where value is filtered in this column
- operator is the relational operator
- value is the string/numeric data compared with actual column value in the dataframe
Example 1:
#import the module
import pandas
#consider the food data
food_input={'id':['foo-23','foo-13','foo-02','foo-31'],
'name':['ground-nut oil','almonds','flour','cereals'],
'cost':[567.00,562.56,67.00,76.09],
'quantity':[1,2,3,2]}
#pass this food to the dataframe
dataframe=pandas.DataFrame(food_input)
#filter id column with value 'foo-02'
print(dataframe.loc[dataframe['id'] == 'foo-02'] )
print()
#filter id column with value expect 'foo-23'
print(dataframe.loc[dataframe['id'] != 'foo-23'] )
print()
#filter name column with value expect 'almonds'
print(dataframe.loc[dataframe['name'] != 'almonds'])
print()
#filter name column with value 'flour'
print(dataframe.loc[dataframe['name'] == 'flour'] )
Output:
id name cost quantity
2 foo-02 flour 67.0 3
id name cost quantity
1 foo-13 almonds 562.56 2
2 foo-02 flour 67.00 3
3 foo-31 cereals 76.09 2
id name cost quantity
0 foo-23 ground-nut oil 567.00 1
2 foo-02 flour 67.00 3
3 foo-31 cereals 76.09 2
id name cost quantity
2 foo-02 flour 67.0 3
Example 2:
In this example. we are going to filter the dataframe from cost and quantity column values by comparing with numeric values.
#import the module
import pandas
#consider the food data
food_input={'id':['foo-23','foo-13','foo-02','foo-31'],
'name':['ground-nut oil','almonds','flour','cereals'],
'cost':[567.00,562.56,67.00,76.09],
'quantity':[1,2,3,2]}
#pass this food to the dataframe
dataframe=pandas.DataFrame(food_input)
#filter cost column values that are greater than 100
print(dataframe.loc[dataframe['cost'] > 100])
print()
#filter pandas dataframe by column quantity with values that are greater than or equal to 2
print(dataframe.loc[dataframe['quantity'] >= 2] )
print()
#filter pandas dataframe by column cost with values that are less than 100
print(dataframe.loc[dataframe['cost'] < 100])
print()
#filter pandas dataframe by column quantity with values that are less than or equal to 2
print(dataframe.loc[dataframe['quantity'] <= 2] )
Output:
id name cost quantity
0 foo-23 ground-nut oil 567.00 1
1 foo-13 almonds 562.56 2
id name cost quantity
1 foo-13 almonds 562.56 2
2 foo-02 flour 67.00 3
3 foo-31 cereals 76.09 2
id name cost quantity
2 foo-02 flour 67.00 3
3 foo-31 cereals 76.09 2
id name cost quantity
0 foo-23 ground-nut oil 567.00 1
1 foo-13 almonds 562.56 2
3 foo-31 cereals 76.09 2
Method - 4:Filter by multiple column values using loc[] function
Here we are going to filter the dataframe using value present in single column using loc[]
function with relational operators with multiple conditions.
Relational operators include <,>,<=,>= !=,==.
We can specify the condition using and(& ) , or(|) operators.
Syntax:
dataframe.loc[(dataframe['column'] operator value) and/or operator (dataframe['column'] operator value) and/or operator .................... ]
where,
- dataframe is the input dataframe
- column refers the dataframe column name where value is filtered in this column
- operator is the relational operator
- value is the
string/numeric
data compared with actual column value in the dataframe - and/or operator refers to
&,|
operators for conditions
Example:
In this example, we are going to filter the dataframe by columns values with multiple conditions
#import the module
import pandas
#consider the food data
food_input={'id':['foo-23','foo-13','foo-02','foo-31'],
'name':['ground-nut oil','almonds','flour','cereals'],
'cost':[567.00,562.56,67.00,76.09],
'quantity':[1,2,3,2]}
#pass this food to the dataframe
dataframe=pandas.DataFrame(food_input)
#filter id column with value expect 'foo-23' or cost with value greater than 100
print(dataframe.loc[(dataframe['id'] != 'foo-23') | (dataframe['cost'] > 100)] )
print()
#filter pandas dataframe by column name with value expect 'almonds' and quantity with value greater than 4
print(dataframe.loc[(dataframe['name'] != 'almonds') & ( dataframe['quantity'] > 4)])
print()
#filter pandas dataframe by column name with value 'flour' or quantity with value greater than 4
print(dataframe.loc[(dataframe['name'] == 'flour') | (dataframe['quantity'] > 4)] )
Output:
id name cost quantity
0 foo-23 ground-nut oil 567.00 1
1 foo-13 almonds 562.56 2
2 foo-02 flour 67.00 3
3 foo-31 cereals 76.09 2
Empty DataFrame
Columns: [id, name, cost, quantity]
Index: []
id name cost quantity
2 foo-02 flour 67.0 3
Summary
In this tutorial , we came to point that we can organize the data in the DataFrame using Pandas module and we discussed how to filter pandas dataframe using column values through Relational operators and loc[]
function. We implemented these functions/methods also by considering multiple conditions using and , or operators.
References