4 ways to filter pandas DataFrame by column value

Different methods to filter pandas DataFrame by column value

In this tutorial we will discuss how to filter pandas DataFrame by column value  using the following methods:

  • Filter by single column value using relational operators
  • Filter by multiple column values using relational operators
  • Filter by single column value using loc[] function
  • Filter by multiple column values using loc[] function

 

Create pandas.DataFrame with example data

DataFrame is a data structure used to store the data in two dimensional format. It is similar to table that stores the data in rows and columns. Rows represents the records/ tuples and columns refers to the attributes.

We can create the DataFrame by using pandas.DataFrame() method.

Syntax:

pandas.DataFrame(input_data,columns,index)

Parameters:

It will take mainly three parameters

  1. input_data is represents a list of data
  2. columns represent the columns names for the data
  3. index represent the row numbers/values

We can also create a DataFrame using dictionary by skipping columns and indices.

Let’s see an example.

Example:

Python Program to create a dataframe for market data from a dictionary of food items

#import the module
import pandas

#consider the food data
food_input={'id':['foo-23','foo-13','foo-02','foo-31'],
                  'name':['ground-nut oil','almonds','flour','cereals'],
                  'cost':[567.00,562.56,67.00,76.09],
                  'quantity':[1,2,3,2]}

#pass this food to the dataframe
dataframe=pandas.DataFrame(food_input)

#display the dataframe
print(dataframe)

Output:

       id            name    cost  quantity
0  foo-23  ground-nut oil  567.00         1
1  foo-13         almonds  562.56         2
2  foo-02           flour   67.00         3
3  foo-31         cereals   76.09         2

 

Method-1:Filter by single column value using relational operators

Here we are going to filter the dataframe using value present in single column using relational operators.

Relational operators include <,>,<=,>= !=,==.

We have to specify along with column name, such that we can easily filter the dataframe by comparing with the given values.

Syntax:

dataframe[dataframe['column'] operator value]

where,

  1. dataframe is the input dataframe
  2. column refers the dataframe column name where value is filtered in this column
  3. operator is the relational operator
  4. value is the string/numeric data compared with actual column value in the dataframe

Example 1:

In this example. we are going to filter the dataframe from cost and quantity column values  by comparing with numeric values.

#import the module
import pandas

#consider the food data
food_input={'id':['foo-23','foo-13','foo-02','foo-31'],
                  'name':['ground-nut oil','almonds','flour','cereals'],
                  'cost':[567.00,562.56,67.00,76.09],
                  'quantity':[1,2,3,2]}

#pass this food to the dataframe
dataframe=pandas.DataFrame(food_input)

#filter cost column values that are greater than 100
print(dataframe[dataframe['cost'] > 100])

print()

#filter quantity column values that are greater than or equal to  2
print(dataframe[dataframe['quantity'] >= 2] )

print()

#filter cost column values that are less than 100
print(dataframe[dataframe['cost'] < 100])

print()

#filter quantity column values that are less than or equal to  2
print(dataframe[dataframe['quantity'] <= 2] )

Output:

       id            name    cost  quantity
0  foo-23  ground-nut oil  567.00         1
1  foo-13         almonds  562.56         2

       id     name    cost  quantity
1  foo-13  almonds  562.56         2
2  foo-02    flour   67.00         3
3  foo-31  cereals   76.09         2

       id     name   cost  quantity
2  foo-02    flour  67.00         3
3  foo-31  cereals  76.09         2

       id            name    cost  quantity
0  foo-23  ground-nut oil  567.00         1
1  foo-13         almonds  562.56         2
3  foo-31         cereals   76.09         2

 

Example 2:

In this example. we are going to filter the dataframe from id and name column values  by comparing with string values.

#import the module
import pandas

#consider the food data
food_input={'id':['foo-23','foo-13','foo-02','foo-31'],
                  'name':['ground-nut oil','almonds','flour','cereals'],
                  'cost':[567.00,562.56,67.00,76.09],
                  'quantity':[1,2,3,2]}

#pass this food to the dataframe
dataframe=pandas.DataFrame(food_input)

#filter id column with value 'foo-02'
print(dataframe[dataframe['id'] == 'foo-02'] )

print()

#filter id column with value expect 'foo-23'
print(dataframe[dataframe['id'] != 'foo-23'] )

print()

#filter name column with value expect 'almonds'
print(dataframe[dataframe['name'] != 'almonds'])

print()

#filter name column with value 'flour'
print(dataframe[dataframe['name'] == 'flour'] )

Output:

       id   name  cost  quantity
2  foo-02  flour  67.0         3

       id     name    cost  quantity
1  foo-13  almonds  562.56         2
2  foo-02    flour   67.00         3
3  foo-31  cereals   76.09         2

       id            name    cost  quantity
0  foo-23  ground-nut oil  567.00         1
2  foo-02           flour   67.00         3
3  foo-31         cereals   76.09         2

       id   name  cost  quantity
2  foo-02  flour  67.0         3

 

Method - 2: Filter by multiple column values using relational operators

Here we are going to filter the dataframe using value present in single column using relational operators with multiple conditions.

Relational operators include <,>,<=,>= !=,==.

We can specify the condition using and(& ) , or(|) operators.

Syntax:

dataframe[(dataframe['column'] operator value) and/or operator (dataframe['column'] operator value) and/or operator .................... ]

where,

  1. dataframe is the input dataframe
  2. column refers the dataframe column name where value is filtered in this column
  3. operator is the relational operator
  4. value is the string/numeric data compared with actual column value in the dataframe
  5. and/or operator refers to &,| operators for conditions

Example:

In this example, we are going to filter the dataframe by columns values with multiple conditions

#import the module
import pandas

#consider the food data
food_input={'id':['foo-23','foo-13','foo-02','foo-31'],
                  'name':['ground-nut oil','almonds','flour','cereals'],
                  'cost':[567.00,562.56,67.00,76.09],
                  'quantity':[1,2,3,2]}

#pass this food to the dataframe
dataframe=pandas.DataFrame(food_input)

#filter id column with value expect 'foo-23' or cost with value greater than 100
print(dataframe[(dataframe['id'] != 'foo-23')  | (dataframe['cost'] > 100)] )

print()

#filter name column with value expect 'almonds'  and quantity with value greater than 4
print(dataframe[(dataframe['name'] != 'almonds') & ( dataframe['quantity'] > 4)])

print()

#filter name column with value 'flour' or quantity with value greater than 4
print(dataframe[(dataframe['name'] == 'flour') | (dataframe['quantity'] > 4)] )

Output:

       id            name    cost  quantity
0  foo-23  ground-nut oil  567.00         1
1  foo-13         almonds  562.56         2
2  foo-02           flour   67.00         3
3  foo-31         cereals   76.09         2

Empty DataFrame
Columns: [id, name, cost, quantity]
Index: []

       id   name  cost  quantity
2  foo-02  flour  67.0         3

 

Method 3: Filter by single column value using loc[] function

Here we are going to filter dataframe by single column value by using loc[] function. This function will take column name as input and filter the data using relational operators.

Syntax:

dataframe[dataframe.loc['column'] operator value]

where,

  1. dataframe is the input dataframe
  2. column refers the dataframe column name where value is filtered in this column
  3. operator is the relational operator
  4. value is the string/numeric data compared with actual column value in the dataframe

 

Example 1:

#import the module
import pandas

#consider the food data
food_input={'id':['foo-23','foo-13','foo-02','foo-31'],
                  'name':['ground-nut oil','almonds','flour','cereals'],
                  'cost':[567.00,562.56,67.00,76.09],
                  'quantity':[1,2,3,2]}

#pass this food to the dataframe
dataframe=pandas.DataFrame(food_input)

#filter id column with value 'foo-02'
print(dataframe.loc[dataframe['id'] == 'foo-02'] )

print()

#filter id column with value expect 'foo-23'
print(dataframe.loc[dataframe['id'] != 'foo-23'] )

print()

#filter name column with value expect 'almonds'
print(dataframe.loc[dataframe['name'] != 'almonds'])

print()

#filter name column with value 'flour'
print(dataframe.loc[dataframe['name'] == 'flour'] )

Output:

       id   name  cost  quantity
2  foo-02  flour  67.0         3

       id     name    cost  quantity
1  foo-13  almonds  562.56         2
2  foo-02    flour   67.00         3
3  foo-31  cereals   76.09         2

       id            name    cost  quantity
0  foo-23  ground-nut oil  567.00         1
2  foo-02           flour   67.00         3
3  foo-31         cereals   76.09         2

       id   name  cost  quantity
2  foo-02  flour  67.0         3

 

Example 2:

In this example. we are going to filter the dataframe from cost and quantity column values  by comparing with numeric values.

#import the module
import pandas

#consider the food data
food_input={'id':['foo-23','foo-13','foo-02','foo-31'],
                  'name':['ground-nut oil','almonds','flour','cereals'],
                  'cost':[567.00,562.56,67.00,76.09],
                  'quantity':[1,2,3,2]}

#pass this food to the dataframe
dataframe=pandas.DataFrame(food_input)

#filter cost column values that are greater than 100
print(dataframe.loc[dataframe['cost'] > 100])

print()

#filter pandas dataframe by column quantity with values that are greater than or equal to  2
print(dataframe.loc[dataframe['quantity'] >= 2] )

print()

#filter pandas dataframe by column cost with values that are less than 100
print(dataframe.loc[dataframe['cost'] < 100])

print()

#filter pandas dataframe by column quantity with values that are less than or equal to  2
print(dataframe.loc[dataframe['quantity'] <= 2] )

Output:

       id            name    cost  quantity
0  foo-23  ground-nut oil  567.00         1
1  foo-13         almonds  562.56         2

       id     name    cost  quantity
1  foo-13  almonds  562.56         2
2  foo-02    flour   67.00         3
3  foo-31  cereals   76.09         2

       id     name   cost  quantity
2  foo-02    flour  67.00         3
3  foo-31  cereals  76.09         2

       id            name    cost  quantity
0  foo-23  ground-nut oil  567.00         1
1  foo-13         almonds  562.56         2
3  foo-31         cereals   76.09         2

 

Method - 4:Filter by multiple column values using loc[] function

Here we are going to filter the dataframe using value present in single column using loc[] function with relational operators with multiple conditions.

Relational operators include <,>,<=,>= !=,==.

We can specify the condition using and(& ) , or(|) operators.

Syntax:

dataframe.loc[(dataframe['column'] operator value) and/or operator (dataframe['column'] operator value) and/or operator .................... ]

where,

  1. dataframe is the input dataframe
  2. column refers the dataframe column name where value is filtered in this column
  3. operator is the relational operator
  4. value is the string/numeric data compared with actual column value in the dataframe
  5. and/or operator refers to &,| operators for conditions

 

Example:

In this example, we are going to filter the dataframe by columns values with multiple conditions

#import the module
import pandas

#consider the food data
food_input={'id':['foo-23','foo-13','foo-02','foo-31'],
                  'name':['ground-nut oil','almonds','flour','cereals'],
                  'cost':[567.00,562.56,67.00,76.09],
                  'quantity':[1,2,3,2]}

#pass this food to the dataframe
dataframe=pandas.DataFrame(food_input)

#filter id column with value expect 'foo-23' or cost with value greater than 100
print(dataframe.loc[(dataframe['id'] != 'foo-23')  | (dataframe['cost'] > 100)] )

print()

#filter pandas dataframe by column name with value expect 'almonds'  and quantity with value greater than 4
print(dataframe.loc[(dataframe['name'] != 'almonds') & ( dataframe['quantity'] > 4)])

print()

#filter pandas dataframe by column name with value 'flour' or quantity with value greater than 4
print(dataframe.loc[(dataframe['name'] == 'flour') | (dataframe['quantity'] > 4)] )

Output:

       id            name    cost  quantity
0  foo-23  ground-nut oil  567.00         1
1  foo-13         almonds  562.56         2
2  foo-02           flour   67.00         3
3  foo-31         cereals   76.09         2

Empty DataFrame
Columns: [id, name, cost, quantity]
Index: []

       id   name  cost  quantity
2  foo-02  flour  67.0         3

 

Summary

In this tutorial , we came to point that we can organize the data in the DataFrame using Pandas module and we discussed how to filter pandas dataframe using column values through Relational operators and loc[] function. We implemented these functions/methods also by considering multiple conditions using and , or operators.

 

References

Python - DataFrame

 

Didn't find what you were looking for? Perform a quick search across GoLinuxCloud

If my articles on GoLinuxCloud has helped you, kindly consider buying me a coffee as a token of appreciation.

Buy GoLinuxCloud a Coffee

For any other feedbacks or questions you can either use the comments section or contact me form.

Thank You for your support!!

Leave a Comment