How to use dropna() function in pandas DataFrame

Different scenarios for dropna() function in DataFrame

In this tutorial we will discuss how to drop the data in pandas DataFrame using dropna() function. dropna() is used to remove/drop null or NaN values from  pandas  dataframe.

  • Drop NaN values from a row using dropna()
  • Drop NaN values from a column using dropna()
  • Drop NaN values from a row using dropna() with how parameter
  • Drop NaN values from a column using dropna() with how parameter
  • Drop NaN values from a row using dropna()  with no parameters.
  • Drop None values using dropna()

 

Create pandas DataFrame with example data

DataFrame is a data structure used to store the data in two dimensional format. It is similar to table that stores the data in rows and columns. Rows represents the records/ tuples and columns refers to the attributes.

Advertisement

We can create the DataFrame by using pandas.DataFrame() method.

Syntax:

pandas.DataFrame(input_data,columns,index)

Parameters:

It will take mainly three parameters

  1. input_data is represents a list of data
  2. columns represent the columns names for the data
  3. index represent the row numbers/values

We can also create a DataFrame using dictionary by skipping columns and indices.

 

Example: Python Program to create a dataframe for market data from a dictionary of food items by specifying the column names.

Advertisement

We are including some NaN values for dropping these values with dropna() function.

# import the module
import pandas
import numpy

# consider the food data
food_input={'id':['foo-23',numpy.nan,'foo-02','foo-31'],
                  'name':['ground-nut oil','almonds','flour','cereals'],
                  'cost':[numpy.nan,562.56,67.00,76.09],
                  'quantity':[numpy.nan,3,numpy.nan,3]}

# pass this food to the dataframe by specifying rows 
dataframe=pandas.DataFrame(food_input,index = ['item-1', 'item-2', 'item-3', 'item-4'])

# display the dataframe
print(dataframe)

Output:

            id            name    cost  quantity
item-1  foo-23  ground-nut oil     NaN       NaN
item-2     NaN         almonds  562.56       3.0
item-3  foo-02           flour   67.00       NaN
item-4  foo-31         cereals   76.09       3.0

You can learn more at Pandas dataframe explained with simple examples

 

1. Drop NaN values from a row using dropna()

Here we are going to drop NaN values from the above dataframe using dropna() function. We have to specify axis=0 to drop rows with NaN values.

Syntax:

dataframe.dropna(axis=0)

where,

  1. dataframe is the input dataframe
  2. axis = 0 specifies row

 

Example: In this example we are going to drop NaN values present in rows of  dataframe.

# import the module
import pandas
import numpy

# consider the food data
food_input={'id':['foo-23',numpy.nan,'foo-02','foo-31'],
                  'name':['ground-nut oil','almonds','flour','cereals'],
                  'cost':[numpy.nan,562.56,67.00,76.09],
                  'quantity':[numpy.nan,3,numpy.nan,3]}

# pass this food to the dataframe by specifying rows 
dataframe=pandas.DataFrame(food_input,index = ['item-1', 'item-2', 'item-3', 'item-4'])

# drop nan values for in row
dataframe=dataframe.dropna(axis=0)

# display the dataframe
print(dataframe)

Output:

Advertisement
            id     name   cost  quantity
item-4  foo-31  cereals  76.09       3.0

 

2. Drop NaN values from a column using dropna()

Here we are going to drop NaN values from the above dataframe using dropna() function. We have to specify axis=1 to drop columns with NaN values.

Syntax:

dataframe.dropna(axis=1)

where,

  1. dataframe is the input dataframe
  2. axis = 1 specifies column.

 

Example: In this example we are going to drop NaN values present in columns of  dataframe.

# import the module
import pandas
import numpy

# consider the food data
food_input={'id':['foo-23',numpy.nan,'foo-02','foo-31'],
                  'name':['ground-nut oil','almonds','flour','cereals'],
                  'cost':[numpy.nan,562.56,67.00,76.09],
                  'quantity':[numpy.nan,3,numpy.nan,3]}

# pass this food to the dataframe by specifying rows 
dataframe=pandas.DataFrame(food_input,index = ['item-1', 'item-2', 'item-3', 'item-4'])

# drop nan values for in column
dataframe=dataframe.dropna(axis=1)

# display the  dataframe
print(dataframe)

Output:

                  name
item-1  ground-nut oil
item-2         almonds
item-3           flour
item-4         cereals

 

3. Drop NaN values from a row using dropna() with how parameter

Here we are going to consider the how parameter to drop NaN values in a row.

This parameter takes tow values - any and all

  1. any is used to remove NaN values in a row if atleast one NaN value is present
  2. all is used to remove NaN values in a row if all are NaN values.

 

Example 1:

Python program to drop NaN values in a row with how = any parameter

# import the module
import pandas
import numpy

# consider the food data
food_input={'id':['foo-23',numpy.nan,'foo-02','foo-31'],
                  'name':['ground-nut oil','almonds','flour','cereals'],
                  'cost':[numpy.nan,562.56,67.00,76.09],
                  'quantity':[numpy.nan,3,numpy.nan,3]}

# pass this food to the dataframe by specifying rows 
dataframe=pandas.DataFrame(food_input,index = ['item-1', 'item-2', 'item-3', 'item-4'])

# drop nan values for in row
dataframe=dataframe.dropna(axis=0,how='any')

# display the dataframe
print(dataframe)

Output:

            id     name   cost  quantity
item-4  foo-31  cereals  76.09       3.0

 

Example 2: Python program to drop NaN values in a row with how = all parameter

# import the module
import pandas
import numpy

# consider the food data
food_input={'id':['foo-23',numpy.nan,'foo-02','foo-31'],
                  'name':['ground-nut oil','almonds','flour','cereals'],
                  'cost':[numpy.nan,562.56,67.00,76.09],
                  'quantity':[numpy.nan,3,numpy.nan,3]}

# pass this food to the dataframe by specifying rows 
dataframe=pandas.DataFrame(food_input,index = ['item-1', 'item-2', 'item-3', 'item-4'])

# drop nan values for in row
dataframe=dataframe.dropna(axis=0,how='all')

# display the dataframe
print(dataframe)

Output:

            id            name    cost  quantity
item-1  foo-23  ground-nut oil     NaN       NaN
item-2     NaN         almonds  562.56       3.0
item-3  foo-02           flour   67.00       NaN
item-4  foo-31         cereals   76.09       3.0

 

4. Drop NaN values from a column using dropna() with how parameter

Here we are going to consider the how parameter to drop NaN values in a column.

This parameter takes tow values - any and all

  1. any is used to remove NaN values in a column if atleast one NaN value is present
  2. all is used to remove NaN values in a column if all are NaN values.

 

Example 1: Python program to drop NaN values in a column  with how = any parameter

# import the module
import pandas
import numpy

# consider the food data
food_input={'id':['foo-23',numpy.nan,'foo-02','foo-31'],
                  'name':['ground-nut oil','almonds','flour','cereals'],
                  'cost':[numpy.nan,562.56,67.00,76.09],
                  'quantity':[numpy.nan,3,numpy.nan,3]}

# pass this food to the dataframe by specifying rows 
dataframe=pandas.DataFrame(food_input,index = ['item-1', 'item-2', 'item-3', 'item-4'])

# drop nan values for in column
dataframe=dataframe.dropna(axis=1,how='any')

# display the dataframe
print(dataframe)

Output:

                  name
item-1  ground-nut oil
item-2         almonds
item-3           flour
item-4         cereals

 

Example 2: Python program to drop NaN values in a column with how = all parameter

# import the module
import pandas
import numpy

# consider the food data
food_input={'id':['foo-23',numpy.nan,'foo-02','foo-31'],
                  'name':['ground-nut oil','almonds','flour','cereals'],
                  'cost':[numpy.nan,562.56,67.00,76.09],
                  'quantity':[numpy.nan,3,numpy.nan,3]}

# pass this food to the dataframe by specifying rows 
dataframe=pandas.DataFrame(food_input,index = ['item-1', 'item-2', 'item-3', 'item-4'])

# drop nan values for in column
dataframe=dataframe.dropna(axis=1,how='all')

# display the dataframe
print(dataframe)

Output:

            id            name    cost  quantity
item-1  foo-23  ground-nut oil     NaN       NaN
item-2     NaN         almonds  562.56       3.0
item-3  foo-02           flour   67.00       NaN
item-4  foo-31         cereals   76.09       3.0

 

5. Drop NaN values from a row using dropna()  with no parameters

Here we are not specifying any of the parameters to the dropna() function. So by default it will drop NaN values in the row.

Syntax:

dataframe.dropna()

where, dataframe is the input dataframe

 

Example: In this example we are going to drop NaN values present in  dataframe.

# import the module
import pandas
import numpy

# consider the food data
food_input={'id':['foo-23',numpy.nan,'foo-02','foo-31'],
                  'name':['ground-nut oil','almonds','flour','cereals'],
                  'cost':[numpy.nan,562.56,67.00,76.09],
                  'quantity':[numpy.nan,3,numpy.nan,3]}

# pass this food to the dataframe by specifying rows 
dataframe=pandas.DataFrame(food_input,index = ['item-1', 'item-2', 'item-3', 'item-4'])

# drop nan values
dataframe=dataframe.dropna()

# display the dataframe
print(dataframe)

Output:

            id     name   cost  quantity
item-4  foo-31  cereals  76.09       3.0

 

6. Drop None values using dropna()

Example: Here we are dropping None values using dropna()

# import the module
import pandas
import numpy

# consider the food data
food_input={'id':['foo-23',None,'foo-02','foo-31'],
                  'name':['ground-nut oil','almonds','flour','cereals'],
                  'cost':[None,562.56,67.00,76.09],
                  'quantity':[None,3,None,3]}

# pass this food to the dataframe by specifying rows 
dataframe=pandas.DataFrame(food_input,index = ['item-1', 'item-2', 'item-3', 'item-4'])

# drop nan values 
dataframe=dataframe.dropna()

# display the dataframe
print(dataframe)

Output:

            id     name   cost  quantity
item-4  foo-31  cereals  76.09       3.0

 

Summary

In this tutorial  we discussed how to use dropna() function to drop NaN /Null values inside the dataframe. We considered all the parameters in dropna() to drop rows and columns. We can see many applications with dropna() function. Data contains Missing values , so we have to process this data with out missing values, inorder to remove these missing values we will use dropna() to remove these missing values. So we will get accurate results while processing the data. So we can also use this function for processing the large datasets in Machine Learning and Deep Learning .

 

References

Pandas dropna()

 

Didn't find what you were looking for? Perform a quick search across GoLinuxCloud

If my articles on GoLinuxCloud has helped you, kindly consider buying me a coffee as a token of appreciation.

Buy GoLinuxCloud a Coffee

For any other feedbacks or questions you can either use the comments section or contact me form.

Thank You for your support!!

Leave a Comment