Table of Contents
Different scenarios for dropna() function in DataFrame
In this tutorial we will discuss how to drop the data in pandas DataFrame using dropna()
function. dropna()
is used to remove/drop null or NaN values from pandas dataframe.
- Drop NaN values from a row using
dropna()
- Drop NaN values from a column using
dropna()
- Drop NaN values from a row using dropna() with
how parameter
- Drop NaN values from a column using dropna() with
how parameter
- Drop NaN values from a row using dropna() with
no parameters
. - Drop None values using
dropna()
Create pandas DataFrame with example data
DataFrame is a data structure used to store the data in two dimensional format. It is similar to table that stores the data in rows and columns. Rows represents the records/ tuples and columns refers to the attributes.
We can create the DataFrame by using pandas.DataFrame() method.
Syntax:
pandas.DataFrame(input_data,columns,index)
Parameters:
It will take mainly three parameters
input_data
is represents a list of datacolumns
represent the columns names for the dataindex
represent the row numbers/values
We can also create a DataFrame using dictionary by skipping columns and indices.
Example: Python Program to create a dataframe for market data from a dictionary of food items by specifying the column names.
We are including some NaN values for dropping these values with dropna() function.
# import the module
import pandas
import numpy
# consider the food data
food_input={'id':['foo-23',numpy.nan,'foo-02','foo-31'],
'name':['ground-nut oil','almonds','flour','cereals'],
'cost':[numpy.nan,562.56,67.00,76.09],
'quantity':[numpy.nan,3,numpy.nan,3]}
# pass this food to the dataframe by specifying rows
dataframe=pandas.DataFrame(food_input,index = ['item-1', 'item-2', 'item-3', 'item-4'])
# display the dataframe
print(dataframe)
Output:
id name cost quantity
item-1 foo-23 ground-nut oil NaN NaN
item-2 NaN almonds 562.56 3.0
item-3 foo-02 flour 67.00 NaN
item-4 foo-31 cereals 76.09 3.0
You can learn more at Pandas dataframe explained with simple examples
1. Drop NaN values from a row using dropna()
Here we are going to drop NaN values from the above dataframe using dropna()
function. We have to specify axis=0 to drop rows with NaN values.
Syntax:
dataframe.dropna(axis=0)
where,
- dataframe is the input dataframe
axis = 0
specifies row
Example: In this example we are going to drop NaN values present in rows of dataframe.
# import the module
import pandas
import numpy
# consider the food data
food_input={'id':['foo-23',numpy.nan,'foo-02','foo-31'],
'name':['ground-nut oil','almonds','flour','cereals'],
'cost':[numpy.nan,562.56,67.00,76.09],
'quantity':[numpy.nan,3,numpy.nan,3]}
# pass this food to the dataframe by specifying rows
dataframe=pandas.DataFrame(food_input,index = ['item-1', 'item-2', 'item-3', 'item-4'])
# drop nan values for in row
dataframe=dataframe.dropna(axis=0)
# display the dataframe
print(dataframe)
Output:
id name cost quantity
item-4 foo-31 cereals 76.09 3.0
2. Drop NaN values from a column using dropna()
Here we are going to drop NaN values from the above dataframe using dropna()
function. We have to specify axis=1 to drop columns with NaN values.
Syntax:
dataframe.dropna(axis=1)
where,
- dataframe is the input dataframe
axis = 1
specifies column.
Example: In this example we are going to drop NaN values present in columns of dataframe.
# import the module
import pandas
import numpy
# consider the food data
food_input={'id':['foo-23',numpy.nan,'foo-02','foo-31'],
'name':['ground-nut oil','almonds','flour','cereals'],
'cost':[numpy.nan,562.56,67.00,76.09],
'quantity':[numpy.nan,3,numpy.nan,3]}
# pass this food to the dataframe by specifying rows
dataframe=pandas.DataFrame(food_input,index = ['item-1', 'item-2', 'item-3', 'item-4'])
# drop nan values for in column
dataframe=dataframe.dropna(axis=1)
# display the dataframe
print(dataframe)
Output:
name
item-1 ground-nut oil
item-2 almonds
item-3 flour
item-4 cereals
3. Drop NaN values from a row using dropna() with how parameter
Here we are going to consider the how parameter to drop NaN values in a row.
This parameter takes tow values - any and all
- any is used to remove NaN values in a row if atleast one NaN value is present
- all is used to remove NaN values in a row if all are NaN values.
Example 1:
Python program to drop NaN values in a row with how = any parameter
# import the module
import pandas
import numpy
# consider the food data
food_input={'id':['foo-23',numpy.nan,'foo-02','foo-31'],
'name':['ground-nut oil','almonds','flour','cereals'],
'cost':[numpy.nan,562.56,67.00,76.09],
'quantity':[numpy.nan,3,numpy.nan,3]}
# pass this food to the dataframe by specifying rows
dataframe=pandas.DataFrame(food_input,index = ['item-1', 'item-2', 'item-3', 'item-4'])
# drop nan values for in row
dataframe=dataframe.dropna(axis=0,how='any')
# display the dataframe
print(dataframe)
Output:
id name cost quantity
item-4 foo-31 cereals 76.09 3.0
Example 2: Python program to drop NaN values in a row with how = all parameter
# import the module
import pandas
import numpy
# consider the food data
food_input={'id':['foo-23',numpy.nan,'foo-02','foo-31'],
'name':['ground-nut oil','almonds','flour','cereals'],
'cost':[numpy.nan,562.56,67.00,76.09],
'quantity':[numpy.nan,3,numpy.nan,3]}
# pass this food to the dataframe by specifying rows
dataframe=pandas.DataFrame(food_input,index = ['item-1', 'item-2', 'item-3', 'item-4'])
# drop nan values for in row
dataframe=dataframe.dropna(axis=0,how='all')
# display the dataframe
print(dataframe)
Output:
id name cost quantity
item-1 foo-23 ground-nut oil NaN NaN
item-2 NaN almonds 562.56 3.0
item-3 foo-02 flour 67.00 NaN
item-4 foo-31 cereals 76.09 3.0
4. Drop NaN values from a column using dropna() with how parameter
Here we are going to consider the how parameter to drop NaN values in a column.
This parameter takes tow values - any and all
any
is used to remove NaN values in a column if atleast one NaN value is presentall
is used to remove NaN values in a column if all are NaN values.
Example 1: Python program to drop NaN values in a column with how = any parameter
# import the module
import pandas
import numpy
# consider the food data
food_input={'id':['foo-23',numpy.nan,'foo-02','foo-31'],
'name':['ground-nut oil','almonds','flour','cereals'],
'cost':[numpy.nan,562.56,67.00,76.09],
'quantity':[numpy.nan,3,numpy.nan,3]}
# pass this food to the dataframe by specifying rows
dataframe=pandas.DataFrame(food_input,index = ['item-1', 'item-2', 'item-3', 'item-4'])
# drop nan values for in column
dataframe=dataframe.dropna(axis=1,how='any')
# display the dataframe
print(dataframe)
Output:
name
item-1 ground-nut oil
item-2 almonds
item-3 flour
item-4 cereals
Example 2: Python program to drop NaN values in a column with how = all
parameter
# import the module
import pandas
import numpy
# consider the food data
food_input={'id':['foo-23',numpy.nan,'foo-02','foo-31'],
'name':['ground-nut oil','almonds','flour','cereals'],
'cost':[numpy.nan,562.56,67.00,76.09],
'quantity':[numpy.nan,3,numpy.nan,3]}
# pass this food to the dataframe by specifying rows
dataframe=pandas.DataFrame(food_input,index = ['item-1', 'item-2', 'item-3', 'item-4'])
# drop nan values for in column
dataframe=dataframe.dropna(axis=1,how='all')
# display the dataframe
print(dataframe)
Output:
id name cost quantity
item-1 foo-23 ground-nut oil NaN NaN
item-2 NaN almonds 562.56 3.0
item-3 foo-02 flour 67.00 NaN
item-4 foo-31 cereals 76.09 3.0
5. Drop NaN values from a row using dropna() with no parameters
Here we are not specifying any of the parameters to the dropna()
function. So by default it will drop NaN values in the row.
Syntax:
dataframe.dropna()
where, dataframe is the input dataframe
Example: In this example we are going to drop NaN values present in dataframe.
# import the module
import pandas
import numpy
# consider the food data
food_input={'id':['foo-23',numpy.nan,'foo-02','foo-31'],
'name':['ground-nut oil','almonds','flour','cereals'],
'cost':[numpy.nan,562.56,67.00,76.09],
'quantity':[numpy.nan,3,numpy.nan,3]}
# pass this food to the dataframe by specifying rows
dataframe=pandas.DataFrame(food_input,index = ['item-1', 'item-2', 'item-3', 'item-4'])
# drop nan values
dataframe=dataframe.dropna()
# display the dataframe
print(dataframe)
Output:
id name cost quantity
item-4 foo-31 cereals 76.09 3.0
6. Drop None values using dropna()
Example: Here we are dropping None values
using dropna()
# import the module
import pandas
import numpy
# consider the food data
food_input={'id':['foo-23',None,'foo-02','foo-31'],
'name':['ground-nut oil','almonds','flour','cereals'],
'cost':[None,562.56,67.00,76.09],
'quantity':[None,3,None,3]}
# pass this food to the dataframe by specifying rows
dataframe=pandas.DataFrame(food_input,index = ['item-1', 'item-2', 'item-3', 'item-4'])
# drop nan values
dataframe=dataframe.dropna()
# display the dataframe
print(dataframe)
Output:
id name cost quantity
item-4 foo-31 cereals 76.09 3.0
Summary
In this tutorial we discussed how to use dropna()
function to drop NaN /Null
values inside the dataframe. We considered all the parameters in dropna()
to drop rows and columns. We can see many applications with dropna() function. Data contains Missing values , so we have to process this data with out missing values, inorder to remove these missing values we will use dropna()
to remove these missing values. So we will get accurate results while processing the data. So we can also use this function for processing the large datasets in Machine Learning and Deep Learning .
References