Using Pandas unique() with Series/DataFrame
unique()
function is used to get the distinct / unique data from Series/DataFrames. It can be applied on the columns in the dataframe.
The unique()
method returns a NumPy ndarray
of unique values from the Series
In this tutorial we will discuss how to use unique()
function with Series and DataFrame in Pandas.
Pandas Series.unique - Syntax:
data.unique()
where, data is Series
Pandas DataFrame.unique() - Syntax:
data['column'].unique()
where, data is the dataframe
and column is the name of the column in the dataframe to get the unique values
We will discuss the following scenarios
- Get unique values from Series
- Get unique values from particular columns in DataFrame
unique()
with'.'
operator in Seriespandas.unique()
1. Using Pandas Series.unique()
In this scenario, we can get the unique data from the given series.
Example 1: In this example, We are creating two series one with integer and another with strings and return the unique values from the first series.
# import pandas
import pandas
# creating the series -1
data1 = pandas.Series([10,20,30,50,60,10,20,0,45,20])
# get the unique values
print(data1.unique())
Output:
[10 20 30 50 60 0 45]
Example 2: In this example, We are creating two series one with integer and another with strings and return the unique values from the second series.
# import pandas
import pandas
# creating the series-2
data2 = pandas.Series(['Python','java','html','php','R','Python','java','html','php','R'])
# get the unique values
print(data2.unique())
Output:
['Python' 'java' 'html' 'php' 'R']
2. Using Pandas DataFrame with unique()
In this case, we have to create the pandas dataframe and get the unique values from the dataframe columns by specifying the column names.
Example 1: In this example, we are going to get the unique values from Subjects column in the dataframe. So the output will be all unique values from the Subjects column.
# import pandas
import pandas
# creating the DataFrame 1
data1 = pandas.DataFrame({'Subjects':['java','java','python','html','html','php'],'marks':[80,98,76,87,90,100]})
# get the unique values
print(data1['Subjects'].unique())
Output:
['java' 'python' 'html' 'php']
Example 2: In this example, we are going to get the unique values from marks column in the dataframe. So the output will be all unique values from the marks column.
# import pandas
import pandas
# creating the DataFrame 1
data1 = pandas.DataFrame({'Subjects':['java','java','python','html','html','php'],'marks':[80,98,76,87,90,100]})
# get the unique values
print(data1['marks'].unique())
Output:
[ 80 98 76 87 90 100]
3. Using unique() with '.' operator in Series
Here we are going to create a dataframe and get the unique values from different columns. We have to specify dot(.) operator for specifying the column.
Syntax:
dataframe.column.unique()
where,
- dataframe is the input dataframe
- column is the column name to get unique values from this column.
Example 1: In this example, we are going to get the unique values from Subjects column in the dataframe. So the output will be all unique values from the Subjects column through the "." dot operator.
# import pandas
import pandas
# creating the DataFrame 1
data1 = pandas.DataFrame({'Subjects':['java','java','python','html','html','php'],'marks':[80,98,76,87,90,100]})
# get the unique values
print(data1.Subjects.unique())
Output:
['java' 'python' 'html' 'php']
Example 2: In this example, we are going to get the unique values from marks column in the dataframe. So the output will be all unique values from the marks column through the "." dot operator.
# import pandas
import pandas
# creating the DataFrame 1
data1 = pandas.DataFrame({'Subjects':['java','java','python','html','html','php'],'marks':[80,98,76,87,90,100]})
# get the unique values
print(data1.marks.unique())
Output:
[ 80 98 76 87 90 100]
4. Using pandas.unique()
Return unique values based on a hash table. Uniques are returned in order of appearance. This does NOT sort. Significantly faster than numpy.unique
for long enough sequences. Includes NA values.
Syntax:
pandas.unique(dataframe['column'])
where,
- dataframe is the input dataframe
- column is the column name to get unique values from this column.
Example 1: In this example, we are going to get the unique values from Subjects column in the dataframe.
# import pandas
import pandas
# creating the DataFrame 1
data1 = pandas.DataFrame({'Subjects':['java','java','python','html','html','php'],'marks':[80,98,76,87,90,100]})
# get the unique values
print(pandas.unique(dataframe['Subjects']))
Output:
['java' 'python' 'html' 'php']
Example 2: In this example, we are going to get the unique values from marks column in the dataframe.
# import pandas
import pandas
# creating the DataFrame 1
data1 = pandas.DataFrame({'Subjects':['java','java','python','html','html','php'],'marks':[80,98,76,87,90,100]})
# get the unique values
print(pandas.unique(dataframe['marks']))
Output:
[ 80 98 76 87 90 100]
Summary
In this tutorial we discussed how to get the unique data from pandas Series and DataFrame using unique() function and we also several ways to get the unique data from the dataframe columns. Through the below implementations, we can het the unique values.
unique()
with'.'
operator in Seriespandas.unique()
unique()
function is applied on machine learning projects/applications to know the duplicate data, with this we will get the accurate results.
References