Introduction to Pandas melt() function
Pandas melt()
function is used to unpivot a DataFrame from wide to long format, optionally leaving identifiers set. A pivot table aggregates the values in a data set. In this tutorial, we’ll learn how to do the opposite: break an aggregated collection of data into an unaggregated one.
This function is useful to massage a DataFrame into a format where one or more columns are identifier variables (id_vars), while all other columns, considered measured variables (value_vars), are “unpivoted” to the row axis, leaving just two non-identifier columns, ‘variable’ and ‘value’.
Syntax:
pandas.melt(frame, id_vars=None, value_vars=None, var_name=None, value_name='value', col_level=None, ignore_index=True)
Parameters
- id_vars - tuple, list, or array, optional. Column(s) to use as identifier variables.
- value_vars - tuple, list, or array, optional. Column(s) to unpivot. If not specified, uses all columns that are not set as id_vars.
- var_name - scalar. Name to use for the ‘variable’ column. If None it uses
frame.columns.name
or ‘variable’. - value_name - scalar, default ‘value’. Name to use for the ‘value’ column.
- col_level - int or str, optional. If columns are a MultiIndex then use this level to melt.
- ignore_index - bool, default True. If True, original index is ignored. If False, the original index is retained. Index labels will be repeated as necessary.
What we will cover?
We will discuss the following:
- Set one column as identifier variable and one column as value variable
- Set one column as identifier variable and two columns as value variable
Create pandas DataFrame with example data
DataFrame is a data structure used to store the data in two dimensional format. It is similar to table that stores the data in rows and columns. Rows represents the records/tuples and columns refers to the attributes.
We can create the DataFrame by using pandas.DataFrame() method.
Syntax:
pandas.DataFrame(input_data,columns,index)
Parameters:
It will take mainly three parameters
- input_data is represents a list of data
columns
represent the columns names for the dataindex
represent the row numbers/values
We can also create a DataFrame using dictionary by skipping columns and indices.
Example: Python Program to create a dataframe for college data from a dictionary.
# import pandas
import pandas
# create dataframe with college data
dataframe = pandas.DataFrame({'Name': {0: 'sravan', 1: 'bobby', 2: 'deepak',3:'prasad'},
'Subjects': {0: 'PHP', 1: 'PHP', 2: 'dbms',3:'java'},
'Marks': {0: 89, 1: 90, 2: 93,3:57}})
# display dataframe
print(dataframe)
Output:
Name Subjects Marks
0 sravan PHP 89
1 bobby PHP 90
2 deepak dbms 93
3 prasad java 57
Scenario-1 : Set one column as identifier variable and one column as value variable
Here we are defining one variable as identifier and also one variable is value variable.
Example 1: Python program to set Name as identifier variable and Subjects as value variable
# import pandas
import pandas
# create dataframe with college data
dataframe = pandas.DataFrame({'Name': {0: 'sravan', 1: 'bobby', 2: 'deepak',3:'prasad'},
'Subjects': {0: 'PHP', 1: 'PHP', 2: 'dbms',3:'java'},
'Marks': {0: 89, 1: 90, 2: 93,3:57}})
# here Name is id_vars and Subjects is value_vars
print(pandas.melt(dataframe, id_vars =['Name'], value_vars =['Subjects']))
Output:
Name variable value
0 sravan Subjects PHP
1 bobby Subjects PHP
2 deepak Subjects dbms
3 prasad Subjects java
Example 2: Python program to set Name as identifier variable and Marks as value variable
# import pandas
import pandas
# create dataframe with college data
dataframe = pandas.DataFrame({'Name': {0: 'sravan', 1: 'bobby', 2: 'deepak',3:'prasad'},
'Subjects': {0: 'PHP', 1: 'PHP', 2: 'dbms',3:'java'},
'Marks': {0: 89, 1: 90, 2: 93,3:57}})
# here Name is id_vars and Marks is value_vars
print(pandas.melt(dataframe, id_vars =['Name'], value_vars =['Marks']))
Output:
Name variable value
0 sravan Marks 89
1 bobby Marks 90
2 deepak Marks 93
3 prasad Marks 57
Scenario-2 : Set one column as identifier variable and two columns as value variable
Here we are defining one variable as identifier and two variables as value variable.
Example : Python program to set Name as identifier variable and Subjects, Marks as value variable
# import pandas
import pandas
# create dataframe with college data
dataframe = pandas.DataFrame({'Name': {0: 'sravan', 1: 'bobby', 2: 'deepak',3:'prasad'},
'Subjects': {0: 'PHP', 1: 'PHP', 2: 'dbms',3:'java'},
'Marks': {0: 89, 1: 90, 2: 93,3:57}})
# here Name is id_vars and subjects , Marks is value_vars
print(pandas.melt(dataframe, id_vars =['Name'], value_vars =['Subjects','Marks']))
Output:
Name variable value
0 sravan Subjects PHP
1 bobby Subjects PHP
2 deepak Subjects dbms
3 prasad Subjects java
4 sravan Marks 89
5 bobby Marks 90
6 deepak Marks 93
7 prasad Marks 57
Summary
In this article we learned about Pandas and how it melts a DataFrame with the melt method. (Melting is the process of converting a wide data set to a narrow one.) The method accepts two primary parameters:
- The
id_vars
parameter sets the identifier column, the column for which the wide data set aggregates data. - The
value_vars
parameter accepts the column(s) whose values pandas will melt and store in a new column.
References