Table of Contents
Getting started with dataframe.to_csv()
We can simply use dataframe.to_csv
to convert pandas dataframe to CSV, but we can further customise this and add additional options to save the CSV file in different format such as:
- Converting without Index
- Converting without header
- Convert in append mode
- Convert by setting index column name
- Convert only specific columns
Different supported options to convert dataframe to CSV
Here are the list of different options which are supported with pandas.dataframe.to_csv
function used to convert a dataframe to CSV format:
Scenario | Argument | Example |
---|---|---|
Change default separator | sep | df.to_csv(..., sep=';') |
Write without index | index | df.to_csv(..., index=False) |
Add index label | index_label | df.to_csv(..., index_label='NAME') |
Write without headers | header | df.to_csv(..., header=False) |
Write subset of columns | columns | df.to_csv(..., columns=['x', 'y']) |
File format encoding | encoding | df.to_csv(..., encoding='utf-8') |
Compress CSV | compression | df.to_csv(..., compression='gzip') |
Specify format for datetime | date_format | df.to_csv(..., date_format='%d/%m%Y') |
Save NaNs as "N/A" | na_rep | df.to_csv(..., na_rep='N/A') |
Some important points:
- The default separator is assumed to be a comma ('
,
'). Don't change this unless you know you need to. - By default, the index of
df
is written as the first column. If your DataFrame does not have an index (IOW, thedf.index
is the defaultRangeIndex
), then you will want to setindex=False
when writing. To explain this in a different way, if your data DOES have an index, you can (and should) useindex=True
or just leave it out completely (as the default is True). - It would be wise to set this parameter if you are writing string data so that other applications know how to read your data. This will also avoid any potential
UnicodeEncodeErrors
you might encounter while saving. - Compression is recommended if you are writing large DataFrames (>100K rows) to disk as it will result in much smaller output files. OTOH, it will mean the write time will increase (and consequently, the read time since the file will need to be decompressed).
Method 1 : Convert Pandas DataFrame to CSV
In this method we are going to convert pandas dataframe to csv using to_csv()
with out specifying any parameters.
Syntax:
dataframe.to_csv('file.csv')
where,
- dataframe is the input dataframe
- file is the file name for the csv created from the dataframe.
Example: Python program to convert dataframe to csv
# import the module
import pandas
# consider the food data
food_input={'id':['foo-23','foo-13','foo-02','foo-31'],
'name':['ground-nut oil','almonds','flour','cereals'],
'cost':[567.00,562.56,67.00,76.09],
'quantity':[1,2,3,2]}
# pass this food to the dataframe by specifying rows
dataframe=pandas.DataFrame(food_input,index = ['item-1', 'item-2', 'item-3', 'item-4'])
# convert pandas dataframe to csv
dataframe.to_csv("converted.csv")
Output:
,id,name,cost,quantity
item-1,foo-23,ground-nut oil,567.0,1
item-2,foo-13,almonds,562.56,2
item-3,foo-02,flour,67.0,3
item-4,foo-31,cereals,76.09,2
Method 2 : Convert dataframe to csv without Index
In this method we are going to convert pandas dataframe to csv using to_csv()
by specifying index
and header parameter.
csv will not read index and header from the dataframe , if they set to False
Syntax:
dataframe.to_csv('file.csv', index=False)
where,
- dataframe is the input dataframe
- file is the file name for the csv created from the dataframe.
index
is the parameter used to set or remove the index. By default index is True- header is the parameter used to set or remove the header. By default header is True
Example 1: Python program to convert dataframe to csv without index parameter
# import the module
import pandas
# consider the food data
food_input={'id':['foo-23','foo-13','foo-02','foo-31'],
'name':['ground-nut oil','almonds','flour','cereals'],
'cost':[567.00,562.56,67.00,76.09],
'quantity':[1,2,3,2]}
# pass this food to the dataframe by specifying rows
dataframe=pandas.DataFrame(food_input,index = ['item-1', 'item-2', 'item-3', 'item-4'])
# convert pandas dataframe to csv without index
dataframe.to_csv("converted.csv", index=False)
Output:
~]# cat converted.csv
id,name,cost,quantity
foo-23,ground-nut oil,567.0,1
foo-13,almonds,562.56,2
foo-02,flour,67.0,3
foo-31,cereals,76.09,2
Method 3 : Convert dataframe to csv without header
In this method we are going to convert pandas dataframe to csv using to_csv()
by specifying header
parameter.
csv will not read column names from the dataframe , if it set to False.
Syntax:
dataframe.to_csv('file.csv', header=False)
where,
- dataframe is the input dataframe
- file is the file name for the csv created from the dataframe.
header
is the parameter used to remove the column names . By default header is True.
Example: Python program to convert dataframe to csv without header
# import the module
import pandas
# consider the food data
food_input={'id':['foo-23','foo-13','foo-02','foo-31'],
'name':['ground-nut oil','almonds','flour','cereals'],
'cost':[567.00,562.56,67.00,76.09],
'quantity':[1,2,3,2]}
# pass this food to the dataframe by specifying rows
dataframe=pandas.DataFrame(food_input,index = ['item-1', 'item-2', 'item-3', 'item-4'])
# convert pandas dataframe to csv without header
dataframe.to_csv("converted.csv",header=False)
Output:
~]# cat converted.csv
item-1,foo-23,ground-nut oil,567.0,1
item-2,foo-13,almonds,562.56,2
item-3,foo-02,flour,67.0,3
item-4,foo-31,cereals,76.09,2
Method 4 : Convert dataframe to csv in append mode
Here we are appending the panads dataframe to csv with mode parameter in append mode - 'a
'
Syntax:
dataframe.to_csv('file.csv',mode = 'a')
where,
- dataframe is the input dataframe
- file is the file name for the csv created from the dataframe.
mode
with'a'
defines the append mode.
Example: Python program to convert dataframe to csv by appending csv data
# import the module
import pandas
# consider the food data
food_input={'id':['foo-23','foo-13','foo-02','foo-31'],
'name':['ground-nut oil','almonds','flour','cereals'],
'cost':[567.00,562.56,67.00,76.09],
'quantity':[1,2,3,2]}
# pass this food to the dataframe by specifying rows
dataframe=pandas.DataFrame(food_input,index = ['item-1', 'item-2', 'item-3', 'item-4'])
# convert pandas dataframe to csv in append mode
dataframe.to_csv("converted.csv",mode='a')
Output:
~]# cat converted.csv
item-1,foo-23,ground-nut oil,567.0,1
item-2,foo-13,almonds,562.56,2
item-3,foo-02,flour,67.0,3
item-4,foo-31,cereals,76.09,2
,id,name,cost,quantity
item-1,foo-23,ground-nut oil,567.0,1
item-2,foo-13,almonds,562.56,2
item-3,foo-02,flour,67.0,3
item-4,foo-31,cereals,76.09,2
Method 5 : Convert dataframe to csv by setting index column name
Here we are setting the column name for index through index_label
parameter along with header parameter. It will take index column name value and set as index column in the csv file and set header to False .
Syntax:
dataframe.to_csv('file.csv',index_label='column')
where,
- dataframe is the input dataframe
- file is the file name for the csv created from the dataframe.
index_label
takes column name for index
Example: Python program to convert dataframe to csv by specifying index label as 'Index col'
# import the module
import pandas
# consider the food data
food_input={'id':['foo-23','foo-13','foo-02','foo-31'],
'name':['ground-nut oil','almonds','flour','cereals'],
'cost':[567.00,562.56,67.00,76.09],
'quantity':[1,2,3,2]}
# pass this food to the dataframe by specifying rows
dataframe=pandas.DataFrame(food_input,index = ['item-1', 'item-2', 'item-3', 'item-4'])
# convert pandas dataframe to csv by specifying name for index column
dataframe.to_csv("converted.csv", index_label='Index col')
Output:
~]# cat converted.csv
Index col,id,name,cost,quantity
item-1,foo-23,ground-nut oil,567.0,1
item-2,foo-13,almonds,562.56,2
item-3,foo-02,flour,67.0,3
item-4,foo-31,cereals,76.09,2
Method 6 : Converting only specific columns
Here we will convert only particular columns to csv and do not display headers by using columns and header parameters.
Syntax:
dataframe.to_csv('file.csv',columns=['column',.........])
where,
- dataframe is the input dataframe
- file is the file name for the csv created from the dataframe.
columns
take list of column names
In this example, we are converting only id column without headers
dataframe.to_csv("converted.csv",index=False,columns=['id'])# import the module
import pandas
# consider the food data
food_input={'id':['foo-23','foo-13','foo-02','foo-31'],
'name':['ground-nut oil','almonds','flour','cereals'],
'cost':[567.00,562.56,67.00,76.09],
'quantity':[1,2,3,2]}
# pass this food to the dataframe by specifying rows
dataframe=pandas.DataFrame(food_input,index = ['item-1', 'item-2', 'item-3', 'item-4'])
# convert specific column of pandas dataframe to csv
dataframe.to_csv("converted.csv", index=False, columns=['id'])
Output:
~]# cat converted.csv
id
foo-23
foo-13
foo-02
foo-31
Method-7: Convert dataframe to CSV with a different separator instead of comma
By default pandas to_csv will convert a dataframe to CSV using comma as the separator which is the most common delimiter in CSV files. But if you wish to select a different separator then you can use following syntax:
dataframe.to_csv('file.csv', sep='<SEPARTOR>')
Replace <SEPARATOR> with the type of separator you wish to use as delimiter in your CSV file.
For Example, I will update my existing code to use TAB as the separator instead of comma(,
):
# import the module
import pandas
# consider the food data
food_input={'id':['foo-23','foo-13','foo-02','foo-31'],
'name':['ground-nut oil','almonds','flour','cereals'],
'cost':[567.00,562.56,67.00,76.09],
'quantity':[1,2,3,2]}
# pass this food to the dataframe by specifying rows
dataframe=pandas.DataFrame(food_input,index = ['item-1', 'item-2', 'item-3', 'item-4'])
# convert pandas dataframe to csv with custom separator
dataframe.to_csv("converted.csv", sep="\t")
Output:
~]# cat converted.csv
id name cost quantity
item-1 foo-23 ground-nut oil 567.0 1
item-2 foo-13 almonds 562.56 2
item-3 foo-02 flour 67.0 3
item-4 foo-31 cereals 76.09 2
Method-8: Convert dataframe to CSV in compressed format
We can also compress the CSV while converting from a pandas dataframe to CSV. FOllow the below syntax to achieve the same:
dataframe.to_csv('file.csv', compression='gzip')
Let's update our existing example to perform the conversion along with compression.
# import the module
import pandas
# consider the food data
food_input={'id':['foo-23','foo-13','foo-02','foo-31'],
'name':['ground-nut oil','almonds','flour','cereals'],
'cost':[567.00,562.56,67.00,76.09],
'quantity':[1,2,3,2]}
# pass this food to the dataframe by specifying rows
dataframe=pandas.DataFrame(food_input,index = ['item-1', 'item-2', 'item-3', 'item-4'])
# convert dataframe to CSV in compressed format
dataframe.to_csv("converted.csv.gz", compression='gzip')
Output:
[root@centos8-1 ~]# ls -l converted.csv.gz
-rw-r--r-- 1 root root 146 Feb 5 17:01 converted.csv.gz
[root@centos8-1 ~]# gunzip converted.csv.gz
[root@centos8-1 ~]# cat converted.csv
,id,name,cost,quantity
item-1,foo-23,ground-nut oil,567.0,1
item-2,foo-13,almonds,562.56,2
item-3,foo-02,flour,67.0,3
item-4,foo-31,cereals,76.09,2
Method-9: Use float_format with while converting dataframe to CSV
When you are working with float numbers, it is possible that the number of floats after decimal point may be very high which you want to limit to a certain number. We can achieve this by using float_format
with pandas.dataframe.csv as shown in the following syntax:
dataframe.to_csv('file.csv', float_format='%FORMAT')
where,
- dataframe is the input dataframe
- file is the file name for the csv created from the dataframe.
float_format
will take float value to round of n value in the format - '%.nf'
Here we are generating a 3 x 4 NumPy array after seeding the random generator in the following code snippet.
# import Numpy and Pandas
import numpy as np
import pandas as pd
# seeding
np.random.seed(42)
# randam number generator
data = np.random.randn(3, 4)
data[2][2] = np.nan
# Convert to pandas dataframe
print(pd.DataFrame(data))
Here we get following dataframe:
0 1 2 3
0 0.496714 -0.138264 0.647689 1.523030
1 -0.234153 -0.234137 1.579213 0.767435
2 -0.469474 0.542560 NaN -0.465730
Now, let's control the number of floating points to upto two numbers using float_format='%.2f' and save the output to a CSV file:
# import Numpy and Pandas
import numpy as np
import pandas as pd
# seeding
np.random.seed(42)
# randam number generator
data = np.random.randn(3, 4)
data[2][2] = np.nan
# Convert to pandas dataframe
df = pd.DataFrame(data)
# Write to CSV
df.to_csv('converted.csv', float_format='%.2f')
Output:
~]# cat converted.csv
,0,1,2,3
0,0.50,-0.14,0.65,1.52
1,-0.23,-0.23,1.58,0.77
2,-0.47,0.54,,-0.47
Method-10. Change NaN to different name when converting a dataframe to CSV
By default, any empty cell will be marked as NaN when we are printing a pandas dataframe. We have an option to replace such empty cell with a custom name when converting to CSV using nan_type
option.
Let me update my last code to replace NaN with NULL text:
# import Numpy and Pandas
import numpy as np
import pandas as pd
# seeding
np.random.seed(42)
# randam number generator
data = np.random.randn(3, 4)
data[2][2] = np.nan
# Convert to pandas dataframe
df = pd.DataFrame(data)
# Write to CSV
df.to_csv('converted.csv', float_format='%.2f', na_rep="NULL")
Output:
~]# cat converted.csv
,0,1,2,3
0,0.50,-0.14,0.65,1.52
1,-0.23,-0.23,1.58,0.77
2,-0.47,0.54,NULL,-0.47
Summary
In this tutorial we discussed how to convert pandas dataframe to csv using to_csv()
method with different options. We can use various options with pandas.dataframe.to_csv
to format our content when writing into CSV file such as
- Choosing a different separator
- Disable index and header
- Using a different Index column name
- Converting a specific column
- Replacing NaN with a different value
Working with CSV is best incase of data analysis and machine learning. It is used to load the data and It has some methods and functions such that we can read the data from csv and load the data into csv, where we have seen converting. loading pandas dataframe to csv.
References
Pandas - to_csv()
Writing a pandas DataFrame to CSV file