How to convert DataFrame to CSV [Practical Examples]

Getting started with dataframe.to_csv()

We can simply use dataframe.to_csv to convert pandas dataframe to CSV, but we can further customise this and add additional options to save the CSV file in different format  such as:

  • Converting without Index
  • Converting without header
  • Convert in append mode
  • Convert by setting index column name
  • Convert only specific columns

 

Different supported options to convert dataframe to CSV

Here are the list of different options which are supported with pandas.dataframe.to_csv function used to convert a dataframe to CSV format:

Advertisement
Scenario Argument Example
Change default separator sep df.to_csv(..., sep=';')
Write without index index df.to_csv(..., index=False)
Add index label index_label df.to_csv(..., index_label='NAME')
Write without headers header df.to_csv(..., header=False)
Write subset of columns columns df.to_csv(..., columns=['x', 'y'])
File format encoding encoding df.to_csv(..., encoding='utf-8')
Compress CSV compression df.to_csv(..., compression='gzip')
Specify format for datetime date_format df.to_csv(..., date_format='%d/%m%Y')
Save NaNs as "N/A" na_rep df.to_csv(..., na_rep='N/A')

Some important points:

  • The default separator is assumed to be a comma (','). Don't change this unless you know you need to.
  • By default, the index of df is written as the first column. If your DataFrame does not have an index (IOW, the df.index is the default RangeIndex), then you will want to set index=False when writing. To explain this in a different way, if your data DOES have an index, you can (and should) use index=True or just leave it out completely (as the default is True).
  • It would be wise to set this parameter if you are writing string data so that other applications know how to read your data. This will also avoid any potential UnicodeEncodeErrors you might encounter while saving.
  • Compression is recommended if you are writing large DataFrames (>100K rows) to disk as it will result in much smaller output files. OTOH, it will mean the write time will increase (and consequently, the read time since the file will need to be decompressed).

 

Method 1 : Convert Pandas DataFrame to CSV

In this method we are going to convert pandas dataframe to csv using to_csv() with out specifying any parameters.

Syntax:

dataframe.to_csv('file.csv')

where,

  1. dataframe is the input dataframe
  2. file is the file name for  the csv created from the dataframe.

 

Example: Python program to convert dataframe to csv

# import the module
import pandas

# consider the food data
food_input={'id':['foo-23','foo-13','foo-02','foo-31'],
                  'name':['ground-nut oil','almonds','flour','cereals'],
                  'cost':[567.00,562.56,67.00,76.09],
                  'quantity':[1,2,3,2]}

# pass this food to the dataframe by specifying rows 
dataframe=pandas.DataFrame(food_input,index = ['item-1', 'item-2', 'item-3', 'item-4'])

# convert pandas dataframe to csv
dataframe.to_csv("converted.csv")

Output:

Advertisement
,id,name,cost,quantity
item-1,foo-23,ground-nut oil,567.0,1
item-2,foo-13,almonds,562.56,2
item-3,foo-02,flour,67.0,3
item-4,foo-31,cereals,76.09,2

 

Method 2 : Convert dataframe to csv without Index

In this method we are going to convert pandas dataframe to csv using to_csv() by specifying index and header parameter.

csv will not read index and header  from the dataframe , if they set to False

Syntax:

dataframe.to_csv('file.csv', index=False)

where,

  1. dataframe is the input dataframe
  2. file is the file name for  the csv created from the dataframe.
  3. index is the parameter used to set or remove the index. By default index is True
  4. header is the parameter used to set or remove the header. By default header is True

 

Example 1: Python program to convert dataframe to csv without index parameter

# import the module
import pandas

# consider the food data
food_input={'id':['foo-23','foo-13','foo-02','foo-31'],
                  'name':['ground-nut oil','almonds','flour','cereals'],
                  'cost':[567.00,562.56,67.00,76.09],
                  'quantity':[1,2,3,2]}

# pass this food to the dataframe by specifying rows 
dataframe=pandas.DataFrame(food_input,index = ['item-1', 'item-2', 'item-3', 'item-4'])

# convert pandas dataframe to csv without index
dataframe.to_csv("converted.csv", index=False)

Output:

Advertisement
 ~]# cat converted.csv
id,name,cost,quantity
foo-23,ground-nut oil,567.0,1
foo-13,almonds,562.56,2
foo-02,flour,67.0,3
foo-31,cereals,76.09,2

 

Method 3 : Convert dataframe to csv without header

In this method we are going to convert pandas dataframe to csv using to_csv() by specifying header parameter.

csv will not read column names from the dataframe , if it set to False.

Syntax:

dataframe.to_csv('file.csv', header=False)

where,

  1. dataframe is the input dataframe
  2. file is the file name for  the csv created from the dataframe.
  3. header is the parameter used to  remove the column names . By default header is True.

 

Example: Python program to convert dataframe to csv without header

# import the module
import pandas

# consider the food data
food_input={'id':['foo-23','foo-13','foo-02','foo-31'],
                  'name':['ground-nut oil','almonds','flour','cereals'],
                  'cost':[567.00,562.56,67.00,76.09],
                  'quantity':[1,2,3,2]}

# pass this food to the dataframe by specifying rows 
dataframe=pandas.DataFrame(food_input,index = ['item-1', 'item-2', 'item-3', 'item-4'])

# convert pandas dataframe to csv without header
dataframe.to_csv("converted.csv",header=False)

Output:

Advertisement
~]# cat converted.csv
item-1,foo-23,ground-nut oil,567.0,1
item-2,foo-13,almonds,562.56,2
item-3,foo-02,flour,67.0,3
item-4,foo-31,cereals,76.09,2

 

Method 4 : Convert dataframe to csv in append mode

Here we are appending the panads dataframe to csv with mode parameter in append mode - 'a'

Syntax:

dataframe.to_csv('file.csv',mode = 'a')

where,

  1. dataframe is the input dataframe
  2. file is the file name for  the csv created from the dataframe.
  3. mode with 'a' defines the append mode.

Example: Python program to convert dataframe to csv by appending csv data

# import the module
import pandas

# consider the food data
food_input={'id':['foo-23','foo-13','foo-02','foo-31'],
                  'name':['ground-nut oil','almonds','flour','cereals'],
                  'cost':[567.00,562.56,67.00,76.09],
                  'quantity':[1,2,3,2]}

# pass this food to the dataframe by specifying rows 
dataframe=pandas.DataFrame(food_input,index = ['item-1', 'item-2', 'item-3', 'item-4'])

# convert pandas dataframe to csv in append mode
dataframe.to_csv("converted.csv",mode='a')

Output:

~]# cat converted.csv
item-1,foo-23,ground-nut oil,567.0,1
item-2,foo-13,almonds,562.56,2
item-3,foo-02,flour,67.0,3
item-4,foo-31,cereals,76.09,2
,id,name,cost,quantity
item-1,foo-23,ground-nut oil,567.0,1
item-2,foo-13,almonds,562.56,2
item-3,foo-02,flour,67.0,3
item-4,foo-31,cereals,76.09,2

 

Method 5 : Convert dataframe to csv by setting index column name

Here we are setting the column name for index through index_label parameter along  with header parameter. It will take index column name value and set as index column in the csv file and set header to False .

Advertisement

Syntax:

dataframe.to_csv('file.csv',index_label='column') 

where,

  1. dataframe is the input dataframe
  2. file is the file name for  the csv created from the dataframe.
  3. index_label takes column name for index

Example: Python program to convert dataframe to csv by specifying index label as 'Index col'

# import the module
import pandas

# consider the food data
food_input={'id':['foo-23','foo-13','foo-02','foo-31'],
                  'name':['ground-nut oil','almonds','flour','cereals'],
                  'cost':[567.00,562.56,67.00,76.09],
                  'quantity':[1,2,3,2]}

# pass this food to the dataframe by specifying rows 
dataframe=pandas.DataFrame(food_input,index = ['item-1', 'item-2', 'item-3', 'item-4'])

# convert pandas dataframe to csv by specifying name for index column
dataframe.to_csv("converted.csv", index_label='Index col')

Output:

~]# cat converted.csv
Index col,id,name,cost,quantity
item-1,foo-23,ground-nut oil,567.0,1
item-2,foo-13,almonds,562.56,2
item-3,foo-02,flour,67.0,3
item-4,foo-31,cereals,76.09,2

 

Method 6 : Converting only specific columns

Here we will convert only particular columns to csv and do not display headers by using columns and header parameters.

Syntax:

Advertisement
dataframe.to_csv('file.csv',columns=['column',.........]) 

where,

  1. dataframe is the input dataframe
  2. file is the file name for  the csv created from the dataframe.
  3. columns take list of column names

In this example, we are converting only id column without headers


dataframe.to_csv("converted.csv",index=False,columns=['id'])# import the module
import pandas

# consider the food data
food_input={'id':['foo-23','foo-13','foo-02','foo-31'],
                  'name':['ground-nut oil','almonds','flour','cereals'],
                  'cost':[567.00,562.56,67.00,76.09],
                  'quantity':[1,2,3,2]}

# pass this food to the dataframe by specifying rows 
dataframe=pandas.DataFrame(food_input,index = ['item-1', 'item-2', 'item-3', 'item-4'])

# convert specific column of pandas dataframe to csv
dataframe.to_csv("converted.csv", index=False, columns=['id'])

Output:

~]# cat converted.csv
id
foo-23
foo-13
foo-02
foo-31

 

Method-7: Convert dataframe to CSV with a different separator instead of comma

By default pandas to_csv will convert a dataframe to CSV using comma as the separator which is the most common delimiter in CSV files. But if you wish to select a different separator then you can use following syntax:

dataframe.to_csv('file.csv', sep='<SEPARTOR>')

Replace <SEPARATOR> with the type of separator you wish to use as delimiter in your CSV file.

For Example, I will update my existing code to use TAB as the separator instead of comma(,):

Advertisement
# import the module
import pandas

# consider the food data
food_input={'id':['foo-23','foo-13','foo-02','foo-31'],
                  'name':['ground-nut oil','almonds','flour','cereals'],
                  'cost':[567.00,562.56,67.00,76.09],
                  'quantity':[1,2,3,2]}

# pass this food to the dataframe by specifying rows
dataframe=pandas.DataFrame(food_input,index = ['item-1', 'item-2', 'item-3', 'item-4'])

# convert pandas dataframe to csv with custom separator
dataframe.to_csv("converted.csv", sep="\t")

Output:

~]# cat converted.csv
        id      name    cost    quantity
item-1  foo-23  ground-nut oil  567.0   1
item-2  foo-13  almonds 562.56  2
item-3  foo-02  flour   67.0    3
item-4  foo-31  cereals 76.09   2

 

Method-8: Convert dataframe to CSV in compressed format

We can also compress the CSV while converting from a pandas dataframe to CSV. FOllow the below syntax to achieve the same:

dataframe.to_csv('file.csv', compression='gzip')

Let's update our existing example to perform the conversion along with compression.

# import the module
import pandas

# consider the food data
food_input={'id':['foo-23','foo-13','foo-02','foo-31'],
                  'name':['ground-nut oil','almonds','flour','cereals'],
                  'cost':[567.00,562.56,67.00,76.09],
                  'quantity':[1,2,3,2]}

# pass this food to the dataframe by specifying rows
dataframe=pandas.DataFrame(food_input,index = ['item-1', 'item-2', 'item-3', 'item-4'])

# convert dataframe to CSV in compressed format
dataframe.to_csv("converted.csv.gz", compression='gzip')

Output:

[root@centos8-1 ~]# ls -l converted.csv.gz
-rw-r--r-- 1 root root 146 Feb  5 17:01 converted.csv.gz

[root@centos8-1 ~]# gunzip converted.csv.gz

[root@centos8-1 ~]# cat converted.csv
,id,name,cost,quantity
item-1,foo-23,ground-nut oil,567.0,1
item-2,foo-13,almonds,562.56,2
item-3,foo-02,flour,67.0,3
item-4,foo-31,cereals,76.09,2

 

Method-9: Use float_format with while converting dataframe to CSV

When you are working with float numbers, it is possible that the number of floats after decimal point may be very high which you want to limit to a certain number. We can achieve this by using float_format with pandas.dataframe.csv as shown in the following syntax:

dataframe.to_csv('file.csv', float_format='%FORMAT')

where,

  1. dataframe is the input dataframe
  2. file is the file name for  the csv created from the dataframe.
  3. float_format will take float value to round of n value in the format - '%.nf'

Here we are generating a 3 x 4 NumPy array after seeding the random generator in the following code snippet.

# import Numpy and Pandas
import numpy as np
import pandas as pd

# seeding
np.random.seed(42)

# randam number generator
data = np.random.randn(3, 4)
data[2][2] = np.nan

# Convert to pandas dataframe
print(pd.DataFrame(data))

Here we get following dataframe:

          0         1         2         3
0  0.496714 -0.138264  0.647689  1.523030
1 -0.234153 -0.234137  1.579213  0.767435
2 -0.469474  0.542560       NaN -0.465730

Now, let's control the number of floating points to upto two numbers using float_format='%.2f' and save the output to a CSV file:

# import Numpy and Pandas
import numpy as np
import pandas as pd

# seeding
np.random.seed(42)

# randam number generator
data = np.random.randn(3, 4)
data[2][2] = np.nan

# Convert to pandas dataframe
df = pd.DataFrame(data)

# Write to CSV
df.to_csv('converted.csv', float_format='%.2f')

Output:

 ~]# cat converted.csv
,0,1,2,3
0,0.50,-0.14,0.65,1.52
1,-0.23,-0.23,1.58,0.77
2,-0.47,0.54,,-0.47

 

Method-10. Change NaN to different name when converting a dataframe to CSV

By default, any empty cell will be marked as NaN when we are printing a pandas dataframe. We have an option to replace such empty cell with a custom name when converting to CSV using nan_type option.

Let me update my last code to replace NaN with NULL text:

# import Numpy and Pandas
import numpy as np
import pandas as pd

# seeding
np.random.seed(42)

# randam number generator
data = np.random.randn(3, 4)
data[2][2] = np.nan

# Convert to pandas dataframe
df = pd.DataFrame(data)

# Write to CSV
df.to_csv('converted.csv', float_format='%.2f', na_rep="NULL")

Output:

~]# cat converted.csv
,0,1,2,3
0,0.50,-0.14,0.65,1.52
1,-0.23,-0.23,1.58,0.77
2,-0.47,0.54,NULL,-0.47

 

Summary

In this tutorial we discussed how to convert pandas dataframe to csv using to_csv() method with different options. We can use various options with pandas.dataframe.to_csv to format our content when writing into CSV file such as

  • Choosing a different separator
  • Disable index and header
  • Using a different Index column name
  • Converting a specific column
  • Replacing NaN with a different value

Working with CSV is best incase of data analysis and machine learning. It is used to load the data and It has some methods and functions such that we can read the data from csv and load the data into csv, where we have seen converting. loading pandas dataframe to csv.

 

References

Pandas - to_csv()
Writing a pandas DataFrame to CSV file

 

Didn't find what you were looking for? Perform a quick search across GoLinuxCloud

If my articles on GoLinuxCloud has helped you, kindly consider buying me a coffee as a token of appreciation.

Buy GoLinuxCloud a Coffee

For any other feedbacks or questions you can either use the comments section or contact me form.

Thank You for your support!!

Leave a Comment

X