Table of Contents
Different methods to read_csv into pandas DataFrame
In this tutorial we will discuss about read_csv()
. This function is used to convert the csv file data into pandas dataframe. We will learn to convert csv file to pandas dataframe with different parameters of read_csv()
function
- Import csv to pandas DataFrame using
read_csv()
read_csv()
with first row as headerread_csv()
with custom indexread_csv()
with new column namesread_csv()
with skip rows- Read first N rows from csv to pandas DataFrame
- Import specific columns from csv to pandas DataFrame using
read_csv()
read_csv()
from absolute pathread_csv()
from relative path
Sample CSV content
Since this article is all about converting CSV to pandas dataframe, so we will take the below sample.csv file. This CSV file will be used for all our methods and examples explanation in this tutorial:
,id,name,cost,quantity
item-1,foo-23,ground-nut oil,567.0,1
item-2,foo-13,almonds,562.56,2
item-3,foo-02,flour,67.0,3
item-4,foo-31,cereals,76.09,2
Scenario-1 : Import csv to pandas DataFrame using read_csv()
Here we are going to consider the csv file from the above and import the csv data into the pandas dataframe by specifying no parameters inside read_csv()
function.
Syntax:
pandas.read_csv("file_name.csv")
where, file_name is the name of the input csv file.
Example: In this example, we are going to import csv to pandas dataframe
# import pandas
import pandas
#read the csv
dataframe=pandas.read_csv("sample.csv")
#display the dataframe
print(dataframe)
Output:
Unnamed: 0 id name cost quantity
0 item-1 foo-23 ground-nut oil 567.00 1
1 item-2 foo-13 almonds 562.56 2
2 item-3 foo-02 flour 67.00 3
3 item-4 foo-31 cereals 76.09 2
Scenario-2 : read_csv() with first row as header
Here we are going to consider the csv file from the above and import the csv data into the pandas dataframe by specifying no parameters inside read_csv()
function. It will get the header for first row
.
Syntax:
pandas.read_csv("file_name.csv")
where, file_name is the name of the input csv file.
Example: In this example, we are going to import csv to pandas dataframe
# import pandas
import pandas
#read the csv
dataframe=pandas.read_csv("sample.csv")
#display the dataframe
print(dataframe)
Output:
Unnamed: 0 id name cost quantity
0 item-1 foo-23 ground-nut oil 567.00 1
1 item-2 foo-13 almonds 562.56 2
2 item-3 foo-02 flour 67.00 3
3 item-4 foo-31 cereals 76.09 2
Scenario-3 : read_csv() with custom index
Here we are going to specify the custom index from the existing columns in the csv file through index_col
parameter.
Syntax:
pandas.read_csv("file_name.csv",index_col='column')
where,
- file_name is the name of the input csv file.
index_col
will take column name as input
Example: In this example, we are going to import csv to pandas dataframe with quantity as custom index
# import pandas
import pandas
#read the csv
dataframe=pandas.read_csv("sample.csv",index_col='quantity')
#display the dataframe
print(dataframe)
Output:
Unnamed: 0 id name cost
quantity
1 item-1 foo-23 ground-nut oil 567.00
2 item-2 foo-13 almonds 562.56
3 item-3 foo-02 flour 67.00
2 item-4 foo-31 cereals 76.09
Scenario-4 : read_csv() with new column names
Here we are going to specify the new column names
from the existing columns in the csv file through names parameter to the pandas dataframe. It will take list of column names
Syntax:
pandas.read_csv("file_name.csv",names=[columns])
where,
- file_name is the name of the input csv file.
- columns will be the list of columns
Example: In this example, we are going to import csv to pandas dataframe with new columns
# import pandas
import pandas
#read the csv
dataframe=pandas.read_csv("sample.csv",names=['ID','NAME','COST','QUANTITY'])
#display the dataframe
print(dataframe)
Output:
ID NAME COST QUANTITY
NaN id name cost quantity
item-1 foo-23 ground-nut oil 567.0 1
item-2 foo-13 almonds 562.56 2
item-3 foo-02 flour 67.0 3
item-4 foo-31 cereals 76.09 2
Scenario-5 : read_csv() with skip rows
In this method , we are using skiprows
parameter to remove top n rows into the pandas dataframe.
Syntax:
pandas.read_csv("file_name.csv",skiprows)
where,
- file_name is the name of the input csv file.
skiprows
will take number of rows to be skipped.
Example 1: In this example, we are going to import csv to pandas dataframe by skipping 2 rows
# import pandas
import pandas
#read the csv
dataframe=pandas.read_csv("sample.csv",skiprows=2)
#display the dataframe
print(dataframe)
Output:
item-2 foo-13 almonds 562.56 2
0 item-3 foo-02 flour 67.00 3
1 item-4 foo-31 cereals 76.09 2
Example 2: In this example, we are going to import csv to pandas dataframe by skipping 4 rows
# import pandas
import pandas
#read the csv
dataframe=pandas.read_csv("sample.csv",skiprows=4)
#display the dataframe
print(dataframe)
Output:
Empty DataFrame
Columns: [item-4, foo-31, cereals, 76.09, 2]
Index: []
Scenario-6 : Read first N rows from csv to pandas DataFrame
nrows
is the parameter used to return first n rows to the pandas dataframe from the csv file .
Syntax:
pandas.read_csv("file_name.csv",nrows)
where,
- file_name is the name of the input csv file.
nrows
will take number of rows to be returned from the top of csv.
Example 1: In this example, we are going to import csv to pandas dataframe by returning top 2 rows
# import pandas
import pandas
#read the csv
dataframe=pandas.read_csv("sample.csv",nrows=2)
#display the dataframe
print(dataframe)
Output:
Unnamed: 0 id name cost quantity
0 item-1 foo-23 ground-nut oil 567.00 1
1 item-2 foo-13 almonds 562.56 2
Example 2: In this example, we are going to import csv to pandas dataframe by returning top 4 rows
# import pandas
import pandas
#read the csv
dataframe=pandas.read_csv("sample.csv",nrows=4)
#display the dataframe
print(dataframe)
Output:
Unnamed: 0 id name cost quantity
0 item-1 foo-23 ground-nut oil 567.00 1
1 item-2 foo-13 almonds 562.56 2
2 item-3 foo-02 flour 67.00 3
3 item-4 foo-31 cereals 76.09 2
Scenario-7 : Import specific columns from csv to pandas DataFrame using read_csv()
columns
is the parameter used to get only particular to the pandas dataframe from the csv file .It will take list of columns
Syntax:
pandas.DataFrame(dataframe,columns=[columns])
where, columns will take list of columns and dataframe is resulted from the csv file
Example 1: In this example, we are going to import csv to pandas dataframe by taking id and name
# import pandas
import pandas
#read the csv
dataframe=pandas.read_csv("sample.csv")
#display the dataframe
print(pandas.DataFrame(dataframe,columns=['id','name']))
Output:
id name
0 foo-23 ground-nut oil
1 foo-13 almonds
2 foo-02 flour
3 foo-31 cereals
Example 2: In this example, we are going to import csv to pandas dataframe by taking only cost.
# import pandas
import pandas
#read the csv
dataframe=pandas.read_csv("sample.csv")
#display the dataframe
print(pandas.DataFrame(dataframe,columns=['cost']))
Output:
cost
0 567.00
1 562.56
2 67.00
3 76.09
Scenario-8 : read_csv() using absolute path
We can also read csv from the path (path in out local system) inplace of file_name. Any way we can specify the file_name along with the path. It will take all the parameters as same as the above
Syntax:
pandas.read_csv("path//......file_name.csv")
where, file_name is the name of the input csv file.
Example: In this example, we are going to import csv to pandas dataframe taken file from following path:
# ls -l /root/sample.csv -rw-r--r-- 1 root root 148 Jan 26 23:42 /root/sample.csv
While our script also lies in the same path:
]# ls -l /root/eg-1.py -rw-r--r-- 1 root root 136 Jan 27 14:50 /root/eg-1.py
So this will be our script with absolute path of CSV file:
# import pandas
import pandas
# read the csv
dataframe=pandas.read_csv("<code class="language-">/root/sample.csv
") # display the dataframe print(dataframe)
Output:
Unnamed: 0 id name cost quantity
0 item-1 foo-23 ground-nut oil 567.00 1
1 item-2 foo-13 almonds 562.56 2
2 item-3 foo-02 flour 67.00 3
3 item-4 foo-31 cereals 76.09 2
Scenario-9 : read_csv() from relative path
We can also read csv from the relative path inplace of file_name. Any way we can specify the file_name along with the path. It will take all the parameters as same as the above. we have to mention r
for relative path
Syntax:
pandas.read_csv(r"path//......file_name.csv")
where, file_name is the name of the input csv file.
Example: In this example, we are going to import csv to pandas dataframe using relative path.
Now we will move our script to /tmp
]# ls -l /tmp/eg-1.py -rw-r--r-- 1 root root 136 Jan 27 14:50 /tmp/eg-1.py
And try to access /root/sample.csv
using relative path:
# import pandas
import pandas
# read the csv
dataframe=pandas.read_csv("../root/sample.csv")
# display the dataframe
print(dataframe)
Output:
Unnamed: 0 id name cost quantity
0 item-1 foo-23 ground-nut oil 567.00 1
1 item-2 foo-13 almonds 562.56 2
2 item-3 foo-02 flour 67.00 3
3 item-4 foo-31 cereals 76.09 2
Summary
In this article, we discussed how to read a csv into the pandas dataframe using read_csv()
function for following scenarios:
- read_csv() with first row as header
- read_csv() with custom index
- read_csv() with new column names
- read_csv() with skip rows
- Read first N rows from csv to pandas DataFrame
- Import specific columns from csv to pandas DataFrame using read_csv()
- read_csv() from absolute path
- read_csv() from relative path
Reference