Introduction to CSV files
CSV stands for comma separated values. It is one of most common files that is used to transfer and store data. It mostly stores tabular data ( numbers and text) in plain text. Each line in CSV files records data. And each record consists of one or more than one field, separated by commas. The ability to read, manipulate, and write data to and from CSV files using Python is a key skill to master for any data scientist or business analyst. In this tutorial, we will cover everything that you need to master with CSV files using Python. We will see how Python helps us to create, write and read CSV files. We will also cover some basic libraries of Python that are very powerful for working with CSV files.
Python open() CSV file
Python has a built-in function open()
to open files. This function returns a file object which is then used to read or modify accordingly. We can use this function to also open a CSV file type.
See the example below:
>> f = open(“myfile.csv”)
>> f = open(myfile.text”)
If the file is not in the current directory, the we have to provide full path to open it.
>> f = open (“C:\Users\Documents\CSVfile\myfile.csv”)
Different modes to open CSV file in Python
We can also specify the mode of opening the file. Specifying mode means, specifying whether we want to read "r"
, write “w”
or append “a”
to the file. Python also provides a way to specify whether we want to open a file in binary mode or text mode.
If we will not provide any mode explicitly, then by default, it will be read in text mode. On the other hand binary mode returns bytes and this mode is used to deal with non-texted files.
The following table specifies the symbols used for different mode:
"r"
opens file in reading mode (default as well)."w"
opens file for writing and create new one if it does not exist."x"
opens file for exclusive creation. If the file already exists, the operation fails."t"
opens file in text mode( default as well)."b"
opens file in binary mode."a"
opens file for appending data at the end of file without removing existing data. If file does not exist, it creates a new one."+"
use to update file ( writing and reading)
See the examples below:
f = open(“myfile.csv”) ## This is the same as “rt” mode.
f = open(“myfile.csv”, “w”) ## write in text mode
f = open(“myimage.png”, “r+b”) ## read and write in binary mode
Python read CSV
reader
Object in Python is used to read CSV files. Firstly, Python built-in function open()
, is used to open a CSV file as a text file, and then pass it to the reader, which then reads the file.
The example below will help us to understand reading CSV in more details. Let say we have csv file named myfile.csv
which contains the following data:
No, Name, Gender
1, Erlan, Male
2, Alex, Male
3, soro, Female
4, Khan, Male
Now let's see python code to open and read this csv file.
import csv
## opens csv file
with open('myfile.csv','r')as f:
data = csv.reader(f) ## Read the csv file and store content of this file into data variable
## Use for loop to iterate over the content in data variable
for row in data:
print(row)
In this python example, we used open()
function to open a csv file. Then csv.reader
is used to read the file and returns an iterable reader object. Python for loop is then used to iterate over the object and print out content in each row.
Output:
Notice that the contents in a row are printed in the form of a list.
Python Write CSV Examples
Writing a csv file in Python is pretty easy. Firstly, we have to open the file in writing “w”
mode by using open()
function. Then we have to create a CSV writer object by calling python built-in function writer()
. Now we can write data into our CSV file by calling writerow()
or writerows()
functions.
Example-1: Overwrite CSV File Content
In this example we will overwrite the content of our myfile.csv
. This example can also be used to write a new CSV file but an empty CSV file should be present for writing. Let’s add the following data into our myfile.csv
5,dilover,Male
Here is a sample python code to overwrite the content into our CSV file. We are using write
mode to write into the CSV file:
import csv
## define new list of data
newStudent = ["5", "dilover", "Male"]
## open csv file in writing mode
with open("myfile.csv", "w") as f:
## stores the data in variable data
data = csv.writer(f)
## add data to csv file
data.writerow(newStudent)
This will add the following data in our csv file but at the same time will remove all the previous data. When we prints the csv file, we gets.
~]# cat myfile.csv
5,dilover,Male
Example-2: Append content into CSV file
Since we used write
mode in the previous example, the existing data of the CSV file was overwritten. To avoid this and retain the previous data, we have to open the file in append
mode:
See the example below:
newStudent = ["5", "dilover", "Male"]
## open csv file in append mode
with open("myfile.csv", "a") as f:
## store data in variable
data = csv.writer(f)
## add data to csv file
data.writerow(newStudent)
Python close() CSV file
It is always a good practice to close a CSV file after opening it. Python uses close()
function to close CSV file as shown below.
newStudent = ["5", "dilover", "Male"]
## opening csv file in append mode
with open("myfile.csv", "a") as f:
## stores data in variable
data = csv.writer(f)
## add data to csv file
data.writerow(newStudent)
## close the opened csv file
f.close()
Pandas and CSV files
Pandas is a powerful and flexible Python package, that help us to work with series of data. It also help us to show our data graphically, contains many powerful statistic methods and many more. One of the important features of pandas is its ability to write and read excel and CSV files. In the following series, we will see how we can use pandas to open, write and read CSV files using pandas.
Example-1: Reading CSV file using pandas
First we have to import the pandas library. If it is not installed, you can install it using the pip
command. The following example shows how to read a CSV file using pandas.
import pandas as pd
## read myfile.csv which is in the same directory
csvFile = pd.read_csv('myfile.csv')
Pandas will search the file myfile.csv
. read_csv()
function takes one argument which is the name or the full path of the file and other optional arguments as well. We can use the head()
function to see if everything is imported correctly or not.
import pandas as pd
## csv file is in the same directory
csvFile = pd.read_csv('myfile.csv')
## print the data in csv file
print(csvFile.head())
ModuleNotFoundError: No module named 'pandas'
, it means pandas module is not available on your server. Use following command to install pandas
module:
~]$ python3 -m pip install pandas
Replace python3 with the your default python binary
Output:
Example-2: How to read specific row from CSV file in Python
Pandas provides us with a more powerful feature by letting us to select and import n
number of rows instead of importing the whole CSV file. This feature is helpful when we need only a limited number of data from a huge file.
read.csv()
function takes an optional argument to print n number of rows. See the example below which prints only the first two rows of the myfile.csv
.
import pandas as pd
## selecting first 2 rows
csvFile = pd.read_csv('myfile.csv', nrows=2)
## printing the selected rows
print(csvFile.head())
Output:
~]# python3 eg-3.py
No Name Gender
0 1 Erlan Male
1 2 Alex Male
Example-3: How to delete selected rows of CSV file in Python
Not only from the top, but pandas also helps us to print the rows from the middle of the data as well. We can also define the range of rows in read.csv()
to import only rows from a limited range.
See the example which imports only the second and and forth row from myfile.csv
and eliminates heading and third row.
import pandas as pd
## skiping rows
csvFile = pd.read_csv('myfile.csv',skiprows=(0, 2), nrows=2)
## printing the data
print(csvFile.head())
Output:
~]# python3 eg-3.py
1 Erlan Male
0 3 soro Female
1 4 Khan Male
Example-4: How to change header of CSV file in Python
We can change the header name in a CSV file in Python using the read_csv()
function. We can provide the list of names as arguments which will act as the new Header. See the example below to understand custom naming of header in CSV file.
import pandas as pd
## giving custom names to columns
csvFile = pd.read_csv('myfile.csv',skiprows=(0, 1), names=["Number", "FirstName", "G"])
print(csvFile.head())
Output:
Number FirstName G
0 2 Alex Male
1 3 soro Female
2 4 Khan Male
Example-5: How to avoid Python/Pandas creating an index in a CSV File
You may have noticed that a new column with the index number is generated by pandas. This is the default behaviour while readin CSV file in Python. We can remove this auto indexing by making Index to False
.
See the below example which disables the auto indexing.
import pandas as pd
<i>## giving custom names to columns</i>
csvFile = pd.read_csv('myfile.csv',skiprows=(0, 1), names=["Number", "FirstName", "G"])
<i># csvFile.reset_index(drop=False)</i>
print(csvFile.to_string(index=False))
Output:
Example-6: How to read CSV file from URL in Python
Another powerful feature that pandas provides is reading csv files from URL. We can use the read_csv()
function and pass the URL instead of the path of the file. The following example demonstrates reading a CSV file from URL.
import pandas as pd
## importing data from given url
df = pd.read_csv("url of csv file goes here")
df.head()
Example-7: How to read specific column from CSV file in Python
We can import specific columns from a csv file by using df[colun_name]
. First we have to specify a list consisting of names of the columns and then we can use df[column_name]
to get access to a specific column.
See the example below, which shows the simple syntax to read only the third column of the CSV file:
import pandas as pd
## reading csv file from same directory
csvFile = pd.read_csv('myfile.csv', names=["Number", "FirstName", "Gender"])
## this prints the specific column of CSV file
print(csvFile["Gender"])
Output:
~]# python3 eg-6.py
0 Gender
1 Male
2 Male
3 Female
4 Male
Name: Gender, dtype: object
This example will only print the second column of the CSV file:
import pandas as pd
## reading csv file from same directory
csvFile = pd.read_csv('myfile.csv', names=["Number", "FirstName", "Gender"])
## this prints the specific column
print(csvFile["FirstName"])
Output:
~]# python3 eg-6.py
0 Name
1 Erlan
2 Alex
3 soro
4 Khan
Name: FirstName, dtype: object
Example-8: How to access specific rows of CSV file using .loc()
In pandas we can get data from a specific row using its function .loc()
. We pass Index position of rows in an integer or list of integers to the function as a parameter. This function returns a data frame of series of data depending on the parameter.
See the example below which prints only the one row.
import pandas as pd
## reading csv file from same directory
csvFile = pd.read_csv('myfile.csv')
## creating a DataFrame
df = pd.DataFrame(csvFile)
##printing specific row
disp(df.loc[[2]])
Output:
~]# python3 eg-5.py
No Name Gender
2 3 soro Female
Numpy and CSV
Numpy is another powerful package in python that is mostly used by data scientists and machine learning engineers to deal with big and large data. Numpy in python makes it easier to deal with CSV files. In the following sections we will see how we can use numpy to open and read csv files.
Example-1: Opening CSV file using Numpy
numpy.loadtxt()
function is used to load data from files. We can read data from a CSV file using this function and store it in a Numpy array.. See the example below which provides the basic syntax to open a CSV file using Numpy.
import numpy as np
## opening csv file using numpy
data = np.loadtxt("myfile.csv", dtype=str)
print(data)
Output:
~]# python3 eg-1.py
[['No,' 'Name,' 'Gender']
['1,' 'Erlan,' 'Male']
['2,' 'Alex,' 'Male']
['3,' 'soro,' 'Female']
['4,' 'Khan,' 'Male']]
In the above example, you can see that we have explicitly defined data type to string because our CSV file contains data in string form. If we will not provide the dtype explicitly, the by default numpy will treat the data as float and we will get an error.
Example-2: Python module to read CSV file to Numpy array
We can read csv files using the CSV module in python which stores data in a list. Then we can convert this list into a numpy array and perform the useful features that numpy provides us.
See example below to understand how we can read a CSV file in a python module and store data in a numpy array.
import csv
import numpy as np
## open csv file from same directory
with open('myfile.csv', 'r') as f:
## read csv file and store in variable data
data = list(csv.reader(f, delimiter=";"))
## data is converted to numpy array
data = np.array(data)
print(data)
Output:
~]# python3 eg-2.py
[['No, Name, Gender']
['1, Erlan, Male']
['2, Alex, Male']
['3, soro, Female']
['4, Khan, Male']]
How to define delimiter to read CSV file in Python
A delimiter separates columns from each other in a CSV file to form a tabular data set. Common CSV delimiters are space and comma. If we create a csv file or write in a csv file, then we can use a delimiter to make distinction between columns.
See the following example which demonstrate the use of delimiter in Python.
## opens csv file and store data in variable
file_object = open("myfile.csv", "w")
## defining delimiter
writer = csv.writer(file_object, delimiter = ",")
## adding data to csv file
writer.writerow(["a","b"])
## close opened csv file
file_object.close()
Pandas and CSV delimiter
An optional parameter sep
in read_csv()
is used to specify the delimiter. The default delimiter in the dataset is comma, that means if we will not specify the delimiter explicitly, python will use comma as delimiter. We can specify delimiter other than comma by using sep
parameter;
See the example below which reads csv file having semicolons as delimiter.
df = pd.read_csv("myfile.csv", sep = ';')
A vertical bar delimited file can be read by:
df = pd.read_csv("myfile.csv", sep = '|')
Summary
In this article, we learned about opening, reading and writing in a CSV file using Python. We also learned about python modules which are used to write, read and open CSV files. Moreover, we cover some useful Python packages including pandas and numpy and learn some of the useful features that these packages provide. This article also provides examples along with explanation and give full understanding of CSV files in python
Further Readings
Python CSV
Pandas CSV
Use Numpy to read CSV
all the examples have html tags
Thank you for highlighting this. I have fixed the code tags