Table of Contents
Getting started with pandas to_datetime function
This function converts a scalar, array-like, Series or DataFrame/dict-like to a pandas datetime object. The function accepts an iterable object (such as a Python list, tuple, Series, or index), converts its values to datetimes, and returns the new values in a DatetimeIndex.
Syntax:
pandas.to_datetime( dayfirst=False, yearfirst=False, utc=None, format=None)
Parameters:
- dayfirst - It is a boolean value, that represents true or false, will get the day first when it is true.
- yearfirst - It is a boolean value, that represents true or false, will get the year first when it is true.
- utc - It is used to get the UTC based on the time provided
- format - It is used to format the string in the given format -
%d represents date. %m represents month and % y represents the year.
Example-1. Convert String to DateTime
We can take a simple date output string and convert it to datetime. Consider this example where I have defined a date and then converted it to datetime output:
import pandas as pd
# Define string
date = '04/03/2021 11:23'
# Convert string to datetime format
date1 = pd.to_datetime(date)
# print to_datetime output
print(date1)
# print day, month and year separately from the to_datetime output
print("Day: ", date1.day)
print("Month", date1.month)
print("Year", date1.year)
Output:
2021-03-04 11:23:00
Day: 4
Month 3
Year 2021
Example-2. Convert Series to DateTime
Here we have a Panda Series which we will convert to datetime format:
import pandas as pd
# Define Panda Series
times = pd.Series(["2021-01-25", "2021/01/08", "2021", "Jan 4th, 2022"])
# Print Series
print("Series: \n", times, "\n")
# Convert Series to datetime
print("datetime: \n", pd.to_datetime(times))
Output:
As you can see, our Series contains date in different format which are all converted into datetime format:
Series:
0 2021-01-25
1 2021/01/08
2 2021
3 Jan 4th, 2022
dtype: object
datetime:
0 2021-01-25
1 2021-01-08
2 2021-01-01
3 2022-01-04
dtype: datetime64[ns]
Example-3. Handling exceptions during datetime conversion
But what would happen if the Series contains normal text instead of datetime, in such case the to-datetime will raise exception. For example, I have updated my Series to pd.Series(["2021-01-25", "2021/01/08", "2021", "Hello World", "Jan 4th, 2022"])
When we try to convert this to_datetime, we get following exception:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/pandas/core/arrays/datetimes.py", line 2192, in objects_to_datetime64ns
values, tz_parsed = conversion.datetime_to_datetime64(data.ravel("K"))
File "pandas/_libs/tslibs/conversion.pyx", line 359, in pandas._libs.tslibs.conversion.datetime_to_datetime64
TypeError: Unrecognized value type: <class 'str'>
During handling of the above exception, another exception occurred:
So, to handle this we must use errors = 'coerce'
which will convert all section which to_datetime fails to convert to NaT i.e. Not A Time
Let's update our code:
import pandas as pd
# Define Panda Series
times = pd.Series(["2021-01-25", "2021/01/08", "2021", "Hello World", "Jan 4th, 2022"])
# Print Series
print("Series: \n", times, "\n")
# Convert Series to datetime
print("datetime: \n", pd.to_datetime(times, errors = 'coerce'))
Output:
As you can see, now the Hello World is replaced with NaT as to_datetime was unable to convert that field.
Series:
0 2021-01-25
1 2021/01/08
2 2021
3 Hello World
4 Jan 4th, 2022
dtype: object
datetime:
0 2021-01-25
1 2021-01-08
2 2021-01-01
3 NaT
4 2022-01-04
dtype: datetime64[ns]
Example-4. Convert Unix times to DateTime
A Unix represents is a way to store time in seconds, and I believe it represents the number of seconds since January 1st 1970, I think, at midnight. And so by storing the datetime as a number of seconds, it's very easy to convert that number of seconds into a specific date and time without running into any kind of formatting issues with dashes and slashes and all kinds of funky symbols.
import pandas as pd
# Define Panda Series
times = pd.Series([1349720105, 1349806505, 1349979305, 1350065705])
# Convert Series to datetime
print("datetime:\n", pd.to_datetime(times, unit = "s"))
Output:
datetime:
0 2012-10-08 18:15:05
1 2012-10-09 18:15:05
2 2012-10-11 18:15:05
3 2012-10-12 18:15:05
dtype: datetime64[ns]
Example-5. Using format with to_datetime
Now to_datetime will automatically identify the day, month and year but there may be situations where the provided nut may not be in standard format.
For example, I will define my date string in "%M-%D-%Y" format i.e. month-day-year. In such case, if we only want to access the month, then to_datetime()
may not be able to give proper data. So in such case we use .format
to define the format in which input has been provided to_datetime()
.
import pandas as pd
# Define string
date = '05/03/2021 11:23'
# Convert string to datetime and define the format
date1 = pd.to_datetime(date, format='%m/%d/%Y %H:%M')
# print to_datetime output
print(date1)
# print individual field
print("Day: ", date1.day)
print("Month", date1.month)
print("Year", date1.year)
Output:
2021-05-03 11:23:00
Day: 3
Month 5
Year 2021
Example-6. Convert range of date to DateTime
Example: In this example, we are converting the existing dataframe to datetime using to_datetime()
function.
# import pandas
import pandas
# create dates in the range with 12 and Hours
data= pandas.date_range('1/1/2022', periods = 12, freq ='H')
# display
dataframe = pandas.DataFrame(data,columns=['date'])
#convert to datetime
print(pandas.to_datetime(dataframe['date']))
Output:
0 2022-01-01 00:00:00
1 2022-01-01 01:00:00
2 2022-01-01 02:00:00
3 2022-01-01 03:00:00
4 2022-01-01 04:00:00
5 2022-01-01 05:00:00
6 2022-01-01 06:00:00
7 2022-01-01 07:00:00
8 2022-01-01 08:00:00
9 2022-01-01 09:00:00
10 2022-01-01 10:00:00
11 2022-01-01 11:00:00
Name: date, dtype: datetime64[ns]
Example-7. Change the format of to_datetime() output
We can use dt.strftime
to change the output format of to_datetime()
function.
The format starts with % symbol.
- %d represents date
- %m represents month
- %Y represents year
Example : In this example we are displaying the datetime in "%d-%m-%Y%I:%M %p" format.
import pandas as pd
# Define a dataframe
df = {"Country": ["IND", "CAL", "LON"],
"date": ["04/11/2022 09:13:55 AM", "05/10/2022 11:31:05 PM", "12/08/2022 08:00:00 AM"]}
df = pd.DataFrame(df)
# Define the format to be used. format1 is the format from to_datetime while format2 is the new output format
format1 ="%m/%d/%Y %I:%M:%S %p"
format2 = "%m-%d-%Y %H:%M:%S"
# Convert and store datetime in new format
df['date'] = pd.to_datetime(df['date'], format=format1).dt.strftime(format2)
# Print new datetime format
print(df)
Output:
Country date
0 IND 04-11-2022 09:13:55
1 CAL 05-10-2022 23:31:05
2 LON 12-08-2022 08:00:00
Example-8. Remove time from to_datetime() output (Print only date)
In this scenario, we will discuss how to remove time from the converted datetime. We have to mention dt.date
to get only date without time.
Syntax:
pandas.to_datetime(dataframe['column'].dt.date)
where,
- dataframe is the input dataframe
- column is the column name that includes datetime values
Example:
In this example, we are removing time from the datetime with to_datetime()
for the above dataframe
import pandas as pd
# Define a dataframe
df = {"Country": ["IND", "CAL", "LON"],
"date": ["04/11/2022 09:13:55 AM", "05/10/2022 11:31:05 PM", "12/08/2022 08:00:00 AM"]}
df = pd.DataFrame(df)
# Remove time and only store the date
df['date'] = pd.to_datetime(df['date']).dt.date
# Print date without time
print(df)
Output:
Country date
0 IND 2022-04-11
1 CAL 2022-05-10
2 LON 2022-12-08
Example-9. Parse month name with to_datetime()
Here we are converting month name to timestamp(date,time and hours) using to_datetime(). The input is month name followed by day and year.
Format:
Monthname day, year
Example:
In this example, we are parsing following month names with to_datetime() function.
# import module
import pandas
# convert month name datetime
print(pandas.to_datetime("January 5, 2022"))
# convert month name datetime
print(pandas.to_datetime("January 3, 2022"))
# convert month name datetime
print(pandas.to_datetime("May 5, 2022"))
# convert month name datetime
print(pandas.to_datetime("December 5, 2022"))
# convert month name datetime
print(pandas.to_datetime("July 24, 2022"))
Output:
2022-01-05 00:00:00
2022-01-03 00:00:00
2022-05-05 00:00:00
2022-12-05 00:00:00
2022-07-24 00:00:00
Example-10. Add timezone to_datetime() output
We can get the timezone using tz_localize()
method after converting the date data into datetime with to_datetime()
method.
Syntax:
dataframe.column.dt.tz_localize('zone_name')
where,
- dataframe is the input dataframe
- column refers to datetime column
- zone_name is the timezone - like asia/kolkata, UTC etc..
Example: Add timezone to_datetime() data
import pandas as pd
# Define a dataframe
df = {"Country": ["IND", "CAL", "LON"],
"date": ["04/11/2022 09:13:55 AM", "05/10/2022 11:31:05 PM", "12/08/2022 08:00:00 AM"]}
df = pd.DataFrame(df)
# Convert to_datetime and add timezone
df['date'] = pd.to_datetime(df['date']).dt.tz_localize('UTC')
# Print dataframe
print(df)
Output:
Country date
0 IND 2022-04-11 09:13:55+00:00
1 CAL 2022-05-10 23:31:05+00:00
2 LON 2022-12-08 08:00:00+00:00
Summary
In this tutorial we covered different examples of to_datetime()
function in Python Pandas. We covered following topics basically:
- Convert, series, strings and dataframe into DateTime Index using
to_datetime()
- Modify the output format of
to_datetime()
usingdt.strftime()
- Print only date from
to_datetime()
output (Remove time) - Access month, day and year field by using format with
to_datetime()
References