When we are starting to write code in any programming language then we need to save the file at a location. For that purpose, we specify the path. Pathname consists of an extension of the file, file name, and file location. Like every programming language, Python also has file extensions known by (.py). All the python files are saved with the .py extension and other scripts are also converted into this extension. In this tutorial, we will discuss how to get file extensions in Python.
Different methods to get file extension in python
Here are the possible list of methods to get file extension from a file in Python:
- Using the
os.path
Module: Theos.path.splitext
function can be employed to split the file path into the root and the extension. - Using the
pathlib
Module: The modernpathlib
module provides an object-oriented approach to handle filesystem paths. Thesuffix
property of aPath
object can be used to get file extension in Python. - String Manipulation: Basic string operations, such as split or rfind, can be utilized to extract the file extension directly from the file name or path.
- Using the
mimetypes
Module: While primarily designed to determine the MIME type of a file, themimetypes.guess_extension
function can also assist in deducing the probable extension based on the file's MIME type. - Using rfind() Method: The
rfind
method can be used directly on strings to find the last occurrence of a specified value - Using rpartition() Method: The
rpartition()
method splits the string from the right, based on a specified separator. - Using os.path.basename():Â This function retrieves the final component (filename) of a path, which can then be combined with other methods to extract the file extension.
- Using pathlib.Path.stem():Â The
stem
property of aPath
object in thepathlib
module gives the file name without the suffix. It can be combined with other methods or properties to extract the file extension. - Leveraging Third-party Libraries: Libraries like
python-magic
can be used to not only detect the MIME type but also to get file extension in Python, especially when dealing with ambiguous file types.
1. Using the os.path
Module
The os.path
module, a subset of the broader os
module, offers various functions to facilitate interaction with the file system. Among these, the os.path.splitext
function stands out as an invaluable tool to get file extension in Python, effectively assisting in extracting the extension of a given file.
Syntax:
os.path.splitext(filepath)
Where filepath
is the path of the file for which you want to extract the extension.
Return Value: This function returns a tuple. The first element contains the path up to, but not including, the last period. The second element contains the file extension, beginning from the last period. If there is no extension, the second element will be an empty string.
1. Basic Use Case
import os
filepath = "/path/to/your/file/document.txt"
root, extension = os.path.splitext(filepath)
print("Root:", root)
print("Extension:", extension)
Output:
Root: /path/to/your/file/document
Extension: .txt
File Without Extension
import os
filepath = "/path/to/your/file/document"
root, extension = os.path.splitext(filepath)
print("Root:", root)
print("Extension:", extension)
Output
Root: /path/to/your/file/document
Extension:
File with Multiple Periods in Name
import os
filepath = "/path/to/your/file/document.v2.0.txt"
root, extension = os.path.splitext(filepath)
print("Root:", root)
print("Extension:", extension)
Output:
Root: /path/to/your/file/document.v2.0
Extension: .txt
2. Using the pathlib
Module
The pathlib
module, introduced in Python 3.4, offers an object-oriented approach to handling filesystem paths. It's a more modern alternative to the os.path
module and comes with several built-in methods and properties that simplify file and directory manipulation tasks. Among its features is the ability to extract file extensions, which is particularly useful when you aim to get file extension in Python.
The pathlib.Path
object has a suffix
property that returns the file extension of a given path.
Syntax:
Path(filepath).suffix
Where filepath
is the path of the file for which you want to get file extension in Python.
Basic Use Case
from pathlib import Path
filepath = Path("/path/to/your/file/document.txt")
extension = filepath.suffix
print("Extension:", extension)
Output:
Extension: .txt
File Without Extension
from pathlib import Path
filepath = Path("/path/to/your/file/document")
extension = filepath.suffix
print("Extension:", extension)
Output:
Extension:
File with Multiple Periods in Name
from pathlib import Path
filepath = Path("/path/to/your/file/document.v2.0.txt")
extension = filepath.suffix
print("Extension:", extension)
Output:
Extension: .txt
3. String Manipulation
In Python, working with strings often means you're not bound to seek refuge in external modules for basic undertakings. Harnessing the power of essential string manipulation tactics, one can conveniently extract file extensions.
The primary idea behind using string manipulation to get file extension in Python is to split the filename based on the period (.
) character and retrieve the last segment.
Syntax:
extension = filepath.split('.')[-1]
In this approach, filepath
represents the string containing the path or name of the file.
Basic Scenario
filepath = "/path/to/your/file/document.txt"
extension = filepath.split('.')[-1]
print("Extension:", extension)
Output:
Extension: txt
File Without an Extension
filepath = "/path/to/your/file/document"
extension = filepath.split('.')[-1] if '.' in filepath else ''
print("Extension:", extension)
Output:
Extension:
File with Multiple Periods
filepath = "/path/to/your/file/document.v2.0.txt"
extension = filepath.split('.')[-1]
print("Extension:", extension)
Output:
Extension: txt
4. Using the mimetypes
Module
In the world of files and media, MIME types (Multipurpose Internet Mail Extensions) are pivotal in specifying the nature of files and their associated data types. The mimetypes
module in Python is a handy tool that can be leveraged to map filenames to MIME types.
The mimetypes
module offers the guess_extension
method which, given a MIME type, returns a possible filename extension associated with that MIME type. This can be an indirect way to get file extension in Python, especially when the MIME type is known or can be inferred.
Syntax:
mimetypes.guess_extension(mime_type, strict=True)
Here, mime_type
denotes the MIME type for which you want to get file extension in Python.
Standard Scenario:
import mimetypes
mime_type = "image/jpeg"
extension = mimetypes.guess_extension(mime_type)
print("Extension for MIME type:", mime_type)
print("Extension:", extension)
Output:
Extension for MIME type: image/jpeg
Extension: .jpg
Non-standard MIME Type
import mimetypes
mime_type = "application/x-some-custom-type"
extension = mimetypes.guess_extension(mime_type)
print("Extension for MIME type:", mime_type)
print("Extension:", extension)
Output:
Extension for MIME type: application/x-some-custom-type
Extension: None
5. Using rfind() function
The rfind
method can be used directly on strings to find the last occurrence of a specified value, which can be useful to get file extension in Python.
string.rfind(value, start, end)
In this case, a value is an item whose last occurrence has to be returned. the start and end represent the starting and ending positions while searching the string. By default, the start value is 0, and the end value is the length of the string.
Now let us take an example and see how we can use the rfind()
method to find the file extension in Python. In this situation, we will call the rfind()
function using directoryrfind()
. Inside the rfind()
function, we will pass the dot ‘.’ as the value. We will save the index of the dot character into a variable named ‘index’. Then, we will print the string ‘directory’ from the 0th character to the value of ‘index’.See the python program below:
#declaring the directory
directory = '/Users/Programs/Directory/file.csv'
# splitting using . separator
index = directory.rfind(".")
# printing
print(directory[index:])
Output:
csv
Here, in the first line, we are declaring the directory using the directory variable. In the second line, we are using the "." operator to split and we are also using rfind()
function.
6. Using basename() function
The os.path.basename()
function can be used to retrieve the final component (filename) of a path. This can then be combined with other methods to extract the file extension. We should have to pass the complete pathname into the basename()
function. Using the basename()
function, we can get the base name of the file from the entire directory name.
Here is the syntax of the base function:
os.path.basename(path)
The output of ‘os.path.basename(directory)
’ will be ‘file.csv’. So, we will call the split function and pass the dot character as the separator. That will return a list containing [ ‘file’ , ‘csv’ ]. So we will print the first item of that list.
Now let us take an example and see how basename()
function works. See the example below:
# importing os
import os
# declaring directory
directory = '/Users/Programs/Directory/file.csv'
# printing output
print(os.path.basename(directory).split('.')[1])
Output:
csv
Here in the first line, we are importing the os module and in the second line, we are defining the directory and assigning it to the variable directory. In the third line, we are printing the file name using the basename()
function
7. Using the pathlib.Path.stem() function
With the pathlib
module, the stem
property of a Path
object gets the file name without the suffix. To get file extension in Python, one can use the suffix
property, but if we want to demonstrate using stem
. Using the stem property, we will get the file name without its extension.
The following is the simple syntax of the path method which is used to find the extension.
pathlib.path(filename).suffix
This method returns the file extension.
Initially, we have to import the pathlib module then we have to pass the ‘directory’ inside the pathlib.Path()
function. Before using pathlib module, make sure that you have installed it on your system. You can use pip command to install this module as shown below:
pip install pathlib
Once have successfully installed the pathlib library, then we can use it to find the file name and extension. Here we need to use the stem property. Because pathlib.Path().suffix
method of the Pathlib module can be used to extract the extension of the file path. This method will return the file name as shown below:
# importing pathlib module
import pathlib
# declaring directory
directory = '/Users/Programs/Directory/file.csv'
# defining filename using pathlib.Path() function
# get file extension in Python
filename = pathlib.Path(directory).stem
# printing
print(filename)
Output:
file
Here in the first line, we are importing the pathlib module and then we are declaring the directory using the variable directory. In the third line we are defining the filename using the function pathlib.Path()
. In the above example, we are using the stem as a suffix to extract the extension of the path. We can use the pathlib function without using the stem suffix. Given below is an example.
# importing the module
import pathlib
# using suffix
file_extension = pathlib.Path('file.csv').suffix
# printing
print("File Extension: ", file_extension)
Output:
File Extension: .csv
In this example, we are not using stem instead we are using a suffix to extract the extension of the path.
8. Using rpartition() method
Therpartition()
function splits a given string into three parts. One part will be the separator and the other two parts will be the strings to the left and the right side of the separator. The Syntax of the rpartition()
function is given below:
string.rpartition(separator)
Here,
- string: The string you want to partition.
- separator: Specifies the separator to use when splitting the string.
A tuple containing three elements:
- The part of the string before the separator.
- The separator itself.
- The part of the string after the separator.
Let us take an example and see how we can find the rpartition()
method to get file extension in Python. See the python program below
# declaring directory
directory = '/Users/Programs/Directory/file.csv'
# printing
print(directory.rpartition('.')[2])
Output:
csv
Here in the first line, we are declaring the directory using the variable directory. In the second line we are printing the directory using the rpartition() function.
9. Leveraging Third-party Libraries
Python's extensive ecosystem is adorned with numerous third-party libraries that cater to almost every conceivable requirement. When the built-in options may seem limited or if specialized functionality is required, these external libraries come to the rescue. To get file extension in Python, a few third-party libraries provide extended capabilities, efficiencies, and functionalities beyond what's available in the standard library.
1. filemagic
filemagic
provides a Pythonic way to interact with the Unix file
command. After identifying the file type, you can subsequently deduce the file's extension.
Installation:
pip install file-magic
Example:
import magic
file_info = magic.Magic().from_file('sample.jpg')
print(file_info)
# Output might be something like: "JPEG image data, JFIF standard 1.01"
2. python-magic
This is a library that's a wrapper around the libmagic
file type identification library.
Installation:
pip install python-magic
Example
import magic
mime_type = magic.from_file('sample.jpg', mime=True)
print(mime_type)
# Output: "image/jpeg"
After retrieving the MIME type, you can utilize the mimetypes
module or similar methods to get file extension in Python.
3. FileType
This library provides a mechanism to identify file types based on their magic numbers without relying on file extensions.
Installation:
pip install filetype
Example:
import filetype
kind = filetype.guess('sample.jpg')
if kind:
print('File extension:', kind.extension)
print('File MIME type:', kind.mime)
else:
print('Cannot guess file type!')
# Output:
# File extension: jpg
# File MIME type: image/jpeg
FileType
makes it pretty straightforward to get file extension in Python. Once the file type is recognized, the library readily provides the extension.
Summary
Extracting the file extension is a common task in many data processing and file management applications. In Python, there are various ways to accomplish this. Methods range from utilizing built-in modules like os.path
and pathlib
to simple string manipulations. Third-party libraries and MIME type guessers further enhance the capability to get file extension in Python, accommodating more complex scenarios and ambiguous file types. This article covers several methods, guiding the reader on choosing the most suitable technique for their specific needs.
Further Reading Section
Python file extension
Python file
File object in python