10 Ways to Get File Extension in Python [SOLVED]


Python

Author: Bashir Alam
Reviewer: Deepak Prasad

When we are starting to write code in any programming language then we need to save the file at a location. For that purpose, we specify the path. Pathname consists of an extension of the file, file name, and file location. Like every programming language, Python also has file extensions known by (.py). All the python files are saved with the .py extension and other scripts are also converted into this extension. In this tutorial, we will discuss how to get file extensions in Python.

 

Different methods to get file extension in python

Here are the possible list of methods to get file extension from a file in Python:

  1. Using the os.path Module: The os.path.splitext function can be employed to split the file path into the root and the extension.
  2. Using the pathlib Module: The modern pathlib module provides an object-oriented approach to handle filesystem paths. The suffix property of a Path object can be used to get file extension in Python.
  3. String Manipulation: Basic string operations, such as split or rfind, can be utilized to extract the file extension directly from the file name or path.
  4. Using the mimetypes Module: While primarily designed to determine the MIME type of a file, the mimetypes.guess_extension function can also assist in deducing the probable extension based on the file's MIME type.
  5. Using rfind() Method: The rfind method can be used directly on strings to find the last occurrence of a specified value
  6. Using rpartition() Method: The rpartition() method splits the string from the right, based on a specified separator.
  7. Using os.path.basename(): This function retrieves the final component (filename) of a path, which can then be combined with other methods to extract the file extension.
  8. Using pathlib.Path.stem(): The stem property of a Path object in the pathlib module gives the file name without the suffix. It can be combined with other methods or properties to extract the file extension.
  9. Leveraging Third-party Libraries: Libraries like python-magic can be used to not only detect the MIME type but also to get file extension in Python, especially when dealing with ambiguous file types.

 

1. Using the os.path Module

The os.path module, a subset of the broader os module, offers various functions to facilitate interaction with the file system. Among these, the os.path.splitext function stands out as an invaluable tool to get file extension in Python, effectively assisting in extracting the extension of a given file.

Syntax:

os.path.splitext(filepath)

Where filepath is the path of the file for which you want to extract the extension.

Return Value: This function returns a tuple. The first element contains the path up to, but not including, the last period. The second element contains the file extension, beginning from the last period. If there is no extension, the second element will be an empty string.

1. Basic Use Case

import os

filepath = "/path/to/your/file/document.txt"
root, extension = os.path.splitext(filepath)

print("Root:", root)
print("Extension:", extension)

Output:

Root: /path/to/your/file/document
Extension: .txt

File Without Extension

import os

filepath = "/path/to/your/file/document"
root, extension = os.path.splitext(filepath)

print("Root:", root)
print("Extension:", extension)

Output

Root: /path/to/your/file/document
Extension:

File with Multiple Periods in Name

import os

filepath = "/path/to/your/file/document.v2.0.txt"
root, extension = os.path.splitext(filepath)

print("Root:", root)
print("Extension:", extension)

Output:

Root: /path/to/your/file/document.v2.0
Extension: .txt

 

2. Using the pathlib Module

The pathlib module, introduced in Python 3.4, offers an object-oriented approach to handling filesystem paths. It's a more modern alternative to the os.path module and comes with several built-in methods and properties that simplify file and directory manipulation tasks. Among its features is the ability to extract file extensions, which is particularly useful when you aim to get file extension in Python.

The pathlib.Path object has a suffix property that returns the file extension of a given path.

Syntax:

Path(filepath).suffix

Where filepath is the path of the file for which you want to get file extension in Python.

Basic Use Case

from pathlib import Path

filepath = Path("/path/to/your/file/document.txt")
extension = filepath.suffix

print("Extension:", extension)

Output:

Extension: .txt

File Without Extension

from pathlib import Path

filepath = Path("/path/to/your/file/document")
extension = filepath.suffix

print("Extension:", extension)

Output:

Extension:

File with Multiple Periods in Name

from pathlib import Path

filepath = Path("/path/to/your/file/document.v2.0.txt")
extension = filepath.suffix

print("Extension:", extension)

Output:

Extension: .txt

 

3. String Manipulation

In Python, working with strings often means you're not bound to seek refuge in external modules for basic undertakings. Harnessing the power of essential string manipulation tactics, one can conveniently extract file extensions.

The primary idea behind using string manipulation to get file extension in Python is to split the filename based on the period (.) character and retrieve the last segment.

Syntax:

extension = filepath.split('.')[-1]

In this approach, filepath represents the string containing the path or name of the file.

Basic Scenario

filepath = "/path/to/your/file/document.txt"
extension = filepath.split('.')[-1]

print("Extension:", extension)

Output:

Extension: txt

File Without an Extension

filepath = "/path/to/your/file/document"
extension = filepath.split('.')[-1] if '.' in filepath else ''

print("Extension:", extension)

Output:

Extension:

File with Multiple Periods

filepath = "/path/to/your/file/document.v2.0.txt"
extension = filepath.split('.')[-1]

print("Extension:", extension)

Output:

Extension: txt

 

4. Using the mimetypes Module

In the world of files and media, MIME types (Multipurpose Internet Mail Extensions) are pivotal in specifying the nature of files and their associated data types. The mimetypes module in Python is a handy tool that can be leveraged to map filenames to MIME types.

The mimetypes module offers the guess_extension method which, given a MIME type, returns a possible filename extension associated with that MIME type. This can be an indirect way to get file extension in Python, especially when the MIME type is known or can be inferred.

Syntax:

mimetypes.guess_extension(mime_type, strict=True)

Here, mime_type denotes the MIME type for which you want to get file extension in Python.

Standard Scenario:

import mimetypes

mime_type = "image/jpeg"
extension = mimetypes.guess_extension(mime_type)

print("Extension for MIME type:", mime_type)
print("Extension:", extension)

Output:

Extension for MIME type: image/jpeg
Extension: .jpg

Non-standard MIME Type

import mimetypes

mime_type = "application/x-some-custom-type"
extension = mimetypes.guess_extension(mime_type)

print("Extension for MIME type:", mime_type)
print("Extension:", extension)

Output:

Extension for MIME type: application/x-some-custom-type
Extension: None

 

5. Using rfind() function

The rfind method can be used directly on strings to find the last occurrence of a specified value, which can be useful to get file extension in Python.

string.rfind(value, start, end)

In this case, a value is an item whose last occurrence has to be returned. the start and end represent the starting and ending positions while searching the string. By default, the start value is 0, and the end value is the length of the string.

Now let us take an example and see how we can use the rfind() method to find the file extension in Python. In this situation, we will call the rfind() function using directoryrfind(). Inside the rfind() function, we will pass the dot ‘.’ as the value. We will save the index of the dot character into a variable named ‘index’. Then, we will print the string ‘directory’ from the 0th character to the value of ‘index’.See the python program below:

#declaring the directory
directory = '/Users/Programs/Directory/file.csv' 
# splitting using . separator
index = directory.rfind(".") 
# printing
print(directory[index:])

Output:

csv

Here, in the first line, we are declaring the directory using the directory variable. In the second line, we are using the "." operator to split and we are also using rfind() function.

 

6. Using basename() function

The os.path.basename() function can be used to retrieve the final component (filename) of a path. This can then be combined with other methods to extract the file extension. We should have to pass the complete pathname into the basename() function. Using the basename() function, we can get the base name of the file from the entire directory name.

Here is the syntax of the base function:

os.path.basename(path)

The output of ‘os.path.basename(directory)’ will be ‘file.csv’. So, we will call the split function and pass the dot character as the separator. That will return a list containing [ ‘file’ , ‘csv’ ]. So we will print the first item of that list.

Now let us take an example and see how basename() function works. See the example below:

# importing os
import os 
# declaring directory
directory = '/Users/Programs/Directory/file.csv' 
 # printing output
print(os.path.basename(directory).split('.')[1])

Output:

csv

Here in the first line, we are importing the os module and in the second line, we are defining the directory and assigning it to the variable directory. In the third line, we are printing the file name using the basename() function

 

7. Using the pathlib.Path.stem() function

With the pathlib module, the stem property of a Path object gets the file name without the suffix. To get file extension in Python, one can use the suffix property, but if we want to demonstrate using stem. Using the stem property, we will get the file name without its extension.

The following is the simple syntax of the path method which is used to find the extension.

pathlib.path(filename).suffix

This method returns the file extension.

Initially, we have to import the pathlib module then we have to pass the ‘directory’ inside the pathlib.Path() function. Before using pathlib module, make sure that you have installed it on your system. You can use pip command to install this module as shown below:

pip install pathlib

Once have successfully installed the pathlib library, then we can use it to find the file name and extension.  Here we need to use the stem property.  Because pathlib.Path().suffix method of the Pathlib module can be used to extract the extension of the file path. This method will return the file name as shown below:

# importing pathlib module
import pathlib 
# declaring directory
directory = '/Users/Programs/Directory/file.csv' 
# defining filename using pathlib.Path() function
# get file extension in Python
filename = pathlib.Path(directory).stem 
# printing
print(filename)

Output:

file

Here in the first line, we are importing the pathlib module and then we are declaring the directory using the variable directory. In the third line we are defining the filename using the function pathlib.Path(). In the above example, we are using the stem as a suffix to extract the extension of the path. We can use the pathlib function without using the stem suffix. Given below is an example.

# importing the module
import pathlib
# using suffix
file_extension = pathlib.Path('file.csv').suffix 
# printing
print("File Extension: ", file_extension)

Output:

File Extension: .csv

In this example, we are not using stem instead we are using a suffix to extract the extension of the path.

 

8. Using rpartition() method

Therpartition()function splits a given string into three parts. One part will be the separator and the other two parts will be the strings to the left and the right side of the separator. The Syntax of the rpartition() function is given below:

string.rpartition(separator)

Here,

  • string: The string you want to partition.
  • separator: Specifies the separator to use when splitting the string.

A tuple containing three elements:

  1. The part of the string before the separator.
  2. The separator itself.
  3. The part of the string after the separator.

Let us take an example and see how we can find the rpartition() method to get file extension in Python. See the python program below

# declaring directory
directory = '/Users/Programs/Directory/file.csv'
# printing
print(directory.rpartition('.')[2])

Output:

csv

Here in the first line, we are declaring the directory using the variable directory. In the second line we are printing the directory using the rpartition() function.

 

9. Leveraging Third-party Libraries

Python's extensive ecosystem is adorned with numerous third-party libraries that cater to almost every conceivable requirement. When the built-in options may seem limited or if specialized functionality is required, these external libraries come to the rescue. To get file extension in Python, a few third-party libraries provide extended capabilities, efficiencies, and functionalities beyond what's available in the standard library.

1. filemagic

filemagic provides a Pythonic way to interact with the Unix file command. After identifying the file type, you can subsequently deduce the file's extension.

Installation:

pip install file-magic

Example:

import magic

file_info = magic.Magic().from_file('sample.jpg')
print(file_info)
# Output might be something like: "JPEG image data, JFIF standard 1.01"

 

2. python-magic

This is a library that's a wrapper around the libmagic file type identification library.

Installation:

pip install python-magic

Example

import magic

mime_type = magic.from_file('sample.jpg', mime=True)
print(mime_type)
# Output: "image/jpeg"

After retrieving the MIME type, you can utilize the mimetypes module or similar methods to get file extension in Python.

 

3. FileType

This library provides a mechanism to identify file types based on their magic numbers without relying on file extensions.

Installation:

pip install filetype

Example:

import filetype

kind = filetype.guess('sample.jpg')
if kind:
    print('File extension:', kind.extension)
    print('File MIME type:', kind.mime)
else:
    print('Cannot guess file type!')

# Output:
# File extension: jpg
# File MIME type: image/jpeg

FileType makes it pretty straightforward to get file extension in Python. Once the file type is recognized, the library readily provides the extension.

 

Summary

Extracting the file extension is a common task in many data processing and file management applications. In Python, there are various ways to accomplish this. Methods range from utilizing built-in modules like os.path and pathlib to simple string manipulations. Third-party libraries and MIME type guessers further enhance the capability to get file extension in Python, accommodating more complex scenarios and ambiguous file types. This article covers several methods, guiding the reader on choosing the most suitable technique for their specific needs.

 

Further Reading Section

Python file extension
Python file
File object in python

 

 

Views: 131
Bashir Alam

Bashir Alam

He is a Computer Science graduate from the University of Central Asia, currently employed as a full-time Machine Learning Engineer at uExel. His expertise lies in Python, Java, Machine Learning, OCR, text extraction, data preprocessing, and predictive models. You can connect with him on his LinkedIn profile.

Can't find what you're searching for? Let us assist you.

Enter your query below, and we'll provide instant results tailored to your needs.

If my articles on GoLinuxCloud has helped you, kindly consider buying me a coffee as a token of appreciation.

Buy GoLinuxCloud a Coffee

For any other feedbacks or questions you can send mail to admin@golinuxcloud.com

Thank You for your support!!

Leave a Comment