How to extract specific file, link & folder from tar.gz


Tips and Tricks

In this tutorial, we'll explore how to extract specific files, folders, and symbolic links from a tar.gz archive, using straightforward command-line techniques suitable for beginners.

Before diving into the commands, it's important to understand what a tar.gz file is:

  • Tar: Short for "tape archive", it's a file format and a command-line utility used for collecting multiple files into a single archive file (.tar). It's often used in Unix and Linux environments.
  • Gzip: A compression method to reduce the size of files. When a tar file is compressed with gzip, it gets a .tar.gz or .tgz extension.

 

Steps to extract specific file(s), link(s) and directories from archive

You'll need a Unix-like environment (Linux, macOS, FreeBSD, etc.) or a Windows system with tools like Git Bash, Cygwin, or WSL (Windows Subsystem for Linux) installed. The tar utility should be pre-installed in most Unix-like environments.

 

1. Open Terminal or Command Line Interface

  • Open your command-line interface (CLI).
  • Navigate to the directory containing your tar.gz file using the cd command. For example, if your file is in the Downloads folder, you'd type cd Downloads.

 

2. Listing Contents of the Archive

Before extracting, you might want to see what's inside the archive. As you are required to give the complete path of the file or directory you intend to extract.

Use the command: tar -tzvf filename.tar.gz.

  • t tells tar to list contents.
  • z is for gzip compressed files.
  • v is for verbose mode, showing details.
  • f specifies that you're working with a file.

For example:

# tar -tzvf demo-archive.tar.gz
drwx------ root/root         0 2024-01-22 12:10 demo-archive/
lrwxrwxrwx root/root         0 2024-01-22 12:10 demo-archive/link_to_file1.txt -> file1.txt
-rw------- root/root         0 2024-01-22 12:10 demo-archive/file2.txt
-rw------- root/root         0 2024-01-22 12:10 demo-archive/file1.txt
drwx------ root/root         0 2024-01-22 12:10 demo-archive/reports/
-rw------- root/root         0 2024-01-22 12:10 demo-archive/reports/report3.txt
-rw------- root/root         0 2024-01-22 12:10 demo-archive/reports/report2.txt
-rw------- root/root         0 2024-01-22 12:10 demo-archive/reports/report1.txt

You can also pipe the output and grep for any specific file or directory. You can also use regex while performing the grep:

tar -tzvf demo-archive.tar.gz | grep -E 'file[0-9]+\.txt'

Here's a breakdown of the regex pattern:

  • file: Matches the literal text "file".
  • [0-9]+: Matches one or more digits. This part will match 1, 2, etc.
  • \.txt: The dot . is a special character in regex, so it's escaped with a backslash to match a literal dot. Then txt matches the file extension.

The -E flag in grep is used for extended regular expressions, which allow a broader range of regex patterns than basic regex.

 

3. Extracting Specific File or Multiple Files

The general command to extract specific files is:

tar -xzvf filename.tar.gz path/to/file1 path/to/file2.

Here we must specify the complete path as we received in the previous step output. So for example to extract demo-archive/file2.txt and demo-archive/reports/report3.txt we will use:

# tar -xzvf demo-archive.tar.gz demo-archive/file2.txt demo-archive/reports/report3.txt
demo-archive/file2.txt
demo-archive/reports/report3.txt

 

4. Extracting Symbolic Links

Extracting symbolic links from a tar.gz archive is similar to extracting regular files or directories, but there are a few key points to understand about how symbolic links are handled in archives.

  • When you extract a symlink from a tar.gz archive, the extracted symlink will still point to the original location that it referenced when it was archived.
  • If the target of the symlink does not exist at the expected location, the symlink will be broken (it will point to a non-existent location) so you have to make sure to extract both the original source file to which the symbolic link is pointing to along with the symbolic link.
# tar -xzvf demo-archive.tar.gz demo-archive/file1.txt demo-archive/link_to_file1.txt
demo-archive/link_to_file1.txt
demo-archive/file1.txt

[root@fi-758-ncs22-12-06-cs-01 tmp]# ls -l demo-archive
total 0
-rw-------. 1 root root 0 Jan 22 12:10 file1.txt
lrwxrwxrwx. 1 root root 9 Jan 22 12:10 link_to_file1.txt -> file1.txt

 

5. Extracting all files from an specific directory

You can also choose to extract a specific directory and all the content inside this directory:

tar -xzvf demo-archive.tar.gz demo-archive/reports/

Sample Output:

demo-archive/reports/
demo-archive/reports/report3.txt
demo-archive/reports/report2.txt
demo-archive/reports/report1.txt

 

6. Handling Wildcards during extraction

Handling wildcards for extracting specific patterns of files from a tar.gz archive can be a useful technique, especially when dealing with large numbers of files. However, it's important to note that the tar command itself does not directly support wildcard usage during extraction. Instead, you can combine tar with other commands like grep for listing and xargs for extracting. Here's how you can do it:

First, list the contents of the archive and use grep with a wildcard pattern to filter the files you're interested in.

tar -tzvf archive_name.tar.gz | grep 'path/to/directory/*pattern*'

After you have the list of files, you can use xargs to pass these file names to tar for extraction.

tar -tzvf archive_name.tar.gz | grep 'path/to/directory/*pattern*' | xargs -I '{}' tar -xzvf archive_name.tar.gz '{}'

For example:

Let us list the files with some wildcard pattern:

# tar -tzvf demo-archive.tar.gz | grep 'demo-archive/reports/.*\.txt'
-rw------- root/root         0 2024-01-22 12:10 demo-archive/reports/report3.txt
-rw------- root/root         0 2024-01-22 12:10 demo-archive/reports/report2.txt
-rw------- root/root         0 2024-01-22 12:10 demo-archive/reports/report1.txt

Next we will combine this command with xargs to also extract the files in single line:

tar -tzvf demo-archive.tar.gz | grep 'demo-archive/reports/.*\.txt' | awk '{print $6}' | xargs -I '{}' tar -xzvf demo-archive.tar.gz '{}'

Here

  • We are getting the list of files using tar -tzvf demo-archive.tar.gz | grep 'demo-archive/reports/.*\.txt'
  • Next we print the file name and path using awk '{print $6}' and pass this list to xargs
  • Lastly xargs will parse the file list and store the file list in {} which is taken as input for tar -xzvf demo-archive.tar.gz to further extract them from the archive.

 

Extracting specific files from a tar.gz archive is a useful skill, particularly for managing large archives or when dealing with limited storage space. By following these steps and understanding the commands used, you can efficiently manage .tar.gz files in a Unix-like environment. Remember, the key is to know the exact path of the files you want to extract and to use the correct options with the tar command. You can read more using man tar command.

 

Deepak Prasad

Deepak Prasad

He is the founder of GoLinuxCloud and brings over a decade of expertise in Linux, Python, Go, Laravel, DevOps, Kubernetes, Git, Shell scripting, OpenShift, AWS, Networking, and Security. With extensive experience, he excels in various domains, from development to DevOps, Networking, and Security, ensuring robust and efficient solutions for diverse projects. You can connect with him on his LinkedIn profile.

Can't find what you're searching for? Let us assist you.

Enter your query below, and we'll provide instant results tailored to your needs.

If my articles on GoLinuxCloud has helped you, kindly consider buying me a coffee as a token of appreciation.

Buy GoLinuxCloud a Coffee

For any other feedbacks or questions you can send mail to admin@golinuxcloud.com

Thank You for your support!!