In this tutorial, we'll explore how to extract specific files, folders, and symbolic links from a tar.gz
archive, using straightforward command-line techniques suitable for beginners.
Before diving into the commands, it's important to understand what a tar.gz
file is:
- Tar: Short for "tape archive", it's a file format and a command-line utility used for collecting multiple files into a single archive file (
.tar
). It's often used in Unix and Linux environments. - Gzip: A compression method to reduce the size of files. When a tar file is compressed with gzip, it gets a
.tar.gz
or.tgz
extension.
Steps to extract specific file(s), link(s) and directories from archive
You'll need a Unix-like environment (Linux, macOS, FreeBSD, etc.) or a Windows system with tools like Git Bash, Cygwin, or WSL (Windows Subsystem for Linux) installed. The tar utility should be pre-installed in most Unix-like environments.
1. Open Terminal or Command Line Interface
- Open your command-line interface (CLI).
- Navigate to the directory containing your
tar.gz
file using thecd
command. For example, if your file is in theDownloads
folder, you'd typecd Downloads
.
2. Listing Contents of the Archive
Before extracting, you might want to see what's inside the archive. As you are required to give the complete path of the file or directory you intend to extract.
Use the command: tar -tzvf filename.tar.gz
.
t
tellstar
to list contents.z
is for gzip compressed files.v
is for verbose mode, showing details.f
specifies that you're working with a file.
For example:
# tar -tzvf demo-archive.tar.gz drwx------ root/root 0 2024-01-22 12:10 demo-archive/ lrwxrwxrwx root/root 0 2024-01-22 12:10 demo-archive/link_to_file1.txt -> file1.txt -rw------- root/root 0 2024-01-22 12:10 demo-archive/file2.txt -rw------- root/root 0 2024-01-22 12:10 demo-archive/file1.txt drwx------ root/root 0 2024-01-22 12:10 demo-archive/reports/ -rw------- root/root 0 2024-01-22 12:10 demo-archive/reports/report3.txt -rw------- root/root 0 2024-01-22 12:10 demo-archive/reports/report2.txt -rw------- root/root 0 2024-01-22 12:10 demo-archive/reports/report1.txt
You can also pipe the output and grep for any specific file or directory. You can also use regex while performing the grep:
tar -tzvf demo-archive.tar.gz | grep -E 'file[0-9]+\.txt'
Here's a breakdown of the regex pattern:
file
: Matches the literal text "file".[0-9]+
: Matches one or more digits. This part will match1
,2
, etc.\.txt
: The dot.
is a special character in regex, so it's escaped with a backslash to match a literal dot. Thentxt
matches the file extension.
The -E
flag in grep
is used for extended regular expressions, which allow a broader range of regex patterns than basic regex.
3. Extracting Specific File or Multiple Files
The general command to extract specific files is:
tar -xzvf filename.tar.gz path/to/file1 path/to/file2.
Here we must specify the complete path as we received in the previous step output. So for example to extract demo-archive/file2.txt
and demo-archive/reports/report3.txt
we will use:
# tar -xzvf demo-archive.tar.gz demo-archive/file2.txt demo-archive/reports/report3.txt demo-archive/file2.txt demo-archive/reports/report3.txt
4. Extracting Symbolic Links
Extracting symbolic links from a tar.gz
archive is similar to extracting regular files or directories, but there are a few key points to understand about how symbolic links are handled in archives.
- When you extract a symlink from a
tar.gz
archive, the extracted symlink will still point to the original location that it referenced when it was archived. - If the target of the symlink does not exist at the expected location, the symlink will be broken (it will point to a non-existent location) so you have to make sure to extract both the original source file to which the symbolic link is pointing to along with the symbolic link.
# tar -xzvf demo-archive.tar.gz demo-archive/file1.txt demo-archive/link_to_file1.txt demo-archive/link_to_file1.txt demo-archive/file1.txt [root@fi-758-ncs22-12-06-cs-01 tmp]# ls -l demo-archive total 0 -rw-------. 1 root root 0 Jan 22 12:10 file1.txt lrwxrwxrwx. 1 root root 9 Jan 22 12:10 link_to_file1.txt -> file1.txt
5. Extracting all files from an specific directory
You can also choose to extract a specific directory and all the content inside this directory:
tar -xzvf demo-archive.tar.gz demo-archive/reports/
Sample Output:
demo-archive/reports/ demo-archive/reports/report3.txt demo-archive/reports/report2.txt demo-archive/reports/report1.txt
6. Handling Wildcards during extraction
Handling wildcards for extracting specific patterns of files from a tar.gz
archive can be a useful technique, especially when dealing with large numbers of files. However, it's important to note that the tar
command itself does not directly support wildcard usage during extraction. Instead, you can combine tar
with other commands like grep
for listing and xargs
for extracting. Here's how you can do it:
First, list the contents of the archive and use grep
with a wildcard pattern to filter the files you're interested in.
tar -tzvf archive_name.tar.gz | grep 'path/to/directory/*pattern*'
After you have the list of files, you can use xargs
to pass these file names to tar
for extraction.
tar -tzvf archive_name.tar.gz | grep 'path/to/directory/*pattern*' | xargs -I '{}' tar -xzvf archive_name.tar.gz '{}'
For example:
Let us list the files with some wildcard pattern:
# tar -tzvf demo-archive.tar.gz | grep 'demo-archive/reports/.*\.txt' -rw------- root/root 0 2024-01-22 12:10 demo-archive/reports/report3.txt -rw------- root/root 0 2024-01-22 12:10 demo-archive/reports/report2.txt -rw------- root/root 0 2024-01-22 12:10 demo-archive/reports/report1.txt
Next we will combine this command with xargs to also extract the files in single line:
tar -tzvf demo-archive.tar.gz | grep 'demo-archive/reports/.*\.txt' | awk '{print $6}' | xargs -I '{}' tar -xzvf demo-archive.tar.gz '{}'
Here
- We are getting the list of files using
tar -tzvf demo-archive.tar.gz | grep 'demo-archive/reports/.*\.txt'
- Next we print the file name and path using
awk '{print $6}'
and pass this list toxargs
- Lastly
xargs
will parse the file list and store the file list in {} which is taken as input fortar -xzvf demo-archive.tar.gz
to further extract them from the archive.
Extracting specific files from a tar.gz
archive is a useful skill, particularly for managing large archives or when dealing with limited storage space. By following these steps and understanding the commands used, you can efficiently manage .tar.gz
files in a Unix-like environment. Remember, the key is to know the exact path of the files you want to extract and to use the correct options with the tar
command. You can read more using man tar command.