In this article we will learn all about tar incremental backup with examples in Linux. In the IT industry or be it any industry actually taking backup is something is followed everywhere. We cannot survive without a backup.
Now imagine the kind of space which goes up in storing such backup data so it is always a good idea to have an incremental backup to save space and take the backup of only the data which has changed.
In this article I will share some examples to perform tar incremental backup. You can integrate this solution with some script to automate the incremental backup.
tar is a famous archive tool which can be used for multiple purpose but we will concentrate on tar incremental backup functionality here.
I will distribute the tasks and verification in steps
- Create some data
- Take level 0 incremental backup
- Make some data changes
- Take level 1 incremental backup
- Remove the data source
- Recover the data from the tar incremental backup
How tar incremental backup is performed
- tar uses a metadata file to store the information about the source directory
- You can define about this metadata file using
--listed-incremental=<file>
where file is the metadata file - This file can also be referred as snapshot file
- The purpose of this file is to help determine which files have been changed, added or deleted since the last backup, so that the next incremental backup will contain only modified files.
Let us learn about tar incremental backup with some examples:
Step 1: Create some data
I have a directory /tmp/data
under which I have kept some files as see below:
[root@server1 tmp]# ls -l data/ total 60 -rw-r--r--. 1 root root 10240 Feb 8 11:23 file1 -rw-r--r--. 1 root root 10240 Feb 8 11:23 file2 -rw-r--r--. 1 root root 10240 Feb 8 11:23 file3 -rw-r--r--. 1 root root 10240 Feb 8 11:23 file4 -rw-r--r--. 1 root root 10240 Feb 8 11:23 file5
Step 2: Take level 0 incremental backup
I am sure you must be wondering about level 0
tar incremental backup. Level 0 is the first copy of the source directory which will be stored in our snapshot file.
By default if the snapshot file used with --listed-incremental
does not exists then it is considered as "Level 0" or other wise you can also specify this using --level=0
.
In the below example
- We will use
/root/metadata/data.sngz
as our snapshot file which does not exists at the moment so this will be our level 0 tar incremental backup - We will take backup of
/tmp/data
directory - The archive
data.tgz
will be stored under/tmp
and is archived using gzip compression - We have added verbose twice to get some more details on the screen of the activity.
--bzip2
), xz (--xz
) etc. You can check the man page of tar for more list of supported options
[root@server1 tmp]# tar --verbose --verbose --create --gzip --listed-incremental=/root/metadata/data.sngz --file=data.tgz data tar: data: Directory is new drwxr-xr-x root/root 0 2020-02-08 11:23 data/ -rw-r--r-- root/root 10240 2020-02-08 11:23 data/file1 -rw-r--r-- root/root 10240 2020-02-08 11:23 data/file2 -rw-r--r-- root/root 10240 2020-02-08 11:23 data/file3 -rw-r--r-- root/root 10240 2020-02-08 11:23 data/file4 -rw-r--r-- root/root 10240 2020-02-08 11:23 data/file5
Here,
-z| --gzip filter the archive through gzip -g| --listed-incremental=FILE handle new GNU-format incremental backup -c| --create create a new archive -f| --file=ARCHIVE use archive file or device ARCHIVE -v| --verbose verbosely list files processed
As you see a new directory was added to the snapshot file along with other files from our source directory
You can view the content of the incremental data in our archive using below command
[root@server1 tmp]# tar --list --incremental --verbose --verbose --file data.tgz drwxr-xr-x root/root 36 2020-02-08 11:23 data/ Y file1 Y file2 Y file3 Y file4 Y file5 -rw-r--r-- root/root 10240 2020-02-08 11:23 data/file1 -rw-r--r-- root/root 10240 2020-02-08 11:23 data/file2 -rw-r--r-- root/root 10240 2020-02-08 11:23 data/file3 -rw-r--r-- root/root 10240 2020-02-08 11:23 data/file4 -rw-r--r-- root/root 10240 2020-02-08 11:23 data/file5
This command will print, for each directory in the archive, the list of files in that directory at the time the archive was created.
This information is put out in a format which is both human-readable and unambiguous for a program: each file name is printed as
x file
where x is a letter describing the status of the file: `Y' if the file is present in the archive, `N' if the file is not included in the archive, or a `D' if the file is a directory (and is included in the archive)
Step 3: Make some data changes
Now for the sake of demonstrating tar incremental backup we will make some change in our source directory
[root@server1 tmp]# rm -f data/file5 [root@server1 tmp]# fallocate -l 10K data/file6
So e have deleted file5
and added a new file6
under /tmp/data
Step 4: Take level 1 incremental backup
Now it is time to take next level incremental backup. As you see I have used the same snapshot file /root/metadata/data.sngz
but I am creating a new archive "data.1.gz
[root@server1 tmp]# tar --verbose --verbose --create --gzip --listed-incremental=/root/metadata/data.sngz --file=data.1.tgz data drwxr-xr-x root/root 0 2020-02-08 11:27 data/ -rw-r--r-- root/root 10240 2020-02-08 11:27 data/file6
This new archive will only contain the recent changes after creating the level 0 incremental backup.
[root@server1 tmp]# tar --list --incremental --verbose --verbose --file data.1.tgz drwxr-xr-x root/root 36 2020-02-08 11:27 data/ N file1 N file2 N file3 N file4 Y file6 -rw-r--r-- root/root 10240 2020-02-08 11:27 data/file6
As you see file5
is missing and file6
is added as "Y"
Step 5: Remove the data source
Now to verify our tar incremental backup we will manually delete our source data
[root@server1 tmp]# rm -rf data
Step 6: Recover data using the tar incremental backup
Now we will try to recover our deleted data using the tar incremental backup. in the below example we are extracting the "level 0
" data first as we need the base of our source data
[root@server1 tmp]# tar --extract --listed-incremental=/dev/null --file data.tgz
So now we have the source data with us:
[root@server1 tmp]# ls -l data/ total 60 -rw-r--r--. 1 root root 10240 Feb 8 11:23 file1 -rw-r--r--. 1 root root 10240 Feb 8 11:23 file2 -rw-r--r--. 1 root root 10240 Feb 8 11:23 file3 -rw-r--r--. 1 root root 10240 Feb 8 11:23 file4 -rw-r--r--. 1 root root 10240 Feb 8 11:23 file5
Next we will extract our "level 1
" tar incremental backup which will make the other changes in our source data directory:
[root@server1 tmp]# tar --extract --listed-incremental=/dev/null --file data.1.tgz
Verify the changes:
[root@server1 tmp]# ls -l data/ total 60 -rw-r--r--. 1 root root 10240 Feb 8 11:23 file1 -rw-r--r--. 1 root root 10240 Feb 8 11:23 file2 -rw-r--r--. 1 root root 10240 Feb 8 11:23 file3 -rw-r--r--. 1 root root 10240 Feb 8 11:23 file4 -rw-r--r--. 1 root root 10240 Feb 8 11:27 file6
So as expected file5
is deleted and file6
is added from our level 1 tar incremental backup.
Lastly I hope the steps from the article to perform tar incremental backup on Linux was helpful. So, let me know your suggestions and feedback using the comment section.
Hi! Thanks a lot for this tutorial, this helped a lot. I have a question though,
What happens if I make changes to one of the files (say file1)? Will it backup the changed file? Considering that we make the change after level 0 backup
Thank you
Hi Mohnish, yes any such changes will be captured in the new incremental backup