How to perform tar incremental backup with example in Linux

In this article we will learn all about tar incremental backup with examples in Linux. In the IT industry or be it any industry actually taking backup is something is followed everywhere. We cannot survive without a backup.

Now imagine the kind of space which goes up in storing such backup data so it is always a good idea to have an incremental backup to save space and take the backup of only the data which has changed.

In this article I will share some examples to perform tar incremental backup. You can integrate this solution with some script to automate the incremental backup.

tar is a famous archive tool which can be used for multiple purpose but we will concentrate on tar incremental backup functionality here.

I will distribute the tasks and verification in steps

  • Create some data
  • Take level 0 incremental backup
  • Make some data changes
  • Take level 1 incremental backup
  • Remove the data source
  • Recover the data from the tar incremental backup

 

How tar incremental backup is performed

  • tar uses a metadata file to store the information about the source directory
  • You can define about this metadata file using --listed-incremental=<file> where file is the metadata file
  • This file can also be referred as snapshot file
  • The purpose of this file is to help determine which files have been changed, added or deleted since the last backup, so that the next incremental backup will contain only modified files.

Let us learn about tar incremental backup with some examples:

 

Step 1: Create some data

I have a directory /tmp/data under which I have kept some files as see below:

[root@server1 tmp]# ls -l data/
total 60
-rw-r--r--. 1 root root 10240 Feb  8 11:23 file1
-rw-r--r--. 1 root root 10240 Feb  8 11:23 file2
-rw-r--r--. 1 root root 10240 Feb  8 11:23 file3
-rw-r--r--. 1 root root 10240 Feb  8 11:23 file4
-rw-r--r--. 1 root root 10240 Feb  8 11:23 file5

 

Step 2: Take level 0 incremental backup

I am sure you must be wondering about level 0 tar incremental backup. Level 0 is the first copy of the source directory which will be stored in our snapshot file.
By default if the snapshot file used with --listed-incremental does not exists then it is considered as "Level 0" or other wise you can also specify this using --level=0.

In the below example

  • We will use /root/metadata/data.sngz as our snapshot file which does not exists at the moment so this will be our level 0 tar incremental backup
  • We will take backup of /tmp/data directory
  • The archive data.tgz will be stored under /tmp and is archived using gzip compression
  • We have added verbose twice to get some more details on the screen of the activity.

You can check the man page of tar for more list of supported options

[root@server1 tmp]# tar --verbose --verbose --create --gzip --listed-incremental=/root/metadata/data.sngz --file=data.tgz data
tar: data: Directory is new
drwxr-xr-x root/root         0 2020-02-08 11:23 data/
-rw-r--r-- root/root     10240 2020-02-08 11:23 data/file1
-rw-r--r-- root/root     10240 2020-02-08 11:23 data/file2
-rw-r--r-- root/root     10240 2020-02-08 11:23 data/file3
-rw-r--r-- root/root     10240 2020-02-08 11:23 data/file4
-rw-r--r-- root/root     10240 2020-02-08 11:23 data/file5

Here,

-z| --gzip                         filter the archive through gzip
-g| --listed-incremental=FILE      handle new GNU-format incremental backup
-c| --create                       create a new archive
-f| --file=ARCHIVE                 use archive file or device ARCHIVE
-v| --verbose                      verbosely list files processed

As you see a new directory was added to the snapshot file along with other files from our source directory

You can view the content of the incremental data in our archive using below command

[root@server1 tmp]# tar --list --incremental --verbose --verbose --file data.tgz
drwxr-xr-x root/root        36 2020-02-08 11:23 data/
Y file1
Y file2
Y file3
Y file4
Y file5

-rw-r--r-- root/root     10240 2020-02-08 11:23 data/file1
-rw-r--r-- root/root     10240 2020-02-08 11:23 data/file2
-rw-r--r-- root/root     10240 2020-02-08 11:23 data/file3
-rw-r--r-- root/root     10240 2020-02-08 11:23 data/file4
-rw-r--r-- root/root     10240 2020-02-08 11:23 data/file5

This command will print, for each directory in the archive, the list of files in that directory at the time the archive was created.
This information is put out in a format which is both human-readable and unambiguous for a program: each file name is printed as

x file

where x is a letter describing the status of the file: `Y' if the file is present in the archive, `N' if the file is not included in the archive, or a `D' if the file is a directory (and is included in the archive)

 

Step 3: Make some data changes

Now for the sake of demonstrating tar incremental backup we will make some change in our source directory

[root@server1 tmp]# rm -f data/file5
[root@server1 tmp]# fallocate -l 10K data/file6

So e have deleted file5 and added a new file6 under /tmp/data

 

Step 4: Take level 1 incremental backup

Now it is time to take next level incremental backup. As you see I have used the same snapshot file /root/metadata/data.sngz but I am creating a new archive "data.1.gz

[root@server1 tmp]# tar --verbose --verbose --create --gzip --listed-incremental=/root/metadata/data.sngz --file=data.1.tgz data
drwxr-xr-x root/root         0 2020-02-08 11:27 data/
-rw-r--r-- root/root     10240 2020-02-08 11:27 data/file6

This new archive will only contain the recent changes after creating the level 0 incremental backup.

[root@server1 tmp]# tar --list --incremental --verbose --verbose --file data.1.tgz
drwxr-xr-x root/root        36 2020-02-08 11:27 data/
N file1
N file2
N file3
N file4
Y file6

-rw-r--r-- root/root     10240 2020-02-08 11:27 data/file6

As you see file5 is missing and file6 is added as "Y"

 

Step 5: Remove the data source

Now to verify our tar incremental backup we will manually delete our source data

[root@server1 tmp]# rm -rf data

 

Step 6: Recover data using the tar incremental backup

Now we will try to recover our deleted data using the tar incremental backup. in the below example we are extracting the "level 0" data first as we need the base of our source data

[root@server1 tmp]# tar --extract  --listed-incremental=/dev/null --file data.tgz

So now we have the source data with us:

[root@server1 tmp]# ls -l data/
total 60
-rw-r--r--. 1 root root 10240 Feb  8 11:23 file1
-rw-r--r--. 1 root root 10240 Feb  8 11:23 file2
-rw-r--r--. 1 root root 10240 Feb  8 11:23 file3
-rw-r--r--. 1 root root 10240 Feb  8 11:23 file4
-rw-r--r--. 1 root root 10240 Feb  8 11:23 file5

Next we will extract our "level 1" tar incremental backup which will make the other changes in our source data directory:

[root@server1 tmp]# tar --extract  --listed-incremental=/dev/null --file data.1.tgz

Verify the changes:

[root@server1 tmp]# ls -l data/
total 60
-rw-r--r--. 1 root root 10240 Feb  8 11:23 file1
-rw-r--r--. 1 root root 10240 Feb  8 11:23 file2
-rw-r--r--. 1 root root 10240 Feb  8 11:23 file3
-rw-r--r--. 1 root root 10240 Feb  8 11:23 file4
-rw-r--r--. 1 root root 10240 Feb  8 11:27 file6

So as expected file5 is deleted and file6 is added from our level 1 tar incremental backup.

 

Lastly I hope the steps from the article to perform tar incremental backup on Linux was helpful. So, let me know your suggestions and feedback using the comment section.

2 thoughts on “How to perform tar incremental backup with example in Linux”

  1. Hi! Thanks a lot for this tutorial, this helped a lot. I have a question though,
    What happens if I make changes to one of the files (say file1)? Will it backup the changed file? Considering that we make the change after level 0 backup
    Thank you

    Reply

Leave a Comment

Please use shortcodes <pre class=comments>your code</pre> for syntax highlighting when adding code.