The tar(1) command is a commonly used tool for administrators and users, to backup their system and/or personal files. Next to just archiving a multitude of files and folders into a single file, tar is further equipped with the possibility to create a gzip'ed an thus compressed archive. This is done via the following command:
tar -czf archive.tar.gz folder/
So let's create a few directories and take a look at how this is usually done:
[om@oliver ~]$ mkdir -p tarproject/folder
[om@oliver ~]$ cd tarproject/folder/
[om@oliver folder]$ echo "hello" > file_a.txt
[om@oliver folder]$ echo "world" > file_b.txt
[om@oliver folder]$ echo '!' > file_c.txt
[om@oliver folder]$ ll
total 20
drwxrwxr-x. 2 om om 4096 Jan 7 14:52 ./
drwxrwxr-x. 3 om om 4096 Jan 7 14:51 ../
-rw-rw-r--. 1 om om 6 Jan 7 14:51 file_a.txt
-rw-rw-r--. 1 om om 6 Jan 7 14:51 file_b.txt
-rw-rw-r--. 1 om om 2 Jan 7 14:52 file_c.txt
Now we can create our archive according to the above command:
[om@oliver folder]$ cd ..
[om@oliver tarproject]$ tar -czf archive_outside.tar.gz folder
[om@oliver tarproject]$ echo $?
0
[om@oliver tarproject]$ ll
total 16
drwxrwxr-x. 3 om om 4096 Jan 7 15:39 ./
drwxr-xr-x. 64 om om 4096 Jan 7 14:51 ../
-rw-rw-r--. 1 om om 210 Jan 7 15:39 archive_outside.tar.gz
drwxrwxr-x. 2 om om 4096 Jan 7 15:38 folder/
As can be seen, the command's exit status is zero, therefore indicating success.
To extract the archive we would first cd into the directory where we want the files to be archived to be placed and then run:
tar -xvf archive.tar.gz
So much for the standard usage of the tar command. One thing to note here, before creating the tar archive we moved into the corresponding parent dicrectory. Therefore, if "folder" is a subfolder to our homedirectory ${HOME}, we can easily create the archive by cd'ing to ~/ and running tar as indicated above. However, executing the command inside the directory we'd like to archive and thus creating a new file - the tar.gz archive - while we are processing the directory, is when things get a little more interesting. Let's examine this very situation in the following passages.
Let's step into the folder where our files sit, create the archive again and see what happens...
[om@oliver tarproject]$ cd folder/
[om@oliver folder]$ ll
total 20
drwxrwxr-x. 2 om om 4096 Jan 7 14:52 ./
drwxrwxr-x. 3 om om 4096 Jan 7 15:34 ../
-rw-rw-r--. 1 om om 6 Jan 7 14:51 file_a.txt
-rw-rw-r--. 1 om om 6 Jan 7 14:51 file_b.txt
-rw-rw-r--. 1 om om 2 Jan 7 14:52 file_c.txt
[om@oliver folder]$ tar -czf archive_inside.tar.gz ./
tar: .: file changed as we read it
[om@oliver folder]$ echo $?
1
[om@oliver folder]$ ll archive_inside.tar.gz
-rw-rw-r--. 1 om om 204 Jan 7 15:38 archive_inside.tar.gz
While tar is running, we get the following error message: tar: .: file changed as we read it.
Moreover, when checking the command's exit value via echo$?, we realize that tar is not necessarily happy with the course of events.
It would be worthwhile, to check the integrity of our archive via gzip --test. While we are at it, let's do the same for our original - outside - archive.
[om@oliver folder]$ gzip --test archive_inside.tar.gz
[om@oliver folder]$ echo $?
0
[om@oliver folder]$ gzip --test ../archive_outside.tar.gz
[om@oliver folder]$ echo $?
0
[om@oliver folder]$
archive_outside.tar.gz is ok as expected. At the same time, the integrity of archive_inside.tar.gz seems to be ok too, even though tar complained while creating the archive. Let's get back to this a little later.
Taking a look at:
[om@oliver folder]$ man tar
We can see that there is a nice switch, called --ignore-failed-read to tar, which may alleviate the problem. Let's duplicate our backup directory, as our current one now hold's our archive and try the --ignore-failed-read switch.
[om@oliver folder]$ cd ..
[om@oliver tarproject]$ mkdir folder2
[om@oliver tarproject]$ cp -v folder/*.txt folder2
`folder/file_a.txt' -> `folder2/file_a.txt'
`folder/file_b.txt' -> `folder2/file_b.txt'
`folder/file_c.txt' -> `folder2/file_c.txt'
[om@oliver tarproject]$ ll
total 20
drwxrwxr-x. 4 om om 4096 Jan 7 16:32 ./
drwxr-xr-x. 64 om om 4096 Jan 7 14:51 ../
-rw-rw-r--. 1 om om 210 Jan 7 16:05 archive_outside.tar.gz
drwxrwxr-x. 2 om om 4096 Jan 7 16:06 folder/
drwxrwxr-x. 2 om om 4096 Jan 7 16:32 folder2/
[om@oliver tarproject]$ cd folder2/
[om@oliver folder2]$ ll
total 20
drwxrwxr-x. 2 om om 4096 Jan 7 16:32 ./
drwxrwxr-x. 4 om om 4096 Jan 7 16:32 ../
-rw-rw-r--. 1 om om 6 Jan 7 16:32 file_a.txt
-rw-rw-r--. 1 om om 6 Jan 7 16:32 file_b.txt
-rw-rw-r--. 1 om om 2 Jan 7 16:32 file_c.txt
[om@oliver folder2]$ tar --ignore-failed-read -czf archive_inside_ignore.tar.gz ./
tar: .: file changed as we read it
[om@oliver folder2]$ echo $?
1
[om@oliver folder2]$ ll
total 24
drwxrwxr-x. 2 om om 4096 Jan 7 16:38 ./
drwxrwxr-x. 4 om om 4096 Jan 7 16:32 ../
-rw-rw-r--. 1 om om 189 Jan 7 16:38 archive_inside_ignore.tar.gz
-rw-rw-r--. 1 om om 6 Jan 7 16:32 file_a.txt
-rw-rw-r--. 1 om om 6 Jan 7 16:32 file_b.txt
-rw-rw-r--. 1 om om 2 Jan 7 16:32 file_c.txt
[om@oliver folder2]$ gzip --test archive_inside_ignore.tar.gz
[om@oliver folder2]$ echo $?
0
[om@oliver folder2]$
It appears, that not a whole lot has changed. We still get an exit value unequal to zero, yet testing the archive via gzip --test seems to be ok. Seems to be the same result as if we hadn't used the switch.
Taking yet another look at:
[om@oliver folder]$ man tar
We find, that there is a --exclude switch, that we can use, to exclude files from being backed up. We'll duplicate the original set of files yet again, and try this:
[om@oliver folder2]$ cd ..
[om@oliver tarproject]$ mkdir folder3
[om@oliver tarproject]$ cp -v folder/*.txt folder3
`folder/file_a.txt' -> `folder3/file_a.txt'
`folder/file_b.txt' -> `folder3/file_b.txt'
`folder/file_c.txt' -> `folder3/file_c.txt'
[om@oliver tarproject]$ cd folder3
[om@oliver folder3]$ touch archive_inside_exclude.tar.gz
[om@oliver folder3]$ ll
total 20
drwxrwxr-x. 2 om om 4096 Jan 7 17:08 ./
drwxrwxr-x. 5 om om 4096 Jan 7 17:03 ../
-rw-rw-r--. 1 om om 0 Jan 7 17:08 archive_inside_exclude.tar.gz
-rw-rw-r--. 1 om om 6 Jan 7 17:03 file_a.txt
-rw-rw-r--. 1 om om 6 Jan 7 17:03 file_b.txt
-rw-rw-r--. 1 om om 2 Jan 7 17:03 file_c.txt
[om@oliver folder3]$ tar --exclude="archive_inside_exclude.tar.gz" -czf archive_inside_exclude.tar.gz ./
[om@oliver folder3]$ echo $?
0
[om@oliver folder3]$ ll
total 24
drwxrwxr-x. 2 om om 4096 Jan 7 17:08 ./
drwxrwxr-x. 5 om om 4096 Jan 7 17:03 ../
-rw-rw-r--. 1 om om 193 Jan 7 17:08 archive_inside_exclude.tar.gz
-rw-rw-r--. 1 om om 6 Jan 7 17:03 file_a.txt
-rw-rw-r--. 1 om om 6 Jan 7 17:03 file_b.txt
-rw-rw-r--. 1 om om 2 Jan 7 17:03 file_c.txt
[om@oliver folder3]$ gzip --test archive_inside_exclude.tar.gz
[om@oliver folder3]$ echo $?
0
[om@oliver folder3]$
This appears to solve our problem. Tar exits with a return value of 0 and the test via gzip is satisfying as well.
To test all the “inside“ archives created, let's untar them in another set of directories:
[om@oliver folder3]$ cd ..
[om@oliver tarproject]$ mkdir untar
[om@oliver tarproject]$ mkdir untar/outside untar/inside untar/inside_ignore untar/inside_exclude
[om@oliver tarproject]$ ls
./ ../ archive_outside.tar.gz folder/ folder2/ folder3/ untar/
[om@oliver tarproject]$ ls untar/
./ ../ inside/ inside_exclude/ inside_ignore/ outside/
[om@oliver tarproject]$ tar -xf archive_outside.tar.gz -C untar/outside/
[om@oliver tarproject]$ tar -xf folder/archive_inside.tar.gz -C untar/inside/
[om@oliver tarproject]$ tar -xf folder2/archive_inside_ignore.tar.gz -C untar/inside_ignore/
[om@oliver tarproject]$ tar -xf folder3/archive_inside_exclude.tar.gz -C untar/inside_exclude/
And run the mapdir utility - that we created in another tutorial - followed by diff on them, to see if and what differences there are. Our reference base here is the archive that we created first, archive_outside.tar.gz
[om@oliver tarproject]$ mapdir -f untar/outside/folder/
Mapping structure of: /home/om/tarproject/untar/outside/folder
file_a.txt - regular file - Size: 6 bytes - MD5: b1946ac92492d2347c6235b4d2611184
file_b.txt - regular file - Size: 6 bytes - MD5: 591785b794601e212b260e25925636fd
file_c.txt - regular file - Size: 2 bytes - MD5: 8183aa57a23658efe7ba7aebe60816bc
########## Statistics for /home/om/tarproject/untar/outside/folder ##########
Number of directories: 1
Number of regular files: 3
#############################################################################
Done!
[om@oliver tarproject]$ mapdir -f untar/inside
Mapping structure of: /home/om/tarproject/untar/inside
file_a.txt - regular file - Size: 6 bytes - MD5: b1946ac92492d2347c6235b4d2611184
file_b.txt - regular file - Size: 6 bytes - MD5: 591785b794601e212b260e25925636fd
file_c.txt - regular file - Size: 2 bytes - MD5: 8183aa57a23658efe7ba7aebe60816bc
########## Statistics for /home/om/tarproject/untar/inside ##########
Number of directories: 1
Number of regular files: 3
#####################################################################
Done!
[om@oliver tarproject]$ mapdir -f untar/inside_ignore/
Mapping structure of: /home/om/tarproject/untar/inside_ignore
file_a.txt - regular file - Size: 6 bytes - MD5: b1946ac92492d2347c6235b4d2611184
file_b.txt - regular file - Size: 6 bytes - MD5: 591785b794601e212b260e25925636fd
file_c.txt - regular file - Size: 2 bytes - MD5: 8183aa57a23658efe7ba7aebe60816bc
########## Statistics for /home/om/tarproject/untar/inside_ignore ##########
Number of directories: 1
Number of regular files: 3
############################################################################
Done!
[om@oliver tarproject]$ mapdir -f untar/inside_exclude/
Mapping structure of: /home/om/tarproject/untar/inside_exclude
file_a.txt - regular file - Size: 6 bytes - MD5: b1946ac92492d2347c6235b4d2611184
file_b.txt - regular file - Size: 6 bytes - MD5: 591785b794601e212b260e25925636fd
file_c.txt - regular file - Size: 2 bytes - MD5: 8183aa57a23658efe7ba7aebe60816bc
########## Statistics for /home/om/tarproject/untar/inside_exclude ##########
Number of directories: 1
Number of regular files: 3
#############################################################################
Done!
[om@oliver tarproject]$ cd
[om@oliver ~]$ ll mapdir_home_om_tarproject_untar_*
-rw-rw-r--. 1 om om 371 Jan 7 19:36 mapdir_home_om_tarproject_untar_inside_01072017.txt
-rw-rw-r--. 1 om om 371 Jan 7 19:36 mapdir_home_om_tarproject_untar_inside_exclude_01072017.txt
-rw-rw-r--. 1 om om 371 Jan 7 19:36 mapdir_home_om_tarproject_untar_inside_ignore_01072017.txt
-rw-rw-r--. 1 om om 371 Jan 7 19:36 mapdir_home_om_tarproject_untar_outside_folder_01072017.txt
[om@oliver ~]$ diff mapdir_home_om_tarproject_untar_outside_folder_01072017.txt mapdir_home_om_tarproject_untar_inside_01072017.txt
[om@oliver ~]$ echo $?
0
[om@oliver ~]$ diff mapdir_home_om_tarproject_untar_outside_folder_01072017.txt mapdir_home_om_tarproject_untar_inside_exclude_01072017.txt
[om@oliver ~]$ echo $?
0
[om@oliver ~]$ diff mapdir_home_om_tarproject_untar_outside_folder_01072017.txt mapdir_home_om_tarproject_untar_inside_ignore_01072017.txt
[om@oliver ~]$ echo $?
0
[om@oliver ~]$
As can be seen, the are no differences between the archives, even though tar complained while creating the archive. In order to avoid this problem, when running tar inside a folder that is being backed up, we can simply make use of the --exclude feature introduced above.
CONCLUSIONS
In an earlier tutorial, we have developed a few scripts through which a user can backup their home folder. The idea rested on the assumption, that the backups are placed in another part of the filesystem: /usr/local/backups/user. If you are not the only user, this may in some situations be inconvenient, such as when you are on a shared server, where you cannot simply access various parts of the filesystem, because you haven't been granted access to it. Furthermore, you may (or may not) find, that having you backups in your home folder is a better way of organizing your personal files, but that maybe a matter of personal taste. The backup scripts can be easily amended to meet the requirements discussed.