Backups with tar

Edit on GitHub
Aug 17, 2017
System Administration
3 minutes

tar

tar is an archiving utility, it creates and extracts (compressed) archives (aka tarballs).

1tar -czpf foo.tar.gz sourceFiles file1 file2 # creates compressed archive
2tar -xpf foo.tar.gz # extracts archive
3tar -xpf foo.tar.gz -C dest/ # extracts archive in the `dest/` directory

c or --createcreates
x or --extract extracts
z or --gzip/--gunzip zip, compresses or uncompresses the archive with gzip
p or --preserve-permissions preserves file and directory permissions
f provide the File name (foo.tar.gz in the above example)
Compress your backups for faster transfer, less bandwidth usage, and less disk space usage (you will get charged for the disk space and bandwidth if you’re transferring backups off-site, to a service like Amazon S3).
Since backups are usually automated, you can skip -v for verbosity.

ownership

You can optionally preserve and restore file ownerships as well with the -s, --preserve --same-owner flags.

s
                             from the archive (default for ordinary users)
      --numeric-owner        always use numbers for user/group names
      --owner=NAME           force NAME as owner for added files
  -p, --preserve-permissions, --same-permissions
                             extract information about file permissions
                             (default for superuser)
      --preserve             same as both -p and -s
      --same-owner           try extracting files with the same ownership as
                             exists in the archive (default for superuser)

add backup dates

1tar -czpf foo.'/bin/date + \%y%m\%d'.tar.gz

compression

bzip is the best in terms of compression ration, but is very CPU and RAM intensive
gzip has a decent compression ratio, and a decent resource usage

See what’s inside a backup

You might want to for different reasons.. Let’s say you want to find out what date the files inside a tarball were backed up/created

1tar -tf foo.tar.gz # list the files in the tar archive
2tar -tvf foo.tar # list all files in foo.tar verbosely (permissions, ownerships, file size, time)
3tar --list -f foo.tar.gz # -t and --list are the same thing (equivalent of `tar -tf foo.tar.gz`)

 1# tar -tf foo.tar.gz
 2foo/
 3foo/file2.txt
 4foo/file3.txt
 5foo/file9.txt
 6foo/file4.txt
 7foo/file1.txt
 8foo/file5.txt
 9foo/file8.txt
10foo/file7.txt
11foo/file6.txt

 1# tar -tvf foo.tar.gz
 2drwxr-xr-x root/root         0 2017-08-17 06:48 foo/
 3-rw-r--r-- root/root         0 2017-08-17 06:45 foo/file2.txt
 4-rw-r--r-- root/root         0 2017-08-17 06:45 foo/file3.txt
 5-rw-r--r-- root/root         0 2017-08-17 06:45 foo/file9.txt
 6-rw-r--r-- root/root         0 2017-08-17 06:45 foo/file4.txt
 7-rw-r--r-- root/root         0 2017-08-17 06:48 foo/file1.txt
 8-rw-r--r-- root/root         0 2017-08-17 06:45 foo/file5.txt
 9-rw-r--r-- root/root         0 2017-08-17 06:45 foo/file8.txt
10-rw-r--r-- root/root         0 2017-08-17 06:45 foo/file7.txt
11-rw-r--r-- root/root         0 2017-08-17 06:45 foo/file6.txt

A bash script to automate the whole thing

Here’s a script that i have used on one of my sites. It creates a file backup of a website in /var/www and saves it in a backups directory on the server. It also deletes backups older than 5 days, and can optionally sync backups to S3.

 1DIR='/backups'
 2TIMESTAMP=`date +%Y%b%d`
 3YEAR=`date +%Y`
 4
 5# Create & Compress
 6echo "Backing up: foo.com"
 7tar -czpf ${DIR}/${TIMESTAMP}.foo.com.tar.gz /var/www/foo.com/public_html/
 8
 9echo "Success: backup created"
10
11# Delete old backups (older than 5 days)
12echo "Deleting old backups.."
13find ${DIR}/${YEAR}*.*.tar.gz -type f -lastmod +5 -delete
14# -delete might not work on all systems
15#find ${DIR}/${YEAR}*.*.tar.gz -type f -lastmod +5 -exec rm -f {} \;
16
17# Sync to S3
18# s3cmd sync /backups/ s3://s3.foo.com/
19# echo "Success: backup synced with S3"

Notes