Notes

Backups with tar

Edit on GitHub

System Administration
3 minutes

tar

tar is an archiving utility, it creates and extracts (compressed) archives (aka tarballs).

1tar -czpf foo.tar.gz sourceFiles file1 file2 # creates compressed archive
2tar -xpf foo.tar.gz # extracts archive
3tar -xpf foo.tar.gz -C dest/ # extracts archive in the `dest/` directory
  • c or --createcreates

  • x or --extract extracts

  • z or --gzip/--gunzip zip, compresses or uncompresses the archive with gzip

  • p or --preserve-permissions preserves file and directory permissions

  • f provide the File name (foo.tar.gz in the above example)

  • Compress your backups for faster transfer, less bandwidth usage, and less disk space usage (you will get charged for the disk space and bandwidth if you’re transferring backups off-site, to a service like Amazon S3).

  • Since backups are usually automated, you can skip -v for verbosity.

ownership

You can optionally preserve and restore file ownerships as well with the -s, --preserve --same-owner flags.

s
                             from the archive (default for ordinary users)
      --numeric-owner        always use numbers for user/group names
      --owner=NAME           force NAME as owner for added files
  -p, --preserve-permissions, --same-permissions
                             extract information about file permissions
                             (default for superuser)
      --preserve             same as both -p and -s
      --same-owner           try extracting files with the same ownership as
                             exists in the archive (default for superuser)

add backup dates

1tar -czpf foo.'/bin/date + \%y%m\%d'.tar.gz

compression

  • bzip is the best in terms of compression ration, but is very CPU and RAM intensive
  • gzip has a decent compression ratio, and a decent resource usage

See what’s inside a backup

You might want to for different reasons.. Let’s say you want to find out what date the files inside a tarball were backed up/created

1tar -tf foo.tar.gz # list the files in the tar archive
2tar -tvf foo.tar # list all files in foo.tar verbosely (permissions, ownerships, file size, time)
3tar --list -f foo.tar.gz # -t and --list are the same thing (equivalent of `tar -tf foo.tar.gz`)
 1# tar -tf foo.tar.gz
 2foo/
 3foo/file2.txt
 4foo/file3.txt
 5foo/file9.txt
 6foo/file4.txt
 7foo/file1.txt
 8foo/file5.txt
 9foo/file8.txt
10foo/file7.txt
11foo/file6.txt
 1# tar -tvf foo.tar.gz
 2drwxr-xr-x root/root         0 2017-08-17 06:48 foo/
 3-rw-r--r-- root/root         0 2017-08-17 06:45 foo/file2.txt
 4-rw-r--r-- root/root         0 2017-08-17 06:45 foo/file3.txt
 5-rw-r--r-- root/root         0 2017-08-17 06:45 foo/file9.txt
 6-rw-r--r-- root/root         0 2017-08-17 06:45 foo/file4.txt
 7-rw-r--r-- root/root         0 2017-08-17 06:48 foo/file1.txt
 8-rw-r--r-- root/root         0 2017-08-17 06:45 foo/file5.txt
 9-rw-r--r-- root/root         0 2017-08-17 06:45 foo/file8.txt
10-rw-r--r-- root/root         0 2017-08-17 06:45 foo/file7.txt
11-rw-r--r-- root/root         0 2017-08-17 06:45 foo/file6.txt

A bash script to automate the whole thing

Here’s a script that i have used on one of my sites. It creates a file backup of a website in /var/www and saves it in a backups directory on the server. It also deletes backups older than 5 days, and can optionally sync backups to S3.

 1DIR='/backups'
 2TIMESTAMP=`date +%Y%b%d`
 3YEAR=`date +%Y`
 4
 5# Create & Compress
 6echo "Backing up: foo.com"
 7tar -czpf ${DIR}/${TIMESTAMP}.foo.com.tar.gz /var/www/foo.com/public_html/
 8
 9echo "Success: backup created"
10
11# Delete old backups (older than 5 days)
12echo "Deleting old backups.."
13find ${DIR}/${YEAR}*.*.tar.gz -type f -lastmod +5 -delete
14# -delete might not work on all systems
15#find ${DIR}/${YEAR}*.*.tar.gz -type f -lastmod +5 -exec rm -f {} \;
16
17# Sync to S3
18# s3cmd sync /backups/ s3://s3.foo.com/
19# echo "Success: backup synced with S3"