Rsync for Linux

This guide is aimed at those who wish to establish a centralised server to store back-up data from remote servers/workstations running GNU/Linux. The steps outlined below are tried and tested and used automatically via cron 365 days a year to maintain back-up data for Gaztronics.com clients. This guide does not require or set-up Rsync as a xinetd daemon.

I recommend you utilise SSH to encrypt all of your Rsync traffic regardless of whether it is being transported over the local network, or the Internet. If it is passing over the Internet, I would strongly recommend you use SSH!


1: SSH

In order to use SSH to encrypt our Rsync traffic (and manage your servers/workstations) we need to create keys to allow secure access without the need for a password. Key-length should be 2048 bits by default.

ssh-keygen

The first step is to use ssh-keygen on our master server to create a set of public and private keys. You should be root (or Super-User) for this step as we will be connecting root-to-root in order to run the back-up scripts with sufficient priveleges.

Enter the command: ssh-keygen and press return. You will be prompted for a location to save the key. This should be /root/.ssh/id_rsa. Press return to accept the default. Press return again twice more to not set a passphrase for the key. If you enter a passphrase, you will be prompted each time you run the back-up job - which will make automatic job running by cron very difficult!

In the .ssh directory, you should now see two files: id_rsa and id_rsa.pub. id_rsa is the private key for this server and id_rsa.pub is the public key which we will place on the remote systems.

Remote host

Login to the remote host, either as root, or as yourself [then become super-user]. In the /root directory, create a .ssh directory with 700 (chmod 700 .ssh/) and root:root (chown root:root .ssh/) permissions. Inside that directory, copy the contents of your id_rsa.pub file into a file called authorized_keys (Note: the Americanized spelling!). This file should be owned by root and have 644 permissions. You can add extra keys to this file if you wish to access the remote host from other systems. Simply add the keys into the file, one per line; and remember to delete old keys when they are no longer valid!

Modify the sshd_config file (usually found in /etc/ssh) to allow root login by removing the # against PermitRootLogin yes. Restart the SSH server-daemon and try logging into the remote system from your master server as the root user. If the SSH keys are correct, your connection should not prompt for a password. Please note: The first time you connect, you will be prompted to confirm the key-fingerprint of the remote host. Once accepted, you will only be prompted again if the key at the remote end is changed.

Bolt-down the SSH config by editing sshd_config again, and this time, set PasswordAuthentication no and restart the SSH server-daemon. This will prevent anyone from accessing the SSH connection via a challenge password. Please note: If you have other users accessing the remote system, ensure they are all using keys before disabling this!


2: Directories

We now need to decide where to store the back-up data. I recommend creating a directory called /backup and this should be on a volume with sufficient space for the data. Inside that directory, I recommend creating two sub-directories called /daily and /snapshot. All will be revealed further down.

As the rsync scripts will be run as root, I recommend creating a directory in /root to keep them all safe; e.g. /root/rsync-scripts.


3: Job control script

We are now ready to create a script to back-up the remote host. I prefer to create a script per-host to enable easy tracking and easy management from the cron daemon. I will leave the naming convention up to you. The script detailed below utilises a trick known as hard-linking - a feature of the Unix cp command that creates pointers in the file-system. This neat trick allows us to store 30, 60 or 90 days-worth of back-up data without actually storing multiple copies of the same file.

Start with a header describing the script and the last time it was edited. If you are working in a team, ask team-members to add comments if they change the file. Then you know who to blame when it all goes wrong!

#!/bin/bash
#
# Rsync job control file for <name of host>.
#
# Last updated: 13th December 2012

We now define paths and variables and set the Rsync options. If you are testing, set OPT=vanP to see the progress of a dry-run. Set hostname_rsync to the name of the remote host to make it easier to see from the lock-files which jobs for which hosts are still running. For the sub-directories in SNAPSHOT and DAILY, set hostname to match the remote host's name, or something meaningful to yourself.

#---------------------------------------------------------------
# Paths, variables and system checks.

# Path to lock file.
#
LOCK_FILE=/var/lock/subsys/hostname_rsync

# Options
#
OPT=va

# Paths to back-up directories
#
SNAPSHOT=/backup/snapshot/hostname/
DAILY=/backup/daily/hostname/

# Check directories exist, if not create them.

# if [ ! -d $SNAPSHOT ]; then
     mkdir -p $SNAPSHOT
fi

if [ ! -d $DAILY ]; then
     mkdir -p $DAILY
fi

We now define the route to the remote host. This can take the form of a hostname, IP address, or fully-qualified domain name. Using the @ sign causes rsync to assume -e ssh, so there is no need to define it.

# Route
#
ROUTE=root@hostname

Now we start the back-up process for real.

#---------------------------------------------------------------

# Start the back-up process.
#
if [ -f $LOCK_FILE ]; then
     echo "Rsync job already running!"
     exit 1
fi

echo "Rsync job started: `date`"

# Create lock file.
#
touch $LOCK_FILE


# /etc directory
#
echo -e "\n*/etc*\n"
rsync -$OPT --delete $ROUTE:/etc $SNAPSHOT

# /home directory
# echo -e "\n*/home*\n"
rsync -$OPT --delete $ROUTE:/home $SNAPSHOT

# /root directory
#
echo -e "\n*/root*\n"
rsync -$OPT --delete $ROUTE:/root $SNAPSHOT

# /var directory
#
echo -e "\n*/var*\n"
rsync -$OPT --delete --exclude-from=/root/rsync-scripts/var_exclude $ROUTE:/var $SNAPSHOT

In the above examples, we are separately backing up a selected set of directories, and excluding some sub-directories from the /var directory. You could back-up the root level of the remote host, but you will find a lot of errors are thrown up when parsing over dynamic areas, such as /proc and /dev; and you may not want/need the whole system. Once this part of the script has run, you will have a snapshot of the remote host from the time you ran the job. If you are running this daily, it will pick up any changes and apply them to the snapshot.

The file var_exclude has been created for all of my scripts to use and contains the following:

# Rsync exclude file for /var
#
# Last updated: 7th June 2011

/var/account
/var/cache
/var/crash
/var/db
/var/empty
/var/games
/var/gdm
/var/lib/clamav
/var/lib/mlocate/
/var/local
/var/lock
/var/log
/var/nis
/var/opt
/var/preserve
/var/run
/var/spool
/var/tmp

These are directories that are not needed from a system recovery point of view. It is entirely your choice as to whether you back-up the whole of /var, or leave some directories off. The file has 644 permissions as we only need to read it in.

The next section is magic! It creates a new daily dated set of hard-links to the snapshot, and in the process, it provides a rolling back-up. It is a bit like using daily tapes, only you do not have to change them, or have a tape drive, or any tapes! For the purposes of testing, I recommend leaving these lines commented-out so you can test if the snapshot is created correctly first.

# Create hard-link from overnight snapshot.
#
DATE=`date +%F-%A`

echo -e "\nCreating daily directory $DATE"

mkdir $DAILY/$DATE

echo -e "\nCreating hard linked data tree...\n"

cp -al $SNAPSHOT/* $DAILY/$DATE/

The job is now finished, so we remove the lock file and un-set our variables.

echo "Rsync job finished: `date`"


# Job finished, remove lock file.
#
rm -f $LOCK_FILE

# Clean-up.
#
unset DATE
unset DAILY
unset SNAPSHOT
unset ROUTE
unset LOCK_FILE
unset OPT

echo "Done!"

exit 0

If you are wondering why there are lots of echo lines, this is to make it easier to trace errors when testing, and it provides a handy guide when used via cron as the results are emailed to you (assuming your server is configured to route email!).

Putting it all together:

#!/bin/bash
#
# Rsync job control file for <name of host>.
#
# Last updated: 13th December 2012

#---------------------------------------------------------------
# Paths, variables and system checks.

# Path to lock file.
#
LOCK_FILE=/var/lock/subsys/hostname_rsync

# Options
#
OPT=va

# Paths to back-up directories
#
SNAPSHOT=/backup/snapshot/hostname/
DAILY=/backup/daily/hostname/

# Check directories exist, if not create them.

# if [ ! -d $SNAPSHOT ]; then
     mkdir -p $SNAPSHOT
fi

if [ ! -d $DAILY ]; then
     mkdir -p $DAILY
fi

# Route
#
ROUTE=root@hostname

#---------------------------------------------------------------

# Start the back-up process.
#
if [ -f $LOCK_FILE ]; then
     echo "Rsync job already running!"
     exit 1
fi

echo "Rsync job started: `date`"

# Create lock file.
#
touch $LOCK_FILE


# /etc directory
#
echo -e "\n*/etc*\n"
rsync -$OPT --delete $ROUTE:/etc $SNAPSHOT

# /home directory
# echo -e "\n*/home*\n"
rsync -$OPT --delete $ROUTE:/home $SNAPSHOT

# /root directory
#
echo -e "\n*/root*\n"
rsync -$OPT --delete $ROUTE:/root $SNAPSHOT

# /var directory
#
echo -e "\n*/var*\n"
rsync -$OPT --delete --exclude-from=/root/rsync-scripts/var_exclude $ROUTE:/var $SNAPSHOT


# Create hard-link from overnight snapshot.
#
DATE=`date +%F-%A`

echo -e "\nCreating daily directory $DATE"

mkdir $DAILY/$DATE

echo -e "\nCreating hard linked data tree...\n"

cp -al $SNAPSHOT/* $DAILY/$DATE/

echo "Rsync job finished: `date`"


# Job finished, remove lock file.
#
rm -f $LOCK_FILE

# Clean-up.
#
unset DATE
unset DAILY
unset SNAPSHOT
unset ROUTE
unset LOCK_FILE
unset OPT

echo "Done!"

exit 0

Remember to give your script 700 or 755 permissions, or it will not run.


4: Cron

It is extremely handy to control when the Rsync job is run, so we use cron to schedule the jobs at suitable times in the night. Create a file in /etc/cron.d called rsync-jobs (or what ever suits your convention) and set the times you would like your jobs to run. If your server is configured to send email, you might like to tell cron where to send the job output so you can monitor each run.

Here is an example cron file:

# Cron job to run the Rsync jobs
#
# Last updated: 13th December 2012

MAILTO=root


# Back-up host1 at 00:30
#
30 00 * * * root /root/rsync-scripts/host1_rsync


# Back-up host2 at 01:00
#
00 01 * * * root /root/rsync-scripts/host2_rsync


# Back-up host3 at 02:00
#
00 02 * * * root /root/rsync-scripts/host2_rsync


# Back-up host4 at 03:00
#
00 03 * * * root /root/rsync-scripts/host3_rsync

Each job runs every day, every month and every year, as root, at the set time. You can of course adjust the settings to only run a job on a specific day, every other day, etc. Search the web for more information on cron settings if you wish to try this for yourself. Remember to restart the cron daemon if you make changes to this file (and when you first create it).


5: Results

You should now be the proud owner of a centralised Rsync back-up server. Each time the jobs are run, the snapshot directory will be updated and a new daily copy will be created; ready to be searched when someone realises they deleted a file a week ago and they might like it back!

Tip: You may wish to exclude the /backup directory from being indexed by the mlocate (slocate on some systems) database to avoid creating huge databases that hog I/O when indexing/searching. This will mean you will have to search for files yourself, which can be tedious from the command line, so I recommend sharing the /backup directory via NFS and searching through with a graphical file browser.


6: 30-days later

You may wish to only retain 30-days of the daily back-up data. Here is an example script to delete unwanted daily directories once they reach the 30 day limit:

#!/bin/bash
#
# Script to delete old daily back-ups
#
# Last updated: 13th December 2012

echo -e "*Deleting back-up directories older than 30 days*\n"

cd /backup/daily/

find -mindepth 1 -maxdepth 2 -type d -mtime +30 -ls -exec rm -fr '{}' +

echo -e "\n*Deletion complete*\n"

exit 0

The find command searches two directories below /backup/daily/, so any directories below /backup/daily/hostname/... will be checked for their modify time and deleted if they are older than 30 days. I call my file delete-backups and store it in /root. I call it daily from /etc/cron.daily via a symlink (ln -s /root/delete-backups /etc/cron.daily/delete-backups). If you wanted to store 60-days worth of daily back-ups, simply set -mtime +60.

External Links

Rsync website

Rsync FAQ

Rsync README

Rsync man page

Rsync over SSH

Rsync and Stunnel


Page updated: 10th May 2017