Rsync for Linux
This guide is aimed at those who wish to establish a centralised server to store back-up data from remote servers/workstations running GNU/Linux. The steps outlined below are tried and tested and used automatically via cron 365 days a year to maintain back-up data for Gaztronics.com clients. This guide does not require or set-up Rsync as a xinetd daemon.
I recommend you utilise SSH to encrypt all of your Rsync traffic regardless of whether it is being transported over the local network, or the Internet. If it is passing over the Internet, I would strongly recommend you use SSH!
1: SSH
In order to use SSH to encrypt our Rsync traffic (and manage your servers/workstations) we need to create keys to allow secure access without the need for a password. Key-length should be 2048 bits by default.
ssh-keygen
The first step is to use ssh-keygen on our master server to create a set of public and private keys. You should be root (or Super-User) for this step as we will be connecting root-to-root in order to run the back-up scripts with sufficient priveleges.
Enter the command: ssh-keygen and press return. You will be prompted for a location to save the key. This should be /root/.ssh/id_rsa. Press return to accept the default. Press return again twice more to not set a passphrase for the key. If you enter a passphrase, you will be prompted each time you run the back-up job - which will make automatic job running by cron very difficult!
In the .ssh directory, you should now see two files: id_rsa and id_rsa.pub. id_rsa is the private key for this server and id_rsa.pub is the public key which we will place on the remote systems.
Remote host
Login to the remote host, either as root, or as yourself [then become super-user]. In the /root directory, create a .ssh directory with 700 (chmod 700 .ssh/) and root:root (chown root:root .ssh/) permissions. Inside that directory, copy the contents of your id_rsa.pub file into a file called authorized_keys (Note: the Americanized spelling!). This file should be owned by root and have 644 permissions. You can add extra keys to this file if you wish to access the remote host from other systems. Simply add the keys into the file, one per line; and remember to delete old keys when they are no longer valid!
Modify the sshd_config file (usually found in /etc/ssh) to allow root login by removing the # against PermitRootLogin yes. Restart the SSH server-daemon and try logging into the remote system from your master server as the root user. If the SSH keys are correct, your connection should not prompt for a password. Please note: The first time you connect, you will be prompted to confirm the key-fingerprint of the remote host. Once accepted, you will only be prompted again if the key at the remote end is changed.
Bolt-down the SSH config by editing sshd_config again, and this time, set PasswordAuthentication no and restart the SSH server-daemon. This will prevent anyone from accessing the SSH connection via a challenge password. Please note: If you have other users accessing the remote system, ensure they are all using keys before disabling this!
2: Directories
We now need to decide where to store the back-up data. I recommend creating a directory called /backup and this should be on a volume with sufficient space for the data. Inside that directory, I recommend creating two sub-directories called /daily and /snapshot. All will be revealed further down.
As the rsync scripts will be run as root, I recommend creating a directory in /root to keep them all safe; e.g. /root/rsync-scripts.
3: Job control script
We are now ready to create a script to back-up the remote host. I prefer to create a script per-host to enable easy tracking and easy management from the cron daemon. I will leave the naming convention up to you. The script detailed below utilises a trick known as hard-linking - a feature of the Unix cp command that creates pointers in the file-system. This neat trick allows us to store 30, 60 or 90 days-worth of back-up data without actually storing multiple copies of the same file.
Start with a header describing the script and the last time it was edited. If you are working in a team, ask team-members to add comments if they change the file. Then you know who to blame when it all goes wrong!
#!/bin/bash
#
# Rsync job control file for <name of host>.
#
# Last updated: 13th December 2012
We now define paths and variables and set the Rsync options. If you are testing, set OPT=vanP to see the progress of a dry-run. Set hostname_rsync to the name of the remote host to make it easier to see from the lock-files which jobs for which hosts are still running. For the sub-directories in SNAPSHOT and DAILY, set hostname to match the remote host's name, or something meaningful to yourself.
#---------------------------------------------------------------
# Paths, variables and system checks.
# Path to lock file.
#
LOCK_FILE=/var/lock/subsys/hostname_rsync
# Options
#
OPT=va
# Paths to back-up directories
#
SNAPSHOT=/backup/snapshot/hostname/
DAILY=/backup/daily/hostname/
# Check directories exist, if not create them.
# if [ ! -d $SNAPSHOT ]; then
mkdir -p $SNAPSHOT
fi
if [ ! -d $DAILY ]; then
mkdir -p $DAILY
fi
We now define the route to the remote host. This can take the form of a hostname, IP address, or fully-qualified domain name. Using the @ sign causes rsync to assume -e ssh, so there is no need to define it.
# Route
#
ROUTE=root@hostname
Now we start the back-up process for real.
#---------------------------------------------------------------
# Start the back-up process.
#
if [ -f $LOCK_FILE ]; then
echo "Rsync job already running!"
exit 1
fi
echo "Rsync job started: `date`"
# Create lock file.
#
touch $LOCK_FILE
# /etc directory
#
echo -e "\n*/etc*\n"
rsync -$OPT --delete $ROUTE:/etc $SNAPSHOT
# /home directory
# echo -e "\n*/home*\n"
rsync -$OPT --delete $ROUTE:/home $SNAPSHOT
# /root directory
#
echo -e "\n*/root*\n"
rsync -$OPT --delete $ROUTE:/root $SNAPSHOT
# /var directory
#
echo -e "\n*/var*\n"
rsync -$OPT --delete --exclude-from=/root/rsync-scripts/var_exclude $ROUTE:/var $SNAPSHOT
In the above examples, we are separately backing up a selected set of directories, and excluding some sub-directories from the /var directory. You could back-up the root level of the remote host, but you will find a lot of errors are thrown up when parsing over dynamic areas, such as /proc and /dev; and you may not want/need the whole system. Once this part of the script has run, you will have a snapshot of the remote host from the time you ran the job. If you are running this daily, it will pick up any changes and apply them to the snapshot.
The file var_exclude has been created for all of my scripts to use and contains the following:
# Rsync exclude file for /var
#
# Last updated: 7th June 2011
/var/account
/var/cache
/var/crash
/var/db
/var/empty
/var/games
/var/gdm
/var/lib/clamav
/var/lib/mlocate/
/var/local
/var/lock
/var/log
/var/nis
/var/opt
/var/preserve
/var/run
/var/spool
/var/tmp
These are directories that are not needed from a system recovery point of view. It is entirely your choice as to whether you back-up the whole of /var, or leave some directories off. The file has 644 permissions as we only need to read it in.
The next section is magic! It creates a new daily dated set of hard-links to the snapshot, and in the process, it provides a rolling back-up. It is a bit like using daily tapes, only you do not have to change them, or have a tape drive, or any tapes! For the purposes of testing, I recommend leaving these lines commented-out so you can test if the snapshot is created correctly first.
# Create hard-link from overnight snapshot.
#
DATE=`date +%F-%A`
echo -e "\nCreating daily directory $DATE"
mkdir $DAILY/$DATE
echo -e "\nCreating hard linked data tree...\n"
cp -al $SNAPSHOT/* $DAILY/$DATE/
The job is now finished, so we remove the lock file and un-set our variables.
echo "Rsync job finished: `date`"
# Job finished, remove lock file.
#
rm -f $LOCK_FILE
# Clean-up.
#
unset DATE
unset DAILY
unset SNAPSHOT
unset ROUTE
unset LOCK_FILE
unset OPT
echo "Done!"
exit 0
If you are wondering why there are lots of echo lines, this is to make it easier to trace errors when testing, and it provides a handy guide when used via cron as the results are emailed to you (assuming your server is configured to route email!).
Putting it all together:
#!/bin/bash
#
# Rsync job control file for <name of host>.
#
# Last updated: 13th December 2012
#---------------------------------------------------------------
# Paths, variables and system checks.
# Path to lock file.
#
LOCK_FILE=/var/lock/subsys/hostname_rsync
# Options
#
OPT=va
# Paths to back-up directories
#
SNAPSHOT=/backup/snapshot/hostname/
DAILY=/backup/daily/hostname/
# Check directories exist, if not create them.
# if [ ! -d $SNAPSHOT ]; then
mkdir -p $SNAPSHOT
fi
if [ ! -d $DAILY ]; then
mkdir -p $DAILY
fi
# Route
#
ROUTE=root@hostname
#---------------------------------------------------------------
# Start the back-up process.
#
if [ -f $LOCK_FILE ]; then
echo "Rsync job already running!"
exit 1
fi
echo "Rsync job started: `date`"
# Create lock file.
#
touch $LOCK_FILE
# /etc directory
#
echo -e "\n*/etc*\n"
rsync -$OPT --delete $ROUTE:/etc $SNAPSHOT
# /home directory
# echo -e "\n*/home*\n"
rsync -$OPT --delete $ROUTE:/home $SNAPSHOT
# /root directory
#
echo -e "\n*/root*\n"
rsync -$OPT --delete $ROUTE:/root $SNAPSHOT
# /var directory
#
echo -e "\n*/var*\n"
rsync -$OPT --delete --exclude-from=/root/rsync-scripts/var_exclude $ROUTE:/var $SNAPSHOT
# Create hard-link from overnight snapshot.
#
DATE=`date +%F-%A`
echo -e "\nCreating daily directory $DATE"
mkdir $DAILY/$DATE
echo -e "\nCreating hard linked data tree...\n"
cp -al $SNAPSHOT/* $DAILY/$DATE/
echo "Rsync job finished: `date`"
# Job finished, remove lock file.
#
rm -f $LOCK_FILE
# Clean-up.
#
unset DATE
unset DAILY
unset SNAPSHOT
unset ROUTE
unset LOCK_FILE
unset OPT
echo "Done!"
exit 0
Remember to give your script 700 or 755 permissions, or it will not run.
4: Cron
It is extremely handy to control when the Rsync job is run, so we use cron to schedule the jobs at suitable times in the night. Create a file in /etc/cron.d called rsync-jobs (or what ever suits your convention) and set the times you would like your jobs to run. If your server is configured to send email, you might like to tell cron where to send the job output so you can monitor each run.
Here is an example cron file:
# Cron job to run the Rsync jobs
#
# Last updated: 13th December 2012
MAILTO=root
# Back-up host1 at 00:30
#
30 00 * * * root /root/rsync-scripts/host1_rsync
# Back-up host2 at 01:00
#
00 01 * * * root /root/rsync-scripts/host2_rsync
# Back-up host3 at 02:00
#
00 02 * * * root /root/rsync-scripts/host2_rsync
# Back-up host4 at 03:00
#
00 03 * * * root /root/rsync-scripts/host3_rsync
Each job runs every day, every month and every year, as root, at the set time. You can of course adjust the settings to only run a job on a specific day, every other day, etc. Search the web for more information on cron settings if you wish to try this for yourself. Remember to restart the cron daemon if you make changes to this file (and when you first create it).
5: Results
You should now be the proud owner of a centralised Rsync back-up server. Each time the jobs are run, the snapshot directory will be updated and a new daily copy will be created; ready to be searched when someone realises they deleted a file a week ago and they might like it back!
Tip: You may wish to exclude the /backup directory from being indexed by the mlocate (slocate on some systems) database to avoid creating huge databases that hog I/O when indexing/searching. This will mean you will have to search for files yourself, which can be tedious from the command line, so I recommend sharing the /backup directory via NFS and searching through with a graphical file browser.
6: 30-days later
You may wish to only retain 30-days of the daily back-up data. Here is an example script to delete unwanted daily directories once they reach the 30 day limit:
#!/bin/bash
#
# Script to delete old daily back-ups
#
# Last updated: 13th December 2012
echo -e "*Deleting back-up directories older than 30 days*\n"
cd /backup/daily/
find -mindepth 1 -maxdepth 2 -type d -mtime +30 -ls -exec rm -fr '{}' +
echo -e "\n*Deletion complete*\n"
exit 0
The find command searches two directories below /backup/daily/, so any directories below /backup/daily/hostname/... will be checked for their modify time and deleted if they are older than 30 days. I call my file delete-backups and store it in /root. I call it daily from /etc/cron.daily via a symlink (ln -s /root/delete-backups /etc/cron.daily/delete-backups). If you wanted to store 60-days worth of daily back-ups, simply set -mtime +60.
External Links
Page updated: 10th May 2017