Difference between revisions of "Backup strategies"

From MayhakWiki
Jump to: navigation, search
(Initial version)
(No difference)

Revision as of 20:37, 18 May 2020

Here I'm going to try and document the main parts of my backup strategy. This should not be taken as advice; I'm sure it can all be done better. Feel free to e-mail me if you see any fatal flaws or have suggestions. I prefer to keep things simple, and my data doesn't change frequently enough that I feel any need to automate the process.

Backing up the home server to the remote server

I currently use a script like the following to backup my home server (which is considered the original copy of all my files) to my remote server. This script lives on and is run on the remote server, and it is run as under my regular login (not root).

#!/bin/sh
 
# Rearrange existing rsync directories
rm -rf /srv/backups/[user]/old.4
mv /srv/backups/[user]/old.3 /srv/backups/[user]/old.4
mv /srv/backups/[user]/old.2 /srv/backups/[user]/old.3
mv /srv/backups/[user]/old.1 /srv/backups/[user]/old.2
mv /srv/backups/c[user]/urrent.0 /srv/backups/[user]/old.1
 
# Perform backup
rsync --progress -aH --delete \
--link-dest=/srv/backups/[user]/old.1 [user]@[home server]:~/ \
/srv/backups/[user]/current.0/

First, I rotate the existing backup directories, deleting the oldest copy. Then, I use rsync to copy any changes from the home server to the remote server, linking to the previous copy of the backup wherever files remained the same. This way, only new or changed files take up additional space; all five backup directories share most of their data.

Verifying backups

Occasionally, I want to verify that nothing fishy has happened between backups. To verify that files that shouldn't have changed are truly the same, I use rsync's checksum option and perform a dry run to compare two locations:

rsync -ricn [user]@[home server]:~/ /srv/backups/[user]/current.0/

This takes longer, but it compares every single file's checksum (the "c") to verify the data has remained the same even if the file size and modified date haven't changed. The itemize option (the "i") produces a detailed list of what's changed, and the "n" tells it to do a dry run (so it's not actually updating the backup).

Periodic backups to BD-R

Both my home server and remote server are always connected to the Internet and electricity, and I realized I wanted a completely offline copy of my data that would be resistant to electromagnetic and power surge events. Since the cost of BD-R discs has come down considerably (less than $25 for a 50-pack), it seems like the perfect option for a write once read many (WORM) offline backup solution. When stored in a cool, dry place, a BD-R should outlast me; and if I create a new set of discs every few months, it doesn't even matter how long the oldest copies last). A stack of discs should also be easy to mail to a friend or family member for safekeeping, making this a great off-site option that can't be tampered with (without damaging the disc in some way).

While the kind of multi-location disaster that would be required for this to be necessary seems very unlikely, I figured it was worth the <$25 investment (I already own a Blu-ray recorder) to try it out. Here's what I have so far.

Generate a giant archive and split it into BD-sized files

First, I take my current.0 folder and compress it into one big archive, splitting it into 22GiB chunks along the way. Why 22GiB? you may ask. I just felt like that was a good place to start since the actual usable space on a single-layer BD-R is not 25GiB but rather somewhere under 25GB, which is a little under 23GiB as far as I can tell. If I ran into a situation where I would end up with the last disc only having a GiB of data on it or something, I might adjust the split size so it could all fit on n-1 discs.

I run this in the main backups folder on the remote server (note: this will require your remote server is less than half full):

tar cvzf - [user]/current.0 | split -d -b 22G - "[user]-YYYY-MM-DD.tar.gz."

This should create as many 22GiB files as are needed, compressing all the data using gzip so it will take up a bit less space than the originals, and those files will end in .00, .01, etc. For my most recent attempt at this, my original backup took up ~385GiB and the resulting split archive files took up ~352GiB. I have a lot of pictures and music files that are already compressed, so gzip just helps with documents and such. I could use "J" instead of "z" in the tar command to get more compression using xz; but in a small test where gzip took ~15 minutes, xz took over an hour-and-a-half. Not worth it for me.

Once these split archive files have been generated, it's time to generate some checksums that we can later use to verify these files have remained unscathed. For that, I use sha256sum:

sha256sum [user]-YYYY-MM-DD.tar.gz.* > [user]-YYYY-MM-DD.tar.gz-SHA256SUM

We now have files that should fit on a series of BD-Rs, and a checksum file we can use to verify those files if we ever need to restore this backup.