Difference between revisions of "Backup strategies"

From MayhakWiki
Jump to: navigation, search
(Initial version)
 
(Small changes to image creation)
(Tag: Mobile edit)
(3 intermediate revisions by the same user not shown)
Line 3: Line 3:
 
==Backing up the home server to the remote server==
 
==Backing up the home server to the remote server==
  
I currently use a script like the following to backup my home server (which is considered the original copy of all my files) to my remote server.  This script lives on and is run on the remote server, and it is run as under my regular login (not root).
+
I currently use a script like the following to backup my home server (which is considered the original copy of all my files) to my remote server.  This script lives on and is run on the remote server, and it is run under my regular login (not root).
  
 
<syntaxhighlight lang="bash">
 
<syntaxhighlight lang="bash">
Line 13: Line 13:
 
mv /srv/backups/[user]/old.2 /srv/backups/[user]/old.3
 
mv /srv/backups/[user]/old.2 /srv/backups/[user]/old.3
 
mv /srv/backups/[user]/old.1 /srv/backups/[user]/old.2
 
mv /srv/backups/[user]/old.1 /srv/backups/[user]/old.2
mv /srv/backups/c[user]/urrent.0 /srv/backups/[user]/old.1
+
mv /srv/backups/[user]/current.0 /srv/backups/[user]/old.1
  
 
# Perform backup
 
# Perform backup
Line 21: Line 21:
 
</syntaxhighlight>
 
</syntaxhighlight>
  
First, I rotate the existing backup directories, deleting the oldest copy.  Then, I use rsync to copy any changes from the home server to the remote server, linking to the previous copy of the backup wherever files remained the same.  This way, only new or changed files take up additional space; all five backup directories share most of their data.
+
First, I rotate the existing backup directories, deleting the oldest copy.  Then, I use rsync to copy any changes from the home server to the remote server, linking to the previous copy of the backup wherever files remain the same.  This way, only new or changed files take up additional space; all five backup directories share most of their data using hardlinks.
  
 
==Verifying backups==
 
==Verifying backups==
  
Occasionally, I want to verify that nothing fishy has happened between backups.  To verify that files that shouldn't have changed are truly the same, I use rsync's checksum option and perform a dry run to compare two locations:
+
Occasionally, I want to verify that nothing fishy has happened between backups.  To verify files that shouldn't have changed are truly the same, I use rsync's checksum option to compare the two locations:
  
 
<code>rsync -ricn [user]@[home server]:~/ /srv/backups/[user]/current.0/</code>
 
<code>rsync -ricn [user]@[home server]:~/ /srv/backups/[user]/current.0/</code>
  
This takes longer, but it compares every single file's checksum (the "c") to verify the data has remained the same even if the file size and modified date haven't changed.  The itemize option (the "i") produces a detailed list of what's changed, and the "n" tells it to do a dry run (so it's not actually updating the backup).
+
This takes longer than a normal sync, but it compares every single file's checksum (the "c") to verify the data has remained the same even if the file size and modified date haven't changed.  The itemize option (the "i") produces a detailed list of what's changed, and the "n" tells it to do a dry run (so it's not actually updating the backup).
  
 
==Periodic backups to BD-R==
 
==Periodic backups to BD-R==
  
Both my home server and remote server are always connected to the Internet and electricity, and I realized I wanted a completely offline copy of my data that would be resistant to electromagnetic and power surge events.  Since the cost of BD-R discs has come down considerably (less than $25 for a 50-pack), it seems like the perfect option for a write once read many (WORM) offline backup solution.  When stored in a cool, dry place, a BD-R should outlast me; and if I create a new set of discs every few months, it doesn't even matter how long the oldest copies last).  A stack of discs should also be easy to mail to a friend or family member for safekeeping, making this a great off-site option that can't be tampered with (without damaging the disc in some way).
+
Both my home server and remote server are always connected to the Internet and electricity, and I realized I wanted a completely offline copy of my data that would be resistant to electromagnetic and power surge events.  Since the cost of BD-R media has come down considerably (less than $25 for a 50-pack of single-layer discs), it seems like the perfect option for a write once read many (WORM) offline backup solution.  When stored in a cool, dry place, a BD-R should outlast me; and if I create a new set of discs every few months, it doesn't even matter how long the oldest copies last.  A stack of discs should also be easy to mail to a friend or family member for safekeeping, making this a great off-site option that can't be tampered with (without damaging the disc in some way).
  
While the kind of multi-location disaster that would be required for this to be necessary seems very unlikely, I figured it was worth the <$25 investment (I already own a Blu-ray recorder) to try it out.  Here's what I have so far.
+
While the kind of multi-location disaster that would be required for this to be necessary seems very unlikely to occur, I figured it was worth the <$25 investment (I already own a Blu-ray recorder) to try it out.  Here's what I have so far.
  
 
===Generate a giant archive and split it into BD-sized files===
 
===Generate a giant archive and split it into BD-sized files===
  
First, I take my current.0 folder and compress it into one big archive, splitting it into 22GiB chunks along the way.  Why 22GiB? you may ask.  I just felt like that was a good place to start since the actual usable space on a single-layer BD-R is not 25GiB but rather somewhere under 25GB, which is a little under 23GiB as far as I can tell.  If I ran into a situation where I would end up with the last disc only having a GiB of data on it or something, I might adjust the split size so it could all fit on n-1 discs.
+
First, I take my current.0 folder and compress it into one big archive, splitting it into 22GiB chunks along the way.  "Why 22GiB?" you may ask.  I just felt like that was a good place to start since the actual space on a single-layer BD-R is not 25GiB but rather 25GB, which leaves you with a little under 23GiB after formatting and defect management are taken into account.  If I ran into a situation where I would end up with the last disc only having a GiB of data on it or something, I might adjust the split size so it could all fit on n-1 discs.
  
I run this in the main backups folder on the remote server (note: this will require your remote server is less than half full):
+
I run this in the main backups folder on the remote server (note: this will require your remote server to be less than half full):
  
 
<code>tar cvzf - [user]/current.0 | split -d -b 22G - "[user]-YYYY-MM-DD.tar.gz."</code>
 
<code>tar cvzf - [user]/current.0 | split -d -b 22G - "[user]-YYYY-MM-DD.tar.gz."</code>
  
This should create as many 22GiB files as are needed, compressing all the data using gzip so it will take up a bit less space than the originals, and those files will end in .00, .01, etc.  For my most recent attempt at this, my original backup took up ~385GiB and the resulting split archive files took up ~352GiB.  I have a lot of pictures and music files that are already compressed, so gzip just helps with documents and such.  I could use "J" instead of "z" in the tar command to get more compression using xz; but in a small test where gzip took ~15 minutes, xz took over an hour-and-a-half.  Not worth it for me.
+
This should create as many 22GiB files as are needed, compressing all the data using gzip so it will take up a bit less space than the original, and those files will end in .00, .01, etc.  For my first attempt at this, my original backup took up ~382GiB and the resulting split archive files took up ~351GiB.  I have a lot of pictures and music files that are already compressed, so gzip just helps with documents and such.  I could use "J" instead of "z" in the tar command to get more compression using xz; but in a smaller test where gzip took ~15 minutes, xz took over an hour-and-a-half.  Not worth it for me; the chance it would save an entire BD-Rs worth of space is slim, and the cost of an extra disc is worth the time savings.
  
 
Once these split archive files have been generated, it's time to generate some checksums that we can later use to verify these files have remained unscathed.  For that, I use sha256sum:
 
Once these split archive files have been generated, it's time to generate some checksums that we can later use to verify these files have remained unscathed.  For that, I use sha256sum:
Line 51: Line 51:
 
<code>sha256sum [user]-YYYY-MM-DD.tar.gz.* > [user]-YYYY-MM-DD.tar.gz-SHA256SUM</code>
 
<code>sha256sum [user]-YYYY-MM-DD.tar.gz.* > [user]-YYYY-MM-DD.tar.gz-SHA256SUM</code>
  
We now have files that should fit on a series of BD-Rs, and a checksum file we can use to verify those files if we ever need to restore this backup.
+
We now have files that should fit on a series of BD-Rs, and a checksum file we can use to verify those files if we ever need to restore the backup.
 +
 
 +
===Create disc images and burn===
 +
 
 +
This part is as yet untested, but it makes sense to me and I've seen these steps in a couple places on the Internet (like [https://wiki.gentoo.org/wiki/CD/DVD/BD_writing here] and [https://irishjesus.wordpress.com/2010/10/17/blu-ray-movie-authoring-in-linux/ here]).
 +
 
 +
First, we create an empty file the size of the writable space on a single-layer BD-R and format that image with mkudffs:
 +
 
 +
<code>mkudffs -l "YYYY-MM-DD_BACKUP-00" -b 2048 backup-00.udf 11826176</code>
 +
 
 +
Second, we mount it to a temporary location and copy the first part of the archive into the image along with our checksum file (every disc will have a copy of that):
 +
 
 +
<code>mount backup-00.udf /mnt/tmp/</code>
 +
 
 +
<code>cp [user]-YYYY-MM-DD.tar.gz.00 /mnt/tmp/</code>
 +
 
 +
<code>cp [user]-YYYY-MM-DD.tar.gz-SHA256SUM /mnt/tmp/</code>
 +
 
 +
Lastly, we unmount the image and burn it to the disc:
 +
 
 +
<code>umount /mnt/tmp/</code>
 +
 
 +
<code>growisofs -dvd-compat -Z /dev/sr0=backup-00.udf</code>
 +
 
 +
===Verifying, combining, and extracting the archive parts===
 +
 
 +
If I should ever need to restore the backup from the BD-Rs, I'll copy the files to a central location and verify their integrity using sha256sum:
 +
 
 +
<code>sha256sum -c [user]-YYYY-MM-DD.tar.gz-SHA256SUM</code>
 +
 
 +
Then, I can combine the individual pieces and extract the archive:
 +
 
 +
<code>cat [user]-YYYY-MM-DD.tar.gz.* | tar xf - -C /srv/backups/restored/</code>

Revision as of 02:17, 19 May 2020

Here I'm going to try and document the main parts of my backup strategy. This should not be taken as advice; I'm sure it can all be done better. Feel free to e-mail me if you see any fatal flaws or have suggestions. I prefer to keep things simple, and my data doesn't change frequently enough that I feel any need to automate the process.

Backing up the home server to the remote server

I currently use a script like the following to backup my home server (which is considered the original copy of all my files) to my remote server. This script lives on and is run on the remote server, and it is run under my regular login (not root).

#!/bin/sh
 
# Rearrange existing rsync directories
rm -rf /srv/backups/[user]/old.4
mv /srv/backups/[user]/old.3 /srv/backups/[user]/old.4
mv /srv/backups/[user]/old.2 /srv/backups/[user]/old.3
mv /srv/backups/[user]/old.1 /srv/backups/[user]/old.2
mv /srv/backups/[user]/current.0 /srv/backups/[user]/old.1
 
# Perform backup
rsync --progress -aH --delete \
--link-dest=/srv/backups/[user]/old.1 [user]@[home server]:~/ \
/srv/backups/[user]/current.0/

First, I rotate the existing backup directories, deleting the oldest copy. Then, I use rsync to copy any changes from the home server to the remote server, linking to the previous copy of the backup wherever files remain the same. This way, only new or changed files take up additional space; all five backup directories share most of their data using hardlinks.

Verifying backups

Occasionally, I want to verify that nothing fishy has happened between backups. To verify files that shouldn't have changed are truly the same, I use rsync's checksum option to compare the two locations:

rsync -ricn [user]@[home server]:~/ /srv/backups/[user]/current.0/

This takes longer than a normal sync, but it compares every single file's checksum (the "c") to verify the data has remained the same even if the file size and modified date haven't changed. The itemize option (the "i") produces a detailed list of what's changed, and the "n" tells it to do a dry run (so it's not actually updating the backup).

Periodic backups to BD-R

Both my home server and remote server are always connected to the Internet and electricity, and I realized I wanted a completely offline copy of my data that would be resistant to electromagnetic and power surge events. Since the cost of BD-R media has come down considerably (less than $25 for a 50-pack of single-layer discs), it seems like the perfect option for a write once read many (WORM) offline backup solution. When stored in a cool, dry place, a BD-R should outlast me; and if I create a new set of discs every few months, it doesn't even matter how long the oldest copies last. A stack of discs should also be easy to mail to a friend or family member for safekeeping, making this a great off-site option that can't be tampered with (without damaging the disc in some way).

While the kind of multi-location disaster that would be required for this to be necessary seems very unlikely to occur, I figured it was worth the <$25 investment (I already own a Blu-ray recorder) to try it out. Here's what I have so far.

Generate a giant archive and split it into BD-sized files

First, I take my current.0 folder and compress it into one big archive, splitting it into 22GiB chunks along the way. "Why 22GiB?" you may ask. I just felt like that was a good place to start since the actual space on a single-layer BD-R is not 25GiB but rather 25GB, which leaves you with a little under 23GiB after formatting and defect management are taken into account. If I ran into a situation where I would end up with the last disc only having a GiB of data on it or something, I might adjust the split size so it could all fit on n-1 discs.

I run this in the main backups folder on the remote server (note: this will require your remote server to be less than half full):

tar cvzf - [user]/current.0 | split -d -b 22G - "[user]-YYYY-MM-DD.tar.gz."

This should create as many 22GiB files as are needed, compressing all the data using gzip so it will take up a bit less space than the original, and those files will end in .00, .01, etc. For my first attempt at this, my original backup took up ~382GiB and the resulting split archive files took up ~351GiB. I have a lot of pictures and music files that are already compressed, so gzip just helps with documents and such. I could use "J" instead of "z" in the tar command to get more compression using xz; but in a smaller test where gzip took ~15 minutes, xz took over an hour-and-a-half. Not worth it for me; the chance it would save an entire BD-Rs worth of space is slim, and the cost of an extra disc is worth the time savings.

Once these split archive files have been generated, it's time to generate some checksums that we can later use to verify these files have remained unscathed. For that, I use sha256sum:

sha256sum [user]-YYYY-MM-DD.tar.gz.* > [user]-YYYY-MM-DD.tar.gz-SHA256SUM

We now have files that should fit on a series of BD-Rs, and a checksum file we can use to verify those files if we ever need to restore the backup.

Create disc images and burn

This part is as yet untested, but it makes sense to me and I've seen these steps in a couple places on the Internet (like here and here).

First, we create an empty file the size of the writable space on a single-layer BD-R and format that image with mkudffs:

mkudffs -l "YYYY-MM-DD_BACKUP-00" -b 2048 backup-00.udf 11826176

Second, we mount it to a temporary location and copy the first part of the archive into the image along with our checksum file (every disc will have a copy of that):

mount backup-00.udf /mnt/tmp/

cp [user]-YYYY-MM-DD.tar.gz.00 /mnt/tmp/

cp [user]-YYYY-MM-DD.tar.gz-SHA256SUM /mnt/tmp/

Lastly, we unmount the image and burn it to the disc:

umount /mnt/tmp/

growisofs -dvd-compat -Z /dev/sr0=backup-00.udf

Verifying, combining, and extracting the archive parts

If I should ever need to restore the backup from the BD-Rs, I'll copy the files to a central location and verify their integrity using sha256sum:

sha256sum -c [user]-YYYY-MM-DD.tar.gz-SHA256SUM

Then, I can combine the individual pieces and extract the archive:

cat [user]-YYYY-MM-DD.tar.gz.* | tar xf - -C /srv/backups/restored/