Difference between revisions of "Backup strategies"

From MayhakWiki
Jump to: navigation, search
(Small changes to image creation)
(Tag: Mobile edit)
(Add note about retaining old SHA256SUM files)
 
(4 intermediate revisions by the same user not shown)
Line 47: Line 47:
 
This should create as many 22GiB files as are needed, compressing all the data using gzip so it will take up a bit less space than the original, and those files will end in .00, .01, etc.  For my first attempt at this, my original backup took up ~382GiB and the resulting split archive files took up ~351GiB.  I have a lot of pictures and music files that are already compressed, so gzip just helps with documents and such.  I could use "J" instead of "z" in the tar command to get more compression using xz; but in a smaller test where gzip took ~15 minutes, xz took over an hour-and-a-half.  Not worth it for me; the chance it would save an entire BD-Rs worth of space is slim, and the cost of an extra disc is worth the time savings.
 
This should create as many 22GiB files as are needed, compressing all the data using gzip so it will take up a bit less space than the original, and those files will end in .00, .01, etc.  For my first attempt at this, my original backup took up ~382GiB and the resulting split archive files took up ~351GiB.  I have a lot of pictures and music files that are already compressed, so gzip just helps with documents and such.  I could use "J" instead of "z" in the tar command to get more compression using xz; but in a smaller test where gzip took ~15 minutes, xz took over an hour-and-a-half.  Not worth it for me; the chance it would save an entire BD-Rs worth of space is slim, and the cost of an extra disc is worth the time savings.
  
Once these split archive files have been generated, it's time to generate some checksums that we can later use to verify these files have remained unscathed.  For that, I use sha256sum:
+
Once these split archive files have been generated, it's time to generate some checksums that I can later use to verify these files have remained unscathed.  For that, I use sha256sum:
  
 
<code>sha256sum [user]-YYYY-MM-DD.tar.gz.* > [user]-YYYY-MM-DD.tar.gz-SHA256SUM</code>
 
<code>sha256sum [user]-YYYY-MM-DD.tar.gz.* > [user]-YYYY-MM-DD.tar.gz-SHA256SUM</code>
  
We now have files that should fit on a series of BD-Rs, and a checksum file we can use to verify those files if we ever need to restore the backup.
+
I now have files that should fit on a series of BD-Rs, and a checksum file I can use to verify those files if I ever need to restore the backup.
  
 
===Create disc images and burn===
 
===Create disc images and burn===
Line 57: Line 57:
 
This part is as yet untested, but it makes sense to me and I've seen these steps in a couple places on the Internet (like [https://wiki.gentoo.org/wiki/CD/DVD/BD_writing here] and [https://irishjesus.wordpress.com/2010/10/17/blu-ray-movie-authoring-in-linux/ here]).
 
This part is as yet untested, but it makes sense to me and I've seen these steps in a couple places on the Internet (like [https://wiki.gentoo.org/wiki/CD/DVD/BD_writing here] and [https://irishjesus.wordpress.com/2010/10/17/blu-ray-movie-authoring-in-linux/ here]).
  
First, we create an empty file the size of the writable space on a single-layer BD-R and format that image with mkudffs:
+
First, I create an empty file the size of the writable space on a single-layer BD-R and format that image with mkudffs:
  
 
<code>mkudffs -l "YYYY-MM-DD_BACKUP-00" -b 2048 backup-00.udf 11826176</code>
 
<code>mkudffs -l "YYYY-MM-DD_BACKUP-00" -b 2048 backup-00.udf 11826176</code>
  
Second, we mount it to a temporary location and copy the first part of the archive into the image along with our checksum file (every disc will have a copy of that):
+
Second, I mount it to a temporary location and copy the first part of the archive into the image along with the checksum file (every disc will have a copy of that):
  
 
<code>mount backup-00.udf /mnt/tmp/</code>
 
<code>mount backup-00.udf /mnt/tmp/</code>
Line 69: Line 69:
 
<code>cp [user]-YYYY-MM-DD.tar.gz-SHA256SUM /mnt/tmp/</code>
 
<code>cp [user]-YYYY-MM-DD.tar.gz-SHA256SUM /mnt/tmp/</code>
  
Lastly, we unmount the image and burn it to the disc:
+
Lastly, I unmount the image and burn it to the disc:
  
 
<code>umount /mnt/tmp/</code>
 
<code>umount /mnt/tmp/</code>
Line 84: Line 84:
  
 
<code>cat [user]-YYYY-MM-DD.tar.gz.* | tar xf - -C /srv/backups/restored/</code>
 
<code>cat [user]-YYYY-MM-DD.tar.gz.* | tar xf - -C /srv/backups/restored/</code>
 +
 +
==Additional ideas==
 +
 +
===Creating UDF images and mounting them before downloading backup files from remote server===
 +
 +
The first time I tried backing up my data to BD-Rs (May 2020), I created the multi-part archive on the remote server and then downloaded those parts to an external drive at home.  I then proceeded to create one UDF image at a time as I burned each BD-R.  However, since the external drive I have is currently only USB 2.0 (I know, I know), this means copying the archive parts into the UDF images basically doubles how long it takes before I can start the burn process.  The next time I attempt this, I think I'll try the following:
 +
 +
# Create as many UDF images as I need before downloading any backup files
 +
# Mount those images to various mount points (/mnt/backup-disc01, etc.)
 +
# Download the archive parts directly into those UDF mount points
 +
# Unmount the images
 +
#* This can be done as each part finishes downloading
 +
# Burn the images
 +
#* This can be done as each part finishes downloading
 +
 +
I probably won't do this for another six months, so we'll see how it goes in November.  I may try to script some of it, as it sounds very repetitive and tedious.
 +
 +
===Keep old SHA256SUM files===
 +
 +
This may be obvious to some folks, but I hadn't thought of it earlier on.  I plan to retain the "*-SHA256SUM" files for old backups, even though I'll be deleting server-side copies of the archive parts (because they now live on BD-Rs and take up a lot more space than my current incremental backup).  This way I not only have the checksums on the discs to verify the archive files, but I also have a copy of them on the server to verify the checksums themselves have not been corrupted on the discs.  Since it's just a few tiny text files, there's no reason not to keep them around indefinitely.

Latest revision as of 16:56, 22 May 2020

Here I'm going to try and document the main parts of my backup strategy. This should not be taken as advice; I'm sure it can all be done better. Feel free to e-mail me if you see any fatal flaws or have suggestions. I prefer to keep things simple, and my data doesn't change frequently enough that I feel any need to automate the process.

Backing up the home server to the remote server

I currently use a script like the following to backup my home server (which is considered the original copy of all my files) to my remote server. This script lives on and is run on the remote server, and it is run under my regular login (not root).

#!/bin/sh
 
# Rearrange existing rsync directories
rm -rf /srv/backups/[user]/old.4
mv /srv/backups/[user]/old.3 /srv/backups/[user]/old.4
mv /srv/backups/[user]/old.2 /srv/backups/[user]/old.3
mv /srv/backups/[user]/old.1 /srv/backups/[user]/old.2
mv /srv/backups/[user]/current.0 /srv/backups/[user]/old.1
 
# Perform backup
rsync --progress -aH --delete \
--link-dest=/srv/backups/[user]/old.1 [user]@[home server]:~/ \
/srv/backups/[user]/current.0/

First, I rotate the existing backup directories, deleting the oldest copy. Then, I use rsync to copy any changes from the home server to the remote server, linking to the previous copy of the backup wherever files remain the same. This way, only new or changed files take up additional space; all five backup directories share most of their data using hardlinks.

Verifying backups

Occasionally, I want to verify that nothing fishy has happened between backups. To verify files that shouldn't have changed are truly the same, I use rsync's checksum option to compare the two locations:

rsync -ricn [user]@[home server]:~/ /srv/backups/[user]/current.0/

This takes longer than a normal sync, but it compares every single file's checksum (the "c") to verify the data has remained the same even if the file size and modified date haven't changed. The itemize option (the "i") produces a detailed list of what's changed, and the "n" tells it to do a dry run (so it's not actually updating the backup).

Periodic backups to BD-R

Both my home server and remote server are always connected to the Internet and electricity, and I realized I wanted a completely offline copy of my data that would be resistant to electromagnetic and power surge events. Since the cost of BD-R media has come down considerably (less than $25 for a 50-pack of single-layer discs), it seems like the perfect option for a write once read many (WORM) offline backup solution. When stored in a cool, dry place, a BD-R should outlast me; and if I create a new set of discs every few months, it doesn't even matter how long the oldest copies last. A stack of discs should also be easy to mail to a friend or family member for safekeeping, making this a great off-site option that can't be tampered with (without damaging the disc in some way).

While the kind of multi-location disaster that would be required for this to be necessary seems very unlikely to occur, I figured it was worth the <$25 investment (I already own a Blu-ray recorder) to try it out. Here's what I have so far.

Generate a giant archive and split it into BD-sized files

First, I take my current.0 folder and compress it into one big archive, splitting it into 22GiB chunks along the way. "Why 22GiB?" you may ask. I just felt like that was a good place to start since the actual space on a single-layer BD-R is not 25GiB but rather 25GB, which leaves you with a little under 23GiB after formatting and defect management are taken into account. If I ran into a situation where I would end up with the last disc only having a GiB of data on it or something, I might adjust the split size so it could all fit on n-1 discs.

I run this in the main backups folder on the remote server (note: this will require your remote server to be less than half full):

tar cvzf - [user]/current.0 | split -d -b 22G - "[user]-YYYY-MM-DD.tar.gz."

This should create as many 22GiB files as are needed, compressing all the data using gzip so it will take up a bit less space than the original, and those files will end in .00, .01, etc. For my first attempt at this, my original backup took up ~382GiB and the resulting split archive files took up ~351GiB. I have a lot of pictures and music files that are already compressed, so gzip just helps with documents and such. I could use "J" instead of "z" in the tar command to get more compression using xz; but in a smaller test where gzip took ~15 minutes, xz took over an hour-and-a-half. Not worth it for me; the chance it would save an entire BD-Rs worth of space is slim, and the cost of an extra disc is worth the time savings.

Once these split archive files have been generated, it's time to generate some checksums that I can later use to verify these files have remained unscathed. For that, I use sha256sum:

sha256sum [user]-YYYY-MM-DD.tar.gz.* > [user]-YYYY-MM-DD.tar.gz-SHA256SUM

I now have files that should fit on a series of BD-Rs, and a checksum file I can use to verify those files if I ever need to restore the backup.

Create disc images and burn

This part is as yet untested, but it makes sense to me and I've seen these steps in a couple places on the Internet (like here and here).

First, I create an empty file the size of the writable space on a single-layer BD-R and format that image with mkudffs:

mkudffs -l "YYYY-MM-DD_BACKUP-00" -b 2048 backup-00.udf 11826176

Second, I mount it to a temporary location and copy the first part of the archive into the image along with the checksum file (every disc will have a copy of that):

mount backup-00.udf /mnt/tmp/

cp [user]-YYYY-MM-DD.tar.gz.00 /mnt/tmp/

cp [user]-YYYY-MM-DD.tar.gz-SHA256SUM /mnt/tmp/

Lastly, I unmount the image and burn it to the disc:

umount /mnt/tmp/

growisofs -dvd-compat -Z /dev/sr0=backup-00.udf

Verifying, combining, and extracting the archive parts

If I should ever need to restore the backup from the BD-Rs, I'll copy the files to a central location and verify their integrity using sha256sum:

sha256sum -c [user]-YYYY-MM-DD.tar.gz-SHA256SUM

Then, I can combine the individual pieces and extract the archive:

cat [user]-YYYY-MM-DD.tar.gz.* | tar xf - -C /srv/backups/restored/

Additional ideas

Creating UDF images and mounting them before downloading backup files from remote server

The first time I tried backing up my data to BD-Rs (May 2020), I created the multi-part archive on the remote server and then downloaded those parts to an external drive at home. I then proceeded to create one UDF image at a time as I burned each BD-R. However, since the external drive I have is currently only USB 2.0 (I know, I know), this means copying the archive parts into the UDF images basically doubles how long it takes before I can start the burn process. The next time I attempt this, I think I'll try the following:

  1. Create as many UDF images as I need before downloading any backup files
  2. Mount those images to various mount points (/mnt/backup-disc01, etc.)
  3. Download the archive parts directly into those UDF mount points
  4. Unmount the images
    • This can be done as each part finishes downloading
  5. Burn the images
    • This can be done as each part finishes downloading

I probably won't do this for another six months, so we'll see how it goes in November. I may try to script some of it, as it sounds very repetitive and tedious.

Keep old SHA256SUM files

This may be obvious to some folks, but I hadn't thought of it earlier on. I plan to retain the "*-SHA256SUM" files for old backups, even though I'll be deleting server-side copies of the archive parts (because they now live on BD-Rs and take up a lot more space than my current incremental backup). This way I not only have the checksums on the discs to verify the archive files, but I also have a copy of them on the server to verify the checksums themselves have not been corrupted on the discs. Since it's just a few tiny text files, there's no reason not to keep them around indefinitely.