Skip Navigation

Backups: Am I doing this right?

I'm in the process of setting up backups for my home server, and I feel like I'm swimming upstream. It makes me think I'm just taking the wrong approach.

I'm on a shoestring budget at the moment, so I won't really be able to implement a 3-2-1 strategy just yet. I figure the most bang for my buck right now is to set up off-site backups to a cloud provider. I first decided to do a full-system backup in the hopes I could just restore it and immediately be up and running again. I've seen a lot of comments saying this is the wrong approach, although I haven't seen anyone outline exactly why.

I then decided I would instead cherry-pick my backup locations instead. Then I started reading about backing up databases, and it seems you can't just back up the data directory (or file in the case of SQLite) and call it good. You need to dump them first and backup the dumps.

So, now I'm configuring a docker-db-backup container to back each one of them up, finding database containers and SQLite databases and configuring a backup job for each one. Then, I hope to drop all of those dumps into a single location and back that up to the cloud. This means that, if I need to rebuild, I'll have to restore the containers' volumes, restore the backups, bring up new containers, and then restore each container's backup into the new database. It's pretty far from my initial hope of being able to restore all the files and start using the newly restored system.

Am I going down the wrong path here, or is this just the best way to do it?

23 comments
  • I figure the most bang for my buck right now is to set up off-site backups to a cloud provider.

    Check out Borgbase, it's very cheap and it's an actual backup solution, so it offers some features you won't get from Google drive or whatever you were considering using e.g. deduplication, recover data at different points in time and have the data be encrypted so there's no way for them to access it.

    I first decided to do a full-system backup in the hopes I could just restore it and immediately be up and running again. I've seen a lot of comments saying this is the wrong approach, although I haven't seen anyone outline exactly why.

    The vast majority of your system is the same as it would be if you install fresh, so you're wasting backup space in storing data you can easily recover in other ways. You would only need to store changes you made to the system, e.g. which packages are installed (just get the list of packages then run an install on them, no need to backup the binaries) and which config changes you made. Plus if you're using docker for services (which you really should) the services too are very easy to recover. So if you backup the compose file and config folders for those services (and obviously the data itself) you can get back in almost no time. Also even if you do a full system backup you would need to chroot into that system to install a bootloader, so it's not as straightforward as you think (unless your backup is a dd of the disk, which is a bad idea for many other reasons).

    I then decided I would instead cherry-pick my backup locations instead. Then I started reading about backing up databases, and it seems you can't just back up the data directory (or file in the case of SQLite) and call it good. You need to dump them first and backup the dumps.

    Yes and no. You can backup the file completely, but it's not a good practice. The reason is that if the file gets corrupted you will lose all data, whereas if you dumped the database contents and backed that up is much less likely to corrupt. But in actuality there's no reason why backing up the files themselves shouldn't work (in fact when you launch a docker container it's always an entirely new database pointed to the same data folder)

    So, now I'm configuring a docker-db-backup container to back each one of them up, finding database containers and SQLite databases and configuring a backup job for each one. Then, I hope to drop all of those dumps into a single location and back that up to the cloud. This means that, if I need to rebuild, I'll have to restore the containers' volumes, restore the backups, bring up new containers, and then restore each container's backup into the new database. It's pretty far from my initial hope of being able to restore all the files and start using the newly restored system.

    Am I going down the wrong path here, or is this just the best way to do it?

    That seems like the safest approach. If you're concerned about it being too much work I recommend you write a script to automate the process, or even better an Ansible playbook.

  • Just remember any backup is better than nothing. Even if the backup is done wrong (this includes untested!) odds are you can read it and extract at least some data, it just may take a lot of time. Backups that are done right just mean that when (not if!) your computers break you are quickly back up and running.

    There are several reasons to backup data only and not the full system. First you may be unable to find a computer exactly/enough like the one that broke, and so the old system backup won't even run. Second, even if you can find an identical enough system, do you want to, or maybe it is time to upgrade anyway - there are pros and cons of arm (raspberry pi) vs x86 servers (there are other obscure options you might want but those are the main ones), and you may want to switch anyway since you have. Third, odds are some of the services need to be upgraded and so you may as well use this forced computer time to apply the upgrade. Last, you may change how many servers you have, should you split services to different computers, or maybe consolidate the services on the system that died to some other server you already have.

    The only advantage of a full system backup is when they work they are the fastest way to get going again.

  • I first decided to do a full-system backup in the hopes I could just restore it and immediately be up and running again. I’ve seen a lot of comments saying this is the wrong approach, although I haven’t seen anyone outline exactly why.

    The main downside is the size of the backup, since you're backing up the entire OS with cache files, log files, other junk, and so on. Otherwise it's fine.

    Then I started reading about backing up databases, and it seems you can’t just back up the data directory (or file in the case of SQLite) and call it good. You need to dump them first and backup the dumps.

    You can back up the data directory, that works fine for selfhosted stuff generally because we don't have tons of users writing to the database constantly.

    If you back up /var/lib/docker/volumes, your docker-compose.yaml files for each service, and any other bind mount directories you use in the compose files, then restoring is as easy as pulling all the data back to the new system and running docker compose up -d on each service.

    I highly recommend Backrest which uses Restic for backups, very easy to configure and supports Healthchecks integration for easy notifications if backups fail for some reason.

    • Second rest and backrest!

    • If that's the main downside to a full-system backup, I might go ahead and try it. I'll check out Backrest too. Looks great!

      • Yeah there are plenty of advantages of a full system backup, like not having to worry that you're backing up all the specific directories needed, and super easy restores since the whole bootable system is saved.

        Personally I do both, I have a full system backup to local storage using Proxmox Backup Server, and then to Backblaze B2 using Restic I backup only the really important stuff.

  • Some things you should determine first:

    1. Total amount of data you will be backing up
    2. Frequency of backups
    3. Number of copies to keep

    Plug these numbers into cost calculators for whatever cloud service you're hoping to use, because this is honestly not going to be the cheapest route to store off-site if there are ingress charges like with S3.

    I know Cloudflare's R2 service doesn't charge for ingress or egress (for now), but you might be able to find something even cheaper if you're only backing up certain types of data that can be easily compressed.

    I'd also investigate cheap ways to maybe just store an off-site drive with your data: office/work, family house, friends house...etc. Storage devices are way cheaper than monthly cloud costs.

    • Had considered a device with some storage at a family member's house, but then I'd have to maintain that, fix it if it goes down, replace it if it breaks, etc. I think I'd prefer a small monthly fee for now, even if it may work out more expensive in the long run.

      Good call on the cost calculation. I'll take another look at those factors...

      • There's also the option of just leaving an offline disk at someone's and visiting them regularly to update the backup.

        Having an entirely offline copy also protects you/mitigates against a few additional hazards.

23 comments