
# babackup

Babackup is a tool providing secure, time-traveling backup via ssh and rsync.
It has the following goals:

- No mysteries: It uses standard tools (rsync, ssh), 
    and storage formats (the file system)
    so it's easy to inspect what's going on,
    verify it's happy, and recover from the unexpected.

    It also shows you what changes it makes to system settings.
    
- High security:  Remote access is limited, so compromise of either
    a client or the server have a limited blast radius.
    
    Compromise of a client can at most cost the current and
    immediately prior backup, but cannot spread to remote access of
    the server or the entire backup history.
    
    Compromise of the server cannot spread to remote access of
    clients.  However, the server has a clear-text backup of the data
    from the client, so the data may be disclosed.
    (Server-side encryption is a non-goal.)

    Both the client and server sides can run with or without
    privilege.  With privilege (as root), file user and permissions
    are preserved.
    
    For these reasons, the both the client and the server can be 
    considered "high security" machines, since compromise of either
    will not spread to the other.
    
- History:  One can walk back history, time-machine like.  History is
    adaptively aged so we remember a lot about the near past and some
    about the far past.

- Storage efficiency:  Backups are de-duplicated, so if nothing
    changes from day to day, the incremental storage cost is minimal.
    
    Old backups are kept, with several recent ones and fewer old ones.
    
- Relatively simple: It tries hard to automate all configuration,
    avoiding steps that might get skipped in manual setup.
    
    In automatic mode (the default, the only manual step is
    copying the `backup_server` command to run on the server.

- Recoverability:  Everything fails, including backup systems.  It
  should be possible to recover from the failure of any one component
  (on the client, and the the backup server's system or backup disk)
  from other surviving componets.

    
We expand on how we accomplish these goals in
["Technical Details"](#Technical Details) below.


## Why use babackup?

It provides a secure, turn-key backup approach
that smooths over many rough edges.
    
## Alternatives

There are many, many backup alternatives.

Duplicity (https://duplicity.us) adds server-side encryption, but
gives up on simplicity, integrating directly with librsync.


# Using Babackup

## Normal Operation

On the client, run `babackup` periodically
(say, from cron).  That will see when backups last run, and run a new
one if it's been more than a day.  This side pushes a new, incremental
backup to the server.

Cron entries should run as each user that has a babackup.
A typical client-side cron entry would be:

    5 * * * * /usr/bin/babackup
    
(This crontab entry will be installed for you when you run `babackup`
on the client in automatic mode.)

Babackup reads its default client configuration from 
`~/.config/babackup/client.yaml` or `/etc/babackup/client.yaml`
(when run as root).

On the server, run `babackup_server` 
frequently (say, every 10 or 15 minutes) by each user.  It will detect when a
new backup has completed and prepare for the next one.
It will also periodically (daily) look over old backups and age them
out.  (Server-side configurations are in `server.yaml`.)

Cron entries much run on the server as each user that manages backups.
A typical server-side cron entries is:

    1,16,31,46 * * * * /usr/sbin/babackup_server

(This crontab entry will be installed for you when you run
`babackup_server` on the server in automatic mode.)

As a help, `babackup` and `babackup_server` 
remind one about these when a new backup is created.


## New Backups

### New Backups: On the Client

To **start backing up a new partition**, run

    babackup --name=my_source --new-path=/client/backup/source/ --new-server=serveruser@serverhost:server-side/storage/location --new-mode=rrsync
    
on the client. 
This command will add the new backup path to the
client configuration, and print out a command (described below) 
to run on the server-side. 
It gives the backup a name (here `my_source`)
to track this pairing.

Client-side setup updates the configuration
files so future runs know what to do.
It also generates a new, passwordless ssh key
to use for this backup.

Paths follow rsync conventions, so it often ends in "/" to back up the
directory contents.  Alternatively, if you need to back up several
sources, use several `--new-path` entries (without trailing slashes),
and perhaps `--new-relative` to preserve the full path.  Server-side
paths are relative to the serveruser's home directory.  Both client
and server paths may be absolute.

Another useful option is `--new-filter="merge /path/to/list.filter"`
which uses ("merging in") filters from the given path.
One can also use exclusion lists with
`--new-exclude-from=/path/to/list.exclusion`,
which gives an rsync-style exclusion file of filenames to *not* copy.

Finally, remember that *you must run `babackup` frequently*
on the client, as described above in "Normal Operation".
Usually it runs via the automatically-installed cron job.


### New Backups: On the Server

After generating a new backup on the client side,
the client will show a command to run on the server to set it up there.
Cut-and-paste what it tells you to run on the server side
and run it there as serveruser.

On the server side, that command installs the ssh key with an rsync
limitation to the backup directory,
and updates server-side configuration files so it knows
to watch the new backup,
and it initializes the storage location.

(`babackup_server` mus run for all modes, even when the data is local,
so that it can do aging of old backups.)

Finally, remember that *you must run `babackup_server` periodically*
on the server, as described above in "Normal Operation".
Usually it runs via automatically-installed cron job.


### Modes of Use

There are several different "modes" to use babackup:

**Local**: copies between local disks.

**Rrsync**: copies to a remote machine over rsync and ssh, using
rrsync to limit remote access to the current and possibly prior
backup.  It generates a unique, per-backup ssh key and manages its
password.

**Ssh**: copies to a remote machine over rsync and ssh.  Unlike
rrsync mode, it does not use rrsync or manage ssh keys, so it places
security in the hands of the user.


## Rsync Options

Rsync has many options and subtlities in how it's used.
Fortunately it has a good manual page, but I encourage you to read it carefully.

Some thing watch out for: 

- use of trailing / on paths (it give you contents of the path)

- use of --relative when you have multiple paths.

- use of --exclude-from and --filter.  Exclude-from takes a filename,
    but filter is a command.  Set filter to the command "merge path/to/filter" to
    read from a `path/to/filter` as a file.


## Examples

To back up *my home directory* to an external disk on the same machine:

    babackup --new=home_local --new-mode=local --new-path=/home/me/ --new-server=/mnt/externaldisk/backupplace

(Note the trailing slash on the home directory to copy its contents.)


To back up my home directory to a *remote server*:

    babackup --new=home_remote --new-mode=rrsync --new-path=/home/me/ --new-server=remote.example.com:/var/spool/backup/me

(Babackup sets up keys to do this securely.)


To back up some critical *system files from multiple places* (to a local disk):

    babackup --new=system_local --new-mode=local --new-relative --new-path=/etc --new-path=/usr/lib/mysql --new-path=/root --new-server=/mnt/externaldisk/backupplace

(Note use of "relative" mode, so the prefix of the paths appears in the backup,
and absense of a trailing slash on the source paths.)

# Recovery

The whole point of babackup is to recover from failures.

Primarily, we assume user files will be lost.  That's why you have
backups.

But *everything* fails, including your backup system,
so there are also provisions to handle loss of the backup server.

Keep in mind, the whole point of babackup is to run for years, we so
we expect *every* part to fail.  Recovering from failure is why
babackup uses a regular file systems and rsync: one can always go look
around in the file system and get things back with cp, if necessary.

Here we review how to recover from the failure of any component.
We consider three components:

- files on the client's computer

- the backup server's "system" disk, with `/etc` and `/home`

- the backup server's data disks, with the actual backup data

## Failure of the Client's Files

This case is the *expected* case--something happened to files we were
backing up (maybe an accidental rm) or their disk.
My condolences that you have to use it to recover from your loss,
but well done to be doing backups!

Currently there is *no* explicit support for file recover.  The data
is sitting on the server, and the interested user must have access to
the backup server, log in to it, then go search the archives.
Running `babackup_server --status` will show paths to the archives
and which dates are present in the snapshot.

(Perhaps in the future there will be support for a separate key that
allows remote-rsync-over-ssh for file recovery.)

## Failure of the Server's System Disk

If your backup server loses its system disk (with `/etc` for backups
owned by root, or with `/home` for backups owned by users),
but not the disk with backup data,
then you need to recreate the server metadata (where are backups kept)
on the server,
but you should be happy that your backup data still exists.

Fortunately, a copy of all the metadata begin at the client, and was
saved there.  So go please run, on each client computer and as each
client user:

    babackup --reconfigure-old
    
It will print out the commands to run on the backup server
to recreate the metadata, just like you did when you first started the
backup.

If you make sure the backup data is in the same place on the server as
it was before, then the server should reattach your precious historical
data to the newly recreated metadata.

## Failure of the Server's Data Disk

If your backup server loses a *data* disk,
but still has its system disk (with `/etc` for backups
owned by root, or with `/home` for backups owned by users),
then my condolences.  You are currently without any backups.

The best way to fix this problem is: start afresh.
Spin up a new data disk and put in the same place as the old data
disk.
The data disk needs a bit of scaffolding in place to work correctly.
To recover this scaffolding from the metadata already on the server,
run, on the server as each server-side user:

    babackup_server --reconfigure-old
    
Expect that your first backup to the new disk will take a while,
but you should be back in business.

## Protecting Your Server With RAID

One option to make your server more robust is to use RAID disk
mirroring or level-5 or -6 to duplicate the file system's data.
(We cover another option below.)

If you know about RAID, you're all set.  RAID does a great job,
provided you watch for disk failures and swap new disks into the RAID
if one of the old one fails.  (Although I have lost data on a RAID-5
array due to a double disk failure, thus far I have never lost data on
a RAID-6 array, knock on wood.)

## Protecting Your Server With a Babackup "Secondary"

An alternative to RAID is to use a secondary data disk on the server.
(Secondary disks have been available since babackup-1.38 in 2025-07.)

(Although RAID is great, it often forces all RAID disks to be the
same size, which can be inconvenient.  And I'm nervous about RAIDing
between external USB disks since it's easy for them to accidentally go
offline.)

A secondary data disk follows the babackup philosophy of "simple,
user-level" approaches.  Data is kept not on *one* disk on the server,
but on *two* disks (or more).  The client backs up to the server just
like before, but the server duplicates each backup from the primary
backup to each secondary backup when the backup is "committed".
(Specifically: when the server polls the backup, typically every 30
minutes.)  Since this primary-to-secondary backup is between disks on
the same server, it should run fairly quickly.

As an added bonus, because babackup knows about the primary and
secondaries, it attempts to make the archives cover diverse dates.
All disks will keep the same daily backups, but when backups start
thining out, it tries to keep different dates on different backups.

Assuming your secondary is on a different physical disk, it has all
the same structure as the primary backup.  Should you need to recover
files, you may do so from either the primary or any secondary.

If your primary backup disk fails, you can replace it and start a new
backup stream on the fresh new disk, but your secondary copy is still
there allowing one to do time travel.

Alternatively, if the primary fails, you can point the primary to the
secondary.  Currently this requires manual editing of `server.yaml`
(in `~/.config/babackup` or `/etc/babackup` for data backed up to
root).  Move the line from `secondary` to `server_path` (note that
`secondary` is an array and `server_path` is not).


# Technical Details

Some technical details about how we meet our goals.

## Security

For security, all client-to-server remote access is done over ssh, 
protecting from eavesdropping and limiting remote access.

The server-side uses rrsync to limit remote access to writing to
the backup directory.  Therefore the client has no remote shell access
to the server, and the server has no access to the client.

This model means the client can use passwordless-ssh keys without
concern that an attacker can gain lateral movement between hosts.

However, as an additional layer, we generate and save passwords
for our keys.  This protects the keys on-disk from a naive attacker,
since it requires knowledge of babackup to extract a key's password.
While this layer protects against script kiddies,
rrsync is provides stronger technical protection.

Specifically: an attacker who compromises a client machine can corrupt
the current and a recent backup.  They can run a compromised backup
and fill up the server's disk.  They CANNOT eliminate
archived backups (those older then the current backup).  However, if they run
compromised backups daily, they will slowly age out old good backups.
(But slowly, because of our aging policy.)

Data restoration (that is: server-to-client data movement) is handled
out-of-band.  The client and the server users may be different people,
and we assume they can communicate out of band to do data recovery.

(Alternatively, a second ssh key could allow client-to-server
read-only access to data archive, to allow data recovery without
intervention from the server operator.)

## Data storage: simplicity, history, and efficiency

For simplicity and history of data storage, backups just go to the
file system.  Each day is in a separate date-stamped directory tree
(like Plan 9 backups, or Apple Time Machine).

For history, we age out old backups.  We age by day, week, month, and
year.  Typically we keep the N most recent backups for each tier, with
backups at each tier at least that duration apart.  Thus a fully
populated archive of daily backups will have up to 40 backups:  the 10
most recent days, then one per week for 10 weeks, then one per month
for 10 months, and finally one per year for 10 years.

For efficiency, backups use hardlinks for unchanged files, so they are
de-duplicated (an unchanged file takes nearly no extra space from day
to day).  (Hard links are managed by the rsync --link-dest option.)

## Alternatives

In a slightly better world we would encrypt backups before leaving the
client, allowing the server to be untrusted.  This approach was not
chosen because it lot more client-side infrastructure, making it
impossible to use unmodified rsync.


# History

## Authorship

Babackup was written by John Heidemann <johnh@isi.edu>.

## Release History

- 1.0 2024-01-14 First public release.

- 1.1 2024-01-15 Improve handling of --new-mode={rrsync,ssh}

- 1.2 2024-01-16 Correct handling of archive cleanup and shake out bugs.

- 1.3 2024-01-17 Ssh key now has a passphrase, and archive is now daily/etc.

- 1.4 2024-01-17 Given ssh a passphrase requires sshpass.

- 1.5 2024-01-19 One can now sync a list of paths with multiple --new-path.

- 1.6 2024-01-20 Add --new-mode=local and check for crontab entries.

- 1.7 2024-01-20 Fix some critical typos in babackup_server.

- 1.8 2024-01-22 Add --new-conditional-v4router and -new-conditional-v6router to constrain when backups run.

- 1.9 2024-01-23 Fix bugs in conditionals.

- 1.10 2024-01-23 More reobust handling of starting new users.

- 1.11 2024-01-24 Fix typo.

- 1.12 2024-01-24 rmtree cleanup.

- 1.13 2024-01-24 Fix bug in moving to archive (incorrect whitespace!).

- 1.14 2024-01-24 A test suite to avoid typo releases, and check for host key on startup.

- 1.15 2024-01-24 A test suite how handles paths to the tools.

- 1.16 2024-01-24 Ironing out packaging interactions with the test suite.

- 1.17 2024-01-24 Backwards compatibility to python-3.6 for REHL8.

- 1.18 2024-01-24 Finally fix deleting files from dirs that are a-w, and more python-3.6 support.

- 1.19 2024-01-25 Verify python-3.6 support.

- 1.20 2024-01-26 Fix the aging algorithm.

- 1.21 2024-01-28 Add --automatic to fully automate setup (crontab, ssh known-host).

- 1.22 2024-01-30 Add --new-filter is now an option (in addition to --new-exclude-from), removed --delete from rsync (to work-around an rsync bug).

- 1.23 2024-01-31 `backup_server` now avoids running concurrently on the same backup; babackup has --new-relative option to preserve full paths (rsync "relative" mode).

- 1.24 2024-02-08 --force now overrides router conditionals in `babackup`, and fix bug about cannot create "/data" on multiple backups

- 1.25 2024-02-15 `babackup_server --status` now summarizes the archive.

- 1.26 2024-02-29 Concurrency checking is now more robust, and `babackup_server --reconfigure-old` will reinstall the server-side-config from what's in server.yaml

- 1.27 2024-11-10 A critical bugfix to handle long-running backups.

    - New backups now default to relative if multiple paths are given.
    - NEW: `babackup --reconfigure-old` will display server-side config from what's in client.yaml (on the client).
    - BUG FIX: `babackup_server --new-mode` now throws an error of the name already exists.
    - BUG FIX: stricter checking for dead senitile file now handles backups longer than 24h
    
- 1.28 2024-11-10 Release engineering bugfix from 1.27.

- 1.29 2024-11-12 Critical typo bugfix from 1.27.

- 1.30 2024-11-13 Fix where root files go.

    - BUG FIX: root transient files now go in /var, not /etc.

- 1.31 2025-03-10 Fix missing import causing fails with stale lock.

    - BUG FIX: babackup would fail on "psutil" when run with a stale lock file.
      Thanks to Nathaniel Heidemann for the bug report and fix.
      
- 1.32 2025-03-10 Fix release engineering for F42's unified bin and sbin.

- 1.33 2025-04-25 Even more engineering for F42's unified bin and sbin.

- 1.34 2025-06-25 `babackup_server --reconfigure-old` now recovers the backup directories.

- 1.35 2025-06-26 `babackup_server --reconfigure-old` improvements.

- 1.36 2025-06-27 Fix a bug in the test suite when run against beta versions of python.

- 1.37 2025-06-28 Version testing was in many places.

- 1.38 2025-07-05 Servers can now have secondary backups that are
  cloned off the primary, and added documentation about recovery.

- 1.39 2025-07-10 Bug fixes, mostly to secondary backups:
    - BUG FIX: new authorized_keys are now created with correct permissions.
    - BUG FIX: secondary backups now detect and avoid concurrent attempts.
    - BUG FIX: secondary backups now do hard linking correctly (critical!).

## Ancient History

A first version of these ideas (rsync and a link-dest tree, inspired
by a post on the Internet pointing out the power of link-dest) was
deployed in 1999 by John Heidemann to back up his laptop.  This
version was mostly shell scripts (with more shell scripts for
configuration files), but most configuration was done by hand.
(Inspired by Ken Thompson, "the experienced user will usually know
what to configure".)

Around 2015 Yuri adapted these scripts for ANT lab use.

In 2024 I rewrote the shell scripts in python to take over
configuration, and to enable an rrsync-protected mode.  Babackup is a
response to an increased security threat of lateral movement via
passwordless ssh keys, the need to backup a semi-trusted laptop, and
the need to have high security clients and backup servers that are
still outside the "compromise" blast radius.


# INSTALLATION

Sigh, it doesn't currently use any of the standard python tools.

To install:

    make install
    
or

    make install DESTDIR=/usr/local
    
Also 

    make test
    
# KNOWN BUGS

It doesn't work with python-3.6 due to missing fromisoformat.
It uses a work-around for python from 3.7 to 3.10
that doesn't handle all possible ISO-like dates,
although this limitation is only a problem with old, non-babackup archives.


# COPYRIGHT

Babackup is
copyright (C) 2024-2025 by John Heidemann <johnh@isi.edu>
under terms of the GPLv2.
Full license terms are in the source code.

    
    
