Skip to Content

Verify ZFS Backups Using zxfer and mtree

Even if you trust your backup process, the only way to make sure the archive contains what you expect is to perform an occasional test restore using a recent backup and compare it to the original source data. Here’s one way to do that on FreeBSD using zxfer to restore and mtree to compare.

Both zxfer and mtree can target the local machine, making it possible to restore the backup data and then verify it by moving a spare hard drive between the backup server and the data server.

However, it is also possible to perform both operations over the network using an intermediate host. To do this, add a temporary Raspberry Pi to the network with a spare hard drive attached, restore the data from the backup server to the Pi using zxfer over ssh, then run mtree on the Pi and on the original data server to compare the restored data to the original data. Afterwards, remove the Pi from the network, wipe the spare drive and everything goes back to the way it was.

Create a destination zpool to hold the restored data

To create a temporary location to hold the restored data, attach a spare empty drive to the system and create a destination zpool named restore, then export it from the system for the next step.

C:\Users\ccammack
λ ssh backup.ccammack.com
[...]

$ su
Password:

root@backup:/usr/home/ccammack # dmesg
[...]
da0 at umass-sim0 bus 0 scbus2 target 0 lun 0
da0: <JMicron Tech 0508> Fixed Direct Access SPC-4 SCSI device
da0: Serial Number 000000000004
da0: 40.000MB/s transfers
da0: 3815447MB (7814037168 512 byte sectors)
da0: quirks=0x2<NO_6_BYTE>

root@backup:/usr/home/ccammack # gpart destroy -F da0
gpart: arg0 'da0': Invalid argument

root@backup:/usr/home/ccammack # gpart create -s gpt da0
da0 created

root@backup:/usr/home/ccammack # gpart add -a 1m -l restore -t freebsd-zfs "da0"
da0p1 added

root@backup:/usr/home/ccammack # gpart show -l da0
=>        40  7814037088  da0  GPT  (3.6T)
          40        2008       - free -  (1.0M)
        2048  7814033408    1  restore  (3.6T)
  7814035456        1672       - free -  (836K)

root@backup:/usr/home/ccammack # zpool create restore da0p1

root@backup:/usr/home/ccammack # zpool list restore
NAME      SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
restore  3.62T   316K  3.62T        -         -     0%     0%  1.00x  ONLINE  -

root@backup:/usr/home/ccammack # zpool export restore

Restore the data from the backup pool to the restore pool

There are a couple of ways to restore the backup data. If the restore pool can be mounted on the same machine as the backup pool, use zxfer to restore the data directly from drive to drive. Alternatively, the backup data can also be restored from one host to another over the LAN using zxfer and ssh.

Option 1: Restore the data from drive to drive on a single machine

Mount the restore pool on the same machine as the backup pool and run zxfer as root to restore the data. Plug the destination drive directly into a SATA port if possible; external USB drive docks may not be as reliable, although that is what I use here.

If the backup pool contains data from multiple hosts, use zfs list | grep to find the right one; in this case, I’m restoring the zroot pool from the data server, which is stored on the backup drive as /backup/data/zroot/.

Include a trailing / on the source pool when restoring with zxfer to make the output file hierarchy under /restore/ parallel the one under /backup/data/zroot/.

Export the restore pool when finished to prepare for the mtree step.

$ hostname
backup.ccammack.com

$ su
Password:

root@backup:/usr/home/ccammack # zpool import restore

root@backup:/usr/home/ccammack # zpool list
NAME      SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
backup   4.16T  2.12T  2.04T        -         -     0%    50%  1.00x  ONLINE  -
restore  3.62T   736K  3.62T        -         -     0%     0%  1.00x  ONLINE  -

root@backup:/usr/home/ccammack # zfs list | grep 'backup/.*zroot[^/]'
backup/data/zroot                                    11.0G  1.67T   156K  /backup/data/zroot
[...]

root@backup:/usr/home/ccammack # sh -x /usr/local/sbin/zxfer -deFPv -R backup/data/zroot/ restore
[...]
+ exit 0

root@backup:/usr/home/ccammack # ls /backup/data/zroot/
ROOT            reserved        usr
iocage          tmp             var

root@backup:/usr/home/ccammack # ls /restore/
ROOT            reserved        usr
iocage          tmp             var

root@backup:/usr/home/ccammack # zpool export restore

While the restore runs, open a second terminal on the same machine and keep an eye on the progress by watching the restore pool’s ALLOC size grow as the data is restored. If using an external USB drive, also keep an eye on the restore pool’s HEALTH to make sure it remains ONLINE.

$ zpool list restore
NAME      SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
restore  3.62T   736K  3.62T        -         -     0%     0%  1.00x  ONLINE  -

[...]

$ zpool list restore
NAME      SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
restore  3.62T  1.77T  3.62T        -         -     0%    48%  1.00x  ONLINE  -
Option 2: Restore the data over the LAN

The data can also be restored over the LAN using zxfer and ssh.

It’s generally better to perform backups in pull mode, allowing the backup server to log into each client and collect the data, rather than granting the clients any access to the backup server. Similarly, when restoring the data, follow the same security policy; the backup server should log into the client and push the restored data to it rather than the reverse.

This example shows how to push the restored data from the backup server to a temporary Raspberry Pi 4 running FreeBSD with an attached USB drive dock. The FreeBSD 13.1 image for RPI includes default credentials, so it requires very little additional setup. The only change required on the backup server will be to create a new temporary key pair to access the Pi, which can then be deleted afterwards.

After installing FreeBSD on the Pi and plugging it into the LAN, log into the backup server, generate new temporary keys and transfer the public key to the Pi using the default user account freebsd and password freebsd.

C:\Users\ccammack
λ ssh backup.ccammack.com
[...]

$ ssh-keygen -f /tmp/zxfer_key -N ""
Generating public/private rsa key pair.
Your identification has been saved in /tmp/zxfer_key.
Your public key has been saved in /tmp/zxfer_key.pub.
[...]

$ ssh-copy-id -i /tmp/zxfer_key.pub freebsd@192.168.1.137
[...]
Password for freebsd@generic:

$ hostname
backup.ccammack.com

Log into the Pi using the temporary key as the freebsd user, which should now work without asking for a password. Switch to the root user using the default password root and append the authorized_keys from the freebsd account to the root account.

Change the sshd configuration to allow the root user to log in using keys only, then restart sshd and exit to the backup server.

$ ssh -i /tmp/zxfer_key freebsd@192.168.1.137
[...]

freebsd@generic:~ % su
Password:

root@generic:/home/freebsd # mkdir -p /root/.ssh

root@generic:/home/freebsd # cat /home/freebsd/.ssh/authorized_keys >> /root/.ssh/authorized_keys

root@generic:/home/freebsd # grep PermitRootLogin /etc/ssh/sshd_config
#PermitRootLogin no
[...]

root@generic:/home/freebsd # sed -i .tmp 's/^#PermitRootLogin no.*$/PermitRootLogin without-password/g' /etc/ssh/sshd_config

root@generic:/home/freebsd # grep PermitRootLogin /etc/ssh/sshd_config
PermitRootLogin without-password
[...]

root@generic:/home/freebsd # service sshd restart
[...]
Starting sshd.

root@generic:/home/freebsd # exit
exit
freebsd@generic:~ % exit
logout
Connection to 192.168.1.137 closed.

$ hostname
backup.ccammack.com

Log into the Pi as root to make sure it works without asking for a password. Install zxfer on the Pi, then import the destination restore pool and exit to the backup server.

$ hostname
backup.ccammack.com

$ ssh -i /tmp/zxfer_key root@192.168.1.137
[...]

root@generic:~ # pkg install -y zxfer
[...]

root@generic:~ # zpool import restore

root@generic:~ # zpool list restore
NAME      SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
restore  3.62T   576K  3.62T        -         -     0%     0%  1.00x    ONLINE  -

root@generic:~ # exit
logout
Connection to 192.168.1.137 closed.

$ hostname
backup.ccammack.com

On the backup server, su to root and use zxfer to push the backup data to the restore pool mounted on the Pi.

Make sure the correct source pool is chosen with zfs list | grep.

Include a trailing / on the source pool when restoring with zxfer to make the output file hierarchy under /restore/ parallel the one under /backup/data/zroot/.

$ hostname
backup.ccammack.com

$ su
Password:

root@backup:/usr/home/ccammack # zfs list | grep 'backup/.*zroot[^/]'
backup/data/zroot                                    11.0G  1.67T   156K  /backup/data/zroot
[...]

root@backup:/usr/home/ccammack # sh -x /usr/local/sbin/zxfer -deFPv -T 'root@192.168.1.137 -i /tmp/zxfer_key' -R backup/data/zroot/ restore
[...]
+ exit 0

While the restore runs, open a console on the Pi and keep an eye on the progress by watching the restore pool’s ALLOC size grow as the data is restored. Also keep an eye on the restore pool’s HEALTH to make sure it remains ONLINE.

After the restore finishes, su to root on the Pi and export the restore pool to prepare for the mtree step.

C:\Users\ccammack
λ ssh freebsd@192.168.1.137
(freebsd@192.168.1.137) Password for freebsd@generic:
[...]

freebsd@generic:~ % hostname
generic

freebsd@generic:~ % zpool list restore
NAME      SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
restore  3.62T   736K  3.62T        -         -     0%     0%  1.00x    ONLINE  -

[...]

freebsd@generic:~ % zpool list restore
NAME      SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
restore  3.62T  1.77T  1.85T        -         -     0%    48%  1.00x    ONLINE  -

freebsd@generic:~ % su
Password:

root@generic:/home/freebsd # zpool export restore

Compare the restored data to the original data using mtree

The mtree utility appears to have been originally written for NetBSD as an intrusion detection tool. It has since been ported to Linux, although it doesn’t seem to be very well known.

Unlike diff -qr, mtree can report much more detailed information about the differences between file two systems and can operate over ssh if needed, making it a good option for comparing file systems on separate hosts.

Option A: Compare file systems on the same machine

To compare the original and restored data on a single machine, mount and import the restore pool on the host containing the original data. In this example, I have mounted the restore pool on the data server and will use mtree to compare a specific dataset from both pools.

List the datasets in the restore pool and select one of them (restore/usr/home) for testing.

C:\Users\ccammack
λ ssh data.ccammack.com
[...]

$ su
Password:

root@data:/usr/home/ccammack # zpool import restore

root@data:/usr/home/ccammack # zfs list -r restore
NAME                                        USED  AVAIL  REFER  MOUNTPOINT
[...]
restore/usr/home                            312K  1.56T   240K  /restore/usr/home
[...]

Select the corresponding dataset on the data server’s zroot pool (zroot/usr/home) and note their MOUNTPOINT values (/restore/usr/home and /usr/home).

root@data:/usr/home/ccammack # zfs list -r restore/usr/home
NAME               USED  AVAIL  REFER  MOUNTPOINT
restore/usr/home   312K  1.56T   240K  /restore/usr/home

root@data:/usr/home/ccammack # zfs list -r zroot/usr/home
NAME             USED  AVAIL  REFER  MOUNTPOINT
zroot/usr/home  1.58M  1.42T   800K  /usr/home

Make a small change to a local file in that dataset on the server and use mtree to compare the two file systems. In this case, I added a single byte to the end of my .cshrc file and mtree reports the difference.

root@data:/usr/home/ccammack # echo "" >> .cshrc

root@data:/usr/home/ccammack # mtree -c -p /restore/usr/home | mtree -p /usr/home
[...]
ccammack/.cshrc:
        size (1054, 1055)
        modification time (Fri Nov  1 01:32:34 2019, Mon May  9 13:56:28 2022)
[...]
Option B: Compare file systems over ssh

The mtree utility can also operate over ssh. To allow the data server to reach the intermediate Pi without a password, generate new temporary keys and copy the public key to the Pi for use by the freebsd user.

C:\Users\ccammack
λ ssh data.ccammack.com
[...]

$ ssh-keygen -f /tmp/mtree_key -N ""
Generating public/private rsa key pair.
Your identification has been saved in /tmp/mtree_key.
Your public key has been saved in /tmp/mtree_key.pub.
[...]

$ ssh-copy-id -i /tmp/mtree_key.pub freebsd@192.168.1.137
[...]
Password for freebsd@generic:

$ hostname
data.ccammack.com

Log into the Pi as the freebsd user using the temporary key, which should now work without asking for a password. Switch to the root user using the password root and append the authorized_keys from the freebsd account to the root account.

Change the sshd configuration to allow the root user to log in using keys only, then restart sshd and exit back to the data server.

$ hostname
data.ccammack.com

$ ssh -i /tmp/mtree_key freebsd@192.168.1.137
[...]

freebsd@generic:~ % su
Password:

root@generic:/home/freebsd # mkdir -p /root/.ssh

root@generic:/home/freebsd # cat /home/freebsd/.ssh/authorized_keys >> /root/.ssh/authorized_keys

root@generic:/home/freebsd # grep PermitRootLogin /etc/ssh/sshd_config
#PermitRootLogin no
[...]

root@generic:/home/freebsd # sed -i .tmp 's/^#PermitRootLogin no.*$/PermitRootLogin without-password/g' /etc/ssh/sshd_config

root@generic:/home/freebsd # grep PermitRootLogin /etc/ssh/sshd_config
PermitRootLogin without-password
[...]

root@generic:/home/freebsd # service sshd restart
[...]
Starting sshd.

root@generic:/home/freebsd # exit
exit
freebsd@generic:~ % exit
logout
Connection to 192.168.1.137 closed.

$ hostname
data.ccammack.com

Now that root can connect to the Pi using keys, list the datasets in the restore pool on the Pi, select one of them for testing (restore/usr/home) and note its MOUNTPOINT (/restore/usr/home).

$ hostname
data.ccammack.com

$ su
Password:

root@data:/usr/home/ccammack # ssh -i /tmp/mtree_key root@192.168.1.137 zfs list -r restore
NAME                                        USED  AVAIL  REFER  MOUNTPOINT
[...]
restore/usr/home                            312K  1.56T   240K  /restore/usr/home
[...]

Select the corresponding dataset on the data server (zroot/usr/home) and find its MOUNTPOINT (/usr/home).

root@data:/usr/home/ccammack # zfs list -r zroot/usr/home
NAME             USED  AVAIL  REFER  MOUNTPOINT
zroot/usr/home  1.58M  1.42T   800K  /usr/home

Make a small change to a local file in that dataset on the server. In this case, I added a single byte to the end of my .cshrc.

To compare the file systems between hosts, run mtree on the remote host over ssh and pipe its output to mtree running on the local host.

root@data:/usr/home/ccammack # echo "" >> .cshrc

root@data:/usr/home/ccammack # ssh -i /tmp/mtree_key root@192.168.1.137 mtree -c -p /restore/usr/home | mtree -p /usr/home
[...]
ccammack/.cshrc:
        size (1054, 1055)
        modification time (Fri Nov  1 01:32:34 2019, Mon May  9 13:56:28 2022)
[...]

Automate the mtree comparisons

To verify the entire backup, write a script to iterate the datasets in the restore pool and compare each of them to the corresponding dataset in the original zroot pool using mtree.

#!/bin/sh

# use mtree to generate a spec from the "restore" pool and
# pipe it to mtree running against the "zroot" pool to compare them

# uncomment the next line if the "restore" pool is on a remote system
#sshcmd="ssh -i /tmp/mtree_key root@192.168.1.137"
sshcmd=${sshcmd-""} # set sshcmd="" if not already assigned

# exit immediately if you're not root
if [ "$(id -u)" -ne 0 ]; then
  echo "Error: this must run as root"
  exit 1
fi

# create a temp file to hold the "excluded" paths (log, spool)
exclude=$(${sshcmd} mktemp /tmp/verify-backups.XXXXXX)
${sshcmd} /bin/sh << EOF
  echo log   >> "${exclude}"
  echo spool >> "${exclude}"
EOF

# read datasets and mountpoints from the "restore" pool
lines=$(${sshcmd} zfs list -rpH restore | \
  awk 'BEGIN { ORS = "|"; OFS = ":" } { print $1, $5 }'
)

# generate mtree spec from "restore" pool and compare it to "zroot" pool
echo "${lines}" |
awk -v sshcmd="${sshcmd}" -v exclude="${exclude}" 'BEGIN { RS = "|"; FS = ":" } {
  if ($1 != "" && $2 != "") {
    d1 = $1
    sub("restore", "zroot", d1)
    cmd = "zfs get -pH -o value mountpoint \"" d1 "\""
    cmd | getline m1
    close(cmd)
    sub(/\n/, "", m1)
    cmd = sshcmd
    cmd = cmd " mtree -c -j    -p \"" $2 "\" -X \"" exclude "\" |"
    cmd = cmd " mtree       -e -p \"" m1 "\""
    while (cmd | getline) print
    close(cmd)
  }
}'

# delete the temp "excluded" file
${sshcmd} rm "${exclude}"