Skip to content

Patryk27/lxd-snapper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

68 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

lxd-snapper

LXD snapshots, automated.

lxd-snapper automates creating & removing LXD snapshots - just prepare a snapshotting policy, setup a cronjob, and enjoy your containers.

tl;dr it's a fancy wrapper for lxc snapshot & lxc delete; like LXD's built-in snapshots.schedule, but more powerful.

Requirements

  • LXD 4 / 5 / 6
  • Linux (x86_64)

Plus, if you plan on building lxd-snapper locally:

  • Cargo & Rust 1.78.0
  • Nix (for running integration tests)

Getting started

Downloading

You can either download pre-built binaries:

# x86_64
$ wget https://github.com/Patryk27/lxd-snapper/releases/download/v1.4.0/lxd-snapper-linux64 -O lxd-snapper
$ chmod u+x lxd-snapper

... or build lxd-snapper on your own:

$ git clone https://github.com/Patryk27/lxd-snapper
$ cd lxd-snapper

# Using Cargo
$ cargo build --release
$ ./target/release/lxd-snapper

# (or) Using Nix
$ nix build
$ ./result/bin/lxd-snapper

Configuring

Setting-up lxd-snapper is easy: you just need to prepare a configuration file that will describe which LXD instances (so containers and/or virtual machines) you want to get snapshotted and for how long those snapshots should be kept around.

We can start with the most basic configuration:

policies:
  my-first-policy:
    keep-last: 2

... which defines a single policy called my-first-policy that will snapshot all of your instances, keeping around the latest two snapshots per each instance.

To check how it works, let's go ahead and create some containers:

$ lxc launch ubuntu: hello
$ lxc launch ubuntu: world
# (the container's OS doesn't matter - Ubuntu is just an example)

$ lxc ls
+-------+---------+------+------+-----------+-----------+
| NAME  |  STATE  | IPV4 | IPV6 |   TYPE    | SNAPSHOTS |
+-------+---------+------+------+-----------+-----------+
| hello | RUNNING | ...  | ...  | CONTAINER | 0         |
+-------+---------+------+------+-----------+-----------+
| world | RUNNING | ...  | ...  | CONTAINER | 0         |
+-------+---------+------+------+-----------+-----------+

Now, to snapshot those containers, first we have to store that configuration from before into a file - say, config.yaml - and then run lxd-snapper backup:

$ lxd-snapper --dry-run -c config.yaml backup
(--dry-run is active, no changes will be applied)

hello
  - creating snapshot: auto-20221105-130019 [ OK ]

world
  - creating snapshot: auto-20221105-130019 [ OK ]

Summary
-------
  processed instances: 2
  created snapshots: 2

As you can see, there's a detailed output of everything that's happened - or rather of everything that would happen: we used a switch called --dry-run which tells lxd-snapper that you only want to preview the changes without actually creating or removing any snapshots.

We can confirm that nothing's changed by re-running lxc ls and seeing that we've still got zero snapshots:

$ lxc ls
+-------+---------+------+------+-----------+-----------+
| NAME  |  STATE  | IPV4 | IPV6 |   TYPE    | SNAPSHOTS |
+-------+---------+------+------+-----------+-----------+
| hello | RUNNING | ...  | ...  | CONTAINER | 0         |
+-------+---------+------+------+-----------+-----------+
| world | RUNNING | ...  | ...  | CONTAINER | 0         |
+-------+---------+------+------+-----------+-----------+

--dry-run is useful after you've made some changes to the configuration and want to confirm that everything is working as intended - since that's the case with us, we can now re-run lxc-snapper backup without --dry-run:

$ lxd-snapper -c config.yaml backup

/* ... */

Summary
-------
  processed instances: 2
  created snapshots: 2

... and voilĂ :

$ lxc ls
+-------+---------+------+------+-----------+-----------+
| NAME  |  STATE  | IPV4 | IPV6 |   TYPE    | SNAPSHOTS |
+-------+---------+------+------+-----------+-----------+
| hello | RUNNING | ...  | ...  | CONTAINER | 1         |
+-------+---------+------+------+-----------+-----------+
| world | RUNNING | ...  | ...  | CONTAINER | 1         |
+-------+---------+------+------+-----------+-----------+

Our policy says keep-last: 2, so let's go ahead and run lxd-snapper backup twice more, to trigger this limit:

$ lxd-snapper -c config.yaml backup
$ lxd-snapper -c config.yaml backup

$ lxc ls
+-------+---------+------+------+-----------+-----------+
| NAME  |  STATE  | IPV4 | IPV6 |   TYPE    | SNAPSHOTS |
+-------+---------+------+------+-----------+-----------+
| hello | RUNNING | ...  | ...  | CONTAINER | 3         |
+-------+---------+------+------+-----------+-----------+
| world | RUNNING | ...  | ...  | CONTAINER | 3         |
+-------+---------+------+------+-----------+-----------+

Now we've got three snapshots per each container - why not two? Because as a safety measure, the backup command always only creates snapshots - never deletes them.

To remove stale snapshots, we have to run prune:

$ lxd-snapper --dry-run -c config.yaml prune
(--dry-run is active, no changes will be applied)

hello
  - keeping snapshot: auto-20221105-130214
  - keeping snapshot: auto-20221105-130213
  - deleting snapshot: auto-20221105-130157 [ OK ]

world
  - keeping snapshot: auto-20221105-130214
  - keeping snapshot: auto-20221105-130213
  - deleting snapshot: auto-20221105-130157 [ OK ]

Summary
-------
  processed instances: 2
  deleted snapshots: 2
  kept snapshots: 4

As before, we've started with --dry-run as to see if everything looks alright - and since it seems so, it's time to kick those stale snapshots out of our filesystem for good:

$ lxd-snapper -c config.yaml prune

/* ... */

Summary
-------
  processed instances: 2
  deleted snapshots: 2
  kept snapshots: 4

$ lxc ls
+-------+---------+------+------+-----------+-----------+
| NAME  |  STATE  | IPV4 | IPV6 |   TYPE    | SNAPSHOTS |
+-------+---------+------+------+-----------+-----------+
| hello | RUNNING | ...  | ...  | CONTAINER | 2         |
+-------+---------+------+------+-----------+-----------+
| world | RUNNING | ...  | ...  | CONTAINER | 2         |
+-------+---------+------+------+-----------+-----------+

Re-running prune will now do nothing, since all of the containers have correct number of snapshots:

$ lxd-snapper -c config.yaml prune
hello
  - keeping snapshot: auto-20221105-130214
  - keeping snapshot: auto-20221105-130213

world
  - keeping snapshot: auto-20221105-130214
  - keeping snapshot: auto-20221105-130213

Summary
-------
  processed instances: 2
  deleted snapshots: 0
  kept snapshots: 4

(there's also a command called backup-and-prune that runs backup and prune one after another, which is what you'll usually want to do.)

And that's basically it - that's how lxd-snapper works; now let's see what makes it unique!

Filtering instances

By default, lxd-snapper snapshots all of the instances it can find on the local machine - you can affect that with various included- and excluded- options:

policies:
  # Matches all instances inside the `important-client` project and keeps the
  # last 20 snapshots for each of them:
  a:
    included-projects: ['important-client']
    keep-last: 20
    
  # Matches all instances _outside_ the `important-client` project and keeps the
  # last 5 snapshots for each of them:
  b:
    excluded-projects: ['important-client']
    keep-last: 5
    
  # Matches all instances named `important-container` (across all projects) and
  # keeps the last 20 snapshots for each of them:
  c:
    included-instances: ['important-container']
    keep-last: 20
    
  # Matches all instances _not_ named `important-container` (across all
  # projects) and keeps the last 5 snapshots for each of them:
  d:
    excluded-instances: ['important-container']
    keep-last: 5
    
  # Matches all instances that are running at the time of performing `backup` /
  # `prune`.
  #
  # Possible values: Aborting, Running, Ready, Starting, Stopped, and Stopping.
  e:
    included-statuses: ['Running']
    
  # Matches all instances that are _not_ running at the time of performing
  # `backup` / `prune`.
  f:
    excluded-statuses: ['Running']
    
  # Matches all instances named `php` or `nginx` that belong to project
  # `client-a` or `client-b`.
  #
  # For an instance to match this policy, it has to match all `included-*`
  # rules, so e.g.:
  #
  # - an instance named `php` for `client-c` will be skipped, since `client-c`
  #   doesn't match `included-projects`,
  #
  # - an instance named `nextcloud` for `client-a` will be skipped, since
  #   `nextcloud` doesn't match `included-instances`.
  #
  # In SQL, this would be:
  #
  # SELECT *
  #   FROM instances
  #  WHERE (project = "client-a" OR project = "client-b")
  #    AND (name = "php" OR name = "nginx")
  #    AND (status = "Running")
  g:
    included-projects: ['client-a', 'client-b']
    included-instances: ['php', 'nginx']
    included-statuses: ['Running']
 
  # Similarly as above (notice the reversed operator for `excluded-*`):
  #
  # SELECT *
  #   FROM instances
  #  WHERE (project = "client-a" OR project = "client-b")
  #    AND (name != "php" AND name != "nginx")
  h:
    included-projects: ['client-a', 'client-b']
    excluded-instances: ['php', 'nginx']

Retention strategies

lxd-snapper supports Borg-style retention strategies; each policy must specify at least one keep- option that says for how long its snapshots should be kept around.

The most straightforward setting is keep-last - e.g.:

policies:
  my-policy:
    keep-last: 5

... would keep the five newest snapshots for each container.

(i.e. if you ran backup-and-prune once a day, that would effectively keep the five days worth of snapshots around)

Being versatile, lxd-snapper also supports keep-hourly, keep-daily etc., allowing you to create more fancy policies such as:

policies:
  my-policy:
    keep-hourly: 6
    keep-daily: 5
    keep-weekly: 4
    keep-monthly: 3
    keep-yearly: 2

This would keep snapshots from 6 latest hours + 5 latest days + 4 latest weeks + 3 latest months + 2 latest years = 20 snapshots per instance.

Or, rephrasing:

  • we'd have a snapshot per each past hour, up to 6 of them (e.g. 15:00, 14:00, 13:00, 12:00, 11:00 & 10:00),
  • we'd have a snapshot per each past day, up to 5 of them (e.g. today, yesterday, the day before yesterday, 3 days ago & 4 days ago),
  • we'd have a snapshot per each past week, up to 4 of them (e.g. this week, the past week, two weeks ago & three weeks ago),
  • et cetera, et cetera.

This system takes a while to get used to, but it's also extremely versatile; you can find more examples inside the docs/example-configs directory and inside Borg's documentation.

Of course, you don't have to get fancy -- keep-last should get the job done most of the time.

Cascading

Say, you're using LXD projects and you've got a few containers:

$ lxc ls --project client-a
+-------+---------+------+------+-----------+-----------+
| NAME  |  STATE  | IPV4 | IPV6 |   TYPE    | SNAPSHOTS |
+-------+---------+------+------+-----------+-----------+
| mysql | RUNNING | ...  | ...  | CONTAINER | 0         |
+-------+---------+------+------+-----------+-----------+
| php   | RUNNING | ...  | ...  | CONTAINER | 0         |
+-------+---------+------+------+-----------+-----------+

$ lxc ls --project client-b
+-------+---------+------+------+-----------+-----------+
| NAME  |  STATE  | IPV4 | IPV6 |   TYPE    | SNAPSHOTS |
+-------+---------+------+------+-----------+-----------+
| mysql | RUNNING | ...  | ...  | CONTAINER | 0         |
+-------+---------+------+------+-----------+-----------+
| php   | RUNNING | ...  | ...  | CONTAINER | 0         |
+-------+---------+------+------+-----------+-----------+

$ lxc ls --project client-c
+-------+---------+------+------+-----------+-----------+
| NAME  |  STATE  | IPV4 | IPV6 |   TYPE    | SNAPSHOTS |
+-------+---------+------+------+-----------+-----------+
| mysql | RUNNING | ...  | ...  | CONTAINER | 0         |
+-------+---------+------+------+-----------+-----------+
| php   | RUNNING | ...  | ...  | CONTAINER | 0         |
+-------+---------+------+------+-----------+-----------+

And, for the sake of argument, let's say that you want to create the following configuration:

  • all mysql-s should have 5 latest snapshots,
  • all php-s should have 2 latest snapshots,
  • except for client-c, which is important and should get 10 snapshots.

That's what cascading is for - when multiple policies match a single container:

policies:
  # Matches: client-a/mysql, client-b/mysql, client-c/mysql
  all-mysqls:
    included-instances: ['mysql']
    keep-last: 5
    
  # Matches: client-a/php, client-b/php, client-c/php
  all-phps:
    included-instances: ['php']
    keep-last: 2
    
  # Matches: client-c/mysql, client-c/php
  important-clients:
    included-projects: ['client-c']
    keep-last: 10

... lxd-snapper will combine them top-bottom into a single policy, separately for each container.

What this means practically is that when a few policies match a single instance, policies that are below will have higher priority than the ones above them: important-clients is below all-mysqls and all-phps, so its keep-last is more important for client-c/mysql and client-c/php.

This merging happens on a per-retention-strategy basis, so if we had:

policies:
  # Matches: client-a/mysql, client-b/mysql, client-c/mysql
  all-mysqls:
    included-instances: ['mysql']
    keep-daily: 2
    
  # Matches: client-a/php, client-b/php, client-c/php
  all-phps:
    included-instances: ['php']
    keep-hourly: 8
    
  # Matches: client-c/mysql, client-c/php
  important-clients:
    included-projects: ['client-c']
    keep-last: 20

... then our effective configuration would be:

client-a/mysql + client-b/mysql
  keep-daily = 2
  
client-a/php + client-b/php
  keep-hourly = 8

client-c/mysql
  keep-daily = 2
  keep-last = 20
  (= 22 snapshots)

client-c/php
  keep-hourly = 8
  keep-last = 20
  (= 28 snapshots)

Other possible use cases for this feature include creating a global "catch all" policy, and then creating exceptions of it:

policies:
  all:
    keep-last: 10
    
  storages:
    include-containers: ['nextcloud', 'minio']
    keep-last: 20

This would keep 10 snapshots for all of the containers, with the exception of nextcloud and minio that would have 20 snapshots.

Hooks

Hooks are small shell commands executed when lxd-snapper performs a certain action; you can configure them by creating a hooks: section inside the configuration:

hooks:
  on-backup-started: 'echo "on-backup-started" >> /tmp/log.txt'
  on-snapshot-created: 'echo "on-snapshot-created: {{ remoteName }}, {{ projectName }}, {{ instanceName }}, {{snapshotName}}" >> /tmp/log.txt'
  on-instance-backed-up: 'echo "on-instance-backed-up: {{ remoteName }}, {{ projectName }}, {{ instanceName }}" >> /tmp/log.txt'
  on-backup-completed: 'echo "on-backup-completed" >> /tmp/log.txt'

  on-prune-started: 'echo "on-prune-started" >> /tmp/log.txt'
  on-snapshot-deleted: 'echo "on-snapshot-deleted: {{ remoteName }}, {{ projectName }}, {{ instanceName }}, {{ snapshotName }}" >> /tmp/log.txt'
  on-instance-pruned: 'echo "on-instance-pruned: {{ remoteName }}, {{ projectName }}, {{ instanceName }}" >> /tmp/log.txt'
  on-prune-completed: 'echo "on-prune-completed" >> /tmp/log.txt'

policies:
  # ...

They come handy e.g. for synchronizing snapshots to external storage:

hooks:
  on-snapshot-created: 'zfs send ... | ssh zfs recv ...'
  on-snapshot-deleted: 'zfs send ... | ssh zfs recv ...'

policies:
  # ...

Most of the hooks support variable interpolation - they are strings that are replaced by lxd-snapper with some concrete value before the hook is run:

  • on-snapshot-created has {{ remoteName }}, {{ projectName }}, {{ instanceName }} and {{ snapshotName }},
  • on-instance-backed-up has {{ remoteName }}, {{ projectName }} and {{ instanceName }},
  • on-snapshot-deleted has {{ remoteName }}, {{ projectName }}, {{ instanceName }} and {{ snapshotName }},
  • on-instance-pruned has {{ remoteName }}, {{ projectName }} and {{ instanceName }}.

... where:

  • {{ remoteName }} corresponds to NAME as visible in lxc remote ls (local by default),
  • {{ projectName }} corresponds to NAME as visible in lxc project ls (default by default),
  • {{ instanceName }} corresponds to NAME as visible in lxc ls,
  • {{ snapshotName }} corresponds to NAME as visible in lxc info instance-name.

Caveats & Tips:

  • hooks are skipped during --dry-run,

  • you can provide at most one script per hook (e.g. you can't have on-backup-started defined twice),

  • you don't have to provide scripts for hooks you're not interested in (e.g. specifying just on-backup-started is alright),

  • hooks are run only from inside lxd-snapper (e.g. on-snapshot-created will not be run for a manual lxc snapshot performed from the command line),

  • hooks are launched as soon as the event happens and block lxd-snapper until the hook completes - e.g.

    hooks:
      on-snapshot-created: 'delay 10'

    ... will delay creating each snapshot by 10 seconds; if that's problematic for your use case, you might want to buffer the changes like so:

    hooks:
      on-backup-started: 'rm /tmp/created-snapshots.txt'
      on-snapshot-created: 'echo "{{ instanceName }},{{ snapshotName }}" >> /tmp/created-snapshots.txt'
      on-backup-completed: './sync-snapshots.sh /tmp/created-snapshots.txt'
  • when a hook returns a non-zero exit code, it will be treated as an error,

  • hook's stdout and stderr are not displayed, unless the hook returns a non-zero exit code (stdout & stderr will be then visible in the error message),

  • variables can be written {{likeThat}} or {{ likeThat }}, whichever way you prefer.

Remotes

By default, lxd-snapper sees containers & virtual machines only from the local LXD instance (i.e. as if you run lxc ls).

If you're using LXD remotes, and you'd like for lxd-snapper to snapshot them too, you have to provide their names in the configuration file:

remotes:
  - server-a
  - server-b
  - server-c

If you'd like to snapshot both the local LXD and the remote ones, use a remote called local:

remotes:
  - local
  - server-a
  - server-b
  - server-c

(those labels correspond to NAME as visible in lxc remote ls)

By default, each policy will match all of the specified remotes - if you want to narrow that down, you can use included-remotes and excluded-remotes:

remotes:
  - unimportant-server-A
  - unimportant-server-B
  - important-server-A

policies:
  all-servers:
    keep-last: 10
  
  important-servers:
    included-remotes: ['important-server-A']
    keep-last: 25 

If you're going for a centralized backup solution, you can pair this feature with hooks to pull the newly-created snapshots into your coordinator-machine:

hooks:
  on-instance-backed-up: 'lxc copy --refresh {{ remoteName }}:{{ instanceName }} {{ instanceName }}'
  on-instance-pruned: 'lxc copy --refresh {{ remoteName }}:{{ instanceName }} {{ instanceName }}'

remotes:
  - server-A
  - server-B
  - server-C

policies:
  all-servers:
    keep-last: 10

Scheduling

Finally, lxd-snapper is a fire-and-forget application - it doesn't daemonize itself; to keep instances backed-up & pruned on time, you will want to create a systemctl timer or a cronjob for it:

5 * * * * /usr/bin/lxd-snapper -c /etc/lxd-snapper.yaml backup-and-prune

Configuration syntax reference

# (optional, defaults to 'auto-')
#
# Prefix used to distinguish between snapshots created by lxd-snapper and 
# everything else (e.g. a manual `lxc snapshot`).
#
# `lxd-snapper backup` will create snapshots with this prefix and
# `lxd-snapper prune` will only ever remove snapshots that match this prefix.
snapshot-name-prefix: '...'

# (optional, defaults to '%Y%m%d-%H%M%S')
#
# Formatting string used to build the rest of the snapshot name.
# 
# Format:
# https://docs.rs/chrono/0.4.22/chrono/format/strftime/index.html
snapshot-name-format: '...'

# (optional, defaults to '10m')
#
# Timeout for each call to lxc; prevents lxd-snapper from running forever if lxc
# happens to hang.
#
# If you've got a (very) slow storage, you might want to increase this limit, 
# but the default should be enough for a typical setup.
#
# Format:
# https://docs.rs/humantime/latest/humantime/
# (e.g. '30s', '5m', '1h' etc.)
lxc-timeout: '...'

# (optional)
hooks:
  on-backup-started: '...'
  on-instance-backed-up: '...'
  on-snapshot-created: '...'
  on-backup-completed: '...'
  
  on-prune-started: '...'
  on-snapshot-deleted: '...'
  on-instance-pruned: '...'
  on-prune-completed: '...'

# (optional, defaults to `local`)
remotes:
  - local
  - server-A
  - server-B

# (at least one required)
policies:
  policy-name:
    included-remotes: ['...', '...']
    excluded-remotes: ['...', '...']
    included-projects: ['...', '...']
    excluded-projects: ['...', '...']
    included-instances: ['...', '...']
    excluded-instances: ['...', '...']
    included-statuses: ['...', '...']
    excluded-statuses: ['...', '...']

    keep-hourly: 1
    keep-daily: 1
    keep-weekly: 1
    keep-monthly: 1
    keep-yearly: 1
    keep-last: 1
    keep-limit: 1

Contributing

Merge requests are very much welcome! :-)

lxd-snapper is a pretty standard Rust project, so cargo & rustc should be enough to get you going.

There are also end-to-end tests written using NixOS Testing Framework that you can run with nix flake check -j4.

Disclaimer

Snapshots are not a replacement for backups - to keep your data safe, use snapshots and backups together, wisely.

License

Copyright (c) 2019 Patryk Wychowaniec pwychowaniec@pm.me.
Licensed under the MIT license.