Salvage

Salvage distributes sensitive data to multiple people such that it can only be recovered by several people working together. This is useful for storing information with both a low risk of losing access to it and a low risk of accidental disclosure. A classic application is to create a “recovery kit” for a server or infrastructure, which can be used in the event that conventionally stored keys and credentials become lost or unavailable.

Salvage works by encrypting a file or directory with a random master key and then applying a simple key-splitting scheme to distribute the key across multiple shares. You can create a kit for any number of participants with any threshold required to recover the information. For example, you might create a kit for five people, any three of whom may combine their shares to recover the data.

Salvage runs under Python 2.7 or Python 3.2 and later. The only external dependency is gpg, for the cryptography. For maximum utility, it is packaged as a single flat Python script that can be run with no installation. The algorithms and file formats are simple and carefully documented to ensure that recovery is always possible even if this software is unavailable for some reason.

Installation

$ pip install salvage

This package will only install the salvage executable. It does not depend on any Python packages.

Quick Start

To create a new salvage kit for five participants with a recovery threshold of three:

% salvage new 5 3 path/to/source/dir

This will create five shares, each containing an encrypted archive and some metadata. To decrypt and unpack the archive:

% salvage recover path/to/share1 path/to/share2 path/to/share3

The three paths must be three of the shares generated in the first step. The master key will be reconstructed and the data will be decrypted and unpacked.

See salvage -h for additional options.

Choosing Parameters

Salvage is designed to accomplish two somewhat competing goals: to minimize the risk both of disclosing and of losing some data. Given:

  • \(n \equiv\) the total number of participants or shares.
  • \(t \equiv\) the number of shares required to recover the data (the threshold).
  • \(t' = n - t + 1 \equiv\) the number of shares that must be lost to lose the data.
  • \(p_d \equiv\) the chance of disclosing any given share.
  • \(p_l \equiv\) the chance of losing any given share.

We can calculate the chances of disclosure or loss of the original data as:

  • \(p_{disc} = 1 - (1 - p_d^t)^\binom{n}{t}\)
  • \(p_{loss} = 1 - (1 - p_l^{t'})^\binom{n}{t'}\)

High values of \(t\) will give you a very low \(p_{disc}\), but \(p_{loss}\) could easily exceed \(p_l\) itself. Very low values of \(t\) will do the reverse. Unless you’re far more concerned with one over the other, \(t\) should typically be 40-60% of \(n\).

Calculator

Participants:
Threshold:
% Chance of disclosing secure data: %
% Chance of losing secure data: %

Practical Considerations

The risks of disclosure and loss can never be entirely eliminated, but there are several things that can be done to further reduce them.

Avoiding Disclosure

This is the easier one, as all of the usual rules apply. Each share of a salvage kit should be handled as if it were the raw data. Ideally, it will only exist on physical media and be stored like any other valuable and sensitive document. You can always apply extra protection to each share, such as encrypting it with the public key of the intended recipient.

Depending on your level of paranoia, you might also give some thought to how you prepare the kit. In order to create it, you need to have the original information assembled in the clear. If you’re doing this on a normal internet-connected machine, the data may be compromised before you’ve even protected it.

Consider using a clean air-gapped machine or booting from a read-only operating system such as Tails. You might also assemble the sensitive data in a RAM disk to avoid committing it to any persistent storage. Similarly, when a new kit is created, all of the pieces are stored together. Consider where these are being written and try to separate them as soon as possible.

Avoiding Loss

Salvage itself takes several steps to minimize the risk that a kit will become unrecoverable:

  • Every share in a salvage kit includes a full copy of the program that created it. It is not necessary to download or install any Python packages in order to run the main script.
  • Every share also includes a README with detailed instructions for using salvage.py. This includes instructions for OS X and Windows users who are not accustomed to running Python scripts.
  • The README in each share also includes detailed instructions for manually reconstructing the master key and decrypting the data, in case the Python script can not be run for any reason.

Here are a few additional recommendations for minimizing the risk of ultimate data loss:

  • Store the data well. No digital media lasts forever, but do some research on the current state of the art. If burning to optical media, buy high quality media designed for archiving. It’s also a good idea to print everything out on acid-free paper. Ink on paper lasts a long time and OCR scanners are easy to come by.
  • Refresh salvage kits periodically. Consider how long your storage media is expected to last and regenerate the kit well before that. This is also a good way to audit the previous kit and make sure none of the shares have gone missing.
  • Test the recovery process. You don’t necessarily need to do this with the real data. Create a sample recovery kit with a nonce and give it to the same people who hold the real thing. Make sure they can successfully reassemble the test kit without assistance. Add your own documentation if the standard README is not sufficient for your needs. (This mainly applies when your audience is not especially technical).

Technical Details

This section has a quick technical description of how salvage works. The cryptography involved is pretty trivial, so the bulk of the code is concerned with packaging and logistics. Following is the general procedure used to create a new salvage kit with \(n\) participants and a threshold of \(t\).

  1. The source data is archived, compressed, and encrypted with a random 128-bit key (rendered to a string for gpg). We also use the key to generate a SHA-256-HMAC of the unencrypted archive.

  2. For every unique set of \(t\) participants (of which there are \(\binom{n}{t}\)), \(t - 1\) random keys are generated. These are combined with the master key by xoring the bytes to produce a final random key. We now have \(t\) partial keys that xor to the master key. This can be visualized as a partially filled table of key material, one row for each \(t\)-sized subset of \(n\) and one column for each participant \([0,n)\). The values in each row xor to the same master key.

  3. \(n\) directories are created, each representing one share. Each share gets its own identical copy of the encrypted archive, plus some metadata in a json file. The metadata includes:

    • A version.
    • A common UUID identifying the kit as a whole.
    • The index of that particular share.
    • The HMAC value.
    • The values of \(n\) and \(t\).
    • A table of key material.

    The key material is essentially one column of the full key table: all of the partial keys that belong to this share, associated with a subgroup. In other words, it says “to combine shares 0, 1, and 2, use k1; else to combine shares 0, 1, and 3, use k2; ...”.

When \(t\) shares are brought together, one row of the key table can be fully reassembled, which means the master key can be recovered and the archive decrypted.

Changes

v0.1.3 - 2015-10-18 - Documentation

  • Small documentation fix.

v0.1.1 - 2015-1-31 - recover -a

  • Added the -a argument to the recover command.

v0.1.0 - 2015-1-22 - Initial release

Initial release.

LICENSE

Copyright (c) 2015, Peter Sagerson All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

  • Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
  • Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.