Salvage distributes sensitive data to multiple people such that it can only be recovered by several people working together. This is useful for storing information with both a low risk of losing access to it and a low risk of accidental disclosure. A classic application is to create a “recovery kit” for a server or infrastructure, which can be used in the event that conventionally stored keys and credentials become lost or unavailable.
Salvage works by encrypting a file or directory with a random master key and then applying a simple key-splitting scheme to distribute the key across multiple shares. You can create a kit for any number of participants with any threshold required to recover the information. For example, you might create a kit for five people, any three of whom may combine their shares to recover the data.
Salvage runs under Python 2.7 or Python 3.2 and later. The only external dependency is gpg, for the cryptography. For maximum utility, it is packaged as a single flat Python script that can be run with no installation. The algorithms and file formats are simple and carefully documented to ensure that recovery is always possible even if this software is unavailable for some reason.
$ pip install salvage
This package will only install the salvage executable. It does not depend on any Python packages.
To create a new salvage kit for five participants with a recovery threshold of three:
% salvage new 5 3 path/to/source/dir
This will create five shares, each containing an encrypted archive and some metadata. To decrypt and unpack the archive:
% salvage recover path/to/share1 path/to/share2 path/to/share3
The three paths must be three of the shares generated in the first step. The master key will be reconstructed and the data will be decrypted and unpacked.
See salvage -h for additional options.
Salvage is designed to accomplish two somewhat competing goals: to minimize the risk both of disclosing and of losing some data. Given:
- \(n \equiv\) the total number of participants or shares.
- \(t \equiv\) the number of shares required to recover the data (the threshold).
- \(t' = n - t + 1 \equiv\) the number of shares that must be lost to lose the data.
- \(p_d \equiv\) the chance of disclosing any given share.
- \(p_l \equiv\) the chance of losing any given share.
We can calculate the chances of disclosure or loss of the original data as:
- \(p_{disc} = 1 - (1 - p_d^t)^\binom{n}{t}\)
- \(p_{loss} = 1 - (1 - p_l^{t'})^\binom{n}{t'}\)
High values of \(t\) will give you a very low \(p_{disc}\), but \(p_{loss}\) could easily exceed \(p_l\) itself. Very low values of \(t\) will do the reverse. Unless you’re far more concerned with one over the other, \(t\) should typically be 40-60% of \(n\).
Participants: | ||
Threshold: |
% | Chance of disclosing secure data: | % | |
% | Chance of losing secure data: | % |
The risks of disclosure and loss can never be entirely eliminated, but there are several things that can be done to further reduce them.
This is the easier one, as all of the usual rules apply. Each share of a salvage kit should be handled as if it were the raw data. Ideally, it will only exist on physical media and be stored like any other valuable and sensitive document. You can always apply extra protection to each share, such as encrypting it with the public key of the intended recipient.
Depending on your level of paranoia, you might also give some thought to how you prepare the kit. In order to create it, you need to have the original information assembled in the clear. If you’re doing this on a normal internet-connected machine, the data may be compromised before you’ve even protected it.
Consider using a clean air-gapped machine or booting from a read-only operating system such as Tails. You might also assemble the sensitive data in a RAM disk to avoid committing it to any persistent storage. Similarly, when a new kit is created, all of the pieces are stored together. Consider where these are being written and try to separate them as soon as possible.
Salvage itself takes several steps to minimize the risk that a kit will become unrecoverable:
Here are a few additional recommendations for minimizing the risk of ultimate data loss:
This section has a quick technical description of how salvage works. The cryptography involved is pretty trivial, so the bulk of the code is concerned with packaging and logistics. Following is the general procedure used to create a new salvage kit with \(n\) participants and a threshold of \(t\).
The source data is archived, compressed, and encrypted with a random 128-bit key (rendered to a string for gpg). We also use the key to generate a SHA-256-HMAC of the unencrypted archive.
For every unique set of \(t\) participants (of which there are \(\binom{n}{t}\)), \(t - 1\) random keys are generated. These are combined with the master key by xoring the bytes to produce a final random key. We now have \(t\) partial keys that xor to the master key. This can be visualized as a partially filled table of key material, one row for each \(t\)-sized subset of \(n\) and one column for each participant \([0,n)\). The values in each row xor to the same master key.
\(n\) directories are created, each representing one share. Each share gets its own identical copy of the encrypted archive, plus some metadata in a json file. The metadata includes:
The key material is essentially one column of the full key table: all of the partial keys that belong to this share, associated with a subgroup. In other words, it says “to combine shares 0, 1, and 2, use k1; else to combine shares 0, 1, and 3, use k2; ...”.
When \(t\) shares are brought together, one row of the key table can be fully reassembled, which means the master key can be recovered and the archive decrypted.
Initial release.
Copyright (c) 2015, Peter Sagerson All rights reserved.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.