pyFileSec File-oriented privacy and integrity management tools (beta)


File-oriented security in python

pyFileSec provides a class SecFile that is intended to make it easier to protect computer files from casual inspection or accidental disclosure. By design, privacy assurance, ease-of-use, and a stable, cross-platform API are important security goals. Integrity assurance is useful but not a top priority. The speed of code execution is relatively unimportant. Truly sensitive information should be protected through multiple means, including procedural, physical, and legal methods.

pyFileSec is less about encryption (which it does handily, as do many excellent packages), and more about managing the immediate security issues that arise when working with files. Anyone doing system administration in a research lab might find it useful (including advanced users, not just IT staff), as might developers of software presentation programs for human subjects research. Anyone needing file management with compatible security goals could potentially benefit.

From a security perspective, the goal is to better protect files in the local environment, and reduce the chances of their accidental disclosure. It is beyond pyFileSec’s scope to try to defend against all possible adversarial attacks. pyFileSec is only concerned with file-oriented aspects of information security.

Example use-case: A research team might wish to collect data on illegal drug-use (or other HIPAA-covered information). To keep the window of accidental disclosure as small as possible, such sensitive information is best protected as early as possible in the data stream – ideally from within the data collection program. It is also desirable to be able to encrypt it without needing to be able to decrypt on the same computer, and without needing to store a password for decryption where the password might be copied, disclosed, or exposed to key-loggers (any of which could make encryption irrelevant). Being able to secure-delete the original file(s) to avoid leaving sensitive information on the disk is useful. And at times it can be desirable to obscure file sizes, e.g., so that a larger file cannot indicate a more extensive history of drug use.

Despite excellent tools for encryption being widely available, security is hard to achieve. Even good and trustworthy people can make mistakes that compromise security. Tools to help manage file security can reduce the chances of mistakes and help people be more confident and more productive.

pyFileSec is intended to be adequate for the purpose of securing data files within a typical research lab. Even so, the effective security will be higher if the data have low economic value (which is typically the case in psychology and neuroscience labs). The effective security will be much higher if the lab has reasonable physical and network security, with only trained, trusted people working there (also typically the case).

Cautions: Using encryption in a research context requires some consideration. Perhaps the most important thing to keep in mind is that, depending on your circumstances, the use of encryption (or particular forms of encryption) can conflict with policies of your boss, institution, or even government. You are responsible for knowing your situation, and for the consequences of your decisions about whether and how to use encryption. In addition, the encryption is definitely strong enough to cause trouble. Consider an example: Although you can lock yourself out of your own car or house, you could also hire someone with training and tools to break in on your behalf. With encryption, however, it would likely be prohibitively expensive to hire someone to “break in on your behalf”; hopefully that is not possible, even for a well-funded adversary. So it is possible to lose data by trying to secure it.

Development status: The development status is beta, meaning that minor API changes and bugs are possible. The development emphasis is currently on refactoring the code for Python 3. Documentation is a work in progress. A few extensions are planned, notably alternative encryption backends (likely gpg support) and using zip for archives. File permissions on Windows needs work.

Comments, bug reports or fixes, and code contributions are welcome. Feedback can be posted on github (see issues at https://github.com/jeremygray/pyfilesec/). Contact by private email is preferred for anything sensitive.

Principles and Approach

Using public-key (specifically RSA) encryption allows a non-secret “password” (the public key) to be distributed and used for encryption, with no need for the non-shared private key to be involved in the encryption process. This logically separates encryption from decryption, which in turn allows their physical separation. This separability gives considerable flexibility (and security). The idea is that anyone anywhere can encrypt information that only a trusted process (i.e., with access to the private key) can decrypt. For example, multiple testing-room computers could have the public key, and use it to encrypt the data from each subject so that it can be transferred to a main computer for de-identification, analysis, and archiving. The private key (for decryption) does not need to be shared beyond the main trusted computer. Keep it as private as possible.

pyFileSec does not, of itself, implement cryptographic code; by design it relies on external implementations. In particular, cryptographic operations use OpenSSL (see openssl.org), using its implementation of RSA and AES. These ciphers are industry standards and can be very secure when used correctly. The effective weak link is almost certainly not cryptographic but rather in how the encryption key(s) are handled, which depends mostly on you (the user), including what happens during key generation, storage, and backup. If your keys are bad or compromised, the encryption strength is basically irrelevant. The strength of the lock on your front door is irrelevant if you make a habit of leaving the key under the doormat.

Some considerations:

  • A test-suite is included as part of the library.
  • OpenSSL is not distributed as part of the library (see Installation).
  • By design, the computer used for encryption can be different from the computer used for decryption; it can be a different device, operating system, and version of OpenSSL. The only known incompatibility is that signatures (obtained from sign()) can fail to verify() if the version of OpenSSL used is too different (i.e., if one is pre version 1.0 and the other is 1.0 or higher).
  • You should both encrypt and decrypt only on machines that are physically secure, with access limited to trusted people. Although encryption can be done anywhere, using a public key, if someone used a different public key to encrypt data intended for you, you would not be able to access “your” data.
  • Ideally, do not move your private key from the machine on which it was generated; certainly never ever email it. It is typically fine to share the public key, certainly within a small group of trusted people, such as a research lab. The more widely it is distributed, the sooner it should be retired (and the encryption rotated on files encrypted with that key).
  • Some good advice from GnuPG: “If your system allows for encrypted swap partitions, please make use of that feature.”

Design goals:

  • Rely exclusively on standard, widely available and supported tools and algorithms. OpenSSL and the basic approach (RSA + AES 256) are well-understood and recommended (e.g., by Ferguson, Schneier, & Kohno (2010) Cryptography engineering. Indianapolis, Indiana: Wiley).
  • Allow for the relatively easy adoption of another encryption cipher suite, in the event that a change is necessary for cryptographic reasons.
  • For clarity, use and return full paths to files, not relative paths.
  • Avoid obfuscation. It does not enhance security, yet can make data recovery more difficult or expensive. So transparency is preferred. For this reason, meta-data are generated by default to make things less obscure; meta-data can be suppressed if desired.
  • Require OpenSSL version is 0.9.8 or higher.
  • Require a public key >= 1024 bits; you should only use 2048 or higher.
  • For the AES encryption, a random 256-bit session key (AES password) is generated for each encryption event.
  • Use standard formats as much as possible.
  • Managing the RSA keys is up to the user to do.

Installation

pyFileSec

Install things in the usual way for a python package:

% pip install pyFileSec

Dependencies

pyFileSec requires (but does not itself package) a copy of OpenSSL and a secure file-removal tool. Both are typically present on Mac and Linux; if so, installation is complete.

It is also possible to use a non-default (e.g., compiled) version of OpenSSL. You can specify the path with the --openssl path option (command-line use), or using pyfilesec.set_openssl(path) (python).

On a Mac, if you get the same output all is well:

% which openssl
/usr/bin/openssl
% which srm
/usr/bin/srm

On Linux, it is typically very similar:

% which openssl
/usr/bin/openssl
% which shred
/usr/bin/shred

On Windows, it is also free but not as easy.

1. Download and install OpenSSL from http://slproweb.com/products/Win32OpenSSL.html. First install the “Visual C++ 2008 Redistributables” (from the same page). Then install OpenSSL (Light is fine) and run through the installer pages. It should install to C:\OpenSSL-Win32 by default. pyFileSec should now be able to detect and use OpenSSL.

2. Download and install sdelete (free, from Microsoft) http://technet.microsoft.com/en-us/sysinternals/bb897443.aspx. pyFileSec should be able to detect sdelete.exe.

You will likely need to run these programs once manually and accept the terms before being able to use them from pyFileSec.

Getting started

Generally, you do not need administrative privileges to work with pyFileSec once it is installed. (The only exception is that, on Windows, you need to be an admin to check whether files have other hard links to them.)

Command line usage is likely to be easier with an alias. To find out what path and syntax to use in an alias, start python interactively (type python at a terminal or command prompt) and then:

>>> import pyfilesec as pfs
>>> pfs.command_alias()

This will print aliases for bash, csh / tcsh, and DOS. Copy and paste into your shell as appropriate (or paste elsewhere, like a ~/.bash_profile).

A demos/ directory is in the same directory as pyfilesec.py, and has usage examples for python scripting py_example.py, and for command-line / shell scripting sh_example.sh.

A guide readme.txt has basic instructions on how to generate an RSA key-pair using pyFileSec. Ideally, any valid .pem format key-pair should work; to date this has only been tested with keys generated using OpenSSL.

API

The API describes how to work with a SecFile object from within python. An understanding of the parameters will be useful for command-line / shell-script usage. Details about command-line syntax can be obtained using the usual --help option:

% python pyfilesec.py --help

Note

Any references to ‘clear text’ or ‘plain text’ simply mean an unencrypted file. It could be a binary file, or an encrypted file that is to be encrypted a second time. There is no requirement that it must be text.

The main class of interest is SecFile, described next. Three other classes are used internally, and so are also described here for completeness. There should be no need to understand anything except a SecFile in order to use it.

class SecFile()

class pyfilesec.SecFile(infile=None, pub=None, priv=None, pphr=None, codec=None, openssl=None)

Class for working with a file as a more-secure object.

A SecFile instance tracks a specific file, and regards it as being “the same” object despite differences to the underlying file on the disk file system (e.g., being encrypted).

Example

A SecFile object is created to track a file (here the file is named “The Larch.txt”, which happens to have a space in it). Typically the file name is given at initialization, but it can be given later as well:

>>> sf = SecFile('The Larch.txt')
>>> sf.file
'/Users/.../data/The Larch.txt'

The file can be now encrypted using a public key (stored in the file named pub.pem):

>>> sf.encrypt('pub.pem')
>>> sf.file
'/Users/.../data/The Larch.enc'

The SecFile instance remains the same, but the underlying file has been renamed with extension .enc. The original file has securely deleted.

SecFile objects have various properties that can be queried (continuing on from the above example):

>>> sf.is_encrypted
True
>>> sf.basename
'The Larch.enc'
>>> sf.snippet
'(encrypted)'

Decryption is done in a similar way, using a private key (here, as read from a file named priv.pem):

>>> sf.decrypt('priv.pem', 'pphr.txt')
>>> sf.basename
'The Larch.txt'

Note that the original file’s basename is restored; the full path is not.

encrypt(pub=None, meta=True, date=True, keep=False, enc_method='_encrypt_rsa_aes256cbc', hmac_key=None, note=None)

Encrypt a file using a public key.

By default, the original plaintext is secure-deleted after encryption (default keep=False). This is time-consuming, but important.

The idea is that you can have and share a public key, which anyone can use to encrypt things that only you can decrypt. Generating good keys and managing them is non-trivial (see genrsa() and documentation).

Files larger than 8G before encryption will raise an error.

To mask small file sizes, pad() them to a desired minimum size before calling encrypt().

Parameters :
pub:

The public key to use, specified as the path to a .pem file. The minimum recommended key length is 2048 bits; 1024 is allowed but strongly discouraged.

meta:

If True or a dict, include the meta-data (plaintext) in the archive. If given a dict, the dict will be updated with new meta-data. This allows all meta-data to be retained from the initial encryption through multiple rotations of encryption. If False, will indicate that the meta-data were suppressed.

See load_metadata() and log_metadata().

date:

True : save the date in the clear-text meta-data. False : suppress date from being saved in the meta-data.

Note

File time-stamps on the underlying file-system are NOT obscured, even if date=False.

keep:

False = remove original (unencrypted) file True = leave original file

enc_method:

name of the function / method to use (currently only one option, the default)

hmac_key:

optional key to use for a message authentication (HMAC-SHA256, post-encryption); if a key is provided, the HMAC will be generated and stored with the meta-data. (This is encrypt-then-MAC.) For stronger integrity assurance, use sign().

note :

allows a short, single-line string to be included in the meta-data. trimmed to ensure that its < 120 characters (mainly so that the text of a private key cannot become embedded in the meta-data, which are not encrypted).

decrypt(priv=None, pphr=None, keep_meta=False, keep_enc=False, dec_method=None)

Decrypt a file that was encoded using encrypt().

To get the data back, need two files: data.enc and privkey.pem. If the private key has a passphrase, you’ll need to provide that too. pphr can be the passphrase itself (a string), or a file name. These must match the public key used for encryption.

Works on a copy of data.enc, tries to decrypt it. The original data.enc is removed (unless keep_enc=True).

Tries to detect whether the decrypted file would end up inside a Dropbox folder; if so, refuse to proceed.

Parameters :
priv :

path to the private key that is paired with the pub key used at encryption; in .pem format

pphr :

passphrase for the private key (as a string, or filename)

keep_meta :

if False, unlink the meta file after decrypt

keep_enc :

if False, unlink the encrypted file after decryption

dec_method : (not implemented yet, only one choice).

name of a decryption method that has been registered in the current codec (see PFSCodecRegistry). None will try to use information in the file’s meta-data, and will fall through to the default method.

rotate(pub=None, priv=None, pphr=None, hmac_key=None, pad=None)

Swap old encryption for new: decrypt-then-re-encrypt.

Conceptually there are three separate steps: decrypt with priv (this is the “old” private key), re-encrypt (with the “new” public key), confirm that the rotation worked, and destroy the old (insecure) file. rotate() will only do the first two of these.

If pad is given, the padding will be updated to the new length prior to re-encryption.

New meta-data are added alongside the original meta-data. rotate() will preserve meta-data across encryption sessions, if available, adding to it rather than saving just the last one. (keep_meta=False will suppress all meta_data; typically rotation events are not sensitive.) Handling the meta-data is the principle motivation for having a rotate method; otherwise sf.decrypt(old).encrypt(new) would suffice.

Parameters :
priv :

path to the old private key that is paired with the pub key that was used for the existing encryption

pphr :

passphrase for the private key (as a string, or filename)

pub :

path to the new public key to be used for the new encryption.

hmac_key :

key (string) to use for an HMAC to be saved in the meta-data

sign(priv=None, pphr=None, out=None)

Sign a given file with a private key.

Get a digest of the file, sign the digest, return base64-encoded signature (or save it in file out).

verify(pub=None, sig=None)

Verify signature of filename using pubkey pub.

sig should be a base64-encoded signature, or a path to a sig file.

destroy()

Try to secure-delete a file.

Calls an OS-specific secure-delete utility, defaulting to:

Mac:     srm -f -z --medium  filename
Linux:   shred -f -u -n 7 filename
Windows: sdelete.exe -q -p 7 filename

To secure-delete a file, use this syntax:

SecFile(‘a.enc’).destroy().result

Ideally avoid the need to destroy files as much as possible. Keep sensitive data in RAM. File systems that are journaled, have RAID, are mirrored, or other back-up are much trickier to secure-delete.

destroy() may fail to remove all traces of a file if multiple hard-links exist for the file. For this reason, the original link count is returned. In the case of multiple hardlinks, Linux (shred) and Windows (sdelete) do appear to destroy the data (the inode), whereas Mac (srm) does not.

If destroy() succeeds, the SecFile object is reset(). The .result attribute contains the details. If destroy() fails, .result is not reset.

pad(size=16384)

Append null bytes to filename until it has length size.

The size is changed but the fact that it was changed is only obscured if the padded file is encrypted. pad only changes the effective length, and the padding is easy to see (unless the padding is encrypted).

Files shorter than size will be padded out to size (see details below). The minimum resulting file size is 128 bytes. Files that are already padded will first have any padding removed, and then be padded out to the new target size.

Padded files include a few bytes for padding-descriptor tags, not just null bytes. Thus files that are close to size already would not have their sizes obscured AND also be marked as being padded (in the last ~36 bytes), raising a PaddingError. To avoid this, you can check using the convenience function _ok_to_pad() before calling pad().

Internal padding format:

file + n bytes + padding descriptors + final byte

The padding descriptors consist of 10-digits + one byte + PFS_PAD, where byte is b’’ (the null byte). The process does not depend on the value of the byte. The 10 digits gives the length of the padding as an integer, in bytes. n is selected to make the new file size equal the requested size.

To make unpadding easier and more robust (and enable human inspection), the end bytes provide the number of padding bytes that were added, plus an identifier. 10 digits is not hard-coded as 10, but as the length of str(max_file_size), where the max_file_size constant is 8G by default. This means that any changes to the max file size constant can thus cause pad / unpad failures across versions.

Special size values:

0 : unpad = remove any existing padding, no error if not present

-1 : strict unpad = remove padding if present, error if not present

unpad()

Removes PFS padding from the file. raise PaddingError if no pad.

Truncates the file to remove padding; does not destroy the padding.

Other available SecFile methods include:

set_file() : change the file to work with, and set the .file property.

rename(new_name) : changes the name of the existing file on the file system.

read(n) : read n lines from the file, return as a single string.

SecFile objects have properties that can be accessed with the usual dot notation (i.e., as sf.property where sf is a SecFile object). Most cannot be set (exceptions noted).

file : the full path to the underlying file on the file system

basename : same as os.path.basename(sf.file), or None if no file.

size : (long int)
size in bytes on the disk as reported by os.path.getsize(sf.file).
metadata : (dict)
returns {} for an unencrypted file.
metadataf : (string)
human-friendly version of metadata, e.g., for log files. returns ‘{}’ for an unencrypted file.
snippet : (string)
up to 60 characters of the first line of the file; or will return ‘(encrypted)’, or None if no file
is_encrypted : (boolean)
True if encrypted by pyFileSec.SecFile.encrypt(); does not detect any-encryption-in-general.
is_in_dropbox : (boolean)
True if inside the user’s Dropbox folder
is_in_writeable_dir : (boolean)
True if the user has write permission to the file’s directory
is_tracked : by version control (boolean)
only git, svn, and mercurial (hg) are detected.
permissions : POSIX-style file permissions (int; -1 on Windows)

if sf.permissions is 384 (int), then oct(sf.permissions) will be ‘0600’.

Note

Can be assigned.

openssl : path

contains the path to the OpenSSL executable file to use.

Note

Can be assigned.

openssl_version : (string)
version of sf.openssl.
hardlinks : count of all hardlinks to the file (int)
the count includes sf.file as one link. requires Admin privileges on Windows.

Class SecFileArchive

A SecFileArchive object manages the encrypted (.enc) version of the file. In particular, an encrypted “file” has three pieces:

  • an encrypted version of the plain_text file (currently encrypted using AES-256-CBC)
  • an encrypted version of the AES password (sometimes called a session key) as encrypted using an RSA public key
  • a file containing meta-data about the encryption event (or a placeholder saying that meta-data were suppressed)

A SecFileArchive takes care of packing and unpacking the three pieces into a single underlying file on the file system. Currently this is an ordinary .tar.gz file:

% echo f > file
% python pyfilesec.py --encrypt file --pub pub.pem
% ls file.enc
file.enc
% tar xzvf file.enc
x file.aes256
x file.aes256.pwdrsa
x file.meta

The meta-data (file.meta) is always clear-text. This is to facilitate human inspection in archival uses.

class pyfilesec.SecFileArchive(name='', files=None, arc=None, keep=True)

Class for working with a cipher_text archive file (= *.enc).

Used transparently by SecFile as needed; typically there’s no need to work directly with a SecFileArchive.

  • Provide a name to create an empty archive, or infer a name from paths in

    files, or from archive arc name.

  • Providing files will also pack() them into the archive.

  • Providing an existing archive arc will also unpack it into a tmp

    directory and return full paths to the file names. (This can result in stray tmp files if they are not removed by the user, but everything sensitive is encrypted.)

get_dec_method(codec)

Return a valid decryption method from meta-data or default.

Cross-validate requested dec_method against meta-data.

pack(files, keep=True)

Make a tgz file from a list of paths, set permissions.

Eventually might take an arg to decide whether to use tar or zip. Just a tarfile wrapper with extension, permissions, unlink options. unlink is whether to unlink the original files after making a cipher_text archive, not a secure-delete option.

unpack()

Extract files from cipher_text archive, return paths to files.

Files are unpacked into a tmp directory; the process calling pack() should take care to clean up those files appropriately. There is no sensitive information revealed by unpacking files.

Parameters :
keep :

False will unlink the data_enc file after unpacking, but only if there were no errors during unpacking

Class RsaKeys

class pyfilesec.RsaKeys(pub=None, priv=None, pphr=None)

Class to manage and test RSA key-pairs.

update(pub=None, priv=None, pphr=None, req=0)

Accept new value, use existing val if no new one, or fail.

require(req)

Raise error if key requirement(s) req are not met; assert-like.

Used by SecFile methods: rsakeys.require(req=NEED_PUBK | NEED_PRIV) reads as assert rsakeys.pub and rsakeys.priv or raise a tailored error, including a missing passphrase if the private key is encrypted.

sniff(key)

Inspects the file key, returns information.

Example return values:

('pub', 2048) = public key with length (RSA modulus) 2048 bits

('priv', True) = encrypted private key (will require a
passphrase to use)

(None, None) = not a detectable key format

test()

Tests whether the key pair is suitable for use with pyFileSec.

Keys should be tested in matched pairs. Includes an actual test of encrypt-then-decrypt using the keys with the default codec.

An RsaKeys object has three properties:

pub : path
contains the path to the public key file.
priv : path
contains the path to the private key file.
pphr : (string)
contains the actual passphrase. If the passphrase was given initially as a path, it is read from the file.

Class GenRSA

This class can be used to generate key-pairs that are appropriate for use with pyFileSec.

class pyfilesec.GenRSA

A class to generate RSA key-pairs

dialog(interactive=True, args=None)

Command line dialog to generate an RSA key pair, PEM format.

To launch from the command line:

% python pyfilesec.py genrsa

The following will do the same thing, but save the passphrase into a file named ‘pphr’ [or save onto the clipboard]:

% python pyfilesec.py genrsa [--passfile | --clipboard]

And it can be done from a python interpreter shell:

>>> import pyfilesec as pfs
>>> pfs.genrsa()

The passphrase will not be printed if it was entered manually. If it is auto-generated, it will be displayed or saved to a file if option --passfile is given, or saved to the clipboard if option --clipboard is given. This is the only copy of the passphrase; the key-pair is useless without it. Actually, its far worse than useless. Its dangerous: you could still encrypt something that you could not decrypt.

Choose from 2048, 4096, or 8192 bits. 1024 is not secure medium-term, and 16384 bits is not needed (nor is 8192). A passphrase is required, or one will be auto generated. Ideally, generate a strong passphrase in a password manager (e.g., KeePassX), save there, paste it into the dialog.

You may want to generate keys for testing purposes, and then generate different keys for actual use.

Class Codec Registry

Currently there is only one option for a codec.

class pyfilesec.PFSCodecRegistry(defaults={}, test_keys=None)

Class to explicitly manage the encrypt and decrypt functions (= codec).

A PFSCodecRegistry is used to return the actual encrypt and decrypt functions to use when a SecFile object calls its .encrypt() or .decrypt() methods. The functions are vetted to conform to a minimal expected format, and can optionally be required to pass an encrypt-then-decrypt self-test before being registered (and hence available to a SecFile to use).

Typically, there is no need for anything other than the default registry that is set-up automatically. Each instance of a SecFile keeps its own copy of the registry. In part, having a registry is to help ensure longer-term API stability even in the event that a change in underlying cryptographic protocol is necessitated. It is also desirable to be able to support a “read only” mode, i.e., to access and use all decryption methods, while preventing encryption with that same codec.

The checks are designed to protect against archival ambiguity and operator errors, and not against adversarial manipulation of the registry.

To register a new function, the idea is to be able to do:

codec = PFSCodecRegistry()
new = {'_encrypt_xyz': _encrypt_xyz,
       '_decrypt_xyz': _decrypt_xyz}
codec.register(new)
register(new_functions, test_keys=None)

Validate and add a new codec functions to the registry.

Typically one registers encrypt and decrypt functions in pairs. Its possible to register only a decrypt function, to support “read only” (decrypt) use of a codec.

If test_keys is provided, an encrypt-decrypt self-test validation must passbefore registration can proceed. test_keys should be a tuple of (enc_kwargs, dec_kwargs) that will be passed to the respective functions being registered.

unregister(function_list)

Remove codec pairs from the registry based on keys.

is_registered(fxn_name)

Returns True if fxn_name is registered; validated at registration.

get_function(fxn_name)

Return a validated function based on its registry key fxn_name.

Tests and performance

The built-in tests can be run from the command line:

$ py.test pyfilesec.py

or from within the main directory just:

$ py.test -k-slow

To see log messages and gc-debug during tests:

$ python pyfilesec.py debug

If you try the ‘debug’ option, note that some of the tests are designed to check error situations; i.e., what is being tested is that situations that should fail, do fail, and are recognized as failure situations. This means that in the verbose output you should see some things that look exactly like error messages (e.g., “RSA operation error”) because these are logged.

For details of the specific tests, consult the code directly.

Performance

Files encrypted on one machine can be decrypted on a different platform. (Not tested yet with machines known to be of different endian-ness, however.)

With one exception, the specific version of OpenSSL does not matter. The known exception is that there are incompatibilities between v0.9.x and v1.0.x when using sign / verify. Tested with 9 versions of openssl, running on Mac OS X (10.8), 3 Linux distributions, and Windows 7:

OpenSSL 0.9.8r  8 Feb 2011     Mac 10.8.3, python 2.7.3
OpenSSL 0.9.8x 10 May 2012     Mac 10.8.4, python 2.7.3
OpenSSL 1.0.1e 11 Feb 2013     same Mac, openssl via macports
OpenSSL 1.1.0-dev xx XXX xx    same Mac, clone OpenSSL from github & compile
OpenSSL 1.0.0-fips 29 Mar 2010 CentOS 6.4, python 2.6.6
OpenSSL 1.0.1  14 Mar 2012     Ubuntu 12.04.2 LTS, python 2.7.3
OpenSSL 0.9.8o 01 June 2010    Debian (squeeze), python 2.6.6
OpenSSL 1.0.1e Light           Windows 7, python 2.7.3
OpenSSL 1.0.1e                 same Windows

Encryption is basically linear in time and disk space (file size; times will vary with CPU, disk speed, etc). Example values from a laptop:

1K takes ~0.2s to encrypt, ~0.1s decrypt
1M takes ~10s to encrypt,  ~5s decrypt
1G takes ~90s to encrypt,  ~60s decrypt
8G takes ~13m to encrypt

If backup software is running, that can greatly reduce a SecFile object’s apparent speed. Presumably, other concurrent and intensive disk usage would also do this.

Large files are fine (max tested is 8G). File-size inflation is consistently 3%:

1G:  1073741824 plain text --> 1106221296 encrypted
2G:  2147483648 plain text --> 2212437647 encrypted
8G:  8589934592 plain text --> 8849744181 encrypted

A fair amount of disk space is used for intermediate files during encryption. Encrypting an 8G plaintext file will temporarily require up tp 28G disk space (total):

-rw-------  1 jgray     4357464064 8gig.enc         # grows to 8849744181
-rw-------  1 jgray     8589934592 8gig.zeros
-rw-------  1 jgray    11632203140 8gig.zeros.aes256            # deleted
-rw-------  1 jgray            512 8gig.zeros.aes256pwd.rsa
-rw-------  1 jgray            667 8gig.zeros.meta

The reason for such space requirements is that, currently, the original file is only deleted after all the other steps have been carried out (and carried out successfully). The idea is to ensure as complete check that everything was indeed successful. Presumably–and with a slightly higher risk of losing data, in theory–one could delete the original file after the AES encryption and before archiving it. Only the encrypted (.aes256) file goes in the .enc` archive, not the original.

The larger .aes256 files get removed, leaving:

-rw-------  1 jgray    8849744181  8gig.enc

FAQ / Questions

Q: Will encryption make my data safe?

A: Think of it as adding another layer of security, of itself not being a complete solution. There are many issues involved in securing your data, and encryption alone does not magically solve all of them. Security needs to be considered at all stages in the process. The encryption provided is genuinely strong encryption (and as such could cause problems). Key management is the hard part. And don’t skip physical, legal, and procedural aspects of security.

Q: What if my private RSA private key is no longer private?

A: Obviously, try to avoid this situation. Fix: 1) Generate a new RSA key-pair, and then 2) rotate() the encryption on all files that were encrypted using the public key associated with the compromised private key.

The meta-data includes information about what public key was used for encryption, to make it easier to identify the relevant files. But even without that information, you could just try rotate()‘ing the encryption on all files, and it would only succeed for those with the right key pair. The meta-data are not required for key rotation. By design, pyFileSec is not needed for rotation (or decryption). It is basically just a wrapper to make it easier to work with standard, strong encryption tools, and document what was done and how.

Q: What should I do if my private RSA private key is reaching its expected end-of-life (see http://www.keylength.com)?

A: You should expect to do this. The rotate() function helps make this transition as easy as possible. Just generate a new RSA key-pair, and rotate() the encryption. It would be trivial to write a rotate_all() function to find all encrypted files in a directory, and rotate the encryption on those files.

Q: What if the internal (AES) password was disclosed (i.e., not the RSA private key but the one-time password that is used for the AES encryption)?

A: This is extremely unlikely during normal operation. If it should occur (e.g., maybe a power-failure or other crash at precisely the wrong time?) it would affect at most one file. Fix: Just rotate() the encryption for that file, using the same public key to re-encrypt. A new internal one-time password will be generated during the re-encryption step. (The internal AES password is never re-used, which is a crucial difference between the AES password and the RSA key pair.)

Q: What if I lose my private key?

A: Oops. Fix: None. The whole idea is that, if you don’t have the private key, data recovery should be prohibitively expensive, if it is even possible (and it is intended to not be possible). You should design your procedures under the assumption that data recovery will not going to happen if you lose the private key, even by hiring someone. (In fact, if someone can do so, please send me a private email with details, I’ll want to fix it!)