Welcome to binpack’s documentation!

Binpack implements a protocol for binary packing of data. Binpack classes can be used to easily create user defined file types.

Files packed using the binpack protocol will have a small header section that holds the file type and version information. The body part is a sequence of named data streams. These data streams can be converted transparently into/from Python objects and are accessed using a dictionary interface. The keys in this dictionary are sorted in a predictable order.

Usage

The easiest way to create new packing type is to use the new_type() class method of BinPack.

>>> myfile = BinPack.new_type('myfile', header='myfile', version='0.1')

Any number of fields of different types can be added

>>> myfile.add_field('foo')
>>> myfile.add_field('bar', Compressed(Bytes('bar has a default value!')))

The order in which these fields are created is important because it is the same order that data is stored in the file. It is a good practice to leave the fields that hold large amounts of data to the end. Otherwise, these large chunks of data will be scanned in order to access a few bytes from a field that was saved in the end of the data stream.

After all fields are configured, the new class must be locked. It is not possible to add new fields to locked classes. But once the class is locked, it is ready to create instances. Locked classes cannot be unlocked and should not be modified in any way.

>>> myfile.lock()

A BinPack object opened in ‘writing’ mode works like a dictionary. When you assign some value to a key, it is automatically serialized and saved to the file.

>>> F = StringIO()
>>> with myfile(F, mode='w') as data:
...     data['foo'] = 'data for foo'
...     F.getvalue()             # data is written to F as soon as possible
'\xffmyfile\xff0.1\xff\x0cdata for foo'

When the file is closed (which is automatic using the with statement, but manual otherwise), it writes any unwritten data to the file. If a field that does not have a default values was not assigned by the user, a ValueError is raised.

The data for the ‘bar’ field were automatically compressed in the data stream. Its default value should read ‘bar has a default value!’. Instead we see a bunch of seemly random bytes

>>> F.getvalue()
'\xffmyfile\xff0.1\xff\x0cdata for foo x\x9cKJ,R\xc8H,VHTHIMK,\xcd)Q(K\xcc)MU\x04\x00h\xf1\x08v'

Reading is similar to writing, and data is automatically converted to its desired type

>>> F2 = StringIO(F.getvalue())
>>> with myfile(F2, close=True) as data:
...     data['foo'], data['bar']
('data for foo', 'bar has a default value!')

The close=True parameter used above tells the BinPack instance to automatically close the input file. This avoids a nested with statement that would be necessary for a safe manipulation of files.

>>> F2.closed, F.closed
(True, False)

Converters

(Not ready yet)

Versioning

(Not ready yet)

Data format

binpack uses a method for storing data that can be portable across different languages. Of course, if portability is an issue, python-specific fields such as ‘Pickle’ must be avoided.

Data is stored by binpack as a predictable sequence of bytes.

The header part:

  • The first byte of the file is always 0xFF.
  • The following bytes are the ascii characters in the user-defined header, which are specific of each BinPack subclass.
  • Another 0xFF marks the end of the header.
  • The version string is appended and it is followed by a third 0xFF byte, which sinalizes the end of the header.

The body part:

Each field is associated in the file with a sequence of bytes. Before this sequence starts, however, we must tell how many bytes are present. This is done using the following protocol:

  • The first byte is read. If it is less than 2^7, it is interpreted as the size of the byte stream. This number of bytes is read and assigned to the field.

  • If the byte is greater than or equal to 2^7, the bytes are read until a byte smaller than 2^7 is found. The numbers greater than or equal 2^7 are subtracted by 2^7. The sequence of numbers is then interpreted as a single number formed by a sequence of 7-bit integer in little-endian order.

    That is, if the bytes are b1, b2, ..., bn the result is given by

    N = (b1-128) + 2^7 * (b2-128) + 2^14 * (b3-128) + ... + 2^(n-1) * bn.

    The last byte was not subtracted by 128 (2^7) because it was already smaller than 128. N is then the size of the following byte stream. This number of bytes is then read and assigned to the current field.

The header and body nomenclature used in binpack refers only to how data is packed as binary stream. Of course, user formats may consider many fields in the body (in the sense of binpack) to be part of the header of its file and only a few fields that hold the bulk of the data to be part of the file body.

API Documentation

Class methods in BinPack

class BinPack(F, mode='r', close=False, atomic=False, keep_items=True)[source]

Packs binary data in a file.

Parameters :

F : file object

File to be read or written.

mode : ‘r’ or ‘w’

Initialize binary pack in (r)ead or (w)rite mode.

close : bool

If True, it will also closes F when the .close() method is invoked.

atomic : bool

In normal operation, it tries to save data as soon as possible and tries to delay reading for as long as possible. If atomic is True, these operations are all performed at once.

keep_items : True

It keeps a copy of objects inserted or written to file. For large files, it may be a good idea to delete these objects once they are not needed anymore. If keep_items is False, it tries to delete objects ASAP. This means that most of the keys can be accessed (read or write) only once.

Methods

classmethod set_header(header)[source]

Defines an ascii string that serves as the file header.

classmethod set_version(version)[source]

Register an ascii string to represent the file version.

Versioning may be needed if the data format changes with time.

classmethod add_field(field, converter=None, force=False)[source]

Register a field associating a data type and a default value.

classmethod insert_field(field, key, converter=None)[source]

Register a field before the given key or index.

classmethod del_field(field)[source]

Register a field associating a data type and a default value.

classmethod lock()[source]

Prevents class from changing and allows creation of instances

classmethod new_type(name, header=None, version=None, binary=True)[source]

Returns a new BinPack sub-type

Instance methods in BinPack

class BinPack(F, mode='r', close=False, atomic=False, keep_items=True)[source]

Packs binary data in a file.

Parameters :

F : file object

File to be read or written.

mode : ‘r’ or ‘w’

Initialize binary pack in (r)ead or (w)rite mode.

close : bool

If True, it will also closes F when the .close() method is invoked.

atomic : bool

In normal operation, it tries to save data as soon as possible and tries to delay reading for as long as possible. If atomic is True, these operations are all performed at once.

keep_items : True

It keeps a copy of objects inserted or written to file. For large files, it may be a good idea to delete these objects once they are not needed anymore. If keep_items is False, it tries to delete objects ASAP. This means that most of the keys can be accessed (read or write) only once.

Methods

close()[source]

Close file.

Field types

class Field(default=None)[source]

Base class for all field types

Methods

encoder(obj)[source]

Convert arbitrary object to bytes data.

Must be overridden in child classes

decoder(data)[source]

Convert a data stream back to object.

Must be overridden in child classes

default()[source]

Return the default value for the field

default_enc()[source]

Return a string with the serialized default value for the field

class Bytes(default=None)[source]

Class for byte string data.

This is the default data type for any field

Methods

class String(default=None, encoding='utf8')[source]

Data is a unicode string saved using the specified encoding.

Methods

class Pickle(default=None, protocol=2)[source]

Serialize python objects using the pickle protocol

Examples

>>> pickle_file = BinPack.new_type('json_file', 'JSON', '0.1')
>>> pickle_file.add_field('data', Pickle())
>>> pickle_file.lock()
>>> F = StringIO()
>>> with pickle_file(F, 'w') as data:
...     data['data'] = [None, 1, 'two']
>>> F.seek(0)
>>> with pickle_file(F) as data:
...     print(data['data'])
[None, 1, 'two']

Methods

class Compressed(filter, method='zlib', **kwds)[source]

Compressed can be applied to any field in order to compress an arbitrary data stream.

Parameters :

filter : Filter instance

The original filter. Its data will be compressed in the file.

method : str

A string describing the compression method. Currently, only ‘bz2’ and ‘zlib’ are supported

Methods

Utility functions

Indices and tables