Binpack implements a protocol for binary packing of data. Binpack classes can be used to easily create user defined file types.
Files packed using the binpack protocol will have a small header section that holds the file type and version information. The body part is a sequence of named data streams. These data streams can be converted transparently into/from Python objects and are accessed using a dictionary interface. The keys in this dictionary are sorted in a predictable order.
The easiest way to create new packing type is to use the new_type() class method of BinPack.
>>> myfile = BinPack.new_type('myfile', header='myfile', version='0.1')
Any number of fields of different types can be added
>>> myfile.add_field('foo')
>>> myfile.add_field('bar', Compressed(Bytes('bar has a default value!')))
The order in which these fields are created is important because it is the same order that data is stored in the file. It is a good practice to leave the fields that hold large amounts of data to the end. Otherwise, these large chunks of data will be scanned in order to access a few bytes from a field that was saved in the end of the data stream.
After all fields are configured, the new class must be locked. It is not possible to add new fields to locked classes. But once the class is locked, it is ready to create instances. Locked classes cannot be unlocked and should not be modified in any way.
>>> myfile.lock()
A BinPack object opened in ‘writing’ mode works like a dictionary. When you assign some value to a key, it is automatically serialized and saved to the file.
>>> F = StringIO()
>>> with myfile(F, mode='w') as data:
... data['foo'] = 'data for foo'
... F.getvalue() # data is written to F as soon as possible
'\xffmyfile\xff0.1\xff\x0cdata for foo'
When the file is closed (which is automatic using the with statement, but manual otherwise), it writes any unwritten data to the file. If a field that does not have a default values was not assigned by the user, a ValueError is raised.
The data for the ‘bar’ field were automatically compressed in the data stream. Its default value should read ‘bar has a default value!’. Instead we see a bunch of seemly random bytes
>>> F.getvalue()
'\xffmyfile\xff0.1\xff\x0cdata for foo x\x9cKJ,R\xc8H,VHTHIMK,\xcd)Q(K\xcc)MU\x04\x00h\xf1\x08v'
Reading is similar to writing, and data is automatically converted to its desired type
>>> F2 = StringIO(F.getvalue())
>>> with myfile(F2, close=True) as data:
... data['foo'], data['bar']
('data for foo', 'bar has a default value!')
The close=True parameter used above tells the BinPack instance to automatically close the input file. This avoids a nested with statement that would be necessary for a safe manipulation of files.
>>> F2.closed, F.closed
(True, False)
(Not ready yet)
(Not ready yet)
binpack uses a method for storing data that can be portable across different languages. Of course, if portability is an issue, python-specific fields such as ‘Pickle’ must be avoided.
Data is stored by binpack as a predictable sequence of bytes.
The header part:
- The first byte of the file is always 0xFF.
- The following bytes are the ascii characters in the user-defined header, which are specific of each BinPack subclass.
- Another 0xFF marks the end of the header.
- The version string is appended and it is followed by a third 0xFF byte, which sinalizes the end of the header.
The body part:
Each field is associated in the file with a sequence of bytes. Before this sequence starts, however, we must tell how many bytes are present. This is done using the following protocol:
The first byte is read. If it is less than 2^7, it is interpreted as the size of the byte stream. This number of bytes is read and assigned to the field.
If the byte is greater than or equal to 2^7, the bytes are read until a byte smaller than 2^7 is found. The numbers greater than or equal 2^7 are subtracted by 2^7. The sequence of numbers is then interpreted as a single number formed by a sequence of 7-bit integer in little-endian order.
That is, if the bytes are b1, b2, ..., bn the result is given by
N = (b1-128) + 2^7 * (b2-128) + 2^14 * (b3-128) + ... + 2^(n-1) * bn.
The last byte was not subtracted by 128 (2^7) because it was already smaller than 128. N is then the size of the following byte stream. This number of bytes is then read and assigned to the current field.
The header and body nomenclature used in binpack refers only to how data is packed as binary stream. Of course, user formats may consider many fields in the body (in the sense of binpack) to be part of the header of its file and only a few fields that hold the bulk of the data to be part of the file body.
Packs binary data in a file.
Parameters : | F : file object
mode : ‘r’ or ‘w’
close : bool
atomic : bool
keep_items : True
|
---|
Methods
Register an ascii string to represent the file version.
Versioning may be needed if the data format changes with time.
Register a field associating a data type and a default value.
Packs binary data in a file.
Parameters : | F : file object
mode : ‘r’ or ‘w’
close : bool
atomic : bool
keep_items : True
|
---|
Methods
Class for byte string data.
This is the default data type for any field
Methods
Data is a unicode string saved using the specified encoding.
Methods
Serialize python objects using the pickle protocol
Examples
>>> pickle_file = BinPack.new_type('json_file', 'JSON', '0.1')
>>> pickle_file.add_field('data', Pickle())
>>> pickle_file.lock()
>>> F = StringIO()
>>> with pickle_file(F, 'w') as data:
... data['data'] = [None, 1, 'two']
>>> F.seek(0)
>>> with pickle_file(F) as data:
... print(data['data'])
[None, 1, 'two']
Methods
Compressed can be applied to any field in order to compress an arbitrary data stream.
Parameters : | filter : Filter instance
method : str
|
---|
Methods