Home | Trees | Indices | Help |
---|
|
A stream friendly, simple compression library, built around iterators. See compress and decompress for the easiest way to get started.
After the TIFF implementation of LZW, as described at http://www.fileformat.info/format/tiff/corion-lzw.htm
In an even-nuttier-shell, lzw compresses input bytes with integer codes. Starting with codes 0-255 that code to themselves, and two control codes, we work our way through a stream of bytes. When we encounter a pair of codes c1,c2 we add another entry to our code table with the lowest available code and the value value(c1) + value(c2)[0]
Of course, there are details :)
Our control codes are
When dealing with bytes, codes are emitted as variable length bit strings packed into the stream of bytes.
codepoints are written with varying length
code points are stored with their MSB in the most significant bit available in the output character.
>>> import lzw >>> >>> mybytes = lzw.readbytes("README.txt") >>> lessbytes = lzw.compress(mybytes) >>> newbytes = b"".join(lzw.decompress(lessbytes)) >>> oldbytes = b"".join(lzw.readbytes("README.txt")) >>> oldbytes == newbytes True
Version: 0.01
Author: Joe Bowers
License: MIT License
Classes | |
ByteEncoder Takes a stream of uncompressed bytes and produces a stream of compressed bytes, usable by ByteDecoder. |
|
ByteDecoder Decodes, combines bit-unpacking and interpreting a codepoint stream, suitable for use with bytes generated by ByteEncoder. |
|
BitPacker Translates a stream of lzw codepoints into a variable width packed stream of bytes, for use by BitUnpacker. |
|
BitUnpacker An adaptive-width bit unpacker, intended to decode streams written by BitPacker into integer codepoints. |
|
Decoder Uncompresses a stream of lzw code points, as created by Encoder. |
|
Encoder Given an iterator of bytes, returns an iterator of integer codepoints, suitable for use by Decoder. |
|
PagingEncoder UNTESTED. |
|
PagingDecoder UNTESTED. |
Functions | |||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
Function Details |
Given an iterable of bytes, returns a (hopefully shorter) iterable of bytes that you can store in a file or pass over the network or what-have-you, and later use to get back your original bytes with decompress. This is the best place to start using this module. |
Given a one-byte long byte string, returns an integer. Equivalent to struct.unpack("B", b) |
Convenience for iterating over the bytes in a file. Given a file-like object (with a read(int) method), returns an iterator over the bytes of that file. |
Opens a file named by filename and iterates over the filebytes found therein. Will close the file when the bytes run out. |
Convenience for emitting the bytes we generate to a file. Given a filename, opens and truncates the file, dumps the bytes from bytesource into it, and closes it |
Produces an array of booleans representing the given argument as an unsigned integer, MSB first. If width is given, will pad the MSBs to the given width (but will NOT truncate overflowing results) >>> import lzw >>> lzw.inttobits(304, width=16) [0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0] |
Given a list of boolean values, interprets them as a binary encoded, MSB-first unsigned integer (with True == 1 and False == 0) and returns the result. >>> import lzw >>> lzw.intfrombits([ 1, 0, 0, 1, 1, 0, 0, 0, 0 ]) 304 |
Breaks a given iterable of bytes into an iterable of boolean values representing those bytes as unsigned integers. >>> import lzw >>> [ x for x in lzw.bytestobits(b"\x01\x30") ] [0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0] |
Interprets an indexable list of booleans as bits, MSB first, to be packed into a list of integers from 0 to 256, MSB first, with LSBs zero-padded. Note this padding behavior means that round-trips of bytestobits(bitstobytes(x, width=W)) may not yield what you expect them to if W % 8 != 0 Does *NOT* pack the returned values into a bytearray or the like. >>> import lzw >>> bitstobytes([0, 0, 0, 0, 0, 0, 0, 0, "Yes, I'm True"]) == [ 0x00, 0x80 ] True >>> bitstobytes([0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0]) == [ 0x01, 0x30 ] True |
Home | Trees | Indices | Help |
---|
Generated by Epydoc 3.0.1 on Tue Apr 20 15:25:48 2010 | http://epydoc.sourceforge.net |