data

data is a small Python module that allows you to treat input in a singular way and leave it up to the caller to supply a byte-string, a unicode object, a file-like or a filename.

>>> open('helloworld.txt', 'w').write('hello, world from a file')

>>> from data import Data as I
>>> a = I(u'hello, world')
>>> b = I(file='helloworld.txt')
>>> c = I(open('helloworld.txt'))

>>> print unicode(a)
hello, world
>>> print unicode(b)
hello, world from a file
>>> print unicode(c)
hello, world from a file

This can be made even more convenient using the data decorator:

>>> from data.decorators import data

>>> @data('buf')
... def parse_buffer(buf, magic_mode=False):
...   return 'buf passed in as ' + repr(buf)
...

>>> parse_buffer('hello')
"buf passed in as Data(data='hello', encoding='utf8')"

>>> rv = parse_buffer(open('helloworld.txt'))
>>> assert 'file=' in rv

Fitting in

All instances support methods like read or __str__ that make it easy to fit it into existing APIs:

>>> d = I('some data')
>>> d.read(4)
u'some'
>>> d.read(4)
u' dat'
>>> d.read(4)
u'a'
>>> e = I(u'more data')
>>> str(e)
'more data'

Note how read returns unicode. Additionally, readb is available:

>>> f = I(u'I am \xdcnicode.')
>>> f.readb()
'I am \xc3\x9cnicode.'

Every data object has an encoding attribute which is used for converting from and to unicode.

>>> g = I(u'I am \xdcnicode.', encoding='latin1')
>>> g.readb()
'I am \xdcnicode.'

Iteration and line reading are also supported:

>>> h = I('I am\nof many\nlines')
>>> h.readline()
u'I am\n'
>>> h.readlines()
[u'of many\n', u'lines']

>>> i = I('line one\nline two\n')
>>> list(iter(i))
[u'line one\n', u'line two\n']

Extras

save_to

Some useful convenience methods are available:

>>> j = I('example')
>>> j.save_to('example.txt')

The save_to method will use the most efficient way possible to save the data to a file (copyfileobj or write()). It can also be passed a file-like object:

>>> k = I('example2')
>>> with open('example2.txt', 'wb') as out:
...     k.save_to(out)
...

temp_saved

If you need the output inside a secure temporary file, temp_saved is available:

>>> l = I('goes into tmp')
>>> with l.temp_saved() as tmp:
...     print tmp.name.startswith('/tmp/tmp')
...     print l.read()
...
True
goes into tmp

temp_saved functions almost identically to tempfile.NamedTemporaryFile, with one difference: There is no delete argument. The file is removed only when the context manager exits.

Where it is useful

data can be used on both sides of an API, either while passing values in:

>>> import json
>>> from data import Data as I

>>> m = I('{"this": "json"}')
>>> json.load(m)
{u'this': u'json'}

or when getting values passed (see the data decorator example above). If necessary, you can also support APIs that allow users to pass in filenames:

>>> class Parser(object):
...   @data('input')
...   def parse(self, input, parser_opt=False):
...     return input
...   def parse_file(self, input_file, *args, **kwargs):
...     return self.parse(I(file=input_file), *args, **kwargs)
...

>>> p = Parser()
>>> p.parse_file('/dev/urandom')
Data(file='/dev/urandom', encoding='utf8')

See the documentation at http://pythonhosted.org/data for an API reference.

Python 2 and 3

data works the same on Python 2 and 3 thanks to six, a few compatibility functions and a testsuite.

Python 3 is supported from 3.3 onwards, Python 2 from 2.6.

Table of contents