multifileiter

An iterator over all lines of the given files. Like module fileinput in the standard library, but faster, and written in C.

Unlike fileinput, input files are not completely read into memory; it can handle files of any size.

In addition, a replacement for the standard fileinput.FileInput legacy class is provided.

This package has no external dependencies. It has been tested in Python 2.6; support for Python 3.1 is still experimental.

Installation

Extract the source distribution in some temporary directory, and execute:

python setup.py build
python setup.py install

It uses the very same test suite as the standard fileinput module. To run the tests:

cd test
python test_multifileiter.py

Usage

This package implements a basic multiple file iterator written in C, a thin Python wrapper on top of it, and a replacement for the standard fileinput.FileInput class.

Example:

from multifileiter.fileinput import FileInput

fi = FileInput(list_of_file_names)
# iterate over every line in every file
for line in fi:
    process_line(line)

If you want to rewrite the input files with new content:

fi = FileInput(list_of_file_names, inplace=True)
for line in fi:
    new_line = process_line(line)
    fi.output.write(new_line)

The output attribute points to the currently written file. If you want the legacy FileInput behavior (printing or writing to sys.stdout goes to the output file) use replace_stdout=True.

Replacing the fileinput standard module

Class LegacyFileInput implements the same interface as the standard library's FileInput. You may monkey-patch the standard fileinput module to gain the speed of the new module without modifying any legacy code. Just execute this at the start of your program:

# monkey-patch stdlib's fileinput
import multifileiter.fileinput
import fileinput
fileinput.FileInput = multifileiter.fileinput.LegacyFileInput

.. note::

    In addition to the :class:`FileInput` class, the :mod:`fileinput` standard module
    exposes many global functions. Using those global functions with
    this version of :class:`FileInput` may work, or may not. I don't like its
    global nature, they were never tested with this module, and using
    them is not supported. Use at your own risk.

Reference

class multifileiter.fileinput.MultiFileIter (files=None, mode="r")

files is any iterable yielding either strings or file-like objects. The iterable is consumed lazily. Strings are considered file names and the corresponding file is opened using the mode parameter. The string '-' is special-cased and represents sys.stdin. Other objects are assumed to be file-like objects; only their next(), name() and close() methods are called (the latter two being optional).

MultiFileIter implements the iterator protocol. next() returns each line from its input files.

MultiFileIter objects have these methods:

lineno()
Cumulative line number of the line that has just been read, or 0 before reading the first line.
filelineno()
Line number of the line that has just been read, in the current file.
isfirstline()
True if the line that has just been read is the first line in the file, false otherwise.
filename()
Name of the file currently being read, or None.
isstdin()
True if the line just read came from stdin, false otherwise.
fileno()
File descriptor of the current file, or -1 when no file is opened.
nextfile()
Closes the current file. The next iteration will read the first line from the next file (if any).
close()
Close the input file, the output file (if any) and exits the whole iteration.

class multifileiter.fileinput.FileInput (files=None, inplace=0, backup="", mode="r", openhook=None, replace_stdout=False)

FileInput extends MultiFileIter, adding support for writing files.

files and mode are passed to the base class.

When inplace is true, the input files are renamed, and a new file of the same name is created for writting (it may be accessed thru the output property). In addition, if replace_stdout is true, standard output (sys.stdout) is redirected to that file too.

backup is the extension added to the original file names; '.bak' is used if not specified.

openhook is a function used instead of the builtin open function to open the files; it must take two positional arguments, filename and mode.

FileInput objects have these attributes:

output

The file currently being written (or None when inplace is false)

Bugs, comments, etc.

The issue tracker is located at http://code.google.com/p/multifileiter/issues

Alternatively you may contact the author: Gabriel A. Genellina <ggenellina@yahoo.com.ar>

License

This package is Copyright 2010 Gabriel A. Genellina, and licensed under the MIT license: http://opensource.org/licenses/mit-license.php

See license.txt for details.