utils API Documentation¶

A collection of common utilities and convenient functions.

class convutils.utils.SimpleTsvDialect¶

Bases: csv.excel_tab

A simple tab-separated values dialect.

This Dialect is similar to csv.excel_tab, but uses '\n' as the line terminator and does no special quoting.

lineterminator = '\n'¶

quoting = 3¶

convutils.utils.append_to_file_base_name(path, addition)¶

Extends a file’s base name (the portion prior to the extension) with the addition.

For example, with a path of /foo/bar/spam.txt, and an addition of -eggs, the returned path will be /foo/bar/spam-eggs.txt.

Parameters:	path – a file path (does not have to actually exist) addition – text to append to the base name

convutils.utils.column_args_to_indices(col_str)¶

Converts a string representing columns to actual indices.

Note that the text indices should be 1-indexed, and the returned indices and slices will be 0-indexed.

Parameters:	col_str – a string of column designations (e.g., `'1-4,6,8'`)
Returns:	a list of 0-based indices and slices

convutils.utils.count_lines(fileh)¶

Determines the number of lines in a text file.

Parameters:	fileh – a file handle
Returns:	a non-negative integer.

convutils.utils.cumsum(iterable)¶

Calculates the cumulative sum at each index of an iterable.

Taken from http://stackoverflow.com/a/4844870/38140

Parameters:	iterable – an iterable object
Yields :	the cumulative sum up to the given index

convutils.utils.make_csv_dict_writer(csvfile, fieldnames, *args, **kwargs)¶

Creates a csv.DictWriter instance and also writes the header line to the file.

Parameters:	csvfile – a file handle to a CSV file opened in write mode fieldnames – a list of field names for the columns dialect – a `csv.Dialect` instance args – passed on to `csv.DictWriter()` *kwargs – passed on to `csv.DictWriter()`

convutils.utils.make_csv_reader(csvfile, header=True, dialect=None, *args, **kwargs)¶

Creates a CSV reader given a CSV file.

Parameters:	csvfile – a file handle to a CSV file header – whether or not the file has header dialect – a `csv.Dialect` instance args – passed on to the reader *kwargs – passed on to the reader

convutils.utils.make_simple_tsv_dict_writer(tsvfileh, fieldnames, *args, **kwargs)¶

Similar to make_csv_dict_writer(), but uses the SimpleTsvDialect as the dialect.

Parameters:	tsvfile – a file handle to a TSV file opened in write mode fieldnames – a list of field names for the columns args – passed on to `csv.DictWriter()` *kwargs – passed on to `csv.DictWriter()`

convutils.utils.make_simple_tsv_reader(tsvfile, header=True, *args, **kwargs)¶

Creates a CSV reader given a CSV file.

Parameters:	tsvfile – a file handle to a TSV file header – whether or not the file has header args – passed on to the reader *kwargs – passed on to the reader

convutils.utils.split_file_by_num_lines(infile, lines_per_part, header=False, pad_file_names=False, num_lines_total=None)¶

Divides a file into multiple files of the designated number of lines.

The new files will be of the form <basename>-<num>.<extension>, where <basename> and <extension> are derived from the original file, and <num> is the iteration of the split during which the new file was created.

Pad_file_names :
Parameters:	infile – a file handle lines_per_part – number of lines per new file (excluding header line, if present) header – whether the original file has a header line; if `True`, header will be replicated in all new files
	provide zero-padding for the `<num>` in the output file names; requires counting all the lines in `infile` unless `num_lines_total` is provided (default: `False`)
Num_lines_total :
	the total number of lines in `infile`; useful only in conjunction with `pad_file_names`

convutils.utils.split_file_by_parts(infile, max_num_parts, header=False, pad_file_names=False, num_lines_total=None)¶

Divides a file into the given number of parts.

The new files will be of the form <basename>-<num>.<extension>, where <basename> and <extension> are derived from the original file, and <num> is the iteration of the split during which the new file was created.

If the number of lines of the original file (minus the header line) is not perfectly divisible by the number of parts, fewer parts may be produced (e.g., If the given file has 10 lines and 6 parts are asked, only 5 will be produced), and the final file may have fewer lines than the previous (e.g., if the original file has 156 lines and 5 parts are asked, the first 4 parts will have 32 lines, and the final fifth part will have 28).

Pad_file_names :
Parameters:	infile – a file handle num_parts – number of parts to divide the file into header – whether the original file has a header line; if `True`, header will be replicated in all new files
	provide zero-padding for the `<num>` in the output file names; requires counting all the lines in `infile` unless `num_lines_total` is provided (default: `False`)
Num_lines_total :
	the total number of lines in `infile`; useful only in conjunction with `pad_file_names`

utils API Documentation¶

Previous topic

Next topic

This Page

Navigation

utils API Documentation¶

Previous topic

Next topic

This Page

Quick search

Navigation