utils API Documentation

A collection of common utilities and convenient functions.

class convutils.utils.SimpleTsvDialect

Bases: csv.excel_tab

A simple tab-separated values dialect.

This Dialect is similar to csv.excel_tab, but uses '\n' as the line terminator and does no special quoting.

lineterminator = '\n'
quoting = 3
convutils.utils.append_to_file_base_name(path, addition)

Extends a file’s base name (the portion prior to the extension) with the addition.

For example, with a path of /foo/bar/spam.txt, and an addition of -eggs, the returned path will be /foo/bar/spam-eggs.txt.

Parameters:
  • path – a file path (does not have to actually exist)
  • addition – text to append to the base name
convutils.utils.column_args_to_indices(col_str)

Converts a string representing columns to actual indices.

Note that the text indices should be 1-indexed, and the returned indices and slices will be 0-indexed.

Parameters:col_str – a string of column designations (e.g., '1-4,6,8')
Returns:a list of 0-based indices and slices
convutils.utils.count_lines(fileh)

Determines the number of lines in a text file.

Parameters:fileh – a file handle
Returns:a non-negative integer.
convutils.utils.cumsum(iterable)

Calculates the cumulative sum at each index of an iterable.

Taken from http://stackoverflow.com/a/4844870/38140

Parameters:iterable – an iterable object
Yields :the cumulative sum up to the given index
convutils.utils.make_csv_dict_writer(csvfile, fieldnames, *args, **kwargs)

Creates a csv.DictWriter instance and also writes the header line to the file.

Parameters:
  • csvfile – a file handle to a CSV file opened in write mode
  • fieldnames – a list of field names for the columns
  • dialect – a csv.Dialect instance
  • *args

    passed on to csv.DictWriter()

  • **kwargs

    passed on to csv.DictWriter()

convutils.utils.make_csv_reader(csvfile, header=True, dialect=None, *args, **kwargs)

Creates a CSV reader given a CSV file.

Parameters:
  • csvfile – a file handle to a CSV file
  • header – whether or not the file has header
  • dialect – a csv.Dialect instance
  • *args

    passed on to the reader

  • **kwargs

    passed on to the reader

convutils.utils.make_simple_tsv_dict_writer(tsvfileh, fieldnames, *args, **kwargs)

Similar to make_csv_dict_writer(), but uses the SimpleTsvDialect as the dialect.

Parameters:
  • tsvfile – a file handle to a TSV file opened in write mode
  • fieldnames – a list of field names for the columns
  • *args

    passed on to csv.DictWriter()

  • **kwargs

    passed on to csv.DictWriter()

convutils.utils.make_simple_tsv_reader(tsvfile, header=True, *args, **kwargs)

Creates a CSV reader given a CSV file.

Parameters:
  • tsvfile – a file handle to a TSV file
  • header – whether or not the file has header
  • *args

    passed on to the reader

  • **kwargs

    passed on to the reader

convutils.utils.split_file_by_num_lines(infile, lines_per_part, header=False, pad_file_names=False, num_lines_total=None)

Divides a file into multiple files of the designated number of lines.

The new files will be of the form <basename>-<num>.<extension>, where <basename> and <extension> are derived from the original file, and <num> is the iteration of the split during which the new file was created.

Parameters:
  • infile – a file handle
  • lines_per_part – number of lines per new file (excluding header line, if present)
  • header – whether the original file has a header line; if True, header will be replicated in all new files
Pad_file_names :
 

provide zero-padding for the <num> in the output file names; requires counting all the lines in infile unless num_lines_total is provided (default: False)

Num_lines_total :
 

the total number of lines in infile; useful only in conjunction with pad_file_names

convutils.utils.split_file_by_parts(infile, max_num_parts, header=False, pad_file_names=False, num_lines_total=None)

Divides a file into the given number of parts.

The new files will be of the form <basename>-<num>.<extension>, where <basename> and <extension> are derived from the original file, and <num> is the iteration of the split during which the new file was created.

If the number of lines of the original file (minus the header line) is not perfectly divisible by the number of parts, fewer parts may be produced (e.g., If the given file has 10 lines and 6 parts are asked, only 5 will be produced), and the final file may have fewer lines than the previous (e.g., if the original file has 156 lines and 5 parts are asked, the first 4 parts will have 32 lines, and the final fifth part will have 28).

Parameters:
  • infile – a file handle
  • num_parts – number of parts to divide the file into
  • header – whether the original file has a header line; if True, header will be replicated in all new files
Pad_file_names :
 

provide zero-padding for the <num> in the output file names; requires counting all the lines in infile unless num_lines_total is provided (default: False)

Num_lines_total :
 

the total number of lines in infile; useful only in conjunction with pad_file_names

Previous topic

structs API Documentation

Next topic

CHANGELOG

This Page