A collection of common utilities and convenient functions.
Bases: csv.excel_tab
A simple tab-separated values dialect.
This Dialect is similar to csv.excel_tab, but uses '\n' as the line terminator and does no special quoting.
Extends a file’s base name (the portion prior to the extension) with the addition.
For example, with a path of /foo/bar/spam.txt, and an addition of -eggs, the returned path will be /foo/bar/spam-eggs.txt.
Parameters: |
|
---|
Converts a string representing columns to actual indices.
Note that the text indices should be 1-indexed, and the returned indices and slices will be 0-indexed.
Parameters: | col_str – a string of column designations (e.g., '1-4,6,8') |
---|---|
Returns: | a list of 0-based indices and slices |
Determines the number of lines in a text file.
Parameters: | fileh – a file handle |
---|---|
Returns: | a non-negative integer. |
Calculates the cumulative sum at each index of an iterable.
Taken from http://stackoverflow.com/a/4844870/38140
Parameters: | iterable – an iterable object |
---|---|
Yields : | the cumulative sum up to the given index |
Creates a csv.DictWriter instance and also writes the header line to the file.
Parameters: |
|
---|
Creates a CSV reader given a CSV file.
Parameters: |
|
---|
Similar to make_csv_dict_writer(), but uses the SimpleTsvDialect as the dialect.
Parameters: |
|
---|
Creates a CSV reader given a CSV file.
Parameters: |
|
---|
Divides a file into multiple files of the designated number of lines.
The new files will be of the form <basename>-<num>.<extension>, where <basename> and <extension> are derived from the original file, and <num> is the iteration of the split during which the new file was created.
Parameters: |
|
---|---|
Pad_file_names : | |
provide zero-padding for the <num> in the output file names; requires counting all the lines in infile unless num_lines_total is provided (default: False) |
|
Num_lines_total : | |
the total number of lines in infile; useful only in conjunction with pad_file_names |
Divides a file into the given number of parts.
The new files will be of the form <basename>-<num>.<extension>, where <basename> and <extension> are derived from the original file, and <num> is the iteration of the split during which the new file was created.
If the number of lines of the original file (minus the header line) is not perfectly divisible by the number of parts, fewer parts may be produced (e.g., If the given file has 10 lines and 6 parts are asked, only 5 will be produced), and the final file may have fewer lines than the previous (e.g., if the original file has 156 lines and 5 parts are asked, the first 4 parts will have 32 lines, and the final fifth part will have 28).
Parameters: |
|
---|---|
Pad_file_names : | |
provide zero-padding for the <num> in the output file names; requires counting all the lines in infile unless num_lines_total is provided (default: False) |
|
Num_lines_total : | |
the total number of lines in infile; useful only in conjunction with pad_file_names |