bioplus.tabfile

tools for dealing with files that have tabular data

class bioplus.tabfile.BedFile(f, additional_comments=[], **kwargs)[source]

A BED file is a type of TabFile, but also defines a method for working with rows. rows are given as instances of BedRow, instead of lists. BEDrows inerhit all list methods and therefore are compatible with write_row. BedRow has additional methods for chrom, chromStart, chromEnd, etc. For more info, see BedRow

track, browser lines are treated as comments

Assumes track row is a comment. Use getTrackLine to see the track info

DEFAULT_BED_COMMENTS = ['(?i)track', '(?i)browser']
get_track_line()[source]

returns the current track line, if any

class bioplus.tabfile.BedRow[source]

BEDrows are list, but you can access their chromStart(), chromEnd(), etc. use help for a full list. Uses the same conventions as http://genome.ucsc.edu/FAQ/FAQformat#format1. Note that only the first three entries (chrom, chromStart, chromEnd) are required, so the others may not be defined.

blockCount()[source]

return the number of blocks (exons) in the BED line

blockSizes()[source]

returns a comma-separated list of the block sizes. The number of items in this list should correspond to blockCount.

blockStart()[source]

returns a comma-separated list of block starts. All of the blockStart positions should be calculated relative to chromStart. The number of items in this list should correspond to blockCount.

chrom()[source]

returns the name of the chromosome (e.g. chr3, chrY, chr2_random) or scaffold (e.g. scaffold10671)

chromEnd()[source]
chromStart()[source]
chrom_end()[source]

returns the ending position of the feature in the chromosome or scaffold. The chromEnd base is not included in the display of the feature. For example, the first 100 bases of a chromosome are defined as chromStart=0, chromEnd=100, and span the bases numbered 0-99

chrom_start()[source]

returns the starting position of the feature in the chromosome or scaffold. The first base in a chromosome is numbered 0

itemRgb()[source]

returns itemRgb, An RGB value of the form R,G,B (e.g. 255,0,0). If the track line itemRgb attribute is set to “On”, this RBG value will determine the display color of the data contained in this BED line. NOTE: It is recommended that a simple color scheme (eight colors or less) be used with this attribute to avoid overwhelming the color resources of the Genome Browser and your Internet browse

name()[source]

returns the name of the BED line. This label is displayed to the left of the BED line in the Genome Browser window when the track is open to full display mode or directly to the left of the item in pack mode.

score()[source]

returns the score, a number between 0 and 1000. If the track line useScore attribute is set to 1 for this annotation data set, the score value will determine the level of gray in which this feature is displayed (higher numbers = darker gray)

strand()[source]

returns the strand, either ‘+’ or ‘-‘

thickEnd()[source]

returns the ending position at which the feature is drawn thickly (for example, the stop codon in gene displays).

thickStart()[source]

returns the starting position at which the feature is drawn thickly (for example, the start codon in gene displays)

class bioplus.tabfile.Bzip2TabFile(*args, **kwargs)[source]

For bzip2-compressed tab-delimited files

See Tabfile for usage info

exception bioplus.tabfile.DetectCommentsError(*args)[source]
class bioplus.tabfile.GzipTabFile(*args, **kwargs)[source]

For gzip-compressed tab-delimited files

See Tabfile for usage info

class bioplus.tabfile.Macs2Row[source]
FDR()[source]
name()[source]
qvalue()[source]
class bioplus.tabfile.MacsFile(f, convert_spaces=True, **kwargs)[source]

A MACS file is a type of TabFile, but also defines a method for working with rows. rows are given as instances of MACSRow, instead of lists. MACSrows inerhit all list methods and therefore are compatible with write_row. MacsRow has additional methods for chrom, chromStart, chromEnd, etc. For more info, see MacsRow

class bioplus.tabfile.MacsRow[source]

MACSrows are list, but you can access their features as follows:

chr() or chrom() – chromosome name start() or chromStart() – start position, start() is 1-based, chromStart is 0-based (BED) end() or chromEnd() – end position, equivalent but chromEnd (BED) is defined as 0-based, exclusive length() – length summit() – position of summit tags() – number of unique tags in the peak region pvalue() – returns the -10*log10(pvalue) fold_enrichment – returns the fold enrichment FDR – returns the FDR in %

FDR(type_=<type 'str'>)[source]

returns the FDR (%). preserves the str to eliminate rounding error. use type=float to get a decimal value

chr()[source]

returns the name of the chromosome (e.g. chr3, chrY, chr2_random) or scaffold (e.g. scaffold10671)

chrom()[source]

returns the name of the chromosome (e.g. chr3, chrY, chr2_random) or scaffold (e.g. scaffold10671)

chromEnd()[source]
chromStart()[source]
chrom_end()[source]

returns the ending position of the feature in the chromosome or scaffold. The chromEnd base is not included in the display of the feature. For example, the first 100 bases of a chromosome are defined as chromStart=0, chromEnd=100, and span the bases numbered 0-99

chrom_start()[source]

returns the starting position of the feature in the chromosome or scaffold. The first base in a chromosome is numbered 0

end()[source]

returns the end position, 1-based, inclusive

fold_enrichment(type=<type 'str'>)[source]

returns the fold_enrichment vs control. preserves the str to eliminate rounding error. use type=float to get a decimal value

length()[source]

returns the length

pvalue(type_=<type 'str'>)[source]

returns the -10*log10(pvalue). preserves the str to eliminate rounding error. use type=float to get a decimal value

start()[source]

returns the starting position of the feature in the chromosome or scaffold. The first base in a chromosome is numbered 1

summit()[source]

returns the position of the summit

tags(type_=<type 'int'>)[source]

returns the number of unique tags in the peak region

tagsv1()[source]
tagsv2()[source]
class bioplus.tabfile.TabFile(filename, mode='r', convert_spaces=True, compression=None, comments=[], column_names=False)[source]
Usage: f = TabFile(‘filename’, convert_spaces=True,
comments=[], column_names = False)

TabFile is a class for handling tab-delimited files.

Use convert_spaces=False if you’re file is tab-delimited and you wish to preserve other whitespace.

TabFile suports commented lines. Commented lines are not recognzied as part of the table

By default, only lines beginning with ‘#’ will be recognized as comments (not part of the table). You may specify a list of additional keywords using comments=[‘keyword1’,’keyword2’,etc.]. All lines containing that keyword will be recognized as a comment. keywords may be regular expressions.

[‘(?i)track’,’(?i)browser’]
if column_names = True, the first properly formatted row will be treated as column names (i.e. ignored as a comment)
close()[source]

works just like the built-in close method in the file class

column_dict()[source]

returns a dictionary which gives the index corresponding to a particular column name

comment_line_contents()[source]

returns the list of lines that are comments

comment_line_numbers()[source]

returns the list of lines that are comments

get_column_names()[source]

returns the column names

mergesort(f, n, numerical=False)[source]
open(mode=None)[source]

mode can be overriden here but defaults to TabFile.mode

acts just like the built-in open method in the file class. use write=True to write to a file, otherwise it will be opened in read-only mode

previous_line()[source]

returns the line number of the last line read

process_table(output_filename, fnc, column_names=None)[source]

process_tables2 writes a new file (name is specified with new_file), which applies a user-defined function fnc to each row of data in the original file. fnc should yield a row (i.e. a list, array, or something else finitely iterable).

process_tables2 preserves all commented lines and also the line which column names, if applicable. The user may specify new column names using column_names (a list or other finite iterable), or we will use the old column_names, which might not preserve the column labels if columns were inserted in the middle of the table

read_col(n)[source]

returns a list of items in column n (numbering starts at 0) as items rather than lists. Unlike read_cols and read_table, elements of the read_col list are not lists, but strings

read_cols(L)[source]

read_cols behaves like read_col but instead of taking a single column number (numbering starts at 0), it takes a list of column numbers and returns a list of partial rows, where each partial row is a list with entries from the appropriate columns IN THE ORDER SPECIFIED.

tip: use range() to create lists of ordered integers. e.g., range(2,6)=[2,3,4,5]

read_first_col()[source]

return a list of items in the first column

read_last_col()[source]

returns a list of items in the last column

read_row()[source]

returns the next (or first) line that is not a comment, parsed. uses __iter__ as a generator, and simply returns the next value

read_row is deprecated. Use x = self.__iter__() and x.next()

read_table(override=False)[source]

returns the contents of a file as a list of rows (with each row as a list). will ignore any lines that begin with a “#” symbol and truncate any lines that contain a “#” symbol

readline()[source]

reads one line and returns it. uses a generator, and will raise StopIteration if it reaches the EOF.

readline is deprecated. use x = self.__rawiter__() and x.next()

set_column_names(L)[source]
write(s)[source]

writes a string directly to a file, without modification (user must supply n if desired)

write_column_names()[source]

writes the stored column names

write_row(row, separator='t')[source]

Writes a list to the file as a line (Tab-delimited). A different separator may also be specified with separator=’x’. (Note: uses file writelines method)

write_rows(iterable)[source]
write_table(table, separator='t', override=False, column_names=True)[source]

Writes a table to a file (tab-delimited). An alternative separator may be specified with separator=’x’. if column_names = True, column_names will be included as the first line unless they do not exist.

zap()[source]

forces status to not open. use with caution. this may destroy data

exception bioplus.tabfile.TabFileError(*args)[source]
bioplus.tabfile.merge_files(left, right, output, comments='left')[source]

merge_files merges the tab-delimited files named left and right, which may have commmented lines. the output is directed to the file named output.

There are few modes. If comments=’left’, comments in left are preserved. If comments=’right’, comments in right are preserved. If comments=’none’, no comments are preserved. If comments=’all’, all comments in left and right are appended to the beginning of output, although they may previously have been contained within the data in left or right.

Use merge_tab_files if you need to pass custom parameters to TabFile.

bioplus.tabfile.merge_tab_files(file1, file2, output_filename, comments='left')[source]

merge_tab_files merges the tab-delimited files represented by TabFile objects left and right, which may have commmented lines. the output is directed to the file named output.

There are few modes. If comments=’left’, comments in left are preserved.

If comments=’right’, comments in right are preserved.

If comments=’none’, no comments are preserved.

If comments=’all’, all comments in left and right are appended to the beginning of output, although they may previously have been contained within the data in left or right.

See also merge_files

bioplus.tabfile.shift_peaks(f, peak_lengths=2)[source]

shift_peaks takes a file f (foo.bed) and produces a new file (foo_shifted.bed) with all the sequences shifted (left) by peak_lengths times their length. If peak_lengths, is negative they are shifted to the right. comments are stripped

Previous topic

bioplus.sitefinder

Next topic

bioplus.wrappers

This Page