tools for dealing with files that have tabular data
A BED file is a type of TabFile, but also defines a method for working with rows. rows are given as instances of BedRow, instead of lists. BEDrows inerhit all list methods and therefore are compatible with write_row. BedRow has additional methods for chrom, chromStart, chromEnd, etc. For more info, see BedRow
track, browser lines are treated as comments
Assumes track row is a comment. Use getTrackLine to see the track info
BEDrows are list, but you can access their chromStart(), chromEnd(), etc. use help for a full list. Uses the same conventions as http://genome.ucsc.edu/FAQ/FAQformat#format1. Note that only the first three entries (chrom, chromStart, chromEnd) are required, so the others may not be defined.
returns a comma-separated list of the block sizes. The number of items in this list should correspond to blockCount.
returns a comma-separated list of block starts. All of the blockStart positions should be calculated relative to chromStart. The number of items in this list should correspond to blockCount.
returns the name of the chromosome (e.g. chr3, chrY, chr2_random) or scaffold (e.g. scaffold10671)
returns the ending position of the feature in the chromosome or scaffold. The chromEnd base is not included in the display of the feature. For example, the first 100 bases of a chromosome are defined as chromStart=0, chromEnd=100, and span the bases numbered 0-99
returns the starting position of the feature in the chromosome or scaffold. The first base in a chromosome is numbered 0
returns itemRgb, An RGB value of the form R,G,B (e.g. 255,0,0). If the track line itemRgb attribute is set to “On”, this RBG value will determine the display color of the data contained in this BED line. NOTE: It is recommended that a simple color scheme (eight colors or less) be used with this attribute to avoid overwhelming the color resources of the Genome Browser and your Internet browse
returns the name of the BED line. This label is displayed to the left of the BED line in the Genome Browser window when the track is open to full display mode or directly to the left of the item in pack mode.
returns the score, a number between 0 and 1000. If the track line useScore attribute is set to 1 for this annotation data set, the score value will determine the level of gray in which this feature is displayed (higher numbers = darker gray)
For bzip2-compressed tab-delimited files
See Tabfile for usage info
For gzip-compressed tab-delimited files
See Tabfile for usage info
A MACS file is a type of TabFile, but also defines a method for working with rows. rows are given as instances of MACSRow, instead of lists. MACSrows inerhit all list methods and therefore are compatible with write_row. MacsRow has additional methods for chrom, chromStart, chromEnd, etc. For more info, see MacsRow
MACSrows are list, but you can access their features as follows:
chr() or chrom() – chromosome name start() or chromStart() – start position, start() is 1-based, chromStart is 0-based (BED) end() or chromEnd() – end position, equivalent but chromEnd (BED) is defined as 0-based, exclusive length() – length summit() – position of summit tags() – number of unique tags in the peak region pvalue() – returns the -10*log10(pvalue) fold_enrichment – returns the fold enrichment FDR – returns the FDR in %
returns the FDR (%). preserves the str to eliminate rounding error. use type=float to get a decimal value
returns the name of the chromosome (e.g. chr3, chrY, chr2_random) or scaffold (e.g. scaffold10671)
returns the name of the chromosome (e.g. chr3, chrY, chr2_random) or scaffold (e.g. scaffold10671)
returns the ending position of the feature in the chromosome or scaffold. The chromEnd base is not included in the display of the feature. For example, the first 100 bases of a chromosome are defined as chromStart=0, chromEnd=100, and span the bases numbered 0-99
returns the starting position of the feature in the chromosome or scaffold. The first base in a chromosome is numbered 0
returns the fold_enrichment vs control. preserves the str to eliminate rounding error. use type=float to get a decimal value
returns the -10*log10(pvalue). preserves the str to eliminate rounding error. use type=float to get a decimal value
returns the starting position of the feature in the chromosome or scaffold. The first base in a chromosome is numbered 1
returns the number of unique tags in the peak region
comments=[], column_names = False)
TabFile is a class for handling tab-delimited files.
Use convert_spaces=False if you’re file is tab-delimited and you wish to preserve other whitespace.
TabFile suports commented lines. Commented lines are not recognzied as part of the table
By default, only lines beginning with ‘#’ will be recognized as comments (not part of the table). You may specify a list of additional keywords using comments=[‘keyword1’,’keyword2’,etc.]. All lines containing that keyword will be recognized as a comment. keywords may be regular expressions.
returns a dictionary which gives the index corresponding to a particular column name
mode can be overriden here but defaults to TabFile.mode
acts just like the built-in open method in the file class. use write=True to write to a file, otherwise it will be opened in read-only mode
process_tables2 writes a new file (name is specified with new_file), which applies a user-defined function fnc to each row of data in the original file. fnc should yield a row (i.e. a list, array, or something else finitely iterable).
process_tables2 preserves all commented lines and also the line which column names, if applicable. The user may specify new column names using column_names (a list or other finite iterable), or we will use the old column_names, which might not preserve the column labels if columns were inserted in the middle of the table
returns a list of items in column n (numbering starts at 0) as items rather than lists. Unlike read_cols and read_table, elements of the read_col list are not lists, but strings
read_cols behaves like read_col but instead of taking a single column number (numbering starts at 0), it takes a list of column numbers and returns a list of partial rows, where each partial row is a list with entries from the appropriate columns IN THE ORDER SPECIFIED.
tip: use range() to create lists of ordered integers. e.g., range(2,6)=[2,3,4,5]
returns the next (or first) line that is not a comment, parsed. uses __iter__ as a generator, and simply returns the next value
read_row is deprecated. Use x = self.__iter__() and x.next()
returns the contents of a file as a list of rows (with each row as a list). will ignore any lines that begin with a “#” symbol and truncate any lines that contain a “#” symbol
reads one line and returns it. uses a generator, and will raise StopIteration if it reaches the EOF.
readline is deprecated. use x = self.__rawiter__() and x.next()
writes a string directly to a file, without modification (user must supply n if desired)
Writes a list to the file as a line (Tab-delimited). A different separator may also be specified with separator=’x’. (Note: uses file writelines method)
merge_files merges the tab-delimited files named left and right, which may have commmented lines. the output is directed to the file named output.
There are few modes. If comments=’left’, comments in left are preserved. If comments=’right’, comments in right are preserved. If comments=’none’, no comments are preserved. If comments=’all’, all comments in left and right are appended to the beginning of output, although they may previously have been contained within the data in left or right.
Use merge_tab_files if you need to pass custom parameters to TabFile.
merge_tab_files merges the tab-delimited files represented by TabFile objects left and right, which may have commmented lines. the output is directed to the file named output.
There are few modes. If comments=’left’, comments in left are preserved.
If comments=’right’, comments in right are preserved.
If comments=’none’, no comments are preserved.
If comments=’all’, all comments in left and right are appended to the beginning of output, although they may previously have been contained within the data in left or right.
See also merge_files