twobitreader: a fast python package for reading .2bit files

twobitreader

Licensed under Perl Artistic License 2.0 No warranty is provided, express or implied

class twobitreader.TwoBitFile(foo)[source]

python-level reader for .2bit files (i.e., from UCSC genome browser) (note: no writing support)

TwoBitFile inherits from dict You may access sequences by name, e.g. >>> genome = TwoBitFile(‘hg18.2bit’) >>> chr20 = genome[‘chr20’]

Sequences are returned as TwoBitSequence objects You may access intervals by slicing or using str() to dump the entire entry e.g. >>> chr20[100100:100120] ‘ttttcctctaagataatttttgccttaaatactattttgttcaatactaagaagtaagataacttccttttgttggta tttgcatgttaagtttttttcc’ >>> whole_chr20 = str(chr20)

Fair warning: dumping the entire chromosome requires a lot of memory

See TwoBitSequence for more info

sequence_sizes()[source]

returns a dictionary with the sizes of each sequence

exception twobitreader.TwoBitFileError(msg)[source]

Base exception for TwoBit module

class twobitreader.TwoBitSequence(file_handle, offset, file_size, byteswapped=False)[source]

A TwoBitSequence object refers to an entry in a TwoBitFile

You may access intervals by slicing or using str() to dump the entire entry e.g. >>> genome = TwoBitFile(‘hg18.2bit’) >>> chr20 = genome[‘chr20’] >>> chr20[100100:100200] # slicing returns a string ‘ttttcctctaagataatttttgccttaaatactattttgttcaatactaagaagtaagataacttccttttgttggta tttgcatgttaagtttttttcc’ >>> whole_chr20 = str(chr20) # get whole chr as string

Fair warning: dumping the entire chromosome requires a lot of memory

Note that we follow python/UCSC conventions: Coordinates are 0-based, end-open (Note: The UCSC web-based genome browser uses 1-based closed coordinates) If you attempt to access a slice past the end of the sequence, it will be truncated at the end.

Your computer probably doesn’t have enough memory to load a whole genome but if you want to string-ize your TwoBitFile, here’s a recipe:

x = TwoBitFile(‘my.2bit’) d = x.dict() for k,v in d.iteritems(): d[k] = str(v)

get_slice(min_, max_=None)[source]

get_slice returns only a sub-sequence

twobitreader.base_to_bin(x)[source]

provided for user convenience convert a nucleotide to its bit representation

twobitreader.bits_to_base(x)[source]

convert integer representation of two bits to correct base

twobitreader.byte_to_bases(x)[source]

convert one byte to the four bases it encodes

twobitreader.cmdline_reader()[source]

cmdline_reader allows twobitreader module to be executed as a script accepts only one argument – the .2bit filename reads input (BED format) from stdin writes output (FASTA format) to stdout writes errors/warning to stderr

Regions should be given in BED format on stdin chrom start(0-based) end(0-based, not included)

To use a BED file of regions, do python -m twobitreader example.2bit < example.bed

Non-regions will be skipped and warnings will be issued to logging (logging output to stderr by default)

twobitreader.create_byte_table()[source]

create BYTE_TABLE

twobitreader.create_twobyte_table()[source]

create TWOBYTE_TABLE

twobitreader.print_specification()[source]

Prints the twoBit file format specification I got from the Internet. This is only here for reference

twobitreader.split16(x)[source]

split a 16-bit number into integer representation of its course and fine parts in binary representation

twobitreader.true_long_type()[source]

OS X uses an 8-byte long, so make sure L (long) is the right size and switch to I (int) if needed

twobitreader.twobit_reader(twobit_file, input_stream=None, write=None)[source]

twobit_reader takes a twobit_file (of class TwoBitFile) and an “input_stream” which can be any iterable (incl. file-like objects) writes output (FASTA format) using write (print if write=None) logs errors/warning to stderr

Regions should be given in BED format on stdin chrom start(0-based) end(0-based, not included)

To use a BED file of regions, do python -m twobitreader example.2bit < example.bed

Non-regions will be skipped and warnings will be issued to logging (logging output to stderr by default)

Previous topic

twobitreader: a fast python package for reading .2bit files

Next topic

twobitreader.download

This Page