The module reflects the content of the R/Bioconductor package ShortRead. It defines Python-level classes for the R/S4 classes, and gives otherwise access to R-level commands the usual rpy2:robjects way.
>>> import bioc.shortread as shortread
>>> from rpy2.robjects.packages import importr
>>> base = importr('base')
Get the path to a FASTAQ file bundled with the R package. This is a small file, convenient to use as an example
>>> fp = base.system_file('extdata', package='ShortRead')
Now reading the data can used by calling directly the bioconductor functions.
>>> sr_pack = shortread.__rpackage__
>>> spath = sr_pack.SolexaPath(fp)
>>> rfq = sr_pack.readFastq(sr_pack.analysisPath(spath), pattern = "s_1_sequence.txt")
>>> rfq
<ShortReadQ - Python:0xca8376c / R:0xcd01cf8>
The class ShortReadQ has methods (mirroring the S4 methods in the R packages), and can be used as if an usual Python class.
For example, looking at the width for the first 10 reads:
>>> tuple(rfq.width[1:10])
(36, 36, 36, 36, 36, 36, 36, 36, 36)
Now on to reading aligned reads:
>>> aln = shortread.__rpackage__.readAligned(spath, "s_2_export.txt")
>>> aln
<AlignedRead - Python:0xce66a0c / R:0xd0c7f98>
Note
The function shortread.__rpackage__.readALined() can be specified the source of alignment data.
Reads can be accessed:
>>> aln.do_slot('sread')
<DNAStringSet - Python:0xce7a0ac / R:0xd0c9484>
Note
In bioconductor, sread() is not formally declared as method of class AlignedRead, so we access it as a slot.
Filters can be used to extract subsets of the reads:
>>> # chromosome 5
>>> chromfilter = shortread.__rpackage__.chromosomeFilter('chr5.fa')
>>> aln_chr5 = aln.rx(chromfilter(aln))
>>> # position interval 6E6-10E6
>>> posfilter = shortread.__rpackage__.positionFilter(min=6E6, max=10E6)
>>> print(aln_chr5.rx(posfilter(aln_chr_5)))
class: AlignedRead
length: 1 reads; width: 35 cycles
chromosome: 1
position: 6774915
strand: 1
alignQuality: NumericQuality
alignData varLabels: run lane ... filtering contig
The class inheritance diagram presents the parent-child relationships between the elements.
A module to model the ShortRead library in Bioconductor
Copyright 2009 - Laurent Gautier