The module reflects the content of the R/Bioconductor package Biostrings. It defines Python-level classes for the R/S4 classes, and gives otherwise access to R-level commands the usual rpy2:robjects way.
The variable biostrings_env in the module is an rpy2.robjects.REnvironment for the modules namespace. Accessing explicitly a module’s object is then straightforward. Example:
>>> biostrings.biostrings_env['RNA_ALPHABET']
The class inheritance diagram is useful for having an overview of how (biological) strings are modelled.
A module to model the Biostrings library in Bioconductor
Copyright 2009-2010 - Laurent Gautier
Parameter: | x – a string of amino-acids |
---|
Biological string
Parameter: | x – a (biological) string |
---|
DNA string
Parameter: | x – a DNA string |
---|
“Masked” arbitrary string
Dictionnary of probes, that is dictionary of of rather short strings.
Create a preprocessed dictionnary of genomic patterns.
Parameter: | x – a string vector, and DNAStringSet, or an XStringViews with s DNAString subject |
---|
RNA string
Parameter: | x – an RNA string |
---|
‘Trusted-band’ (TB) probe dictionary
Arbitrary string
View on an arbitrary string
>>> import bioc.bsgenome
>>> genomes = bioc.bsgenome.__rpackage__.available_genomes()
>>> tuple(genomes)
('BSgenome.Amellifera.BeeBase.assembly4',
'BSgenome.Amellifera.UCSC.apiMel2',
'BSgenome.Athaliana.TAIR.01222004',
'BSgenome.Athaliana.TAIR.04232008',
'BSgenome.Btaurus.UCSC.bosTau3',
'BSgenome.Btaurus.UCSC.bosTau4',
'BSgenome.Celegans.UCSC.ce2',
'BSgenome.Cfamiliaris.UCSC.canFam2',
'BSgenome.Dmelanogaster.UCSC.dm2',
'BSgenome.Dmelanogaster.UCSC.dm3',
'BSgenome.Drerio.UCSC.danRer5',
'BSgenome.Ecoli.NCBI.20080805',
'BSgenome.Ggallus.UCSC.galGal3',
'BSgenome.Hsapiens.UCSC.hg17',
'BSgenome.Hsapiens.UCSC.hg18',
'BSgenome.Hsapiens.UCSC.hg19',
'BSgenome.Mmusculus.UCSC.mm8',
'BSgenome.Mmusculus.UCSC.mm9',
'BSgenome.Ptroglodytes.UCSC.panTro2',
'BSgenome.Rnorvegicus.UCSC.rn4',
'BSgenome.Scerevisiae.UCSC.sacCer1',
'BSgenome.Scerevisiae.UCSC.sacCer2')
The genome names can be passed to biocLite (see the introduction) for an automagic download and install of the corresponding genome package.
>>> tuple(bioc.bsgenome.__rpackage__.installed_genomes())
('BSgenome.Celegans.UCSC.ce2',
'BSgenome.Hsapiens.UCSC.hg18',
'BSgenome.Hsapiens.UCSC.hg19')
Installed genomes can be imported, since they are R packages.
>>> from rpy2.robjects.packages import importr
>>> ce2_genome = importr('BSgenome.Celegans.UCSC.ce2')
>>> ce2_genome.Celegans
<BSgenome - Python:0x2a80058 / R:0x4cbdf10>
>>> print(ce2_genome.Celegans.seqlengths)
chrI chrII chrIII chrIV chrV chrX chrM
15080483 15279308 13783313 17493791 20922231 17718849 13794
>>> ce2_genome.Celegans['chrI']
<DNAString - Python:0x2a80878 / R:0x53ac7e4>
The class inheritance diagram is useful for having an overview of how the representation of genomes is organized.
A module to model the BSgenome library in Bioconductor
Copyright 2009 - Laurent Gautier
Arbitrary string