bioplus.motif

class bioplus.motif.CharDict(dict={})[source]

CharDict is a generalized dictionary that inherits defaultdict. default values are 0. Accepts any keys.

uncertainty()[source]

calculates the uncertainty H in this position (specified by CharDict). reats 0*log(0) as 0

bioplus.motif.KL(p, q)[source]

returns a list of the KL divergence (relative entropy) at each position from positionmatrix p to positionmatrix q. use sum() for the sum

class bioplus.motif.PositionWeightMatrix(n=None)[source]

Stores counts of nucleotide bases at each position. objects are immutable. sequences may be added to the counts, but the object may not be modified in situ

Rs()[source]

returns the Schneider Rs value, which is the expectation of Ri over all possible sequences, calculated as the sum of 2-uncertainty.

count_file(seqsFile, n=0)[source]

uses a tabFile with a list of sequences, in column n (by default n=0, the first column) and extracts counts

count_seqs(L, debug=False)[source]

adds a list of sequences to the counts

import_from_MEME(filename, n=1, mode='biotools')[source]

imports a motif from the output of MEME (meme.txt)

if there are multiple motifs in the output, we will use motif n (the first is n=1, which is also the default)

make_Ri()[source]

changes from counts or probabilities to Ri, information content

make_probs(trueProbs=False)[source]

normalizes everything to 1

rc()[source]

returns the reverse complement of this object

seq_Ri(s)[source]

seqRi returns the information content Ri in bits of a sequences, as measured with the given positionmatrix

uncertainty()[source]

returns the uncertainty H(l) of the matrix as a list. Use sum() for the total uncertainty.

Note: this function calls uncertainty() from the baseDict instance, and as such it can be overwritten implicitly. baseDict.uncertainty() treats 0*log(0) as 0

bioplus.motif.center_region(f, max_dist=75, motif_length=17)[source]

returns a function that specifies whether a given motif is in +/- x bp from the peak_center

requires the tabFile object f to determine the indices properly

bioplus.motif.count_letters(L)[source]
bioplus.motif.count_spacers_from_info(foo, cutoff=None, region_rule=None, region_width=None, spacer_offset=8, spacer_length=3, output_file=None)[source]

count spacers from a .sites.info or .peaks.info file

optionally you may supply cutoff, a minimum cutoff (float or int) region_rule, a function that selects the column

bioplus.motif.joint_matrix(sites)[source]

takes as input a filename and returns the joint Rate matrix for the list of sequences contained in that file

Joint rates R(X;Y_ are defined as R(X;Y) = - sum over X,Y p(x,y) * I(X;Y) I(X;y) = - sum over X,Y p(x,y) * log2[p(x,y)/(p(x)p(y))]

bioplus.motif.spacerGC(L, spacerOffset=6, spacerLength=3)[source]

spacerGC takes as input a list of [15 bp GBSs (strings)] and returns the number of sequences that have 0,1,2,3 G/Cs in the 3 bp spacer as an array in that order

Previous topic

bioplus.genometools

Next topic

bioplus.peaktools

This Page