bioplus.sitefinder

tools for dealing with binding sites (instances of sequence motifs)

an equivalent to Motif.search_pwm()

bioplus.sitefinder.find_sites(peaks_file, fasta_file, motif, bed=True, xls=False, output_dir=None, motif_type='MEME', src_fnc='find_sites', bysummit=False, **kwargs)[source]

findSites(peaks_file,FASTAfile,motif) takes the NAME_peaks.xls file outputed by MACS, as well as a FASTAfile, and finds instances of the motif specified by motif (a Bio.Motif object). It will output two new files for peaks and sites called NAME.peaks.info and NAME.sites.info. It will also create files called NAMES.peaks.bed and NAME.sites.bed which are proper BED files (scores are tag density, and information content, respectively). All files are 0-based, half-open in line with the BED convention. MACS coordinates are corrected accordingly.

f.peaks.info contains Peak (1) chr, (2) start (3) end (4) Peak ID (5) Relative summit (6) Number of unique tags in peak region (7) -10*log10(pvalue) (8) fold_enrichment (9) FDR (10) # motif instances found (11) Total Ri for discovered motif instances (12) Greatest Ri of any motif in peak region (13) Sequence of that motif instance (14) Position (offset) of that motif (left-end)

f.peaks.bed contains Peak (1) chr, (2) start (3) end (4) Peak ID (5) Number of unique tags in peak region (6) Strand . (7) Summit position (absolute) (8) Summit position + 1

f.sites.info contains Site (1) chr (2) start (3) end (4) Unique Site ID (internally generated) (5) The motif information content Ri, in bits (6) motif orientation, best score (+) or (-) —- BED file ends here —- (7) the motif sequence (e.g., ACAACA) (8) Position (offset) of that motif (left-end) (9) peak ID, fetched from MACS (10) used peak length (11) true peak length (11) peak summit offset

bioplus.sitefinder.search_peak(peak_ID, peak, peakseq, motif, bysummit=False)[source]

provide information about matches to a motif in a peak region, and about the region

peak MUST provide EITHER (1) the following public methods chrom = reference (e.g. chr1, chrX) chromStart = start coordinate, 0-based chromEnd = end coordinate, open or (2) the following public method coordinates = a tuple containing (chrom, chomStart, chromEnd)

peak may optionally provide the following methods tags (if not found, we will replace with ‘NA’) summit (if not found, we will use the peak center) misc (a list of anything else)

For each peak, the best motif hit is returned where best is defined as the motif hit with the most information and closest to the center (in the case of ties)

Note site position is 0-based, in contrast with earlier versions of biotools

returns a tuple of four things: peak info peak BED row a list of info about sites (motif matches) a list of BED rows for sites

Previous topic

bioplus.seqtools

Next topic

bioplus.tabfile

This Page