Bases: metaseq._genomic_signal.BaseSignal
Abstract class for bed, BAM and bigBed files.
Methods
array(features[, processes, chunksize, ragged]) | Creates an MxN NumPy array of genomic signal for the region defined by |
local_count(*args, **kwargs) | The count of genomic signal (typcially BED features) found within an interval. |
local_coverage(features, *args, **kwargs) | Returns a binned vector of coverage. |
Methods
__init__(fn) | Abstract class for bed, BAM and bigBed files. |
array(features[, processes, chunksize, ragged]) | Creates an MxN NumPy array of genomic signal for the region defined by |
local_count(*args, **kwargs) | The count of genomic signal (typcially BED features) found within an interval. |
local_coverage(features, *args, **kwargs) | Returns a binned vector of coverage. |
Creates an MxN NumPy array of genomic signal for the region defined by each feature in features, where M=len(features) and N=(bins or feature length)
Parameters: | features : iterable of interval-like objects
processes : int or None
chunksize : int
ragged : bool
|
---|
Notes
Additional keyword args are passed to local_coverage() which performs the work for each feature; see that method for more details.
The count of genomic signal (typcially BED features) found within an interval.
Usually this only makes sense for BED or BAM (not bigWig) files.
Parameters: |
|
---|
Returns a binned vector of coverage.
Computes a 1D vector of coverage at the coordinates for each feature in features, extending each read by fragmentsize bp.
Some arguments cannot be used for bigWig files due to the structure of these files. The parameters docstring below indicates whether or not an argument can be used with bigWig files.
Depending on the arguments provided, this method can return a vector containing values from a single feature or from concatenated features.
An example of the flexibility afforded by the latter case:
features can be a 3-tuple of pybedtools.Intervals representing (TSS + 1kb upstream, gene, TTS + 1kb downstream) and bins can be [100, 1000, 100]. This will return a vector of length 1200 containing the three genomic intervals binned into 100, 1000, and 100 bins respectively. Note that is up to the caller to construct the right axes labels in the final plot!
Parameters: | features : str, interval-like object, or list
bins : None, int, list
fragment_size : None or int
shift_width : int
read_strand : None or str
stranded : bool
use_score : bool
accumulate : bool
preserve_total : bool
method : str; one of [ “summarize” | “get_as_array” | “ucsc_summarize” ]
processes : int or None
|
---|---|
Returns: | 1-d NumPy array |
Notes
If a feature has a “-” strand attribute, then the resulting profile will be relative to a minus-strand feature. That is, the resulting profile will be reversed.
Returns arrays x and y. x is in genomic coordinates, and y is the coverage at each of those coordinates after extending fragments.
The total number of reads is guaranteed to be the same no matter how it’s binned.
(with ideas from http://www-huber.embl.de/users/anders/HTSeq/doc/tss.html)