Compares two genomic signal objects and outputs results as a bedGraph file. Can be used for entire genome-wide comparisons due to its parallel nature.
Typical usage would be to create genome-wide windows of equal size to provide as features:
windowsize = 10000
features = pybedtools.BedTool().window_maker(
genome='hg19', w=windowsize)
You will usually want to choose bins for the array based on the final resolution you would like. Say you would like 10-bp bins in the final bedGraph; using the example above you would use array_kwargs={‘bins’: windowsize/10}. Or, for single-bp resolution (beware: file will be large), use {‘bins’: windowsize}.
Here’s how it works. This function:
- Takes batchsize features at a time from features
- Constructs normalized (RPMMR) arrays in parallel for each input genomic signal object for those batchsize features
- Applies comparefunc (np.subtract by default) to the arrays to get a “compared” (e.g., difference matrix by default) for the batchsize features.
- For each row in this matrix, it outputs each nonzero column as a bedGraph format line in outfn
comparefunc is a function with the signature:
def f(x, y):
return z
where x and y will be arrays for signal1 and signal2 (normalized to RPMMR) and z is a new array. By default this is np.subtract, but another common comparefunc might be a log2-fold-change function:
def lfc(x, y):
return np.log2(x / y)
Parameters: |
|
---|