API: Getting DNase-seq cut data from BAM files¶
At the heart of the pyDNase package is the BAMHandler
class, which provides an interfact to the cut data in a BAM file corresponding to a DNase-seq dataset. The interface is extremely simple:
>>> import pyDNase
>>> reads = pyDNase.BAMHandler("pyDNase/test/data/example.bam")
>>> reads["chr6,170863500,170863532,+"]
{'+': [0,0,0,0,1,0,0,1,1,2,0,0,0,0,1,0,0,0,0,1,0,0,0,0,1,0,0,1,1,0,0,0],
'-': [0,10,1,0,1,0,4,9,0,1,0,2,1,0,0,0,0,0,3,0,6,3,0,0,0,1,1,1,3,0,3,6]}
As you can see, querying the BAMHandler
object returns a dictionary containing arrays with cut count on the positive reference strand (+), and cuts on the negative reference strand (-). If you wanted to look at the cuts with reference to something on the opposite strand, you can rotate the data 180 degrees by passing a “-” flag,
>>> reads["chr6,170863500,170863532,-"]
{'+': [6,3,0,3,1,1,1,0,0,0,3,6,0,3,0,0,0,0,0,1,2,0,1,0,9,4,0,1,0,1,10,0],
'-': [0,0,0,1,1,0,0,1,0,0,0,0,1,0,0,0,0,1,0,0,0,0,2,1,1,0,0,1,0,0,0,0]}
By default, the BAMHandler caches lookups in 1000bp chunks. You can alter this behaviour at instanstiation. The BAMHandler also gives an interface to the Footprint Occupancy Score (FOS).
-
class
pyDNase.
BAMHandler
(filePath, caching=True, chunkSize=1000, ATAC=False)¶ The object that provides the interface to DNase-seq data help in a BAM file
-
FOS
(interval, bgsize=35)¶ Calculates the Footprint Occupancy Score (FOS) for a Genomicinterval. See Neph et al. 2012 (Nature) for full details.
- Args:
- interval (GenomicInterval): The interval that you want the FOS for
- Kwargs:
- bgsize (int): The size of the flanking region to use when calculating the FOS (default: 35)
- Returns:
- A float with the FOS - returns -1 if it can’t calculate it
-
__getitem__
(vals)¶ Wrapper for get_cut_values
-
__init__
(filePath, caching=True, chunkSize=1000, ATAC=False)¶ Initializes the BAMHandler with a BAM file
- Args:
- filePath (str): the path of a sorted, indexed BAM file from a DNase-seq experiment
- Kwargs:
- chunkSize (int): and int of the size of the regions to load if caching (default: 1000) caching (bool): enables or disables read caching (default: True)
- Raises:
- IOError
-