pyDNase - a library for analyzing DNase-seq data

https://travis-ci.org/jpiper/pyDNase.svg?branch=master https://coveralls.io/repos/jpiper/pyDNase/badge.svg?branch=master&service=github

Introduction

pyDNase is a suite of tools for analysing DNase-seq data - pyDNase comes with several analysis scripts covering several common use cases of DNase-seq analysis, and also an implementation of the Wellington, Wellington 1D, and Wellington-boostrap footprinting algorithms.

An easy-to-understand DNase-seq footprinting tutorial can be found here and full documentation can be accessed here

API

Many people currently analyzing DNase-seq data are using tools designed for ChIP-seq work, but may be inappropriate for DNase-seq data where one is less interested in the overlaps of sequenced fragments, but the site at which the cut occurs (the 5’ most end of the aligned sequence fragment).

pyDNase has an underlying API to interface with a sorted and indexed BAM file from a DNase-seq experiment, allowing efficient and easy random access of DNase-seq cut data from any genomic location, e.g.

>>> import pyDNase
>>> reads = pyDNase.BAMHandler(pyDNase.example_reads())
>>> reads["chr6,170863500,170863532,+"]
{'+': [0,0,0,1,0,0,1,1,2,0,0,0,0,1,0,0,0,0,1,0,0,0,0,1,0,0,1,1,0,0,0,1],
 '-': [0,10,1,0,1,0,4,9,0,1,0,2,1,0,0,0,0,0,3,0,6,3,0,0,0,1,1,1,3,0,3,6]}

Querying the BAMHandler object returns a dictionary containing lists with DNase cut counts on the positive reference strand (+), and cuts on the negative reference strand (-). pyDNase efficiently caches the cut data queried, so that multiple requests from the same genomic locations do not require repeated lookups from the BAM file (this can be disabled). See the full documentation for full details.

Installation

to install pyDNase, run:

$ pip install pyDNase

for full documentation go to: http://pythonhosted.org/pyDNase/

Support

If you’re having any troubles, please send an email to j.piper@me.com and I’ll do my best to help you out. If you notice any bugs, then please raise an issue over at the github repo. If you require more formal training on the analysis of DNase-seq or ATAC-seq data, I am available for consultancy. Likewise, if you are a commercial entity looking for a support contract, please get in touch.

Contributions

I highly encourage contributions! This is my first software development project - send any pull requests this way. I’m particularly interested in cool analysis scripts that anyone has written.

Reference

Note

If you use pyDNase or the Wellington algorithm in your work, please cite the following papers.

Piper et al. 2013. Wellington: A novel method for the accurate identification of digital genomic footprints from DNase-seq data, Nucleic Acids Research 2013; doi: 10.1093/nar/gkt850

Piper et al. 2015. Wellington-bootstrap: differential DNase-seq footprinting identifies cell-type determining transcription factors, BMC Genomics 2015; doi:10.1186/s12864-015-2081-4

License

Copyright (C) 2015 Jason Piper. This work is licensed under the GNU GPLv3 license, see LICENCE.TXT for details. If you require the use of this software under a difference license (e.g. for commercial purposes), please email me at j.piper@me.com.

Changelog

0.2.4 - 2016-05-30

  • BUG: Update author’s contact details

0.2.3 - 2016-01-17

  • BUG: Fix error in wellington_bootstrap.py which caused no output to be written to disk (thanks to Duy Pham for reporting)

0.2.2 - 2015-12-26

  • BUG: Fix error in wellington_bootstrap.py

0.2.1 - 2015-12-25

  • ENHANCEMENT: ATAC-seq mode for wellington_bootstrap.py

0.2.0 - 2015-12-18

  • FEATURE: Differential Footprinting between treatment and control (wellington_bootstrap.py)
  • FEATURE: Estimate 6-mer bias cleavage (dnase_bias_estimator.py)
  • FEATURE: ATAC-seq compatability! (pass -A to most of the scripts to enable ATAC-seq read shifting for the transposase staggered integration)
  • FEATURE: Preliminary DNase I cleavage bias correction in dnase_average_profile.py and dnase_to_javatreeview.py (see here for full documentation)
  • FEATURE: Calculate ∆DHS (He et al. 2012) scores between two DNase-seq runs (dnase_ddhs_scorer.py)
  • FEATURE: Annotate BED file number of cuts in each region (dnase_cut_counter.py)
  • ENHANCEMENT: DNase-seq footprinting tutorial :)
  • ENHANCEMENT: In the interests of performance, BAMHandler no longer returns NumPy arrays, just lists.
  • ENHANCEMENT: wellington_footprints.py is now multithreaded (woo) - performance roughly scales linearly with number of CPUs.
  • ENHANCEMENT: The Footprinting module has been refactored to allow for multithreading - those using the Wellington API directly take note of the changes.
  • ENHANCEMENT: Everything is faster! More cythonised code.

0.1.7 - 2014-09-03

  • BUG: Fixed bug in dnase_to_javatreeview.py that prevented caching from being disabled

0.1.6 - 2014-03-07

  • BUG: Fixed FDR calculation that was only performing 1 randomisation! Many thanks to @arjanvandervelde

0.1.5 - 2014-01-31

  • ENHANCEMENT: Removed SciPy requirement
  • ENHANCEMENT: Added Travis-CI automated builds for 2.6, 2.7 and 3.3 and coveralls
  • BUG: Fixed WIG output for UCSC compatibility

0.1.4 - 2014-01-05

  • BUG: Fix dodgy deployment

0.1.3 - 2014-01-05

  • BUG: Fixed Python 2.6 Compatibility (again!)

0.1.2 - 2013-12-09

  • BUG: Fix issue where BED intervals with chromosome names not starting with “c” were silently being ignored (reported by Aaron Hardin)
  • BUG: Fix clint dependency issue (no longer requires custom version of clint)
  • BUG: Fix spelling error in CHANGES

0.1.1 - 2013-12-05

  • BUG: Misc. small bug fixes
  • ENHANCEMENT: Fixed Python 2.6 Compatibility
  • ENHANCEMENT: Added JSON export script

0.1.0 - 2013-09-01

  • Initial Release