Working with 1D intervals ========================= .. :py:module:: ngs_plumbing.intervals In genomics, DNA is most often though of as a one-dimensional structure: a sequence of DNA bases (with the extra twist from microbiologist that the sequence can be circular). Features on DNA are identified by their "geographical" location, that is a pair of coordinates: a beginning and an end. The idea of the module it to handle anything that has the interval protocol, that is a :py:attr:`begin` and an :py:attr:`end`. The :py:class:`Interval` is a minimal implementation of such an object. The second important concept in the module is that there are iterables of intervals, preferably ordered on their :py:attr:`begin`. The :py:class:`IntervalList` is an implementation of such a structure as a Python :py:class:`list` of intervals. In-place sorting happens simply with: >>> from ngs_plumbing.intervals import Interval, IntervalList >>> itl = IntervalList(Interval(x, y) for x, y in ((3,10),(1,7),(12,16)) >>> itl.sort() .. note:: The fuss about sorting has a reason: for several operations, working on intervals sorted on their :py:attr:`begin` coordinate reduces the complexity to O(n) (to which the complexity of sorting should be added). Readers familiar with `samtools` will remember that there is a `sort` command. Union of intervals in a list ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Collapsing intervals in a :py:class:`IntervalList` means to reduce all overlapping intervals to the outer coordinates (think of like of an union of the regions defined in the list) >>> from ngs_plumbing.intervals import Interval, IntervalList >>> itl = IntervalList(Interval(x, y) for x, y in ((3,10),(1,7),(12,16)) >>> itl.sort() # collapse assumes sorting >>> cf = IntervalList.collapse_iter(itl) >>> cf = IntervalList(cf) Depth: how many times a base is covered by an interval ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ `samtools` (exposed to Python through the :py:mod:`pysam` package) has `pileup`. Again, we have here a more generic interface (a "protocol" in Python lingo) that will take anything that as a :py:attr:`begin` and an :py:attr:`end`. >>> from ngs_plumbing.intervals import Interval, IntervalList >>> itl = IntervalList(Interval(x, y) for x, y in ((3,10),(1,7),(12,16)) >>> itl.sort() # depthfilter assumes sorting >>> df = IntervalList.depthfilter_iter(itl, 2) >>> itl_f = IntervalList(df)