dsegmenter.bparseg.align

Module providing methods for string alignment.

All of these methods take two iterables (can be either lists or strings) as arguments. The first iterable (L1) represents string or list to which alignment should be done, another iterable (L2) represents string or list which should be aligned. Optionally, penalties for different types of modifications (insertion, deletion, substitution) can be specified either as positional or as keyword arguments. These arguments should be functions which accept one (for insertion or deletion) or two (for substitution) characters (or any list elements) as arguments and return corresponding penalty scores for modifications. The output is another list which has length equal to the length of L1, where each element is a list of indices of L2 corresponding to given L1 elements.

Example

nw_align(“AGTACGCA”, “TCGC”)
=> [[], [], [0], [], [1], [2], [3], []]

this corresponds to the alignment

AGTACGCA
T CGC

Please also note that different algorithms may give different alignments for tie cases.

dsegmenter.bparseg.align.hb_align(s1, s2, insert=<function <lambda>>, delete=<function <lambda>>, substitute=<function <lambda>>, offset=0, keep_deleted=False)[source]

Align two iterables using Hirschberg alignment algorithm.

Complexity: \(O(nm)\) time; \(O(min(n, m))\) space

Parameters:
  • s1 (iterable) – iterable for alignment
  • s2 (iterable) – iterable which should be aligned
  • insert (lambda) – function returning penalty for insertion (default -2)
  • delete (lambda) – function returning penalty for deletion (default -2)
  • substitute (lambda) – function returning penalty for substitution (default -1)
  • offset (int) – add offset to each aligned index of second iterable (this option is only needed for recursive alignmnet of substrings or sublists)
  • keep_deleted (bool) – return indices of deleted words too
Returns:

indices of s2 corresponding to given positions of s1

Return type:

list

dsegmenter.bparseg.align.nw_align(s1, s2, insert=<function <lambda>>, delete=<function <lambda>>, substitute=<function <lambda>>, offset=0, keep_deleted=False)[source]

Align two iterables using Needleman-Wunsch algorithm.

Complexity: \(O(nm)\) time; \(O(nm)\) space

Parameters:
  • s1 (iterable) – iterable for alignment
  • s2 (iterable) – iterable which should be aligned
  • insert (lambda) – penalty for insertion (default -2)
  • delete (lambda) – penalty for deletion (default -2)
  • substitute (lambda) – penalty for substitution (default -1)
  • offset (int) – add offset to each aligned index of second iterable (this option is only needed for recursive alignmnet of substrings or sublists)
  • keep_deleted (bool) – return indices of deleted words too
Returns:

indices of s2 corresponding to given positions in s1

Return type:

list