tacl align¶

usage: tacl align [-h] [-v] [-m MINIMUM] [-t {cbeta,pagel}]
                  CORPUS OUTPUT RESULTS

Generates an HTML report giving tables showing aligned sequences of text
between each witness within each label and all of the witnesses in the other
labels, within a set of results. This functionality is only appropriate for
intersect results.

positional arguments:
  CORPUS                Path to corpus.
  OUTPUT                Directory to output alignment files to.
  RESULTS               Path to CSV results; use - for stdin.

optional arguments:
  -h, --help            show this help message and exit
  -v, --verbose         Display debug information; multiple -v options
                        increase the verbosity. (default: None)
  -m MINIMUM, --minimum MINIMUM
                        Minimum size of n-gram to base sequences around.
                        (default: 20)
  -t {cbeta,pagel}, --tokenizer {cbeta,pagel}
                        Type of tokenizer to use. The "cbeta" tokenizer is
                        suitable for the Chinese CBETA corpus (tokens are
                        single characters or workaround clusters within square
                        brackets). The "pagel" tokenizer is for use with the
                        transliterated Tibetan corpus (tokens are sets of word
                        characters plus some punctuation used to transliterate
                        characters). (default: cbeta)

Due to encoding issues, you may need to set the environment variable
PYTHONIOENCODING to "utf-8".

This function requires the Biopython suite of software to be installed. It is
extremely slow and resource hungry when the overlap between two witnesses is
very great.
tacl align¶

Related Topics

This Page