tacl stats¶

usage: tacl stats [-h] [-v] [-t {cbeta,pagel}] CORPUS RESULTS

Generate summary statistics for a set of results. This gives, for each
witness, the total number of tokens and the count of matching tokens, and
derived from these the percentage of the witness that is encompassed by the
matches.

positional arguments:
  CORPUS                Path to corpus.
  RESULTS               Path to CSV results.

optional arguments:
  -h, --help            show this help message and exit
  -v, --verbose         Display debug information; multiple -v options
                        increase the verbosity. (default: None)
  -t {cbeta,pagel}, --tokenizer {cbeta,pagel}
                        Type of tokenizer to use. The "cbeta" tokenizer is
                        suitable for the Chinese CBETA corpus (tokens are
                        single characters or workaround clusters within square
                        brackets). The "pagel" tokenizer is for use with the
                        transliterated Tibetan corpus (tokens are sets of word
                        characters plus some punctuation used to transliterate
                        characters). (default: cbeta)
tacl stats¶

Related Topics

This Page