usage: tacl stats [-h] [-v] [-t {cbeta,pagel}] CORPUS RESULTS
Generate summary statistics for a set of results. This gives, for each
witness, the total number of tokens and the count of matching tokens, and
derived from these the percentage of the witness that is encompassed by the
matches.
positional arguments:
CORPUS Path to corpus.
RESULTS Path to CSV results.
optional arguments:
-h, --help show this help message and exit
-v, --verbose Display debug information; multiple -v options
increase the verbosity. (default: None)
-t {cbeta,pagel}, --tokenizer {cbeta,pagel}
Type of tokenizer to use. The "cbeta" tokenizer is
suitable for the Chinese CBETA corpus (tokens are
single characters or workaround clusters within square
brackets). The "pagel" tokenizer is for use with the
transliterated Tibetan corpus (tokens are sets of word
characters plus some punctuation used to transliterate
characters). (default: cbeta)