usage: tacl-helper validate-catalogue [-h] [-v] [-t {cbeta,pagel}]
CORPUS CATALOGUE
Report any errors in the specified catalogue file. Errors that can be detected
are referencse to works that do not exist in the specified corpus and the same
work being listed more than once.
positional arguments:
CORPUS Path to corpus.
CATALOGUE Path to catalogue file.
optional arguments:
-h, --help show this help message and exit
-v, --verbose Display debug information; multiple -v options
increase the verbosity. (default: None)
-t {cbeta,pagel}, --tokenizer {cbeta,pagel}
Type of tokenizer to use. The "cbeta" tokenizer is
suitable for the Chinese CBETA corpus (tokens are
single characters or workaround clusters within square
brackets). The "pagel" tokenizer is for use with the
transliterated Tibetan corpus (tokens are sets of word
characters plus some punctuation used to transliterate
characters). (default: cbeta)