usage: tacl-helper work-against-corpus [-h] [-v] [-m] [-r RAM]
[-t {cbeta,pagel}]
DATABASE CORPUS FILES_LIST
CORPUS_FILES_LIST OUTPUT_DIR
Generate a script to compare each work in a corpus against all the works in
another corpus.
positional arguments:
DATABASE Path to database file.
CORPUS Path to corpus.
FILES_LIST File containing corpus work names to compare (one per
line).
CORPUS_FILES_LIST File containing corpus work names to be compared
against (one per line).
OUTPUT_DIR Output directory for script and catalogue files.
optional arguments:
-h, --help show this help message and exit
-v, --verbose Display debug information; multiple -v options
increase the verbosity. (default: None)
-m, --memory Use RAM for temporary database storage. This may cause
an out of memory error, in which case run the command
without this switch. (default: False)
-r RAM, --ram RAM Number of gigabytes of RAM to use. (default: 3)
-t {cbeta,pagel}, --tokenizer {cbeta,pagel}
Type of tokenizer to use. The "cbeta" tokenizer is
suitable for the Chinese CBETA corpus (tokens are
single characters or workaround clusters within square
brackets). The "pagel" tokenizer is for use with the
transliterated Tibetan corpus (tokens are sets of word
characters plus some punctuation used to transliterate
characters). (default: cbeta)