usage: tacl search [-h] [-v] [-m] [-r RAM] [-t {cbeta,pagel}] [-c CATALOGUE]
DATABASE CORPUS NGRAMS
List witnesses containing at least one of the supplied n-grams, along with a
total count of how many occurrences of the n-grams are present in each
witness, and the number of n-grams that match in each witness.
Specifying a catalogue file will not restrict the search to only those
labelled works, but rather adds the labels to any appropriate witnesses in the
results.
positional arguments:
DATABASE Path to database file.
CORPUS Path to corpus.
NGRAMS Path to file containing list of n-grams to search for,
with one n-gram per line.
optional arguments:
-h, --help show this help message and exit
-v, --verbose Display debug information; multiple -v options
increase the verbosity. (default: None)
-m, --memory Use RAM for temporary database storage.
This may cause an out of memory error, in which case
run the command without this switch. (default: False)
-r RAM, --ram RAM Number of gigabytes of RAM to use. (default: 3)
-t {cbeta,pagel}, --tokenizer {cbeta,pagel}
Type of tokenizer to use. The "cbeta" tokenizer is
suitable for the Chinese CBETA corpus (tokens are
single characters or workaround clusters within square
brackets). The "pagel" tokenizer is for use with the
transliterated Tibetan corpus (tokens are sets of word
characters plus some punctuation used to transliterate
characters). (default: cbeta)
-c CATALOGUE, --catalogue CATALOGUE
Path to catalogue file. (default: None)