usage: tacl sdiff [-h] [-v] [-t {cbeta,pagel}] [-m] [-r RAM] -d DATABASE -l
LABELS [LABELS ...] -s RESULTS [RESULTS ...]
List n-grams unique to each set of results (as defined by the specified
results files).
optional arguments:
-h, --help show this help message and exit
-v, --verbose Display debug information; multiple -v options
increase the verbosity. (default: None)
-t {cbeta,pagel}, --tokenizer {cbeta,pagel}
Type of tokenizer to use. The "cbeta" tokenizer is
suitable for the Chinese CBETA corpus (tokens are
single characters or workaround clusters within square
brackets). The "pagel" tokenizer is for use with the
transliterated Tibetan corpus (tokens are sets of word
characters plus some punctuation used to transliterate
characters). (default: cbeta)
-m, --memory Use RAM for temporary database storage.
This may cause an out of memory error, in which case
run the command without this switch. (default: False)
-r RAM, --ram RAM Number of gigabytes of RAM to use. (default: 3)
-d DATABASE, --db DATABASE
Path to database file. (default: None)
-l LABELS [LABELS ...], --labels LABELS [LABELS ...]
Labels to be assigned in order to the supplied
results. (default: None)
-s RESULTS [RESULTS ...], --supplied RESULTS [RESULTS ...]
Paths to results files to be used in the query.
(default: None)
The number of labels supplied must match the number of results files. The
first label is assigned to all results in the first results file, the second
label to all results in the second results file, etc. The labels specified in
the results files are replaced with the supplied labels in the output.
examples:
tacl sdiff -d cbeta2-10.db -l A B -s results1.csv results2.csv > output.csv