tacl sdiff¶

usage: tacl sdiff [-h] [-v] [-t {cbeta,pagel}] [-m] [-r RAM] -d DATABASE -l
                  LABELS [LABELS ...] -s RESULTS [RESULTS ...]

List n-grams unique to each set of results (as defined by the specified
results files).

optional arguments:
  -h, --help            show this help message and exit
  -v, --verbose         Display debug information; multiple -v options
                        increase the verbosity. (default: None)
  -t {cbeta,pagel}, --tokenizer {cbeta,pagel}
                        Type of tokenizer to use. The "cbeta" tokenizer is
                        suitable for the Chinese CBETA corpus (tokens are
                        single characters or workaround clusters within square
                        brackets). The "pagel" tokenizer is for use with the
                        transliterated Tibetan corpus (tokens are sets of word
                        characters plus some punctuation used to transliterate
                        characters). (default: cbeta)
  -m, --memory          Use RAM for temporary database storage.
                        
                        This may cause an out of memory error, in which case
                        run the command without this switch. (default: False)
  -r RAM, --ram RAM     Number of gigabytes of RAM to use. (default: 3)
  -d DATABASE, --db DATABASE
                        Path to database file. (default: None)
  -l LABELS [LABELS ...], --labels LABELS [LABELS ...]
                        Labels to be assigned in order to the supplied
                        results. (default: None)
  -s RESULTS [RESULTS ...], --supplied RESULTS [RESULTS ...]
                        Paths to results files to be used in the query.
                        (default: None)

The number of labels supplied must match the number of results files. The
first label is assigned to all results in the first results file, the second
label to all results in the second results file, etc. The labels specified in
the results files are replaced with the supplied labels in the output.

examples:

    tacl sdiff -d cbeta2-10.db -l A B -s results1.csv results2.csv > output.csv
tacl sdiff¶

Related Topics

This Page