tacl-helper work-against-corpus¶

usage: tacl-helper work-against-corpus [-h] [-v] [-m] [-r RAM]
                                       [-t {cbeta,pagel}]
                                       DATABASE CORPUS FILES_LIST
                                       CORPUS_FILES_LIST OUTPUT_DIR

Generate a script to compare each work in a corpus against all the works in
another corpus.

positional arguments:
  DATABASE              Path to database file.
  CORPUS                Path to corpus.
  FILES_LIST            File containing corpus work names to compare (one per
                        line).
  CORPUS_FILES_LIST     File containing corpus work names to be compared
                        against (one per line).
  OUTPUT_DIR            Output directory for script and catalogue files.

optional arguments:
  -h, --help            show this help message and exit
  -v, --verbose         Display debug information; multiple -v options
                        increase the verbosity. (default: None)
  -m, --memory          Use RAM for temporary database storage. This may cause
                        an out of memory error, in which case run the command
                        without this switch. (default: False)
  -r RAM, --ram RAM     Number of gigabytes of RAM to use. (default: 3)
  -t {cbeta,pagel}, --tokenizer {cbeta,pagel}
                        Type of tokenizer to use. The "cbeta" tokenizer is
                        suitable for the Chinese CBETA corpus (tokens are
                        single characters or workaround clusters within square
                        brackets). The "pagel" tokenizer is for use with the
                        transliterated Tibetan corpus (tokens are sets of word
                        characters plus some punctuation used to transliterate
                        characters). (default: cbeta)
tacl-helper work-against-corpus¶

Related Topics

This Page