Guide to TACL¶
Works and witnesses¶
TACL operates on named works, each of which consists of one or more plain text files. These files are stored in subdirectories (named after the work) of the corpus directory. The work name is what is used in catalogue files, and referenced in the “work” field in query results.
Every work consists of one or more witnesses, each a file in the work’s subdirectory. The filename of each witness (minus the .txt extension) is referenced in query results in the “siglum” field.
Each witness consists of the full textual content of that witness. In the case of the CBETA corpus, this full text is derived from the marked up variant readings in the source TEI XML.
All witnesses are automatically included in a query when a work is labelled in a catalogue.
Results¶
TACL outputs query results in comma-separated values (CSV) format. Each record (line) represents the occurrence of an n-gram in a witness. The fields in the results are:
ngram
The n-gram that is present in the witness
size
The size (or degree) of the n-gram
work
The name of the work in which the n-gram was found
siglum
The identifier of the particular witness of the work that bears
the n-gram
count
The number of times the n-gram occurs in the witness
label
The label that was assigned to the work in the catalogue file
used in making the query