Package cjklib :: Package reading
[hide private]
[frames] | no frames]

Package reading

source code

Provides the Chinese character reading based functions. This includes ReadingOperators used to handle basic operations like decomposing strings written in a reading into their basic entities (e.g. syllables) and for some languages getting tonal information, syllable onset and rhyme and other features. Furthermore it includes ReadingConverters which offer the conversion of strings from one reading to another.

All basic functionality can be accessed using the ReadingFactory which provides factory methods for creating instances of the supplied classes and also acts as a façade for the functions defined there.

Examples

The following examples should give a quick view into how to use this package.

Readings

Han-characters give only few visual hints about how they are pronounced. The big number of homophones further increases the problem of deriving the character's actual pronunciation from the given glyph. This module implements a framework and desirable functionality to deal with the characteristics of character readings.

From a programmatical view point readings in languages making use of Chinese characters differ in many ways. Some use the Roman alphabet, some have tonal information, some can be mapped character-wise, some map from one Chinese character to a sequence of characters in the target system while some map only to one character.

One mayor group in the topic of readings are romanisations, which are transcriptions into the Roman alphabet (Cyrillic respectively). Romanisations of tonal languages are a subgroup that ask for even more detailed functions. The interface implemented here tries to grasp similar factors on different abstraction levels while trying to maintain flexibility.

In the context of this library the term reading will refer to two things: the realisation of expressing the pronunciation (e.g. the specific romanisation) on the one hand, and the specific reading of a given character on the other hand.

Technical Implementation

While module characterlookup includes the functions for mapping a character to its potential reading, module reading is specialised on all functionality that is primarily connected to the reading of characters.

The main functions implemented here provide ways of handling text written in a reading and converting between different readings.

Handling text written in a reading

Text written in a character reading is special to other text, as it consists of entities which map to corresponding Chinese characters. They can be deduced from the text through breaking the whole string down into a sequence of single entities. This functionality is provided by all operators on readings by providing the interface ReadingOperator. The process of breaking input down (called decomposition) can be reversed by composing the single entities to a string.

Many ReadingOperators provide additional functions, each depending on the characteristics of the implemented reading. For readings of tonal languages for example they might allow to question the tone of the given reading of a character.

Class Hierarchy for operator.ReadingOperator
Class Hierarchy for operator.ReadingOperator

Converting between readings

The second part provided are means to provide support for conversion between different readings.

What all CJK languages seem to have in common is their irreversibility of the mapping from a character to its reading, as these languages are rich in homophones. Thus the highest degree in information for a text is obtained by the pair of characters and their reading (aside from the meaning).

If one has a text written in reading A and one wants to obtain the text written in B instead then it is not feasible to obtain the reading from the corresponding characters even if present, as many characters have several pronunciations. Instead one wants to convert the reading through conversion from A to B.

Simple means to convert between readings is provided by classes implementing ReadingConverter. This conversion might neither be surjective nor injective, and several exceptions can occur.

Class Hierarchy for converter.ReadingConverter
Class Hierarchy for converter.ReadingConverter

Configurable Reading Dialects

Many readings come in specific representations even if standardised. This may start with simple difference in type setting (e.g. punctuation) or include special entities and derivatives.

Instead of selecting one default form as a global standard cjklib lets the user choose the preferred dialect, though still trying to offer good default values. It does so by offering a wide range of options for handling and conversion of readings. These options can be given optionally in many places and are handed down by the system to the component knowing about this specific configuration option. Furthermore each class implements a method that states which options it uses by default.

A special notion of dialect converters is used for ReadingConverters that convert between two different representations of the same reading. These allow flexible switching between reading dialects.


To Do (Fix): Be independant on locale chosen, see http://docs.python.org/library/locale.html#background-details-hints-tips-and-caveats.

Submodules [hide private]

Classes [hide private]
  ReadingFactory
Provides an abstract factory for creating ReadingOperators and ReadingConverters and a façade to directly access the methods offered by these classes.