Script cjknife :: Class CharacterInfo
[hide private]
[frames] | no frames]

Class CharacterInfo

Provides lookup method services.

Instance Methods [hide private]
 
__init__(self, locale=None, readingN=None, dictionary=None)
Initialises the CharacterInfo object.
str
guessCharacterLocale(self)
Guesses the best suited character locale using the user's locale settings.
str
guessReading(self)
Guesses the best suited reading using the user's locale settings.
list of str
getAvailableDictionaries(self)
Gets a list of available dictionaries supported.
 
hasDictionary(self)
dict
getReadingOptions(self, string, readingN)
Guesses the reading options using the given string to support reading dialects.
list of list of str
getReadingEntities(self, string, readingN=None)
Gets all possible decompositions for the given string.
list of str
getSearchReading(self, entities)
Prepares the given reading entities for a database search.
list of tuple
convertDictionaryResult(self, result)
Converts the readings of the given dictionary result to the default reading.
list of list of str
getEquivalentCharTable(self, componentList, includeEquivalentRadicalForms=True)
Gets a list structure of equivalent chars for the given list of characters.
bool
isSemanticVariant(self, char, variants)
Checks if the character is a semantic variant form of the given characters.
str
convertReading(self, readingString, fromReading, toReading=None)
Converts a string in the source reading to the given target reading.
list of str
getCharactersForKangxiRadicalIndex(self, radicalIndex)
Gets all characters for the given Kangxi radical index grouped by their residual stroke count.
list of str
getCharactersForReading(self, readingString, readingN=None)
Gets all know characters for the given reading.
list of list of str
getReadingForCharacters(self, charString)
Gets a list of readings for a given character string.
list of list of str
getSimplified(self, charString)
Gets the Chinese simplified character representation for the given character string.
list of list of str
getTraditional(self, charString)
Gets the traditional character representation for the given character string.
 
searchDictionaryExact(self, searchString, readingN=None, limit=None)
Searches the dictionary for exact matches to the given string.
 
searchDictionaryContaining(self, searchString, readingN=None, position='c', limit=None)
Searches the dictionary for matches containing the given string.
list of tuple
getCharactersForComponents(self, componentList, includeEquivalentRadicalForms=True)
Gets all characters that contain the given components.
dict
getCharacterInformation(self, char)
Get the basic information for the given character.
Class Variables [hide private]
  LANGUAGE_CHAR_LOCALE_MAPPING = {'ja': 'J', 'ko': 'K', 'vi': 'V...
Mapping table for locale to default character locale.
  CHAR_LOCALE_NAME = {'C': 'Chinese simplified', 'J': 'Japanese'...
Character locale names.
  CHAR_LOCALE_DEFAULT_READING = {'ja': 'Kana', 'ko': 'Hangul', '...
Character locale's default character reading.
  DICTIONARY_INFO = {'CEDICT': {'defaultLocale': 'C', 'options':...
Dictionaries with type (EDICT, CEDICT), reading and reading options.
  READING_DEFAULT_DICTIONARY = {'Pinyin': 'CEDICT'}
Dictionary to use by default for a given reading.
  VARIANT_TYPE_NAMES = {'C': 'Compatible variant', 'M': 'Semanti...
List of character variants and their names.
Method Details [hide private]

__init__(self, locale=None, readingN=None, dictionary=None)
(Constructor)

 

Initialises the CharacterInfo object.

Parameters:
  • locale (str) - character locale (one out of TCJKV)

guessCharacterLocale(self)

 

Guesses the best suited character locale using the user's locale settings.

Returns: str
locale

guessReading(self)

 

Guesses the best suited reading using the user's locale settings.

Returns: str
reading name

getAvailableDictionaries(self)

 

Gets a list of available dictionaries supported.

Returns: list of str
names of available dictionaries

getReadingOptions(self, string, readingN)

 

Guesses the reading options using the given string to support reading dialects.

Parameters:
  • string (str) - reading string
  • readingN (str) - reading name
Returns: dict
reading options

getReadingEntities(self, string, readingN=None)

 

Gets all possible decompositions for the given string.

Parameters:
  • string (str) - reading string
  • readingN (str) - reading name
Returns: list of list of str
decomposition into reading entities.

getSearchReading(self, entities)

 

Prepares the given reading entities for a database search. This is needed when doing fuzzy searches.

Parameters:
  • entities (list of str) - reading entities
  • entities (list of str) - prepared entities
Returns: list of str

convertDictionaryResult(self, result)

 

Converts the readings of the given dictionary result to the default reading.

Parameters:
  • result (list of tuple) - database search result
Returns: list of tuple
converted input

getEquivalentCharTable(self, componentList, includeEquivalentRadicalForms=True)

 

Gets a list structure of equivalent chars for the given list of characters.

If option includeEquivalentRadicalForms is set, all equivalent forms will be searched for when a Kangxi radical is given.

Parameters:
  • componentList (list of str) - list of character components
  • includeEquivalentRadicalForms (bool) - if True then characters in the given component list are interpreted as representatives for their radical and all radical forms are included in the search. E.g. 肉 will include ⺼ as a possible component.
Returns: list of list of str
list structure of equivalent characters

To Do (Impl): Once mapping of similar radical forms exist (e.g. 言 and 訁) include here.

isSemanticVariant(self, char, variants)

 

Checks if the character is a semantic variant form of the given characters.

Parameters:
  • char (str) - Chinese character
  • variants (list of str) - Chinese characters
Returns: bool
True if the character is a semantic variant form of the given characters, False otherwise.

convertReading(self, readingString, fromReading, toReading=None)

 

Converts a string in the source reading to the given target reading.

Parameters:
  • readingString (str) - string written in the source reading
  • fromReading (str) - name of the source reading
  • toReading (str) - name of the target reading
Returns: str
the input string converted to the toReading
Raises:
  • DecompositionError - if the string can not be decomposed into basic entities with regards to the source reading or the given information is insufficient.
  • ConversionError - on operations specific to the conversion between the two readings (e.g. error on converting entities).
  • UnsupportedError - if source or target reading is not supported for conversion.

To Do (Fix): Conversion without tones will mostly break as the target reading doesn't support missing tone information.

getCharactersForKangxiRadicalIndex(self, radicalIndex)

 

Gets all characters for the given Kangxi radical index grouped by their residual stroke count.

Parameters:
  • radicalIndex (int) - Kangxi radical index
Returns: list of str
list of matching Chinese characters

getCharactersForReading(self, readingString, readingN=None)

 

Gets all know characters for the given reading.

Parameters:
  • readingString (str) - reading entity for lookup
  • readingN (str) - name of reading
Returns: list of str
list of characters for the given reading
Raises:
  • UnsupportedError - if no mapping between characters and target reading exists.
  • ConversionError - if conversion from the internal source reading to the given target reading fails.

getReadingForCharacters(self, charString)

 

Gets a list of readings for a given character string.

Parameters:
  • charString (str) - string of Chinese characters
Returns: list of list of str
a list of readings per character
Raises:

getSimplified(self, charString)

 

Gets the Chinese simplified character representation for the given character string.

Parameters:
  • charString (str) - string of Chinese characters
Returns: list of list of str
list of simplified Chinese characters

getTraditional(self, charString)

 

Gets the traditional character representation for the given character string.

Parameters:
  • charString (str) - string of Chinese characters
Returns: list of list of str
list of simplified Chinese characters

To Do (Lang): Implementation is too simple to cover all aspects.

searchDictionaryExact(self, searchString, readingN=None, limit=None)

 

Searches the dictionary for exact matches to the given string.

Parameters:
  • searchString (str) - search string
  • readingN (str) - reading name
  • limit (int) - maximum number of entries

searchDictionaryContaining(self, searchString, readingN=None, position='c', limit=None)

 

Searches the dictionary for matches containing the given string.

A position can be specified to narrow matches for character or reading input. 'c', the most general, will allow the string anywhere in a match, 'b' only at the beginning, 'e' only at the end.

Parameters:
  • searchString (str) - search string
  • readingN (str) - reading name
  • position (str) - position of the string in a match (one out of c, b, e)
  • limit (int) - maximum number of entries

getCharactersForComponents(self, componentList, includeEquivalentRadicalForms=True)

 

Gets all characters that contain the given components.

If option includeEquivalentRadicalForms is set, all equivalent forms will be searched for when a Kangxi radical is given.

Parameters:
  • componentList (list of str) - list of character components
  • includeEquivalentRadicalForms (bool) - if True then characters in the given component list are interpreted as representatives for their radical and all radical forms are included in the search. E.g. 肉 will include ⺼ as a possible component.
Returns: list of tuple
list of pairs of matching characters and their Z-variants
Raises:
  • ValueError - if an invalid character locale is specified

To Do (Impl): Once mapping of similar radical forms exist (e.g. 言 and 訁) include here.

getCharacterInformation(self, char)

 

Get the basic information for the given character.

The following data is collected and returned in a dict:

  • char
  • locale
  • locale name
  • code point hex
  • code point dec
  • type
  • equivalent form (if type is 'radical')
  • radical index
  • radical form (if available)
  • radical variants (if available)
  • stroke count (if available)
  • readings (if type is 'character')
  • variants (if type is 'character')
  • default glyph
  • glyphs
Parameters:
  • char (str) - Chinese character
Returns: dict
character information as keyword value pairs

Class Variable Details [hide private]

LANGUAGE_CHAR_LOCALE_MAPPING

Mapping table for locale to default character locale.

Value:
{'ja': 'J',
 'ko': 'K',
 'vi': 'V',
 'zh': 'C',
 'zh_CN': 'C',
 'zh_HK': 'T',
 'zh_MO': 'T',
 'zh_SG': 'C',
...

CHAR_LOCALE_NAME

Character locale names.

Value:
{'C': 'Chinese simplified',
 'J': 'Japanese',
 'K': 'Korean',
 'T': 'traditional',
 'V': 'Vietnamese'}

CHAR_LOCALE_DEFAULT_READING

Character locale's default character reading.

Value:
{'ja': 'Kana',
 'ko': 'Hangul',
 'zh': 'Pinyin',
 'zh_CN': 'Pinyin',
 'zh_HK': 'CantoneseYale',
 'zh_MO': 'Jyutping',
 'zh_SG': 'Pinyin',
 'zh_TW': 'WadeGiles'}

DICTIONARY_INFO

Dictionaries with type (EDICT, CEDICT), reading and reading options.

Value:
{'CEDICT': {'defaultLocale': 'C',
            'options': {'toneMarkType': 'Numbers'},
            'reading': 'Pinyin',
            'readingFunc': <function <lambda> at 0x944b144>,
            'type': 'CEDICT'},
 'CEDICTGR': {'defaultLocale': 'T',
              'options': {},
              'reading': 'GR',
...

VARIANT_TYPE_NAMES

List of character variants and their names.

Value:
{'C': 'Compatible variant',
 'M': 'Semantic variants',
 'P': 'Specialised semantic variants',
 'S': 'Simplified variants',
 'T': 'Traditional variants',
 'Z': 'Z-Variants'}