Package cjklib :: Package reading :: Module operator :: Class TonalIPAOperator
[hide private]
[frames] | no frames]

Class TonalIPAOperator

source code


Defines an operator on strings of a tonal language written in the International Phonetic Alphabet (IPA).

TonalIPAOperator does not supply the same closed set of syllables as other ReadingOperators as IPA provides different ways to represent pronunciation. Because of that a user defined IPA syllable will not easily map to another transcription system and thus only basic support is provided for this direction.

Tones

Tones in IPA can be expressed using different schemes. The following schemes are implemented here:


To Do (Lang): Shed more light on representations of tones in IPA.

To Do (Fix): Get all diacritics used in IPA as tones for TONE_MARK_REGEX.

Instance Methods [hide private]
 
__init__(self, **options)
Creates an instance of the TonalIPAOperator.
source code
list
getTones(self)
Returns a set of tones supported by the reading.
source code
list of str
decompose(self, string)
Decomposes the given string into basic entities that can be mapped to one Chinese character each (exceptions possible).
source code
str
compose(self, readingEntities)
Composes the given list of basic entities to a string.
source code
str
getTonalEntity(self, plainEntity, tone)
Gets the entity with tone mark for the given plain entity and tone.
source code
tuple
splitEntityTone(self, entity)
Splits the entity into an entity without tone mark and the name of the entity's tone.
source code
str
getToneForToneMark(self, toneMark)
Gets the tone for the given tone mark.
source code

Inherited from TonalFixedEntityOperator: getPlainReadingEntities, getReadingEntities, isPlainReadingEntity, isReadingEntity

Inherited from ReadingOperator: getOption

Inherited from object: __delattr__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __repr__, __setattr__, __str__

Class Methods [hide private]
dict
getDefaultOptions(cls)
Returns the reading operator's default options.
source code
Class Variables [hide private]
  TONE_MARK_REGEX = {'ChaoDigits': re.compile(r'(12345+)$'), 'Di...
  DEFAULT_TONE_MARK_TYPE = 'IPAToneBar'
Tone mark type to select by default.
  TONES = []
List of tone names.
  TONE_MARK_PREFER = {'ChaoDigits': {}, 'Diacritics': {}, 'IPATo...
Mapping of tone marks to tone name which will be preferred on ambiguous mappings.
  TONE_MARK_MAPPING = {'ChaoDigits': {}, 'Diacritics': {}, 'IPAT...
Mapping of tone names to tone mark for each tone mark type.

Inherited from ReadingOperator: READING_NAME

Properties [hide private]

Inherited from object: __class__

Method Details [hide private]

__init__(self, **options)
(Constructor)

source code 

Creates an instance of the TonalIPAOperator.

By default no tone marks will be shown.

Parameters:
  • options - extra options
  • dbConnectInst - instance of a DatabaseConnector, if none is given, default settings will be assumed.
  • toneMarkType - type of tone marks, one out of 'Numbers', 'ChaoDigits', 'IPAToneBar', 'Diacritics', 'None'
  • missingToneMark - if set to 'noinfo' no tone information will be deduced when no tone mark is found (takes on value None), if set to 'ignore' this entity will not be valid.
Overrides: object.__init__

getDefaultOptions(cls)
Class Method

source code 

Returns the reading operator's default options.

The default implementation returns an empty dictionary. The keyword 'dbConnectInst' is not regarded a configuration option of the operator and is thus not included in the dict returned.

Returns: dict
the reading operator's default options.
Overrides: ReadingOperator.getDefaultOptions
(inherited documentation)

getTones(self)

source code 

Returns a set of tones supported by the reading. These tones don't necessarily reflect the tones of the underlying language but may defer to reflect notational or other features.

The default implementation will raise a NotImplementedError.

Returns: list
list of supported tone marks.
Overrides: TonalFixedEntityOperator.getTones
(inherited documentation)

decompose(self, string)

source code 

Decomposes the given string into basic entities that can be mapped to one Chinese character each (exceptions possible).

The returned list contains a mix of basic reading entities and other characters e.g. spaces and punctuation marks.

Single syllables can only be found if distinguished by a period or whitespace, such as compose() would return.

Parameters:
  • string (str) - reading string
Returns: list of str
a list of basic entities of the input string
Raises:
Overrides: ReadingOperator.decompose

compose(self, readingEntities)

source code 

Composes the given list of basic entities to a string. IPA syllables are separated by a period.

Parameters:
  • readingEntities (list of str) - list of basic entities or other content
Returns: str
composed entities
Overrides: ReadingOperator.compose

getTonalEntity(self, plainEntity, tone)

source code 

Gets the entity with tone mark for the given plain entity and tone.

The plain entity returned will always be in Unicode's Normalization Form C (NFC, see http://www.unicode.org/reports/tr15/).

Parameters:
  • plainEntity (str) - entity without tonal information
  • tone (str) - tone
Returns: str
entity with appropriate tone
Raises:
Overrides: TonalFixedEntityOperator.getTonalEntity

To Do (Impl): Place diacritics on main vowel, derive from IPA representation.

splitEntityTone(self, entity)

source code 

Splits the entity into an entity without tone mark and the name of the entity's tone.

The plain entity returned will always be in Unicode's Normalization Form C (NFC, see http://www.unicode.org/reports/tr15/).

Parameters:
  • entity (str) - entity with tonal information
Returns: tuple
plain entity without tone mark and additionally the tone
Raises:
Overrides: TonalFixedEntityOperator.splitEntityTone

getToneForToneMark(self, toneMark)

source code 

Gets the tone for the given tone mark.

Parameters:
  • toneMark (str) - tone mark representation of the tone
Returns: str
tone
Raises:

Class Variable Details [hide private]

TONE_MARK_REGEX

Value:
{'ChaoDigits': re.compile(r'(12345+)$'),
 'Diacritics': re.compile(r'([\u0300\u0301\u0302\u0303\u030c]+)'),
 'IPAToneBar': re.compile(r'([\u02e5\u02e6\u02e7\u02e8\u02e9\ua708\ua7\
09\ua70a\ua70b\ua70c]+)$'),
 'Numbers': re.compile(r'(\d)$')}

TONES

List of tone names. Needs to be implemented in child class.

Value:
[]

TONE_MARK_PREFER

Mapping of tone marks to tone name which will be preferred on ambiguous mappings. Needs to be implemented in child classes.

Value:
{'ChaoDigits': {}, 'Diacritics': {}, 'IPAToneBar': {}, 'Numbers': {}}

TONE_MARK_MAPPING

Mapping of tone names to tone mark for each tone mark type. Needs to be implemented in child classes.

Value:
{'ChaoDigits': {}, 'Diacritics': {}, 'IPAToneBar': {}, 'Numbers': {}}