Package cjklib :: Package reading :: Module operator :: Class CantoneseYaleOperator
[hide private]
[frames] | no frames]

Class CantoneseYaleOperator

source code


Provides an operator for the Cantonese Yale romanisation.

Features:

High Level vs. High Falling Tone

Yale distinguishes two tones often subsumed under one: the high level tone with tone contour 55 as given in the commonly used pitch model by Yuen Ren Chao and the high falling tone given as pitch 53 (as by Chao), 52 or 51 (Bauer and Benedikt, chapter 2.1.1 pp. 115). Many sources state that these two tones aren't distinguishable anymore in modern Hong Kong Cantonese and thus are subsumed under one tone in some romanisation systems for Cantonese.

In the abbreviated form of the Yale romanisation that uses numbers to represent tones this distinction is not made. The mapping of the tone number 1 to either the high level or the high falling tone can be given by the user and is important when conversion is done involving this abbreviated form of the Yale romanisation. By default the the high level tone will be used as this primary use is indicated in the given sources.

Sources


See Also:

Instance Methods [hide private]
 
__init__(self, **options)
Creates an instance of the CantoneseYaleOperator.
source code
list
getTones(self)
Returns a set of tones supported by the reading.
source code
str
compose(self, readingEntities)
Composes the given list of basic entities to a string.
source code
str
getTonalEntity(self, plainEntity, tone)
Gets the entity with tone mark for the given plain entity and tone.
source code
tuple
splitEntityTone(self, entity)
Splits the entity into an entity without tone mark and the entity's tone index.
source code
set of str
getPlainReadingEntities(self)
Gets the list of plain entities supported by this reading.
source code
tuple of str
getOnsetRhyme(self, plainSyllable)
Splits the given plain syllable into onset (initial) and rhyme (final).
source code
tuple of str
getOnsetNucleusCoda(self, plainSyllable)
Splits the given plain syllable into onset (initial), nucleus and coda, the latter building the rhyme (final).
source code

Inherited from TonalRomanisationOperator: getReadingEntities, isPlainReadingEntity, isReadingEntity

Inherited from RomanisationOperator: decompose, getDecompositionTree, getDecompositions, isStrictDecomposition, segment

Inherited from ReadingOperator: getOption

Inherited from object: __delattr__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __repr__, __setattr__, __str__

Class Methods [hide private]
dict
getDefaultOptions(cls)
Returns the reading operator's default options.
source code
dict
guessReadingDialect(cls, string, includeToneless=False)
Takes a string written in Cantonese Yale and guesses the reading dialect.
source code
Static Methods [hide private]
list of str
_getDiacriticVowels()
Gets a list of Cantonese Yale vowels with diacritical marks for tones.
source code

Inherited from RomanisationOperator (private): _crossProduct, _treeToList

Class Variables [hide private]
  READING_NAME = 'CantoneseYale'
Unique name of reading
  TONES = ['1stToneLevel', '1stToneFalling', '2ndTone', '3rdTone...
Names of tones used in the romanisation.
  TONE_MARK_MAPPING = {'Diacritics': {'1stToneFalling': (u'̀', '...
Mapping of tone name to representation per tone mark type.
  syllableRegex = re.compile(r'((?:m|ng|h|(?:[bcdfghjklmnpqrstvw...
Regex to split a string in NFD into several syllables in a crude way.

Inherited from RomanisationOperator: readingEntityRegex

Properties [hide private]

Inherited from object: __class__

Method Details [hide private]

__init__(self, **options)
(Constructor)

source code 

Creates an instance of the CantoneseYaleOperator.

Parameters:
  • options - extra options
  • dbConnectInst - instance of a DatabaseConnector, if none is given, default settings will be assumed.
  • strictSegmentation - if True segmentation (using segment()) and thus decomposition (using decompose()) will raise an exception if an alphabetic string is parsed which can not be segmented into single reading entities. If False the aforesaid string will be returned unsegmented.
  • toneMarkType - if set to 'Diacritics' tones will be marked using diacritic marks and the character h for low tones, if set to 'Numbers' appended numbers from 1 to 6 will be used to mark tones, if set to 'None' no tone marks will be used and no tonal information will be supplied at all.
  • missingToneMark - if set to 'noinfo' no tone information will be deduced when no tone mark is found (takes on value None), if set to 'ignore' this entity will not be valid and for segmentation the behaviour defined by 'strictSegmentation' will take affect. This option is only valid if the value 'Numbers' is given for the option toneMarkType.
  • YaleFirstTone - tone in Yale which the first tone for tone marks with numbers should be mapped to. Value can be '1stToneLevel' to map to the level tone with contour 55 or '1stToneFalling' to map to the falling tone with contour 53.
Overrides: object.__init__

getDefaultOptions(cls)
Class Method

source code 

Returns the reading operator's default options.

The default implementation returns an empty dictionary. The keyword 'dbConnectInst' is not regarded a configuration option of the operator and is thus not included in the dict returned.

Returns: dict
the reading operator's default options.
Overrides: ReadingOperator.getDefaultOptions
(inherited documentation)

_getDiacriticVowels()
Static Method

source code 

Gets a list of Cantonese Yale vowels with diacritical marks for tones.

The list includes characters m, n and h for nasal forms.

Returns: list of str
list of Cantonese Yale vowels with diacritical marks

guessReadingDialect(cls, string, includeToneless=False)
Class Method

source code 

Takes a string written in Cantonese Yale and guesses the reading dialect.

Currently only the option 'toneMarkType' is guessed. Unless 'includeToneless' is set to True only the tone mark types 'Diacritics' and 'Numbers' are considered as the latter one can also represent the state of missing tones.

Parameters:
  • string (str) - Cantonese Yale string
Returns: dict
dictionary of basic keyword settings

getTones(self)

source code 

Returns a set of tones supported by the reading. These tones don't necessarily reflect the tones of the underlying language but may defer to reflect notational or other features.

The default implementation will raise a NotImplementedError.

Returns: list
list of supported tone marks.
Overrides: TonalFixedEntityOperator.getTones
(inherited documentation)

compose(self, readingEntities)

source code 

Composes the given list of basic entities to a string.

The default implementation will raise a NotImplementedError.

Parameters:
  • readingEntities - list of basic entities or other content
Returns: str
composed entities
Overrides: ReadingOperator.compose
(inherited documentation)

getTonalEntity(self, plainEntity, tone)

source code 

Gets the entity with tone mark for the given plain entity and tone.

The default implementation will raise a NotImplementedError.

Parameters:
  • plainEntity - entity without tonal information
  • tone - tone
Returns: str
entity with appropriate tone
Raises:
Overrides: TonalFixedEntityOperator.getTonalEntity

To Do (Lang): Place the tone mark on the first character of the nucleus?

splitEntityTone(self, entity)

source code 

Splits the entity into an entity without tone mark and the entity's tone index.

The plain entity returned will always be in Unicode's Normalization Form C (NFC, see http://www.unicode.org/reports/tr15/).

Parameters:
  • entity (str) - entity with tonal information
Returns: tuple
plain entity without tone mark and entity's tone index (starting with 1)
Raises:
Overrides: TonalFixedEntityOperator.splitEntityTone

getPlainReadingEntities(self)

source code 

Gets the list of plain entities supported by this reading. Different to getReadingEntities() the entities will carry no tone mark.

The default implementation will raise a NotImplementedError.

Returns: set of str
set of supported syllables
Overrides: TonalFixedEntityOperator.getPlainReadingEntities
(inherited documentation)

getOnsetRhyme(self, plainSyllable)

source code 

Splits the given plain syllable into onset (initial) and rhyme (final).

The syllabic nasals m, ng will be returned as final. Syllables yu, yun, yut will fall into (y, yu, ), (y, yu, n) and (y, yu, t).

Parameters:
  • plainSyllable (str) - syllable without tone marks
Returns: tuple of str
tuple of entity onset and rhyme
Raises:

getOnsetNucleusCoda(self, plainSyllable)

source code 

Splits the given plain syllable into onset (initial), nucleus and coda, the latter building the rhyme (final).

The syllabic nasals m, ng will be returned as coda. Syllables yu, yun, yut will fall into (y, yu, ), (y, yu, n) and (y, yu, t).

Parameters:
  • plainSyllable (str) - syllable in the Yale romanisation system without tone marks
Returns: tuple of str
tuple of syllable onset, nucleus and coda
Raises:

To Do (Impl): Finals ing, ik, ung, uk, eun, eut, a differ from other finals with same vowels. What semantics/view do we want to provide on the syllable parts?


Class Variable Details [hide private]

TONES

Names of tones used in the romanisation.

Value:
['1stToneLevel',
 '1stToneFalling',
 '2ndTone',
 '3rdTone',
 '4thTone',
 '5thTone',
 '6thTone']

TONE_MARK_MAPPING

Mapping of tone name to representation per tone mark type. Representations includes a diacritic mark and optional the letter 'h' marking a low tone.

The 'Internal' dialect is used for conversion between different forms of Cantonese Yale. As conversion to the other dialects can lose information (Diacritics: missing tone, Numbers: distinction between high level and high rising, None: no tones at all) conversion to this dialect can retain all information and thus can be used as a standard target reading.

Value:
{'Diacritics': {'1stToneFalling': (u'̀', ''),
                '1stToneLevel': (u'̄', ''),
                '2ndTone': (u'́', ''),
                '3rdTone': (u'', ''),
                '4thTone': (u'̀', 'h'),
                '5thTone': (u'́', 'h'),
                '6thTone': (u'', 'h')},
 'Internal': {None: ('', ''), '1stToneFalling': ('1', ''), '1stToneLev\
...

syllableRegex

Regex to split a string in NFD into several syllables in a crude way. The regular expressions works for both, diacritical and number tone marks. It consists of:

  • Nasal syllables,
  • Initial consonants,
  • vowels including diacritics,
  • tone mark h,
  • final consonants,
  • tone numbers.
Value:
re.compile(r'((?:m|ng|h|(?:[bcdfghjklmnpqrstvwxyz]*(?:(?:[aeiou]|[\u03\
04\u0301\u0300])+|yu[\u0304\u0301\u0300]?)))(?:h(?!(?:[aeiou]|yu)))?(?\
:[mnptk]|ng)?[0123456]?)')