Package cjklib :: Package reading :: Module operator :: Class CantoneseYaleOperator

Class CantoneseYaleOperator

Provides an operator for the Cantonese Yale romanisation.

Features:

tones marked by either diacritics or numbers,
choice between high level and high falling tone for number marks,
guessing of input form (reading dialect) and
splitting of syllables into onset, nucleus and coda.

High Level vs. High Falling Tone

Yale distinguishes two tones often subsumed under one: the high level tone with tone contour 55 as given in the commonly used pitch model by Yuen Ren Chao and the high falling tone given as pitch 53 (as by Chao), 52 or 51 (Bauer and Benedikt, chapter 2.1.1 pp. 115). Many sources state that these two tones aren't distinguishable anymore in modern Hong Kong Cantonese and thus are subsumed under one tone in some romanisation systems for Cantonese.

In the abbreviated form of the Yale romanisation that uses numbers to represent tones this distinction is not made. The mapping of the tone number 1 to either the high level or the high falling tone can be given by the user and is important when conversion is done involving this abbreviated form of the Yale romanisation. By default the the high level tone will be used as this primary use is indicated in the given sources.

Sources

Stephen Matthews, Virginia Yip: Cantonese: A Comprehensive Grammar. Routledge, 1994, ISBN 0-415-08945-X.
Robert S. Bauer, Paul K. Benedikt: Modern Cantonese Phonology (摩登廣州話語音學). Walter de Gruyter, 1997, ISBN 3-11-014893-5.

See Also:

Cantonese: A Comprehensive Grammar (Preview): http://books.google.de/books?id=czbGJLu59S0C
Modern Cantonese Phonology (Preview): http://books.google.de/books?id=QWNj5Yj6_CgC

Instance Methods

[hide private]

__init__(self, **options)
Creates an instance of the CantoneseYaleOperator.

source code

list

getTones(self)
Returns a set of tones supported by the reading.

source code

str

compose(self, readingEntities)
Composes the given list of basic entities to a string.

source code

str

getTonalEntity(self, plainEntity, tone)
Gets the entity with tone mark for the given plain entity and tone.

source code

tuple

splitEntityTone(self, entity)
Splits the entity into an entity without tone mark and the entity's tone index.

source code

set of str

getPlainReadingEntities(self)
Gets the list of plain entities supported by this reading.

source code

tuple of str

getOnsetRhyme(self, plainSyllable)
Splits the given plain syllable into onset (initial) and rhyme (final).

source code

tuple of str

getOnsetNucleusCoda(self, plainSyllable)
Splits the given plain syllable into onset (initial), nucleus and coda, the latter building the rhyme (final).

source code

Inherited from TonalRomanisationOperator: getReadingEntities, isPlainReadingEntity, isReadingEntity

Inherited from RomanisationOperator: decompose, getDecompositionTree, getDecompositions, isStrictDecomposition, segment

Inherited from RomanisationOperator (private): _hasMergeableSyllables, _hasSyllableSubstring, _recursiveSegmentation

Inherited from ReadingOperator: getOption

Inherited from object: __delattr__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __repr__, __setattr__, __str__

Class Methods

[hide private]

dict

getDefaultOptions(cls)
Returns the reading operator's default options.

source code

dict

guessReadingDialect(cls, string, includeToneless=False)
Takes a string written in Cantonese Yale and guesses the reading dialect.

source code

Static Methods

[hide private]

list of str

_getDiacriticVowels()
Gets a list of Cantonese Yale vowels with diacritical marks for tones.

source code

Inherited from RomanisationOperator (private): _crossProduct, _treeToList

Class Variables

[hide private]

READING_NAME = 'CantoneseYale'
Unique name of reading

TONES = ['1stToneLevel', '1stToneFalling', '2ndTone', '3rdTone...
Names of tones used in the romanisation.

TONE_MARK_MAPPING = {'Diacritics': {'1stToneFalling': (u'̀', '...
Mapping of tone name to representation per tone mark type.

syllableRegex = re.compile(r'((?:m|ng|h|(?:[bcdfghjklmnpqrstvw...
Regex to split a string in NFD into several syllables in a crude way.

Inherited from RomanisationOperator: readingEntityRegex

Properties

[hide private]

Inherited from object: __class__

Method Details

[hide private]

init(self, **options)
(Constructor)

source code

Creates an instance of the CantoneseYaleOperator.

Parameters:

options - extra options
dbConnectInst - instance of a DatabaseConnector, if none is given, default settings will be assumed.
strictSegmentation - if True segmentation (using segment()) and thus decomposition (using decompose()) will raise an exception if an alphabetic string is parsed which can not be segmented into single reading entities. If False the aforesaid string will be returned unsegmented.
toneMarkType - if set to 'Diacritics' tones will be marked using diacritic marks and the character h for low tones, if set to 'Numbers' appended numbers from 1 to 6 will be used to mark tones, if set to 'None' no tone marks will be used and no tonal information will be supplied at all.
missingToneMark - if set to 'noinfo' no tone information will be deduced when no tone mark is found (takes on value None), if set to 'ignore' this entity will not be valid and for segmentation the behaviour defined by 'strictSegmentation' will take affect. This option is only valid if the value 'Numbers' is given for the option toneMarkType.
YaleFirstTone - tone in Yale which the first tone for tone marks with numbers should be mapped to. Value can be '1stToneLevel' to map to the level tone with contour 55 or '1stToneFalling' to map to the falling tone with contour 53.

Overrides: object.__init__

getDefaultOptions(cls)
Class Method

source code

Returns the reading operator's default options.

The default implementation returns an empty dictionary. The keyword 'dbConnectInst' is not regarded a configuration option of the operator and is thus not included in the dict returned.

Returns: dict: the reading operator's default options.
Overrides: ReadingOperator.getDefaultOptions: (inherited documentation)

_getDiacriticVowels()
Static Method

source code

Gets a list of Cantonese Yale vowels with diacritical marks for tones.

The list includes characters m, n and h for nasal forms.

Returns: list of str: list of Cantonese Yale vowels with diacritical marks

guessReadingDialect(cls, string, includeToneless=False)
Class Method

source code

Takes a string written in Cantonese Yale and guesses the reading dialect.

Currently only the option 'toneMarkType' is guessed. Unless 'includeToneless' is set to True only the tone mark types 'Diacritics' and 'Numbers' are considered as the latter one can also represent the state of missing tones.

Parameters:

string (str) - Cantonese Yale string

Returns: dict

dictionary of basic keyword settings

getTones(self)

source code

Returns a set of tones supported by the reading. These tones don't necessarily reflect the tones of the underlying language but may defer to reflect notational or other features.

The default implementation will raise a NotImplementedError.

Returns: list: list of supported tone marks.
Overrides: TonalFixedEntityOperator.getTones: (inherited documentation)

compose(self, readingEntities)

source code

Composes the given list of basic entities to a string.

The default implementation will raise a NotImplementedError.

Parameters:

readingEntities - list of basic entities or other content

Returns: str

composed entities

Overrides: ReadingOperator.compose

(inherited documentation)

getTonalEntity(self, plainEntity, tone)

source code

Gets the entity with tone mark for the given plain entity and tone.

The default implementation will raise a NotImplementedError.

Parameters:

plainEntity - entity without tonal information
tone - tone

Returns: str

entity with appropriate tone

Raises:

InvalidEntityError - if the entity is invalid.
UnsupportedError - if the operation is not supported for the given form.

Overrides: TonalFixedEntityOperator.getTonalEntity

To Do (Lang): Place the tone mark on the first character of the nucleus?

splitEntityTone(self, entity)

source code

Splits the entity into an entity without tone mark and the entity's tone index.

The plain entity returned will always be in Unicode's Normalization Form C (NFC, see http://www.unicode.org/reports/tr15/).

Parameters:

entity (str) - entity with tonal information

Returns: tuple

plain entity without tone mark and entity's tone index (starting with 1)

Raises:

InvalidEntityError - if the entity is invalid.
UnsupportedError - if the operation is not supported for the given form.

Overrides: TonalFixedEntityOperator.splitEntityTone

getPlainReadingEntities(self)

source code

Gets the list of plain entities supported by this reading. Different to getReadingEntities() the entities will carry no tone mark.

The default implementation will raise a NotImplementedError.

Returns: set of str: set of supported syllables
Overrides: TonalFixedEntityOperator.getPlainReadingEntities: (inherited documentation)

getOnsetRhyme(self, plainSyllable)

source code

Splits the given plain syllable into onset (initial) and rhyme (final).

The syllabic nasals m, ng will be returned as final. Syllables yu, yun, yut will fall into (y, yu, ), (y, yu, n) and (y, yu, t).

Parameters:

plainSyllable (str) - syllable without tone marks

Returns: tuple of str

tuple of entity onset and rhyme

Raises:

InvalidEntityError - if the entity is invalid.

getOnsetNucleusCoda(self, plainSyllable)

source code

Splits the given plain syllable into onset (initial), nucleus and coda, the latter building the rhyme (final).

The syllabic nasals m, ng will be returned as coda. Syllables yu, yun, yut will fall into (y, yu, ), (y, yu, n) and (y, yu, t).

Parameters:

plainSyllable (str) - syllable in the Yale romanisation system without tone marks

Returns: tuple of str

tuple of syllable onset, nucleus and coda

Raises:

InvalidEntityError - if the entity is invalid (e.g. syllable nucleus or tone invalid).

To Do (Impl): Finals ing, ik, ung, uk, eun, eut, a differ from other finals with same vowels. What semantics/view do we want to provide on the syllable parts?

Class Variable Details

[hide private]

TONES

Names of tones used in the romanisation.

Value:

['1stToneLevel',
 '1stToneFalling',
 '2ndTone',
 '3rdTone',
 '4thTone',
 '5thTone',
 '6thTone']

TONE_MARK_MAPPING

Mapping of tone name to representation per tone mark type. Representations includes a diacritic mark and optional the letter 'h' marking a low tone.

The 'Internal' dialect is used for conversion between different forms of Cantonese Yale. As conversion to the other dialects can lose information (Diacritics: missing tone, Numbers: distinction between high level and high rising, None: no tones at all) conversion to this dialect can retain all information and thus can be used as a standard target reading.

Value:

{'Diacritics': {'1stToneFalling': (u'̀', ''),
                '1stToneLevel': (u'̄', ''),
                '2ndTone': (u'́', ''),
                '3rdTone': (u'', ''),
                '4thTone': (u'̀', 'h'),
                '5thTone': (u'́', 'h'),
                '6thTone': (u'', 'h')},
 'Internal': {None: ('', ''), '1stToneFalling': ('1', ''), '1stToneLev
...

syllableRegex

Regex to split a string in NFD into several syllables in a crude way. The regular expressions works for both, diacritical and number tone marks. It consists of:

Nasal syllables,
Initial consonants,
vowels including diacritics,
tone mark h,
final consonants,
tone numbers.

Value:

re.compile(r'((?:m|ng|h|(?:[bcdfghjklmnpqrstvwxyz]*(?:(?:[aeiou]|[\u03
04\u0301\u0300])+|yu[\u0304\u0301\u0300]?)))(?:h(?!(?:[aeiou]|yu)))?(?
:[mnptk]|ng)?[0123456]?)')

Class CantoneseYaleOperator

High Level vs. High Falling Tone

Sources

__init__(self, **options) (Constructor)

getDefaultOptions(cls) Class Method

_getDiacriticVowels() Static Method

guessReadingDialect(cls, string, includeToneless=False) Class Method

getTones(self)

compose(self, readingEntities)

getTonalEntity(self, plainEntity, tone)

splitEntityTone(self, entity)

getPlainReadingEntities(self)

getOnsetRhyme(self, plainSyllable)

getOnsetNucleusCoda(self, plainSyllable)

TONES

TONE_MARK_MAPPING

syllableRegex

init(self, **options)
(Constructor)

getDefaultOptions(cls)
Class Method

_getDiacriticVowels()
Static Method

guessReadingDialect(cls, string, includeToneless=False)
Class Method