Package cjklib :: Package reading :: Module operator :: Class GROperator
[hide private]
[frames] | no frames]

Class GROperator

source code


Provides an operator for the Mandarin Gwoyeu Romatzyh romanisation.

Features:

Limitations:

R-colouring

Gwoyeu Romatzyh renders rhotacised syllables (Erlhuah) by trying to give the actual pronunciation. As the effect of r-colouring looses the information of the underlying etymological syllable conversion between the r-coloured form back to the underlying form can not be done in an unambiguous way. As furthermore finals i, iu, in, iun contrast in the first and the second tone but not in the third and the forth tone conversion between different tones (including the base form) cannot be made in a general manner: 小鸡儿 sheau-jiel is different to 小街儿 sheau-jie’l but 几儿 jieel equals 姐儿 jieel (see Chao).

Thus this ReadingOperator lacks the general handling of syllable renderings and many methods narrow the range of syllables allowed. Unlike the original forms without r-colouring for Erlhuah forms the combination of a plain syllable with a specific tone is limited to the data given in the source, so operations involving tones may return with an UnsupportedError if the given syllable isn't found with that tone.

Sources


See Also:

To Do (Lang): To Do (Impl):
Instance Methods [hide private]
 
__init__(self, **options)
Creates an instance of the GROperator.
source code
list
getTones(self)
Returns a set of tones supported by the reading.
source code
str
compose(self, readingEntities)
Composes the given list of basic entities to a string.
source code
bool
isStrictDecomposition(self, readingEntities)
Checks if the given decomposition follows the romanisation format strictly to allow unambiguous decomposition.
source code
list of tuple
_recursiveSegmentation(self, string)
Takes a string written in the romanisation and returns the possible segmentations as a tree of syllables.
source code
list of str
removeApostrophes(self, readingEntities)
Removes apostrophes between two syllables for a given decomposition.
source code
int
getBaseTone(self, tone)
Gets the tone number of the tone or the etymological tone if it is a neutral or optional neutral tone.
source code
tuple of str
splitPlainSyllableCVC(self, plainSyllable)
Splits the given plain syllable into consonants-vowels-consonants.
source code
str
getTonalEntity(self, plainEntity, tone)
Gets the entity with tone mark for the given plain entity and tone.
source code
tuple
splitEntityTone(self, entity)
Splits the entity into an entity without tone mark (plain entity) and the entity's tone.
source code
str
getRhotacisedTonalEntity(self, plainEntity, tone)
Gets the r-coloured entity (Erlhuah form) with tone mark for the given plain entity and tone.
source code
dict
_getAbbreviatedLookup(self)
Gets the abbreviated form lookup table.
source code
list
getAbbreviatedEntities(self)
Gets a list of abbreviated GR spellings.
source code
bool
isAbbreviatedEntity(self, entity)
Returns true if the given entity is an abbreviated spelling.
source code
str
convertAbbreviatedEntity(self, entity)
Converts the given abbreviated GR spelling to the original form.
source code
set of str
getPlainReadingEntities(self)
Gets the list of plain entities supported by this reading without r-coloured forms (Erlhuah forms).
source code
set of str
getFullReadingEntities(self)
Gets a set of full entities supported by the reading excluding abbreviated forms.
source code
list of str
getReadingEntities(self)
Gets a set of all entities supported by the reading.
source code
bool
isReadingEntity(self, entity)
Returns true if the given entity is recognised by the romanisation operator, i.e.
source code

Inherited from TonalRomanisationOperator: isPlainReadingEntity

Inherited from RomanisationOperator: decompose, getDecompositionTree, getDecompositions, segment

Inherited from ReadingOperator: getOption

Inherited from object: __delattr__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __repr__, __setattr__, __str__

Class Methods [hide private]
dict
getDefaultOptions(cls)
Returns the reading operator's default options.
source code
dict
guessReadingDialect(cls, string, includeToneless=False)
Takes a string written in GR and guesses the reading dialect.
source code
Static Methods [hide private]

Inherited from RomanisationOperator (private): _crossProduct, _treeToList

Class Variables [hide private]
  READING_NAME = 'GR'
Unique name of reading
  TONES = ['1stTone', '2ndTone', '3rdTone', '4thTone', '5thToneE...
  SYLLABLE_STRUCTURE = re.compile(r'^((?:tz|ts|ch|sh|[bpmfdtnlsj...
Regular expression describing the syllable structure in GR (C,V,C).
  _syllableToneLookup = None
Holds the tonal syllable to plain syllable & tone lookup table.
  _abbrConversionLookup = None
Holds the abbreviated entity lookup table.
  DB_RHOTACISED_FINAL_MAPPING = {1: 'GRFinal_T1', 2: 'GRFinal_T2...
Database fields for tonal Erlhuah syllables.
  DB_RHOTACISED_FINAL_MAPPING_ZEROINITIAL = {1: 'GRFinal_T1', 2:...
Database fields for tonal Erlhuah syllables with i, u and iu medials.
  DB_RHOTACISED_FINAL_APOSTROPHE = '\''
Default apostrophe used by GR syllable data in database for marking the longer and back vowel in rhotacised finals.

Inherited from RomanisationOperator: readingEntityRegex

Properties [hide private]

Inherited from object: __class__

Method Details [hide private]

__init__(self, **options)
(Constructor)

source code 

Creates an instance of the GROperator.

Parameters:
  • options - extra options
  • dbConnectInst - instance of a DatabaseConnector, if none is given, default settings will be assumed.
  • strictSegmentation - if True segmentation (using segment()) and thus decomposition (using decompose()) will raise an exception if an alphabetic string is parsed which can not be segmented into single reading entities. If False the aforesaid string will be returned unsegmented.
  • abbreviations - if set to True abbreviated spellings will be supported.
  • GRRhotacisedFinalApostrophe - an alternate apostrophe that is taken instead of the default one for marking a longer and back vowel in rhotacised finals.
  • GRSyllableSeparatorApostrophe - an alternate apostrophe that is taken instead of the default one for separating 0-initial syllables from preceding ones.
Overrides: object.__init__

getDefaultOptions(cls)
Class Method

source code 

Returns the reading operator's default options.

The default implementation returns an empty dictionary. The keyword 'dbConnectInst' is not regarded a configuration option of the operator and is thus not included in the dict returned.

Returns: dict
the reading operator's default options.
Overrides: ReadingOperator.getDefaultOptions
(inherited documentation)

guessReadingDialect(cls, string, includeToneless=False)
Class Method

source code 

Takes a string written in GR and guesses the reading dialect.

The options 'GRRhotacisedFinalApostrophe' and 'GRSyllableSeparatorApostrophe' are guessed. Both will be set to the same value which derives from a list of different apostrophes and similar characters.

Parameters:
  • string (str) - GR string
Returns: dict
dictionary of basic keyword settings

getTones(self)

source code 

Returns a set of tones supported by the reading. These tones don't necessarily reflect the tones of the underlying language but may defer to reflect notational or other features.

The default implementation will raise a NotImplementedError.

Returns: list
list of supported tone marks.
Overrides: TonalFixedEntityOperator.getTones
(inherited documentation)

compose(self, readingEntities)

source code 

Composes the given list of basic entities to a string. Applies an apostrophe between syllables if the second syllable has a zero-initial.

Parameters:
  • readingEntities (list of str) - list of basic syllables or other content
Returns: str
composed entities
Overrides: ReadingOperator.compose

isStrictDecomposition(self, readingEntities)

source code 

Checks if the given decomposition follows the romanisation format strictly to allow unambiguous decomposition.

The romanisation should offer a way/protocol to make an unambiguous decomposition into it's basic syllables possible as to make the process of appending syllables to a string reversible. The testing on compliance with this protocol has to be implemented here. Thus this method can only return true for one and only one possible decomposition for all strings.

Parameters:
  • decomposition - decomposed reading string
Returns: bool
False, as this methods needs to be implemented by the sub class
Overrides: RomanisationOperator.isStrictDecomposition
(inherited documentation)

_recursiveSegmentation(self, string)

source code 

Takes a string written in the romanisation and returns the possible segmentations as a tree of syllables.

The tree is represented by tuples (syllable, subtree).

Parameters:
  • string - reading string
Returns: list of tuple
a tree of possible segmentations (if ambiguous) into single syllables
Overrides: RomanisationOperator._recursiveSegmentation
(inherited documentation)

removeApostrophes(self, readingEntities)

source code 

Removes apostrophes between two syllables for a given decomposition.

Parameters:
  • readingEntities (list of str) - list of basic syllables or other content
Returns: list of str
the given entity list without separating apostrophes

getBaseTone(self, tone)

source code 

Gets the tone number of the tone or the etymological tone if it is a neutral or optional neutral tone.

Parameters:
  • tone (str) - tone
Returns: int
base tone number
Raises:

splitPlainSyllableCVC(self, plainSyllable)

source code 

Splits the given plain syllable into consonants-vowels-consonants.

Parameters:
  • plainSyllable (str) - entity without tonal information
Returns: tuple of str
syllable CVC triple
Raises:

getTonalEntity(self, plainEntity, tone)

source code 

Gets the entity with tone mark for the given plain entity and tone. This method only works for plain syllables that are not r-coloured (Erlhuah forms) as due to the depiction of Erlhuah in GR the information about the base syllable is lost and pronunciation partly varies between different syllables. Use getRhotacisedTonalEntity() to get the tonal entity for a given etymological (base) syllable.

Parameters:
  • plainEntity (str) - entity without tonal information
  • tone (str) - tone
Returns: str
entity with appropriate tone
Raises:
Overrides: TonalFixedEntityOperator.getTonalEntity

splitEntityTone(self, entity)

source code 

Splits the entity into an entity without tone mark (plain entity) and the entity's tone.

The default implementation will raise a NotImplementedError.

Parameters:
  • entity - entity with tonal information
Returns: tuple
plain entity without tone mark and entity's tone
Raises:
Overrides: TonalFixedEntityOperator.splitEntityTone
(inherited documentation)

getRhotacisedTonalEntity(self, plainEntity, tone)

source code 

Gets the r-coloured entity (Erlhuah form) with tone mark for the given plain entity and tone. Not all entity-tone combinations are supported.

Parameters:
  • plainEntity (str) - entity without tonal information
  • tone (str) - tone
Returns: str
entity with appropriate tone
Raises:

To Do (Fix): Build lookup for performance reasons.

_getAbbreviatedLookup(self)

source code 

Gets the abbreviated form lookup table.

Returns: dict
lookup table of abbreviated forms

getAbbreviatedEntities(self)

source code 

Gets a list of abbreviated GR spellings.

Returns: list
list of abbreviated GR forms

isAbbreviatedEntity(self, entity)

source code 

Returns true if the given entity is an abbreviated spelling.

Reading entities will be handled as being case insensitive.

Parameters:
  • entity (str) - entity to check
Returns: bool
True if entity is an abbreviated form.

convertAbbreviatedEntity(self, entity)

source code 

Converts the given abbreviated GR spelling to the original form. Non-abbreviated forms will returned unchanged. Takes care of capitalisation.

Parameters:
  • entity (str) - reading entity.
Returns: str
original entity
Raises:

To Do (Fix): Move this method to the Converter, AmbiguousConversionError not needed for import here then

getPlainReadingEntities(self)

source code 

Gets the list of plain entities supported by this reading without r-coloured forms (Erlhuah forms). Different to getReadingEntities() the entities will carry no tone mark.

Returns: set of str
set of supported syllables
Overrides: TonalFixedEntityOperator.getPlainReadingEntities

getFullReadingEntities(self)

source code 

Gets a set of full entities supported by the reading excluding abbreviated forms.

Returns: set of str
set of supported syllables

getReadingEntities(self)

source code 

Gets a set of all entities supported by the reading.

The list is used in the segmentation process to find entity boundaries.

Returns: list of str
list of supported syllables
Overrides: TonalFixedEntityOperator.getReadingEntities
(inherited documentation)

isReadingEntity(self, entity)

source code 

Returns true if the given entity is recognised by the romanisation operator, i.e. it is a valid entity of the reading returned by the segmentation method.

Reading entities will be handled as being case insensitive.

Parameters:
  • entity - entity to check
Returns: bool
True if string is an entity of the reading, False otherwise.
Overrides: ReadingOperator.isReadingEntity
(inherited documentation)

Class Variable Details [hide private]

TONES

Value:
['1stTone',
 '2ndTone',
 '3rdTone',
 '4thTone',
 '5thToneEtymological1st',
 '5thToneEtymological2nd',
 '5thToneEtymological3rd',
 '5thToneEtymological4th',
...

SYLLABLE_STRUCTURE

Regular expression describing the syllable structure in GR (C,V,C).

Value:
re.compile(r'^((?:tz|ts|ch|sh|[bpmfdtnlsjrgkh])?)([aeiouy]+)((?:ngl|ng\
|n|l)?)$')

DB_RHOTACISED_FINAL_MAPPING

Database fields for tonal Erlhuah syllables.

Value:
{1: 'GRFinal_T1', 2: 'GRFinal_T2', 3: 'GRFinal_T3', 4: 'GRFinal_T4'}

DB_RHOTACISED_FINAL_MAPPING_ZEROINITIAL

Database fields for tonal Erlhuah syllables with i, u and iu medials.

Value:
{1: 'GRFinal_T1',
 2: 'GRFinal_T2',
 3: 'GRFinal_T3_ZEROINITIAL',
 4: 'GRFinal_T4_ZEROINITIAL'}