Class GRPinyinConverter
source code
Provides a converter between the Chinese romanisation Gwoyeu
Romatzyh and Hanyu Pinyin.
Features:
-
configurable mapping of options neutral tone when converting from GR,
-
conversion of abbreviated forms of GR.
Upper or lower case will be transfered between syllables, no special
formatting according to the standards (i.e. Pinyin) will be made. Upper/
lower case will be identified according to three classes: either the
whole syllable is upper case, only the initial letter is upper case or
otherwise the whole syllable is assumed being lower case.
Limitations
Conversion cannot in general be done in a one-to-one manner.
Gwoyeu Romatzyh (GR) gives the etymological tone for a syllable
in neutral tone while Pinyin doesn't. In contrast to tones in GR
carrying more information r-coloured syllables (Erlhuah)
are rendered the way they are pronounced that loosing the original
syllable. Converting those forms to Pinyin in a general manner is not
possible while yielding the original string in Chinese characters might
help do disambiguate. Another issue tone-wise is that Pinyin allows to
specify the changed tone when dealing with tone sandhis instead of the
etymological one while GR doesn't. Only working with the Chinese
character string might help to restore the original tone.
Conversion from Pinyin is crippled as the neutral tone in this form
cannot be transfered to GR as described above. More information is
needed to resolve this. For the other direction the neutral tone can be
mapped but the etymological tone information is lost. For the optional
neutral tone either a mapping is done to the neutral tone in Pinyin or
to the original (etymological).
|
CONVERSION_DIRECTIONS = [ ( ' GR ' , ' Pinyin ' ) , ( ' Pinyin ' , ' GR ' ) ]
List of tuples for specifying supported conversion directions from
reading A to reading B.
|
|
DEFAULT_READING_OPTIONS = { ' GR ' : { ' abbreviations ' : False} , ' Pi ...
Defines default reading options for the reading used to convert from
(to resp.) before (after resp.) converting to (from resp.) the user
specified dialect.
|
Inherited from object :
__class__
|
__init__(self,
*args,
**options)
(Constructor)
| source code
|
Creates an instance of the GRPinyinConverter.
- Parameters:
args - optional list of RomanisationOperators to use for handling source
and target readings.
options - extra options
dbConnectInst - instance of a DatabaseConnector, if none is given, default
settings will be assumed.
sourceOperators - list of ReadingOperators used for handling source
readings.
targetOperators - list of ReadingOperators used for handling target
readings.
GROptionalNeutralToneMapping - if set to 'original' GR syllables marked with an optional neutral
tone will be mapped to the etymological tone, if set to 'neutral'
they will be mapped to the neutral tone in Pinyin.
- Overrides:
object.__init__
|
Returns the reading converter's default options.
The keyword 'dbConnectInst' is not regarded a configuration option of
the converter and is thus not included in the dict returned.
- Returns: dict
- the reading converter's default options.
- Overrides:
ReadingConverter.getDefaultOptions
- (inherited documentation)
|
convertBasicEntity(self,
entity,
fromReading,
toReading)
| source code
|
Converts a basic entity (e.g. a syllable) in the source reading to the
given target reading.
This method is called by convertEntities() and a lower case entity is given for
conversion. The returned value should be in lower case characters too, as
convertEntities() will take care of capitalisation.
If a single entity needs to be converted it is recommended to use convertEntities() instead. In the general case it can
not be ensured that a mapping from one reading to another can be done by
the simple conversion of a basic entity. One-to-many mappings are
possible and there is no guarantee that any entity of a reading
recognised by operator.ReadingOperator.isReadingEntity() will be
mapped here.
The default implementation will raise a NotImplementedError.
- Parameters:
entity - string written in the source reading in lower case letters
fromReading - name of the source reading
toReading - name of the target reading
- Returns: str
- the entity converted to the
toReading in lower case
- Raises:
- Overrides:
EntityWiseReadingConverter.convertBasicEntity
- (inherited documentation)
|
DEFAULT_READING_OPTIONS
Defines default reading options for the reading used to convert from
(to resp.) before (after resp.) converting to (from resp.) the user
specified dialect.
The most general reading dialect should be specified as to allow for a
broad range of input.
- Value:
{ ' GR ' : { ' abbreviations ' : False} , ' Pinyin ' : { ' Erhua ' : ' oneSyllable ' } }
|
|