Package cjklib :: Package reading :: Module converter :: Class PinyinIPAConverter
[hide private]
[frames] | no frames]

Class PinyinIPAConverter

source code

Provides a converter between the Mandarin Chinese romanisation Hanyu Pinyin and the International Phonetic Alphabet (IPA) for Standard Mandarin. This converter provides only basic support for tones and the user needs to specify additional means when handling tone sandhi occurrences.

The standard conversion table is based on the source mentioned below. Though depiction in IPA depends on many factors and therefore might highly vary it seems this source is not error-free: final -üan written [yan] should be similar to -ian [iɛn] and -iong written [yŋ] should be similar to -ong [uŋ].

As IPA allows for a big range of different representations for the sounds in a varying degree no conversion to Pinyin is offered.

Currently conversion of Erhua sound is not supported.



Tone sandhi

Speech in tonal languages is generally subject to tone sandhi. For example in Mandarin bu4 cuo4 for 不错 will render to bu2 cuo4, or lao3shi1 (老师) with a tone contour of 214 for lao3 and 55 for shi1 will render to a contour 21 for lao3.

When translating to IPA the system has to deal with these tone sandhis and therefore provides an option 'sandhiFunction' that can be set to the user specified handler. PinyinIPAConverter will only provide a very basic handler lowThirdAndNeutralToneRule() which will apply the contour 21 for the third tone when several syllables occur and needs the user to supply proper tone information, e.g. ke2yi3 (可以) instead of the normal rendering as ke3yi3 to indicate the tone sandhi for the first syllable.

Further support will be provided for varying stress on syllables in the neutral tone. Following a first tone the weak syllable will have a half-low pitch, following a second tone a middle, following a third tone a half-high and following a forth tone a low pitch.

There a further occurrences of tone sandhis:


In most cases conversion from Pinyin to IPA is straightforward if one does not take tone sandhi into account. There are case though (when leaving aside tones), where phonetic realisation of a syllable depends on its context. The converter allows for handling coarticulation effects by adding a hook coarticulationFunction to which a user-implemented function can be given. An example implementation is given with finalECoarticulation().


See Also:

To Do (Lang): Support for Erhua in mapping.

To Do (Impl): Two different methods for tone sandhi and coarticulation effects?

Instance Methods [hide private]
__init__(self, *args, **options)
Creates an instance of the PinyinIPAConverter.
source code
list of str
convertEntities(self, readingEntities, fromReading='Pinyin', toReading='MandarinIPA')
Converts a list of entities in the source reading to the given target reading.
source code
_convertSyllable(self, plainSyllable, tone)
Converts a single syllable from Pinyin to IPA.
source code
lowThirdAndNeutralToneRule(self, entityTuples)
Converts '3rdToneRegular' to '3rdToneLow' for syllables followed by others and '5thTone' to the respective forms when following another syllable.
source code
finalECoarticulation(self, leftContext, plainSyllable, tone, rightContext)
Example function for handling coarticulation of final e for the neutral tone.
source code

Inherited from ReadingConverter: convert, getOption

Inherited from ReadingConverter (private): _getFromOperator, _getToOperator

Inherited from object: __delattr__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __repr__, __setattr__, __str__

Class Methods [hide private]
Returns the reading converter's default options.
source code
Class Variables [hide private]
  CONVERSION_DIRECTIONS = [('Pinyin', 'MandarinIPA')]
List of tuples for specifying supported conversion directions from reading A to reading B.
  PINYIN_OPTIONS = {'Erhua': 'ignore', 'case': 'lower', 'missing...
Options for the PinyinOperator.
  TONEMARK_MAPPING = {1: '1stTone', 2: '2ndTone', 3: '3rdToneReg...
  NEUTRAL_TONE_MAPPING = {'1stTone': '5thToneHalfLow', '2ndTone'...
Mapping of neutral tone following another tone.
Properties [hide private]

Inherited from object: __class__

Method Details [hide private]

__init__(self, *args, **options)

source code 

Creates an instance of the PinyinIPAConverter.

  • args - optional list of RomanisationOperators to use for handling source and target readings.
  • options - extra options
  • dbConnectInst - instance of a DatabaseConnector, if none is given, default settings will be assumed.
  • sourceOperators - list of ReadingOperators used for handling source readings.
  • targetOperators - list of ReadingOperators used for handling target readings.
  • sandhiFunction - a function that handles tonal changes and converts a given list of entities to accommodate sandhi occurrences, see lowThirdAndNeutralToneRule() for the default implementation.
  • coarticulationFunction - a function that handles coarticulation effects, see finalECoarticulation() for an example implementation.
Overrides: object.__init__

Class Method

source code 

Returns the reading converter's default options.

The keyword 'dbConnectInst' is not regarded a configuration option of the converter and is thus not included in the dict returned.

Returns: dict
the reading converter's default options.
Overrides: ReadingConverter.getDefaultOptions
(inherited documentation)

convertEntities(self, readingEntities, fromReading='Pinyin', toReading='MandarinIPA')

source code 

Converts a list of entities in the source reading to the given target reading.

The default implementation will raise a NotImplementedError.

  • readingEntities - list of entities written in source reading
  • fromReading - name of the source reading
  • toReading - name of the target reading
Returns: list of str
list of entities written in target reading
  • ConversionError - on operations specific to the conversion between the two readings (e.g. error on converting entities).
  • UnsupportedError - if source or target reading is not supported for conversion.
  • InvalidEntityError - if an invalid entity is given.
Overrides: ReadingConverter.convertEntities
(inherited documentation)

_convertSyllable(self, plainSyllable, tone)

source code 

Converts a single syllable from Pinyin to IPA.

  • plainSyllable (str) - plain syllable in the source reading
  • tone (int) - the syllable's tone
Returns: str
IPA representation

lowThirdAndNeutralToneRule(self, entityTuples)

source code 

Converts '3rdToneRegular' to '3rdToneLow' for syllables followed by others and '5thTone' to the respective forms when following another syllable.

This function serves as the default rule and can be overwritten by giving a function as option sandhiFunction on instantiation.

  • entityTuples (list of tuple/str) - a list of tuples and strings. An IPA entity is given as a tuple with the plain syllable and its tone, other content is given as plain string.
Returns: list
converted entity list

To Do (Lang): What to do on several following neutral tones?

finalECoarticulation(self, leftContext, plainSyllable, tone, rightContext)

source code 

Example function for handling coarticulation of final e for the neutral tone.

Only syllables with final e are considered for other syllables None is returned. This will trigger the regular conversion method.

Pronunciation of final e

The final e found in syllables de, me and others is pronounced /ɤ/ in the general case (see source below) but if tonal stress is missing it will be pronounced /ə/. This implementation will take care of this for the fifth tone. If no tone is specified ('None') an ConversionError will be raised for the syllables affected.

Source: Hànyǔ Pǔtōnghuà Yǔyīn Biànzhèng (汉语普通话语音辨正). Page 15, Běijīng Yǔyán Dàxué Chūbǎnshè (北京语言大学出版社), 2003, ISBN 7-5619-0622-6.

  • leftContext (list of tuple/str) - syllables preceding the syllable in question in the source reading
  • plainSyllable (str) - plain syllable in the source reading
  • tone (int) - the syllable's tone
  • rightContext (list of tuple/str) - syllables following the syllable in question in the source reading
Returns: str
IPA representation

Class Variable Details [hide private]


Options for the PinyinOperator.

{'Erhua': 'ignore',
 'case': 'lower',
 'missingToneMark': 'noinfo',
 'toneMarkType': 'Numbers'}


{1: '1stTone',
 2: '2ndTone',
 3: '3rdToneRegular',
 4: '4thTone',
 5: '5thTone'}


Mapping of neutral tone following another tone.

{'1stTone': '5thToneHalfLow',
 '2ndTone': '5thToneMiddle',
 '3rdToneLow': '5thToneHalfHigh',
 '3rdToneRegular': '5thToneHalfHigh',
 '4thTone': '5thToneLow',
 '5thTone': '5thTone',
 '5thToneHalfHigh': '5thToneHalfHigh',
 '5thToneHalfLow': '5thToneHalfLow',