Class PinyinIPAConverter
source code
Provides a converter between the Mandarin Chinese romanisation
Hanyu Pinyin and the International Phonetic Alphabet
(IPA) for Standard Mandarin. This converter provides only basic
support for tones and the user needs to specify additional means when
handling tone sandhi occurrences.
The standard conversion table is based on the source mentioned below.
Though depiction in IPA depends on many factors and therefore might
highly vary it seems this source is not error-free: final -üan
written [yan] should be similar to -ian [iɛn] and -iong
written [yŋ] should be similar to -ong [uŋ].
As IPA allows for a big range of different representations for the
sounds in a varying degree no conversion to Pinyin is offered.
Currently conversion of Erhua sound is not supported.
Features:
-
Default tone sandhi handling for lower third tone and neutral tone,
-
extensibility of tone sandhi handling,
-
extensibility for general coarticulation effects.
Limitations:
-
Tone sandhi needs special treatment depending on the user's needs,
-
transcription of onomatopoeic words will be limited to the general
syllable scheme,
-
limited linking between syllables (e.g. for 啊、呕) will not be
considered and
-
stress, intonation and accented speech are not covered.
Tone sandhi
Speech in tonal languages is generally subject to tone sandhi. For
example in Mandarin bu4 cuo4 for 不错 will render to bu2
cuo4, or lao3shi1 (老师) with a tone contour of 214 for
lao3 and 55 for shi1 will render to a contour 21 for
lao3.
When translating to IPA the system has to deal with these tone
sandhis and therefore provides an option 'sandhiFunction'
that can be set to the user specified handler. PinyinIPAConverter will
only provide a very basic handler lowThirdAndNeutralToneRule() which will apply the
contour 21 for the third tone when several syllables occur and needs
the user to supply proper tone information, e.g. ke2yi3 (可以)
instead of the normal rendering as ke3yi3 to indicate the tone
sandhi for the first syllable.
Further support will be provided for varying stress on syllables in
the neutral tone. Following a first tone the weak syllable will have a
half-low pitch, following a second tone a middle, following a third
tone a half-high and following a forth tone a low pitch.
There a further occurrences of tone sandhis:
-
pronunciations of 一 and 不 vary in different tones depending on
their context,
-
directional complements like 拿出来 ná chu lai under some
circumstances loose their tone,
-
in a three syllable group ABC the second syllable B changes from
second tone to first tone when A is in the first or second tone and
C is not in the neutral tone.
Coarticulation
In most cases conversion from Pinyin to IPA is straightforward if
one does not take tone sandhi into account. There are case though (when
leaving aside tones), where phonetic realisation of a syllable depends
on its context. The converter allows for handling coarticulation
effects by adding a hook coarticulationFunction
to which a
user-implemented function can be given. An example implementation is
given with finalECoarticulation().
Source
-
Hànyǔ Pǔtōnghuà Yǔyīn Biànzhèng (汉语普通话语音辨正). Page 15, Běijīng Yǔyán
Dàxué Chūbǎnshè (北京语言大学出版社), 2003, ISBN 7-5619-0622-6.
-
San Duanmu: The Phonology of Standard Chinese. Second edition,
Oxford University Press, 2007, ISBN 978-0-19-921578-2, ISBN
978-0-19-921579-9.
-
Yuen Ren Chao: A Grammar of Spoken Chinese. University of
California Press, Berkeley, 1968, ISBN 0-520-00219-9.
See Also:
To Do (Lang):
Support for Erhua in mapping.
To Do (Impl):
Two different methods for tone sandhi and coarticulation effects?
|
|
list of str
|
convertEntities(self,
readingEntities,
fromReading=' Pinyin ' ,
toReading=' MandarinIPA ' )
Converts a list of entities in the source reading to the given target
reading. |
source code
|
|
str
|
|
list
|
lowThirdAndNeutralToneRule(self,
entityTuples)
Converts '3rdToneRegular' to '3rdToneLow'
for syllables followed by others and '5thTone' to the
respective forms when following another syllable. |
source code
|
|
str
|
finalECoarticulation(self,
leftContext,
plainSyllable,
tone,
rightContext)
Example function for handling coarticulation of final e for
the neutral tone. |
source code
|
|
Inherited from ReadingConverter :
convert ,
getOption
Inherited from object :
__delattr__ ,
__getattribute__ ,
__hash__ ,
__new__ ,
__reduce__ ,
__reduce_ex__ ,
__repr__ ,
__setattr__ ,
__str__
|
|
CONVERSION_DIRECTIONS = [ ( ' Pinyin ' , ' MandarinIPA ' ) ]
List of tuples for specifying supported conversion directions from
reading A to reading B.
|
|
PINYIN_OPTIONS = { ' Erhua ' : ' ignore ' , ' case ' : ' lower ' , ' missing ...
Options for the PinyinOperator.
|
|
TONEMARK_MAPPING = { 1: ' 1stTone ' , 2: ' 2ndTone ' , 3: ' 3rdToneReg ...
|
|
NEUTRAL_TONE_MAPPING = { ' 1stTone ' : ' 5thToneHalfLow ' , ' 2ndTone ' ...
Mapping of neutral tone following another tone.
|
Inherited from object :
__class__
|
__init__(self,
*args,
**options)
(Constructor)
| source code
|
Creates an instance of the PinyinIPAConverter.
- Parameters:
args - optional list of RomanisationOperators to use for handling source
and target readings.
options - extra options
dbConnectInst - instance of a DatabaseConnector, if none is given, default
settings will be assumed.
sourceOperators - list of ReadingOperators used for handling source
readings.
targetOperators - list of ReadingOperators used for handling target
readings.
sandhiFunction - a function that handles tonal changes and converts a given list
of entities to accommodate sandhi occurrences, see lowThirdAndNeutralToneRule() for the default
implementation.
coarticulationFunction - a function that handles coarticulation effects, see finalECoarticulation() for an example
implementation.
- Overrides:
object.__init__
|
Returns the reading converter's default options.
The keyword 'dbConnectInst' is not regarded a configuration option of
the converter and is thus not included in the dict returned.
- Returns: dict
- the reading converter's default options.
- Overrides:
ReadingConverter.getDefaultOptions
- (inherited documentation)
|
convertEntities(self,
readingEntities,
fromReading=' Pinyin ' ,
toReading=' MandarinIPA ' )
| source code
|
Converts a list of entities in the source reading to the given target
reading.
The default implementation will raise a NotImplementedError.
- Parameters:
readingEntities - list of entities written in source reading
fromReading - name of the source reading
toReading - name of the target reading
- Returns: list of str
- list of entities written in target reading
- Raises:
ConversionError - on operations specific to the conversion between the two readings
(e.g. error on converting entities).
UnsupportedError - if source or target reading is not supported for conversion.
InvalidEntityError - if an invalid entity is given.
- Overrides:
ReadingConverter.convertEntities
- (inherited documentation)
|
_convertSyllable(self,
plainSyllable,
tone)
| source code
|
Converts a single syllable from Pinyin to IPA.
- Parameters:
plainSyllable (str) - plain syllable in the source reading
tone (int) - the syllable's tone
- Returns: str
- IPA representation
|
lowThirdAndNeutralToneRule(self,
entityTuples)
| source code
|
Converts '3rdToneRegular' to '3rdToneLow'
for syllables followed by others and '5thTone' to the
respective forms when following another syllable.
This function serves as the default rule and can be overwritten by
giving a function as option sandhiFunction on
instantiation.
- Parameters:
entityTuples (list of tuple/str) - a list of tuples and strings. An IPA entity is given as a tuple
with the plain syllable and its tone, other content is given as
plain string.
- Returns: list
- converted entity list
To Do (Lang):
What to do on several following neutral tones?
|
finalECoarticulation(self,
leftContext,
plainSyllable,
tone,
rightContext)
| source code
|
Example function for handling coarticulation of final e for the
neutral tone.
Only syllables with final e are considered for other syllables
None is returned. This will trigger the regular conversion
method.
Pronunciation of final e
The final e found in syllables de, me and
others is pronounced /ɤ/ in the general case (see source below) but if
tonal stress is missing it will be pronounced /ə/. This implementation
will take care of this for the fifth tone. If no tone is specified
('None' ) an ConversionError will be raised for the syllables
affected.
Source: Hànyǔ Pǔtōnghuà Yǔyīn Biànzhèng (汉语普通话语音辨正). Page 15,
Běijīng Yǔyán Dàxué Chūbǎnshè (北京语言大学出版社), 2003, ISBN
7-5619-0622-6.
- Parameters:
leftContext (list of tuple/str) - syllables preceding the syllable in question in the source
reading
plainSyllable (str) - plain syllable in the source reading
tone (int) - the syllable's tone
rightContext (list of tuple/str) - syllables following the syllable in question in the source
reading
- Returns: str
- IPA representation
|
PINYIN_OPTIONS
Options for the PinyinOperator.
- Value:
{ ' Erhua ' : ' ignore ' ,
' case ' : ' lower ' ,
' missingToneMark ' : ' noinfo ' ,
' toneMarkType ' : ' Numbers ' }
|
|
TONEMARK_MAPPING
- Value:
{ 1: ' 1stTone ' ,
2: ' 2ndTone ' ,
3: ' 3rdToneRegular ' ,
4: ' 4thTone ' ,
5: ' 5thTone ' }
|
|
NEUTRAL_TONE_MAPPING
Mapping of neutral tone following another tone.
- Value:
{ ' 1stTone ' : ' 5thToneHalfLow ' ,
' 2ndTone ' : ' 5thToneMiddle ' ,
' 3rdToneLow ' : ' 5thToneHalfHigh ' ,
' 3rdToneRegular ' : ' 5thToneHalfHigh ' ,
' 4thTone ' : ' 5thToneLow ' ,
' 5thTone ' : ' 5thTone ' ,
' 5thToneHalfHigh ' : ' 5thToneHalfHigh ' ,
' 5thToneHalfLow ' : ' 5thToneHalfLow ' ,
...
|
|