1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18 u"""
19 Provides the Chinese character reading based functions.
20 This includes L{ReadingOperator}s used to handle basic operations like
21 decomposing strings written in a reading into their basic entities (e.g.
22 syllables) and for some languages getting tonal information, syllable onset and
23 rhyme and other features. Furthermore it includes L{ReadingConverter}s which
24 offer the conversion of strings from one reading to another.
25
26 All basic functionality can be accessed using the L{ReadingFactory} which
27 provides factory methods for creating instances of the supplied classes and also
28 acts as a façade for the functions defined there.
29
30 Examples
31 ========
32 The following examples should give a quick view into how to use this
33 package.
34 - Create the ReadingFactory object with default settings
35 (read from cjklib.conf or using cjklib.db in same directory as default):
36
37 >>> from cjklib.reading import ReadingFactory
38 >>> f = ReadingFactory()
39
40 - Create an operator for Mandarin romanisation Pinyin:
41
42 >>> pinyinOp = f.createReadingOperator('Pinyin')
43
44 - Construct a Pinyin syllable with second tone:
45
46 >>> pinyinOp.getTonalEntity(u'han', 2)
47 u'hán'
48
49 - Segments the given Pinyin string into a list of syllables:
50
51 >>> pinyinOp.decompose(u"tiān'ānmén")
52 [u'ti\u0101n', u"'", u'\u0101n', u'm\xe9n']
53
54 - Do the same using the factory class as a façade to easily access
55 functions provided by those classes in the background:
56
57 >>> f.decompose(u"tiān'ānmén", 'Pinyin')
58 [u'ti\u0101n', u"'", u'\u0101n', u'm\xe9n']
59
60 - Convert the given Gwoyeu Romatzyh syllables to their pronunciation in IPA:
61
62 >>> f.convert('liow shu', 'GR', 'MandarinIPA')
63 u'li\u0259u\u02e5\u02e9 \u0282u\u02e5\u02e5'
64
65 Readings
66 ========
67 Han-characters give only few visual hints about how they are pronounced. The big
68 number of homophones further increases the problem of deriving the character's
69 actual pronunciation from the given glyph. This module implements a framework
70 and desirable functionality to deal with the characteristics of
71 X{character reading}s.
72
73 From a programmatical view point readings in languages making use of Chinese
74 characters differ in many ways. Some use the Roman alphabet, some have tonal
75 information, some can be mapped character-wise, some map from one Chinese
76 character to a sequence of characters in the target system while some map only
77 to one character.
78
79 One mayor group in the topic of readings are X{romanisations}, which are
80 transcriptions into the Roman alphabet (Cyrillic respectively). Romanisations
81 of tonal languages are a subgroup that ask for even more detailed functions. The
82 interface implemented here tries to grasp similar factors on different
83 abstraction levels while trying to maintain flexibility.
84
85 In the context of this library the term I{reading} will refer to two things: the
86 realisation of expressing the pronunciation (e.g. the specific romanisation) on
87 the one hand, and the specific reading of a given character on the other hand.
88
89 Technical Implementation
90 ========================
91 While module L{characterlookup} includes the functions for mapping a character
92 to its potential reading, module C{reading} is specialised on all functionality
93 that is primarily connected to the reading of characters.
94
95 The main functions implemented here provide ways of handling text written in a
96 reading and converting between different readings.
97
98 Handling text written in a reading
99 ----------------------------------
100 Text written in a I{character reading} is special to other text, as it consists
101 of entities which map to corresponding Chinese characters. They can be deduced
102 from the text through breaking the whole string down into a sequence of single
103 entities. This functionality is provided by all operators on readings by
104 providing the interface L{ReadingOperator}. The process of breaking input down
105 (called decomposition) can be reversed by composing the single entities to a
106 string.
107
108 Many L{ReadingOperator}s provide additional functions, each depending on the
109 characteristics of the implemented reading. For readings of tonal languages for
110 example they might allow to question the tone of the given reading of a
111 character.
112
113 G{classtree operator.ReadingOperator}
114
115 Converting between readings
116 ---------------------------
117 The second part provided are means to provide support for conversion between
118 different readings.
119
120 What all CJK languages seem to have in common is their irreversibility of the
121 mapping from a character to its reading, as these languages are rich in
122 homophones. Thus the highest degree in information for a text is obtained by the
123 pair of characters and their reading (aside from the meaning).
124
125 If one has a text written in reading A and one wants to obtain the text written
126 in B instead then it is not feasible to obtain the reading from the
127 corresponding characters even if present, as many characters have several
128 pronunciations. Instead one wants to convert the reading through conversion from
129 A to B.
130
131 Simple means to convert between readings is provided by classes implementing
132 L{ReadingConverter}. This conversion might neither be surjective nor injective,
133 and several L{exception}s can occur.
134
135 G{classtree converter.ReadingConverter}
136
137 Configurable X{Reading Dialect}s
138 --------------------------------
139 Many readings come in specific representations even if standardised. This may
140 start with simple difference in type setting (e.g. punctuation) or include
141 special entities and derivatives.
142
143 Instead of selecting one default form as a global standard cjklib lets the user
144 choose the preferred dialect, though still trying to offer good default values.
145 It does so by offering a wide range of options for handling and conversion of
146 readings. These options can be given optionally in many places and are handed
147 down by the system to the component knowing about this specific configuration
148 option. Furthermore each class implements a method that states which options it
149 uses by default.
150
151 A special notion of X{dialect converters} is used for L{ReadingConverter}s that
152 convert between two different representations of the same reading. These allow
153 flexible switching between reading dialects.
154 @todo Fix: Be independant on locale chosen, see
155 U{http://docs.python.org/library/locale.html#background-details-hints-tips-and-caveats}.
156 """
157
158 __all__ = ['operator', 'converter', 'ReadingFactory']
159
160 from cjklib.exception import UnsupportedError
161 from cjklib.dbconnector import DatabaseConnector
162 import operator
163 import converter
166 """
167 Provides an abstract factory for creating L{ReadingOperator}s and
168 L{ReadingConverter}s and a façade to directly access the methods offered by
169 these classes.
170
171 Instances of other classes are cached in the background and reused on later
172 calls for methods accessed through the façade.
173 L{createReadingOperator()} and L{createReadingConverter} can be used to
174 create new instances for use outside of the ReadingFactory.
175 @todo Impl: What about hiding of inner classes?
176 L{_checkSpecialOperators()} method is called for internal converters and
177 for external ones delivered by L{createReadingConverter()}. Latter
178 method doesn't return internal cached copies though, but creates new
179 instances. L{ReadingOperator} also gets copies from ReadingFactory
180 objects for internal instances. Sharing saves memory but changing one
181 object will affect all other objects using this instance.
182 @todo Impl: General reading options given for a converter with **options
183 need to be used on creating a operator. How to raise errors to save user
184 of specifying an operator twice, one per options, one per concrete
185 instance (similar to sourceOptions and targetOptions)?
186 @todo Bug: Non standard reading options seem to be accepted when default in
187 converter:
188
189 >>> print f.convert('lao3shi1', 'Pinyin', 'MandarinIPA')
190 lau˨˩.ʂʅ˥˥
191 """
192 READING_OPERATORS = [operator.HangulOperator, operator.PinyinOperator,
193 operator.WadeGilesOperator, operator.GROperator,
194 operator.MandarinIPAOperator, operator.MandarinBrailleOperator,
195 operator.JyutpingOperator, operator.CantoneseYaleOperator,
196 operator.CantoneseIPAOperator, operator.HiraganaOperator,
197 operator.KatakanaOperator, operator.KanaOperator]
198 """A list of supported reading operators."""
199 READING_CONVERTERS = [converter.PinyinDialectConverter,
200 converter.WadeGilesDialectConverter, converter.PinyinWadeGilesConverter,
201 converter.GRDialectConverter, converter.GRPinyinConverter,
202 converter.PinyinIPAConverter, converter.PinyinBrailleConverter,
203 converter.JyutpingDialectConverter,
204 converter.CantoneseYaleDialectConverter,
205 converter.JyutpingYaleConverter, converter.BridgeConverter]
206 """A list of supported reading converters. """
207
208 sharedState = {'readingOperatorClasses': {}, 'readingConverterClasses': {}}
209 """
210 Dictionary holding global state information used by all instances of the
211 ReadingFactory.
212 """
213
215 """
216 Defines a simple converter between two I{character reading}s that keeps
217 the real converter doing the work in the background.
218
219 The basic method is L{convert()} which converts one input string from
220 one reading to another. In contrast to a L{ReadingConverter} no source
221 or target reading needs to be specified.
222 """
223 - def __init__(self, converterInst, fromReading, toReading):
224 """
225 Creates an instance of the SimpleReadingConverterAdaptor.
226
227 @type converterInst: instance
228 @param converterInst: L{ReadingConverter} instance doing the actual
229 conversion work.
230 @type fromReading: str
231 @param fromReading: name of reading converted from
232 @type toReading: str
233 @param toReading: name of reading converted to
234 """
235 self.converterInst = converterInst
236 self.fromReading = fromReading
237 self.toReading = toReading
238 self.CONVERSION_DIRECTIONS = [(fromReading, toReading)]
239
240 - def convert(self, string, fromReading=None, toReading=None):
241 """
242 Converts a string in the source reading to the target reading.
243
244 If parameters fromReading or toReading are not given the class's
245 default values will be applied.
246
247 @type string: str
248 @param string: string written in the source reading
249 @type fromReading: str
250 @param fromReading: name of the source reading
251 @type toReading: str
252 @param toReading: name of the target reading
253 @rtype: str
254 @returns: the input string converted to the C{toReading}
255 @raise DecompositionError: if the string can not be decomposed into
256 basic entities with regards to the source reading.
257 @raise ConversionError: on operations specific to the conversion
258 between the two readings (e.g. error on converting entities).
259 @raise UnsupportedError: if source or target reading not supported
260 for conversion.
261 """
262 if not fromReading:
263 fromReading = self.fromReading
264 if not toReading:
265 toReading = self.toReading
266 return self.converterInst.convert(string, fromReading, toReading)
267
268 - def convertEntities(self, readingEntities, fromReading=None,
269 toReading=None):
270 """
271 Converts a list of entities in the source reading to the target
272 reading.
273
274 If parameters fromReading or toReading are not given the class's
275 default values will be applied.
276
277 @type readingEntities: list of str
278 @param readingEntities: list of entities written in source reading
279 @type fromReading: str
280 @param fromReading: name of the source reading
281 @type toReading: str
282 @param toReading: name of the target reading
283 @rtype: list of str
284 @return: list of entities written in target reading
285 @raise ConversionError: on operations specific to the conversion
286 between the two readings (e.g. error on converting entities).
287 @raise UnsupportedError: if source or target reading is not
288 supported for conversion.
289 @raise InvalidEntityError: if an invalid entity is given.
290 """
291 if not fromReading:
292 fromReading = self.fromReading
293 if not toReading:
294 toReading = self.toReading
295 return self.converterInst.convertEntities(readingEntities,
296 fromReading, toReading)
297
299 return getattr(self.converterInst, name)
300
301 - def __init__(self, databaseUrl=None, dbConnectInst=None):
302 """
303 Initialises the ReadingFactory.
304
305 If no parameters are given default values are assumed for the connection
306 to the database. The database connection parameters can be given in
307 databaseUrl, or an instance of L{DatabaseConnector} can be passed in
308 dbConnectInst, the latter one being preferred if both are specified.
309
310 @type databaseUrl: str
311 @param databaseUrl: database connection setting in the format
312 C{driver://user:pass@host/database}.
313 @type dbConnectInst: instance
314 @param dbConnectInst: instance of a L{DatabaseConnector}
315 @bug: Specifying another database connector overwrites settings
316 of other instances.
317 """
318
319 self.__dict__ = self.sharedState
320
321 if dbConnectInst:
322 self.db = dbConnectInst
323 else:
324 self.db = DatabaseConnector.getDBConnector(databaseUrl)
325
326
327 if self.db not in self.sharedState:
328 self.sharedState[self.db] = {}
329 self.sharedState[self.db]['readingOperatorInstances'] = {}
330 self.sharedState[self.db]['readingConverterInstances'] = {}
331
332 for readingOperator in self.READING_OPERATORS:
333 self.publishReadingOperator(readingOperator)
334 for readingConverter in self.READING_CONVERTERS:
335 self.publishReadingConverter(readingConverter)
336
337
338
340 """
341 Publishes a L{ReadingOperator} to the list and thus makes it available
342 for other methods in the library.
343
344 @type readingOperator: classobj
345 @param readingOperator: a new L{ReadingOperator} to be published
346 """
347 self.sharedState['readingOperatorClasses']\
348 [readingOperator.READING_NAME] = readingOperator
349
351 """
352 Gets a list of all supported readings.
353
354 @rtype: list of str
355 @return: a list of readings a L{ReadingOperator} is available for
356 """
357 return self.sharedState['readingOperatorClasses'].keys()
358
360 """
361 Gets the L{ReadingOperator}'s class for the given reading.
362
363 @type readingN: str
364 @param readingN: name of a supported reading
365 @rtype: classobj
366 @return: a L{ReadingOperator} class
367 @raise UnsupportedError: if the given reading is not supported.
368 """
369 if readingN not in self.sharedState['readingOperatorClasses']:
370 raise UnsupportedError("reading '" + readingN + "' not supported")
371 return self.sharedState['readingOperatorClasses'][readingN]
372
374 """
375 Creates an instance of a L{ReadingOperator} for the given reading.
376
377 @type readingN: str
378 @param readingN: name of a supported reading
379 @param options: options for the created instance
380 @rtype: instance
381 @return: a L{ReadingOperator} instance
382 @raise UnsupportedError: if the given reading is not supported.
383 """
384 operatorClass = self.getReadingOperatorClass(readingN)
385 return operatorClass(dbConnectInst=self.db, **options)
386
388 """
389 Publishes a L{ReadingConverter} to the list and thus makes it available
390 for other methods in the library.
391
392 @type readingConverter: classobj
393 @param readingConverter: a new L{readingConverter} to be published
394 """
395 for fromReading, toReading in readingConverter.CONVERSION_DIRECTIONS:
396 self.sharedState['readingConverterClasses']\
397 [(fromReading, toReading)] = readingConverter
398
400 """
401 Gets the L{ReadingConverter}'s class for the given source and target
402 reading.
403
404 @type fromReading: str
405 @param fromReading: name of the source reading
406 @type toReading: str
407 @param toReading: name of the target reading
408 @rtype: classobj
409 @return: a L{ReadingConverter} class
410 @raise UnsupportedError: if conversion for the given readings is not
411 supported.
412 """
413 if not self.isReadingConversionSupported(fromReading, toReading):
414 raise UnsupportedError("conversion from '" + fromReading \
415 + "' to '" + toReading + "' not supported")
416 return self.sharedState['readingConverterClasses']\
417 [(fromReading, toReading)]
418
420 """
421 Creates an instance of a L{ReadingConverter} for the given source and
422 target reading and returns it wrapped as a
423 L{SimpleReadingConverterAdaptor}.
424
425 As L{ReadingConverter}s generally support more than one conversion
426 direction the user needs to specify which source and target reading is
427 needed on a regular instance. Wrapping the created instance in the
428 adaptor gives a simple convert() and convertEntities() routine, such
429 that on conversion the source and target readings don't have to be
430 specified. Other methods signatures remain unchanged.
431
432 @type fromReading: str
433 @param fromReading: name of the source reading
434 @type toReading: str
435 @param toReading: name of the target reading
436 @param args: optional list of L{RomanisationOperator}s to use for
437 handling source and target readings.
438 @param options: options for the created instance
439 @keyword hideComplexConverter: if true the L{ReadingConverter} is
440 wrapped as a L{SimpleReadingConverterAdaptor} (default).
441 @keyword sourceOperators: list of L{ReadingOperator}s used for handling
442 source readings.
443 @keyword targetOperators: list of L{ReadingOperator}s used for handling
444 target readings.
445 @keyword sourceOptions: dictionary of options to configure the
446 L{ReadingOperator}s used for handling source readings. If an
447 operator for the source reading is explicitly specified, no options
448 can be given.
449 @keyword targetOptions: dictionary of options to configure the
450 L{ReadingOperator}s used for handling target readings. If an
451 operator for the target reading is explicitly specified, no options
452 can be given.
453 @rtype: instance
454 @return: a L{SimpleReadingConverterAdaptor} or L{ReadingConverter}
455 instance
456 @raise UnsupportedError: if conversion for the given readings is not
457 supported.
458 """
459 converterClass = self.getReadingConverterClass(fromReading, toReading)
460
461 self._checkSpecialOperators(fromReading, toReading, args, options)
462
463 converterInst = converterClass(dbConnectInst=self.db, *args, **options)
464 if 'hideComplexConverter' not in options \
465 or options['hideComplexConverter']:
466 return ReadingFactory.SimpleReadingConverterAdaptor(
467 converterInst=converterInst, fromReading=fromReading,
468 toReading=toReading)
469 else:
470 return converterInst
471
473 """
474 Checks if the conversion from reading A to reading B is supported.
475
476 @rtype: bool
477 @return: true if conversion is supported, false otherwise
478 """
479 return (fromReading, toReading) \
480 in self.sharedState['readingConverterClasses']
481
483 """
484 Returns the default options for the L{ReadingOperator} or
485 L{ReadingConverter} applied for the given reading name or names
486 respectively.
487
488 The keyword 'dbConnectInst' is not regarded a configuration option and
489 is thus not included in the dict returned.
490
491 @raise ValueError: if more than one or two reading names are given.
492 @raise UnsupportedError: if no ReadingOperator or ReadingConverter
493 exists for the given reading or readings respectively.
494 """
495 if len(args) == 1:
496 return self.getReadingOperatorClass(args[0]).getDefaultOptions()
497 elif len(args) == 2:
498 return self.getReadingConverterClass(args[0], args[1])\
499 .getDefaultOptions()
500 else:
501 raise ValueError("Wrong number of arguments")
502
504 """
505 Returns an instance of a L{ReadingOperator} for the given reading from
506 the internal cache and creates it if it doesn't exist yet.
507
508 @type readingN: str
509 @param readingN: name of a supported reading
510 @param options: additional options for instance
511 @rtype: instance
512 @return: a L{ReadingOperator} instance
513 @raise UnsupportedError: if the given reading is not supported.
514 @todo Impl: Get all options when calculating key for an instance and use
515 the information on standard parameters thus minimising instances in
516 cache. Same for L{_getReadingConverterInstance()}.
517 """
518
519 cacheKey = (readingN, self._getHashableCopy(options))
520
521 instanceCache = self.sharedState[self.db]['readingOperatorInstances']
522 if cacheKey not in instanceCache:
523 operator = self.createReadingOperator(readingN, **options)
524 instanceCache[cacheKey] = operator
525 return instanceCache[cacheKey]
526
529 """
530 Returns an instance of a L{ReadingConverter} for the given source and
531 target reading from the internal cache and creates it if it doesn't
532 exist yet.
533
534 @type fromReading: str
535 @param fromReading: name of the source reading
536 @type toReading: str
537 @param toReading: name of the target reading
538 @param args: optional list of L{RomanisationOperator}s to use for
539 handling source and target readings.
540 @param options: additional options for instance
541 @keyword sourceOperators: list of L{ReadingOperator}s used for handling
542 source readings.
543 @keyword targetOperators: list of L{ReadingOperator}s used for handling
544 target readings.
545 @keyword sourceOptions: dictionary of options to configure the
546 L{ReadingOperator}s used for handling source readings. If an
547 operator for the source reading is explicitly specified, no options
548 can be given.
549 @keyword targetOptions: dictionary of options to configure the
550 L{ReadingOperator}s used for handling target readings. If an
551 operator for the target reading is explicitly specified, no options
552 can be given.
553 @rtype: instance
554 @return: an L{ReadingConverter} instance
555 @raise UnsupportedError: if conversion for the given readings are not
556 supported.
557 @todo Fix : Reusing of instances for other supported conversion
558 directions isn't that efficient if a special ReadingOperator is
559 specified for one direction, that doesn't affect others.
560 """
561 self._checkSpecialOperators(fromReading, toReading, args, options)
562
563
564 cacheKey = ((fromReading, toReading), self._getHashableCopy(options))
565
566 instanceCache = self.sharedState[self.db]['readingConverterInstances']
567 if cacheKey not in instanceCache:
568 conv = self.createReadingConverter(fromReading, toReading,
569 hideComplexConverter=False, *args, **options)
570
571 for convFromReading, convToReading in conv.CONVERSION_DIRECTIONS:
572 oCacheKey = ((convFromReading, convToReading),
573 self._getHashableCopy(options))
574 if oCacheKey not in instanceCache:
575 instanceCache[oCacheKey] = conv
576 return instanceCache[cacheKey]
577
579 """
580 Checks for special operators requested for the given source and target
581 reading.
582
583 @type fromReading: str
584 @param fromReading: name of the source reading
585 @type toReading: str
586 @param toReading: name of the target reading
587 @param args: optional list of L{RomanisationOperator}s to use for
588 handling source and target readings.
589 @param options: additional options for handling the input
590 @keyword sourceOperators: list of L{ReadingOperator}s used for handling
591 source readings.
592 @keyword targetOperators: list of L{ReadingOperator}s used for handling
593 target readings.
594 @keyword sourceOptions: dictionary of options to configure the
595 L{ReadingOperator}s used for handling source readings. If an
596 operator for the source reading is explicitly specified, no options
597 can be given.
598 @keyword targetOptions: dictionary of options to configure the
599 L{ReadingOperator}s used for handling target readings. If an
600 operator for the target reading is explicitly specified, no options
601 can be given.
602 @raise ValueError: if options are given to create a specific
603 ReadingOperator, but an instance is already given in C{args}.
604 @raise UnsupportedError: if source or target reading is not supported.
605 """
606
607 for arg in args:
608 if isinstance(arg, ReadingOperator):
609 if arg.READING_NAME == fromReading \
610 and 'sourceOptions' in options:
611 raise ValueError(
612 "source reading operator options given, " \
613 + "but a source reading operator already exists")
614 if arg.READING_NAME == toReading \
615 and 'targetOptions' in options:
616 raise ValueError(
617 "target reading operator options given, " \
618 + "but a target reading operator already exists")
619
620 if 'sourceOptions' in options:
621 readingOp = self._getReadingOperatorInstance(fromReading,
622 **options['sourceOptions'])
623 del options['sourceOptions']
624
625
626 if 'sourceOperators' not in options:
627 options['sourceOperators'] = []
628 options['sourceOperators'].append(readingOp)
629
630 if 'targetOptions' in options:
631 readingOp = self._getReadingOperatorInstance(toReading,
632 **options['targetOptions'])
633 del options['targetOptions']
634
635
636 if 'targetOperators' not in options:
637 options['targetOperators'] = []
638 options['targetOperators'].append(readingOp)
639
640 @staticmethod
642 """
643 Constructs a unique hashable (partially deep-)copy for a given instance,
644 replacing non-hashable datatypes C{set}, C{dict} and C{list}
645 recursively.
646
647 @param data: non-hashable object
648 @return: hashable object, C{set} converted to a C{frozenset}, C{dict}
649 converted to a C{frozenset} of key-value-pairs (tuple), and C{list}
650 converted to a C{tuple}.
651 """
652 if type(data) == type([]) or type(data) == type(()):
653 newList = []
654 for entry in data:
655 newList.append(ReadingFactory._getHashableCopy(entry))
656 return tuple(newList)
657 elif type(data) == type(set([])):
658 newSet = set([])
659 for entry in data:
660 newSet.add(ReadingFactory._getHashableCopy(entry))
661 return frozenset(newSet)
662 elif type(data) == type({}):
663 newDict = {}
664 for key in data:
665 newDict[key] = ReadingFactory._getHashableCopy(data[key])
666 return frozenset(newDict.items())
667 else:
668 return data
669
670
671
672
673 - def convert(self, readingStr, fromReading, toReading, *args, **options):
674 """
675 Converts the given string in the source reading to the given target
676 reading.
677
678 @type readingStr: str
679 @param readingStr: string that needs to be converted
680 @type fromReading: str
681 @param fromReading: name of the source reading
682 @type toReading: str
683 @param toReading: name of the target reading
684 @param args: optional list of L{RomanisationOperator}s to use for
685 handling source and target readings.
686 @param options: additional options for handling the input
687 @keyword sourceOperators: list of L{ReadingOperator}s used for handling
688 source readings.
689 @keyword targetOperators: list of L{ReadingOperator}s used for handling
690 target readings.
691 @keyword sourceOptions: dictionary of options to configure the
692 L{ReadingOperator}s used for handling source readings. If an
693 operator for the source reading is explicitly specified, no options
694 can be given.
695 @keyword targetOptions: dictionary of options to configure the
696 L{ReadingOperator}s used for handling target readings. If an
697 operator for the target reading is explicitly specified, no options
698 can be given.
699 @rtype: str
700 @return: the converted string
701 @raise DecompositionError: if the string can not be decomposed into
702 basic entities with regards to the source reading or the given
703 information is insufficient.
704 @raise ConversionError: on operations specific to the conversion between
705 the two readings (e.g. error on converting entities).
706 @raise UnsupportedError: if source or target reading is not supported
707 for conversion.
708 """
709 readingConv = self._getReadingConverterInstance(fromReading, toReading,
710 *args, **options)
711 return readingConv.convert(readingStr, fromReading, toReading)
712
713 - def convertEntities(self, readingEntities, fromReading, toReading, *args,
714 **options):
715 """
716 Converts a list of entities in the source reading to the given target
717 reading.
718
719 @type readingEntities: list of str
720 @param readingEntities: list of entities written in source reading
721 @type fromReading: str
722 @param fromReading: name of the source reading
723 @type toReading: str
724 @param toReading: name of the target reading
725 @param args: optional list of L{RomanisationOperator}s to use for
726 handling source and target readings.
727 @param options: additional options for handling the input
728 @keyword sourceOperators: list of L{ReadingOperator}s used for handling
729 source readings.
730 @keyword targetOperators: list of L{ReadingOperator}s used for handling
731 target readings.
732 @keyword sourceOptions: dictionary of options to configure the
733 L{ReadingOperator}s used for handling source readings. If an
734 operator for the source reading is explicitly specified, no options
735 can be given.
736 @keyword targetOptions: dictionary of options to configure the
737 L{ReadingOperator}s used for handling target readings. If an
738 operator for the target reading is explicitly specified, no options
739 can be given.
740 @rtype: list of str
741 @return: list of entities written in target reading
742 @raise ConversionError: on operations specific to the conversion between
743 the two readings (e.g. error on converting entities).
744 @raise UnsupportedError: if source or target reading is not supported
745 for conversion.
746 @raise InvalidEntityError: if an invalid entity is given.
747 """
748 readingConv = self._getReadingConverterInstance(fromReading, toReading,
749 *args, **options)
750 return readingConv.convertEntities(readingEntities, fromReading,
751 toReading)
752
753
754
755
756 - def decompose(self, string, readingN, **options):
757 """
758 Decomposes the given string into basic entities that can be mapped to
759 one Chinese character each for the given reading.
760
761 The given input string can contain other non reading characters, e.g.
762 punctuation marks.
763
764 The returned list contains a mix of basic reading entities and other
765 characters e.g. spaces and punctuation marks.
766
767 @type string: str
768 @param string: reading string
769 @type readingN: str
770 @param readingN: name of reading
771 @param options: additional options for handling the input
772 @rtype: list of str
773 @return: a list of basic entities of the input string
774 @raise DecompositionError: if the string can not be decomposed.
775 @raise UnsupportedError: if the given reading is not supported.
776 """
777 readingOp = self._getReadingOperatorInstance(readingN, **options)
778 return readingOp.decompose(string)
779
780 - def compose(self, readingEntities, readingN, **options):
781 """
782 Composes the given list of basic entities to a string for the given
783 reading.
784
785 @type readingEntities: list of str
786 @param readingEntities: list of basic syllables or other content
787 @type readingN: str
788 @param readingN: name of reading
789 @param options: additional options for handling the input
790 @rtype: str
791 @return: composed entities
792 @raise UnsupportedError: if the given reading is not supported.
793 """
794 readingOp = self._getReadingOperatorInstance(readingN, **options)
795 return readingOp.compose(readingEntities)
796
798 """
799 Checks if the given string is an entity of the given reading.
800
801 @type entity: str
802 @param entity: entity to check
803 @type readingN: str
804 @param readingN: name of reading
805 @param options: additional options for handling the input
806 @rtype: bool
807 @return: true if string is an entity of the reading, false otherwise.
808 @raise UnsupportedError: if the given reading is not supported.
809 """
810 readingOp = self._getReadingOperatorInstance(readingN, **options)
811 return readingOp.isReadingEntity(entity)
812
813
814
815
817 """
818 Decomposes the given string into basic entities that can be mapped to
819 one Chinese character each for ambiguous decompositions. It all possible
820 decompositions. This method is a more general version of L{decompose()}.
821
822 The returned list construction consists of two entity types: entities of
823 the romanisation and other strings.
824
825 @type string: str
826 @param string: reading string
827 @type readingN: str
828 @param readingN: name of reading
829 @param options: additional options for handling the input
830 @rtype: list of list of str
831 @return: a list of all possible decompositions consisting of basic
832 entities.
833 @raise DecompositionError: if the given string has a wrong format.
834 @raise UnsupportedError: if the given reading is not supported or the
835 reading doesn't support the specified method.
836 """
837 readingOp = self._getReadingOperatorInstance(readingN, **options)
838 if not hasattr(readingOp, 'getDecompositions'):
839 raise UnsupportedError("method 'getDecompositions' not supported")
840 return readingOp.getDecompositions(string)
841
842 - def segment(self, string, readingN, **options):
843 """
844 Takes a string written in the romanisation and returns the possible
845 segmentations as a list of syllables.
846
847 In contrast to L{decompose()} this method merely segments continuous
848 entities of the romanisation. Characters not part of the romanisation
849 will not be dealt with, this is the task of the more general decompose
850 method.
851
852 @type string: str
853 @param string: reading string
854 @type readingN: str
855 @param readingN: name of reading
856 @param options: additional options for handling the input
857 @rtype: list of list of str
858 @return: a list of possible segmentations (several if ambiguous) into
859 single syllables
860 @raise DecompositionError: if the given string has an invalid format.
861 @raise UnsupportedError: if the given reading is not supported or the
862 reading doesn't support the specified method.
863 """
864 readingOp = self._getReadingOperatorInstance(readingN, **options)
865 if not hasattr(readingOp, 'segment'):
866 raise UnsupportedError("method 'segment' not supported")
867 return readingOp.segment(string)
868
870 """
871 Checks if the given decomposition follows the romanisation format
872 strictly to allow unambiguous decomposition.
873
874 The romanisation should offer a way/protocol to make an unambiguous
875 decomposition into it's basic syllables possible as to make the process
876 of appending syllables to a string reversible. The testing on compliance
877 with this protocol has to be implemented here. Thus this method can only
878 return true for one and only one possible decomposition for all strings.
879
880 @type decomposition: list of str
881 @param decomposition: decomposed reading string
882 @type readingN: str
883 @param readingN: name of reading
884 @param options: additional options for handling the input
885 @rtype: bool
886 @return: False, as this methods needs to be implemented by the sub class
887 @raise UnsupportedError: if the given reading is not supported or the
888 reading doesn't support the specified method.
889 """
890 readingOp = self._getReadingOperatorInstance(readingN, **options)
891 if not hasattr(readingOp, 'isStrictDecomposition'):
892 raise UnsupportedError(
893 "method 'isStrictDecomposition' not supported")
894 return readingOp.isStrictDecomposition(decomposition)
895
897 """
898 Gets a set of all entities supported by the reading.
899
900 The list is used in the segmentation process to find entity boundaries.
901
902 @type readingN: str
903 @param readingN: name of reading
904 @param options: additional options for handling the input
905 @rtype: set of str
906 @return: set of supported syllables
907 @raise UnsupportedError: if the given reading is not supported or the
908 reading doesn't support the specified method.
909 """
910 readingOp = self._getReadingOperatorInstance(readingN, **options)
911 if not hasattr(readingOp, 'getReadingEntities'):
912 raise UnsupportedError("method 'getReadingEntities' not supported")
913 return readingOp.getReadingEntities()
914
915
916
917
918 - def getTones(self, readingN, **options):
919 """
920 Returns a set of tones supported by the reading.
921
922 @type readingN: str
923 @param readingN: name of reading
924 @param options: additional options for handling the input
925 @rtype: list
926 @return: list of supported tone marks.
927 @raise UnsupportedError: if the given reading is not supported or the
928 reading doesn't support the specified method.
929 """
930 readingOp = self._getReadingOperatorInstance(readingN, **options)
931 if not hasattr(readingOp, 'getTones'):
932 raise UnsupportedError("method 'getTones' not supported")
933 return readingOp.getTones()
934
936 """
937 Gets the entity with tone mark for the given plain entity and tone.
938
939 @type plainEntity: str
940 @param plainEntity: entity without tonal information
941 @param tone: tone
942 @type readingN: str
943 @param readingN: name of reading
944 @param options: additional options for handling the input
945 @rtype: str
946 @return: entity with appropriate tone
947 @raise InvalidEntityError: if the entity is invalid.
948 @raise UnsupportedError: if the given reading is not supported or the
949 reading doesn't support the specified method.
950 """
951 readingOp = self._getReadingOperatorInstance(readingN, **options)
952 if not hasattr(readingOp, 'getTonalEntity'):
953 raise UnsupportedError("method 'getTonalEntity' not supported")
954 return readingOp.getTonalEntity(plainEntity, tone)
955
957 """
958 Splits the entity into an entity without tone mark (plain entity) and
959 the entity's tone.
960
961 @type entity: str
962 @param entity: entity with tonal information
963 @type readingN: str
964 @param readingN: name of reading
965 @param options: additional options for handling the input
966 @rtype: tuple
967 @return: plain entity without tone mark and entity's tone
968 @raise InvalidEntityError: if the entity is invalid.
969 @raise UnsupportedError: if the given reading is not supported or the
970 reading doesn't support the specified method.
971 """
972 readingOp = self._getReadingOperatorInstance(readingN, **options)
973 if not hasattr(readingOp, 'splitEntityTone'):
974 raise UnsupportedError("method 'splitEntityTone' not supported")
975 return readingOp.splitEntityTone(entity)
976
978 """
979 Gets the list of plain entities supported by this reading. Different to
980 L{getReadingEntities()} the entities will carry no tone mark.
981
982 @type readingN: str
983 @param readingN: name of reading
984 @param options: additional options for handling the input
985 @rtype: set of str
986 @return: set of supported syllables
987 @raise UnsupportedError: if the given reading is not supported or the
988 reading doesn't support the specified method.
989 """
990 readingOp = self._getReadingOperatorInstance(readingN, **options)
991 if not hasattr(readingOp, 'getPlainReadingEntities'):
992 raise UnsupportedError(
993 "method 'getPlainReadingEntities' not supported")
994 return readingOp.getPlainReadingEntities()
995
997 """
998 Returns true if the given plain entity (without any tone mark) is
999 recognised by the romanisation operator, i.e. it is a valid entity of
1000 the reading returned by the segmentation method.
1001
1002 Reading entities will be handled as being case insensitive.
1003
1004 @type entity: str
1005 @param entity: entity to check
1006 @type readingN: str
1007 @param readingN: name of reading
1008 @param options: additional options for handling the input
1009 @rtype: bool
1010 @return: C{True} if string is an entity of the reading, C{False}
1011 otherwise.
1012 @raise UnsupportedError: if the given reading is not supported or the
1013 reading doesn't support the specified method.
1014 """
1015 readingOp = self._getReadingOperatorInstance(readingN, **options)
1016 if not hasattr(readingOp, 'isPlainReadingEntity'):
1017 raise UnsupportedError(
1018 "method 'isPlainReadingEntity' not supported")
1019 return readingOp.isPlainReadingEntity(entity)
1020