Package cjklib
[hide private]
[frames] | no frames]

Source Code for Package cjklib

  1  # -*- coding: utf-8 -*- 
  2  # This file is part of cjklib. 
  3  # 
  4  # cjklib is free software: you can redistribute it and/or modify 
  5  # it under the terms of the GNU Lesser General Public License as published by 
  6  # the Free Software Foundation, either version 3 of the License, or 
  7  # (at your option) any later version. 
  8  # 
  9  # cjklib is distributed in the hope that it will be useful, 
 10  # but WITHOUT ANY WARRANTY; without even the implied warranty of 
 11  # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the 
 12  # GNU Lesser General Public License for more details. 
 13  # 
 14  # You should have received a copy of the GNU Lesser General Public License 
 15  # along with cjklib.  If not, see <http://www.gnu.org/licenses/>. 
 16   
 17  """ 
 18  Han character library. Cjklib provides language routines related to Han 
 19  characters (characters based on Chinese characters named Hanzi, Kanji, Hanja and 
 20  chu Han respectively) used in writing of the Chinese, the Japanese, infrequently 
 21  the Korean and formerly the Vietnamese language(s). Functionality is included 
 22  for character pronunciations, radicals, glyph components, stroke decomposition 
 23  and variant information. 
 24   
 25  Supported 
 26  ========= 
 27  The following functions are supported by this library: 
 28      - Character readings (pronunciation): 
 29          - Pinyin, Gwoyeu Romatzyh, Wade-Giles, IPA, Braille (all Mandarin) 
 30          - Jyutping, Cantonese Yale (both Cantonese) 
 31          - Hangul (Korean) 
 32          - Hiragana, Katakana (Japanese) 
 33      - Conversion of readings from one into another 
 34      - Mapping between character and reading 
 35      - Mapping between character and Kangxi radical (radical mapping as defined 
 36          by the Unihan database) 
 37      - Mapping between character and multiple radical forms 
 38      - Mapping between character and stroke sequence 
 39      - Mapping between character and stroke count 
 40      - Mapping between Kangxi radical forms, radical variant forms and equivalent 
 41          characters 
 42      - Character variant lookup (including mapping from traditional to Chinese 
 43          simplified forms) 
 44   
 45  Tools 
 46  ===== 
 47  The following tools come with this library: 
 48      - cjknife, provides most functions from the library on the command line. 
 49      - buildcjkdb, builds the database from source files. 
 50   
 51  Data 
 52  ==== 
 53  The library comes with its own set of sources on: 
 54      - Pinyin syllables 
 55      - Gwoyeu Romatzyh syllables including rhotacised forms and abbreviations 
 56      - Wade-Giles syllables 
 57      - Jyutping syllables 
 58      - Cantonese Yale syllables 
 59      - Pinyin to Gwoyeu Romatzyh mapping 
 60      - Wade-Giles to Pinyin mapping 
 61      - Pinyin to IPA mapping 
 62      - Pinyin to Braille mapping 
 63      - Jyutping to Cantonese Yale mapping 
 64      - Jyutping to IPA mapping 
 65      - mapping of Mandarin syllables to onset and rhyme 
 66      - mapping of Cantonese syllables to onset and rhyme 
 67      - Kangxi radical forms 
 68      - stroke count and stroke order 
 69      - stroke names 
 70      - character decomposition 
 71   
 72  See the data files for comparison with other sources. 
 73   
 74  This project makes use of the X{Unicode Han database} provided by the Unicode 
 75  Consortium: Unicode Standard Annex #38 - Unicode Han database (X{Unihan}): 
 76  U{http://www.unicode.org/reports/tr38/tr38-5.html}, 28E{.}03E{.}2008E{.} 
 77   
 78  The following data is used: 
 79      - Character Kangxi radical information (from kRSKangxi) 
 80      - Radical residual stroke count (from kRSKangxi) and total stroke count 
 81          (from KTotalStrokes) 
 82      - Mandarin character readings in Pinyin (from kMandarin, kHanyuPinlu, 
 83          kXHC1983) 
 84      - Cantonese character readings in Jyutping (from kCantonese) 
 85      - Korean character readings in Hangul (from kHangul) 
 86      - Character variant forms (from kCompatibilityVariant, kSemanticVariant, 
 87          kSimplifiedVariant, kSpecializedSemanticVariant, kTraditionalVariant, 
 88          kZVariant) 
 89   
 90  This includes dictionary data from: 
 91      - kXHC1983:  Xiàndài Hànyǔ Cídiǎn (现代汉语词典). Shāngwù Yìnshūguǎn, Beijing, 
 92          1983. 
 93      - kHanyuPinlu: Xiàndài Hànyǔ Pínlǜ Cídiǎn (現代漢語頻率詞典). 
 94          北京語言學院語言教學研究所編著, First edition 1986/6, 2nd printing 1990/4, 
 95          ISBN 7-5619-0094-5. 
 96   
 97  Currently no data validation scheme is implemented as this library is still 
 98  in early development. Rather than specifying few data cjklib tries to support as 
 99  much options as possible. The library tries to be as accurate as possible but 
100  mistakes do happen, especially for data which differs on different locales. 
101   
102  Dependencies 
103  ============ 
104  cjklib is written in Python and is well tested on Python 2.5. 
105  Apart from this dependency it needs a database back-end for most of its parts 
106  and library SQLAlchemy. 
107  Currently tested are: 
108      - SQLite, tested on SQLite 3 
109      - MySQL, tested on MySQL 5.0 (works only with characters from the Basic 
110          Multilingual Plane in Unicode, BMP) 
111   
112  @author: Christoph Burgmer <cburgmer@ira.uka.de> 
113  @requires: Python 2.5+, SQLAlchemy 0.5+ and either SQLite 3+ or MySQL 5+ and 
114      MySQL-Python 
115  @version: 0.1alpha 
116   
117  @copyright: Copyright (C) 2006-2009 Christoph Burgmer 
118   
119      cjklib comes with absolutely no warranty; for details see B{License}. 
120   
121      Parts of the data used by this library are copyrighted by the following 
122      organisations: 
123          - Copyright © 1991-2007 Unicode, Inc. All rights reserved. Distributed 
124              under the Terms of Use in U{http://www.unicode.org/copyright.html}. 
125   
126              Permission is hereby granted, free of charge, to any person 
127              obtaining a copy of the Unicode data files and any associated 
128              documentation (the "Data Files") or Unicode software and any 
129              associated documentation (the "Software") to deal in the Data Files 
130              or Software without restriction, including without limitation the 
131              rights to use, copy, modify, merge, publish, distribute, and/or sell 
132              copies of the Data Files or Software, and to permit persons to whom 
133              the Data Files or Software are furnished to do so, provided that (a) 
134              the above copyright notice(s) and this permission notice appear with 
135              all copies of the Data Files or Software, (b) both the above 
136              copyright notice(s) and this permission notice appear in associated 
137              documentation, and (c) there is clear notice in each modified Data 
138              File or in the Software as well as in the documentation associated 
139              with the Data File(s) or Software that the data or software has been 
140              modified. 
141   
142          - The Jyutping phrase box, Linguistic Society of Hong Kong. 
143   
144              The copyright of the Jyutping phrase box belongs to the Linguistic 
145              Society of Hong Kong. We would like to thank the Jyutping Group of 
146              the Linguistic Society of Hong Kong for permission to use the 
147              electronic file in our research and/or product development. Note 
148              that the inclusion of the phrase box in the Unihan database requires 
149              that any products developed using the kCantonese field needs to 
150              include this acknowledgement. 
151   
152  @license: The library and all parts are distributed under the terms of the LGPL 
153  Version 3, 29 June 2007 (U{http://www.gnu.org/licenses/lgpl.html}) if not 
154  otherwise noted. 
155  """ 
156  __version__ = '0.1alpha' 
157  """The version of cjklib""" 
158   
159  __author__ = 'Christoph Burgmer <cburgmer@ira.uka.de>' 
160  """The primary author of cjklib""" 
161   
162  __url__ = 'http://code.google.com/p/cjklib/' 
163  """The URL for cjklib's homepage""" 
164   
165  __license__ = 'LGPL' 
166  """The license governing the use and distribution of cjklib""" 
167