1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17 """
18 Han character library. Cjklib provides language routines related to Han
19 characters (characters based on Chinese characters named Hanzi, Kanji, Hanja and
20 chu Han respectively) used in writing of the Chinese, the Japanese, infrequently
21 the Korean and formerly the Vietnamese language(s). Functionality is included
22 for character pronunciations, radicals, glyph components, stroke decomposition
23 and variant information.
24
25 Supported
26 =========
27 The following functions are supported by this library:
28 - Character readings (pronunciation):
29 - Pinyin, Gwoyeu Romatzyh, Wade-Giles, IPA, Braille (all Mandarin)
30 - Jyutping, Cantonese Yale (both Cantonese)
31 - Hangul (Korean)
32 - Hiragana, Katakana (Japanese)
33 - Conversion of readings from one into another
34 - Mapping between character and reading
35 - Mapping between character and Kangxi radical (radical mapping as defined
36 by the Unihan database)
37 - Mapping between character and multiple radical forms
38 - Mapping between character and stroke sequence
39 - Mapping between character and stroke count
40 - Mapping between Kangxi radical forms, radical variant forms and equivalent
41 characters
42 - Character variant lookup (including mapping from traditional to Chinese
43 simplified forms)
44
45 Tools
46 =====
47 The following tools come with this library:
48 - cjknife, provides most functions from the library on the command line.
49 - buildcjkdb, builds the database from source files.
50
51 Data
52 ====
53 The library comes with its own set of sources on:
54 - Pinyin syllables
55 - Gwoyeu Romatzyh syllables including rhotacised forms and abbreviations
56 - Wade-Giles syllables
57 - Jyutping syllables
58 - Cantonese Yale syllables
59 - Pinyin to Gwoyeu Romatzyh mapping
60 - Wade-Giles to Pinyin mapping
61 - Pinyin to IPA mapping
62 - Pinyin to Braille mapping
63 - Jyutping to Cantonese Yale mapping
64 - Jyutping to IPA mapping
65 - mapping of Mandarin syllables to onset and rhyme
66 - mapping of Cantonese syllables to onset and rhyme
67 - Kangxi radical forms
68 - stroke count and stroke order
69 - stroke names
70 - character decomposition
71
72 See the data files for comparison with other sources.
73
74 This project makes use of the X{Unicode Han database} provided by the Unicode
75 Consortium: Unicode Standard Annex #38 - Unicode Han database (X{Unihan}):
76 U{http://www.unicode.org/reports/tr38/tr38-5.html}, 28E{.}03E{.}2008E{.}
77
78 The following data is used:
79 - Character Kangxi radical information (from kRSKangxi)
80 - Radical residual stroke count (from kRSKangxi) and total stroke count
81 (from KTotalStrokes)
82 - Mandarin character readings in Pinyin (from kMandarin, kHanyuPinlu,
83 kXHC1983)
84 - Cantonese character readings in Jyutping (from kCantonese)
85 - Korean character readings in Hangul (from kHangul)
86 - Character variant forms (from kCompatibilityVariant, kSemanticVariant,
87 kSimplifiedVariant, kSpecializedSemanticVariant, kTraditionalVariant,
88 kZVariant)
89
90 This includes dictionary data from:
91 - kXHC1983: Xiàndài Hànyǔ Cídiǎn (现代汉语词典). Shāngwù Yìnshūguǎn, Beijing,
92 1983.
93 - kHanyuPinlu: Xiàndài Hànyǔ Pínlǜ Cídiǎn (現代漢語頻率詞典).
94 北京語言學院語言教學研究所編著, First edition 1986/6, 2nd printing 1990/4,
95 ISBN 7-5619-0094-5.
96
97 Currently no data validation scheme is implemented as this library is still
98 in early development. Rather than specifying few data cjklib tries to support as
99 much options as possible. The library tries to be as accurate as possible but
100 mistakes do happen, especially for data which differs on different locales.
101
102 Dependencies
103 ============
104 cjklib is written in Python and is well tested on Python 2.5.
105 Apart from this dependency it needs a database back-end for most of its parts
106 and library SQLAlchemy.
107 Currently tested are:
108 - SQLite, tested on SQLite 3
109 - MySQL, tested on MySQL 5.0 (works only with characters from the Basic
110 Multilingual Plane in Unicode, BMP)
111
112 @author: Christoph Burgmer <cburgmer@ira.uka.de>
113 @requires: Python 2.5+, SQLAlchemy 0.5+ and either SQLite 3+ or MySQL 5+ and
114 MySQL-Python
115 @version: 0.1alpha
116
117 @copyright: Copyright (C) 2006-2009 Christoph Burgmer
118
119 cjklib comes with absolutely no warranty; for details see B{License}.
120
121 Parts of the data used by this library are copyrighted by the following
122 organisations:
123 - Copyright © 1991-2007 Unicode, Inc. All rights reserved. Distributed
124 under the Terms of Use in U{http://www.unicode.org/copyright.html}.
125
126 Permission is hereby granted, free of charge, to any person
127 obtaining a copy of the Unicode data files and any associated
128 documentation (the "Data Files") or Unicode software and any
129 associated documentation (the "Software") to deal in the Data Files
130 or Software without restriction, including without limitation the
131 rights to use, copy, modify, merge, publish, distribute, and/or sell
132 copies of the Data Files or Software, and to permit persons to whom
133 the Data Files or Software are furnished to do so, provided that (a)
134 the above copyright notice(s) and this permission notice appear with
135 all copies of the Data Files or Software, (b) both the above
136 copyright notice(s) and this permission notice appear in associated
137 documentation, and (c) there is clear notice in each modified Data
138 File or in the Software as well as in the documentation associated
139 with the Data File(s) or Software that the data or software has been
140 modified.
141
142 - The Jyutping phrase box, Linguistic Society of Hong Kong.
143
144 The copyright of the Jyutping phrase box belongs to the Linguistic
145 Society of Hong Kong. We would like to thank the Jyutping Group of
146 the Linguistic Society of Hong Kong for permission to use the
147 electronic file in our research and/or product development. Note
148 that the inclusion of the phrase box in the Unihan database requires
149 that any products developed using the kCantonese field needs to
150 include this acknowledgement.
151
152 @license: The library and all parts are distributed under the terms of the LGPL
153 Version 3, 29 June 2007 (U{http://www.gnu.org/licenses/lgpl.html}) if not
154 otherwise noted.
155 """
156 __version__ = '0.1alpha'
157 """The version of cjklib"""
158
159 __author__ = 'Christoph Burgmer <cburgmer@ira.uka.de>'
160 """The primary author of cjklib"""
161
162 __url__ = 'http://code.google.com/p/cjklib/'
163 """The URL for cjklib's homepage"""
164
165 __license__ = 'LGPL'
166 """The license governing the use and distribution of cjklib"""
167