Script buildcjkdb
Builds the database for cjklib.
For the Unihan data only characters in the Basic Multilingual Plane
(BMP) with code values between U+0000 and U+FFFF are currently included,
as MySQL < 6 doesn't support 4-byte UTF-8. To include characters
outside the BMP change 'UnihanBMPBuilder'
and
'SlimUnihanBMPBuilder'
to 'UnihanBuilder'
and
'SlimUnihanBuilder'
respectively.
For MS Windows default versions provided seem to be a "narrow
build" and not support characters outside the BMP (see e.g. http://wordaligned.org/articles/narrow-python).
Currently no Unicode characters outside the BMP will thus be supported on
Windows platforms.
Some TableBuilders make an assumption about the file names
being loaded (the builder only knows the directory of the data files), so
naming the input files according to the builder's setting is
necessary.
Copyright:
Copyright (C) 2006-2008 Christoph Burgmer.
To Do (Impl):
Add option for rebuilding dependencies by setting
rebuildDepending=False/True (True by default). Consider asking the user
if all dependencies should be rebuilt.
|
version()
Prints the version of this script. |
|
|
|
usage()
Prints the usage for this script. |
|
|
|
printFormattedLine(outputString,
lineLength=80,
subsequentPrefix='
' )
Formats the given input string to fit to a output with a limited line
length and prints it to stdout with the systems encoding. |
|
|
|
main()
Main method of script |
|
|
|
DEFAULT_DATA_PATH = [ ' . ' , ' /tmp/cjklib-read-only/cjklib/data ' ]
|
|
buildModulePath = ' /tmp/cjklib-read-only/cjklib '
|
|
BUILD_GROUPS = { ' KangxiRadicalData ' : [ ' CharacterKangxiRadical ' ...
Definition of build groups available to the user.
|
printFormattedLine(outputString,
lineLength=80,
subsequentPrefix='
' )
|
|
Formats the given input string to fit to a output with a limited line
length and prints it to stdout with the systems encoding.
- Parameters:
outputString (str) - a string that is formated to fit to the screen
lineLength (int) - with of screen
subsequentPrefix (str) - prefix used after line break
|
BUILD_GROUPS
Definition of build groups available to the user. Recursive
definitions are not allowed and will lead to a lock up.
- Value:
{ ' KangxiRadicalData ' : [ ' CharacterKangxiRadical ' ,
' KangxiRadical ' ,
' KangxiRadicalIsolatedCharacter ' ,
' RadicalEquivalentCharacter ' ,
' CharacterRadicalResidualStrokeCount ' ,
' CharacterResidualStrokeCount ' ] ,
' Readings ' : [ ' PinyinSyllables ' ,
' WadeGilesSyllables ' ,
...
|
|