API¶

中文转换函数¶

zhconv.convert(s, locale, update=None)¶

Main convert function.

参数:	s – must be unicode (Python 2) or str (Python 3). locale – should be one of `('zh-hans', 'zh-hant', 'zh-cn', 'zh-sg' 'zh-tw', 'zh-hk', 'zh-my', 'zh-mo')`. update – a dict which updates the conversion table, eg. `{'from1': 'to1', 'from2': 'to2'}`

>>> print(convert('我幹什麼不干你事。', 'zh-cn'))
我干什么不干你事。
>>> print(convert('我幹什麼不干你事。', 'zh-cn', {'不干': '不幹'}))
我干什么不幹你事。
>>> print(convert('人体内存在很多微生物', 'zh-tw'))
人體內存在很多微生物

zhconv.convert_for_mw(s, locale, update=None)¶

Recognizes MediaWiki’s human conversion format. Use locale=’zh’ for no conversion.

Reference: (all tests passed) https://zh.wikipedia.org/wiki/Help:高级字词转换语法 https://www.mediawiki.org/wiki/Writing_systems/Syntax

>>> print(convert_for_mw('在现代，机械计算-{}-机的应用已经完全被电子计算-{}-机所取代', 'zh-hk'))
在現代，機械計算機的應用已經完全被電子計算機所取代
>>> print(convert_for_mw('-{zh-hant:資訊工程;zh-hans:计算机工程学;}-是电子工程的一个分支，主要研究计算机软硬件和二者间的彼此联系。', 'zh-tw'))
資訊工程是電子工程的一個分支，主要研究計算機軟硬體和二者間的彼此聯繫。
>>> print(convert_for_mw('張國榮曾在英國-{zh:利兹;zh-hans:利兹;zh-hk:列斯;zh-tw:里茲}-大学學習。', 'zh-hant'))
張國榮曾在英國里茲大學學習。
>>> print(convert_for_mw('張國榮曾在英國-{zh:利兹;zh-hans:利兹;zh-hk:列斯;zh-tw:里茲}-大学學習。', 'zh-sg'))
张国荣曾在英国利兹大学学习。
>>> convert_for_mw('-{zh-hant:;\nzh-cn:}-', 'zh-tw') == ''
True
>>> print(convert_for_mw('毫米(毫公分)，符號mm，是長度單位和降雨量單位，-{zh-hans:台湾作-{公釐}-或-{公厘}-;zh-hant:港澳和大陸稱為-{毫米}-（台灣亦有使用，但較常使用名稱為毫公分）;zh-mo:台灣作-{公釐}-或-{公厘}-;zh-hk:台灣作-{公釐}-或-{公厘}-;}-。', 'zh-tw'))
毫米(毫公分)，符號mm，是長度單位和降雨量單位，港澳和大陸稱為毫米（台灣亦有使用，但較常使用名稱為毫公分）。
>>> print(convert_for_mw('毫米(毫公分)，符號mm，是長度單位和降雨量單位，-{zh-hans:台湾作-{公釐}-或-{公厘}-;zh-hant:港澳和大陸稱為-{毫米}-（台灣亦有使用，但較常使用名稱為毫公分）;zh-mo:台灣作-{公釐}-或-{公厘}-;zh-hk:台灣作-{公釐}-或-{公厘}-;}-。', 'zh-cn'))
毫米(毫公分)，符号mm，是长度单位和降雨量单位，台湾作公釐或公厘。
>>> print(convert_for_mw('毫米(毫公分)，符號mm，是長度單位和降雨量單位，-{zh-hans:台湾作-{公釐}-或-{公厘}-;zh-hant:港澳和大陸稱為-{毫米}-（台灣亦有使用，但較常使用名稱為毫公分）;zh-mo:台灣作-{公釐}-或-{公厘}-;zh-hk:台灣作-{公釐}-或-{公厘', 'zh-hk'))  # unbalanced test
毫米(毫公分)，符號mm，是長度單位和降雨量單位，台灣作公釐或公厘
>>> print(convert_for_mw('报头的“-{參攷消息}-”四字摘自鲁迅笔迹-{zh-hans:，“-{參}-”是“-{参}-”的繁体字，读音cān，与简体的“-{参}-”字相同；;zh-hant:，;}-“-{攷}-”是“考”的异体字，读音kǎo，与“考”字相同。', 'zh-tw'))
報頭的「參攷消息」四字摘自魯迅筆跡，「攷」是「考」的異體字，讀音kǎo，與「考」字相同。
>>> print(convert_for_mw('报头的“-{參攷消息}-”四字摘自鲁迅笔迹-{zh-hans:，“-{參}-”是“-{参}-”的繁体字，读音cān，与简体的“-{参}-”字相同；;zh-hant:，;}-“-{攷}-”是“考”的异体字，读音kǎo，与“考”字相同。', 'zh-cn'))
报头的“參攷消息”四字摘自鲁迅笔迹，“參”是“参”的繁体字，读音cān，与简体的“参”字相同；“攷”是“考”的异体字，读音kǎo，与“考”字相同。

其他函数¶

zhconv.issimp(s, full=False)¶

Detect text is whether Simplified Chinese or Traditional Chinese. Returns True for Simplified; False for Traditional; None for unknown. If full=False, it returns once first simplified- or traditional-only character is encountered, so it’s for quick and rough identification; else, it compares the count and returns the most likely one. Use is (True/False/None) to check the result.

s must be unicode (Python 2) or str (Python 3), or you’ll get None.

zhconv.tokenize(s, locale, update=None)¶: Tokenize s according to corresponding locale dictionary. Don’t use this for serious text processing.

zhconv.loaddict(filename='zhcdict.json')¶: Load the dictionary from a specific JSON file.

API¶

中文转换函数¶

其他函数¶

內容目录

Related Topics

本页