Package ghalatawi :: Module ar_ghalat
[hide private]
[frames] | no frames]

Module ar_ghalat

source code

Ghalatawi: Arabic AutoCorrection module


Author: Taha Zerrouki

Contact: taha dot zerrouki at gmail dot com

Copyright: Arabtechies, Arabeyes, Taha Zerrouki

License: GPL

Date: 2011/01/05

Version: 0.2

Functions [hide private]
Boolean
isArabicword(word)
Checks for a valid Arabic word.
source code
unicode or False.
autocorrectByRegex(word)
Autocorrect by using regular expression from remplacement table.
source code
unicode or False.
autocorrectByWordlist(word, wordlist={u'اذا': u'إذا', u'التى': u'التي', u'الذى': u'الذي', u'الى': u...)
Autocorrect by using word list.
source code
Boolean.
loadAutocorrectWordlistFromFile(myfile)
Load Autocorrect list from a file, to the global list autocorrect_arabic_list.
source code
Variables [hide private]
  ArabicAutocorrectWordlist = {u'اذا': u'إذا', u'التى': u'التي',...
  ReplacementTable = [(re.compile(r'(?u)\b(\u0648|\u0641|)(\u064...
  __package__ = 'ghalatawi'
Function Details [hide private]

isArabicword(word)

source code 

Checks for a valid Arabic word. An Arabic word not contains spaces, digits and pounctuation avoid some spelling error, TEH_MARBUTA must be at the end.

Parameters:
  • word (unicode) - input word
Returns: Boolean
True if all charaters are in Arabic block

autocorrectByRegex(word)

source code 

Autocorrect by using regular expression from remplacement table.

Example:

>>> word=u"الإجتماعية"
>>> autocorrectByRegex(word)
 الاجتماعية
Parameters:
  • word (unicode.) - the input word.
Returns: unicode or False.
corrected word, if the word is common error, or False.

autocorrectByWordlist(word, wordlist={u'اذا': u'إذا', u'التى': u'التي', u'الذى': u'الذي', u'الى': u...)

source code 

Autocorrect by using word list. the default list is ArabicAutocorrectWordlist.

Example:

>>> autocorrectlist={
        u'اذا':u'إذا',
        u'او':u'أو',
        u'فى':u'في',
        u'هى':u'هي',
        u'انت':u'أنت',
        u'انتما':u'أنتما',
        u'الى':u'إلى',
        u'التى':u'التي',
        u'الذى':u'الذي',
        }
>>> word=u"اذا"
>>> autocorrectByWordlist(word, autocorrectlist)
 إذا
Parameters:
  • word (unicode.) - the input word.
Returns: unicode or False.
corrected word, if the word is common error, or False.

loadAutocorrectWordlistFromFile(myfile)

source code 

Load Autocorrect list from a file, to the global list autocorrect_arabic_list.

Example:

>>> autocorrectlist=loadAutocorrectWordlistFromFile("data/arabic.acl")
>>> word=u"اذا"
>>> autocorrectByWordlist(word, autocorrectlist)
 إذا
Parameters:
  • myfile (unicode.) - the input word.
Returns: Boolean.
wordlist, if loaded, else False.

Variables Details [hide private]

ArabicAutocorrectWordlist

Value:
{u'اذا': u'إذا',
 u'التى': u'التي',
 u'الذى': u'الذي',
 u'الى': u'إلى',
 u'انت': u'أنت',
 u'انتما': u'أنتما',
 u'او': u'أو',
 u'فى': u'في',
...

ReplacementTable

Value:
[(re.compile(r'(?u)\b(\u0648|\u0641|)(\u0643|\u0628|)(\u0627\u0644|)\u\
0625\u0646(\w\w)\u0627(\w)(\u064a|)(\u064a\u0646|\u0627\u062a|\u0629|\\
u062a\u064a\u0646|)\b'),
  u'\1\2\3ان\4ا\5\6\7'),
 (re.compile(r'(?u)\b(\u0648|\u0641|)(\u0644\u0644|)\u0625\u0646(\w\w)\
\u0627(\w)(\u064a|)(\u064a\u0646|\u0627\u062a|\u062a\u064a\u0646|\u062\
9|)\b'),
  u'\1\2ان\3ا\4\5\6'),
...