Package aranalex :: Module analex :: Class analex
[hide private]
[frames] | no frames]

Class analex

source code

Arabic text morphological analyzer. Provides routins to alanyze text. Can treat text as verbs or as nouns.

Instance Methods [hide private]
 
__init__(self)
Create Analex instance.
source code
unicode.
text_treat(self, text)
deprecated: treat text to eliminate pountuation.
source code
list.
tokenize(self, text=u'')
Tokenize text into words
source code
list.
text_tokenize(self, text)
Tokenize text into words, after treatement.
source code
 
set_debug(self, debug)
Set the debug attribute to allow printing internal analysis results.
source code
 
set_limit(self, limit)
Set the number of word treated in text.
source code
list.
check_text(self, text, mode='all')
Analyze text morphologically
source code
list.
check_text_as_nouns(self, text)
Analyze text morphologically as nouns
source code
list.
check_text_as_verbs(self, text)
Analyze text morphologically as verbs
source code
list.
check_word(self, word)
Analyze one word morphologically as verbs
source code
list.
check_normalized(self, word_vocalised, resulted_data)
If the entred word is like the found word in dictionary, to treat some normalized cases, the analyzer return the vocalized like words; ُIf the word is ذئب, the normalized form is ذءب, which can give from dictionary ذئبـ ذؤب.
source code
list.
check_shadda(self, word_vocalised, resulted_data)
if the entred word is like the found word in dictionary, to treat some normalized cases, the analyzer return the vocalized like words.
source code
list.
check_partial_vocalized(self, word_vocalised, resulted_data)
if the entred word is vocalized fully or partially, the analyzer return the vocalized like words; This function treat the partial vocalized case.
source code
list.
check_word_as_stopword(self, word)
Check if the word is a stopword,
source code
list.
check_word_as_pounct(self, word)
Check if the word is a pounctuation,
source code
list.
check_word_as_verb(self, verb)
Analyze the word as verb.
source code
list.
check_word_as_noun(self, noun)
Analyze the word as noun.
source code
list.
context_analyze(self, result)
Deprecated: Analyze the context.
source code
text.
get_number_tags(self, word)
Check the numbers and return tags.
source code
Method Details [hide private]

text_treat(self, text)

source code 

deprecated: treat text to eliminate pountuation.

Parameters:
  • text (unicode; @return : treated text.) - input text;
Returns: unicode.

tokenize(self, text=u'')

source code 

Tokenize text into words

Parameters:
  • text (unicode.) - the input text.
Returns: list.
list of words.

text_tokenize(self, text)

source code 

Tokenize text into words, after treatement.

Parameters:
  • text (unicode.) - the input text.
Returns: list.
list of words.

set_debug(self, debug)

source code 

Set the debug attribute to allow printing internal analysis results.

Parameters:
  • debug (True/False.) - the debug value.

set_limit(self, limit)

source code 

Set the number of word treated in text.

Parameters:
  • limit (integer.) - the word number limit.

check_text(self, text, mode='all')

source code 

Analyze text morphologically

Parameters:
  • text (unicode.) - the input text.
  • mode (unicode.) - the mode of analysis as 'verbs', 'nouns', or 'all'.
Returns: list.
list of dictionaries of analyzed words with tags.

check_text_as_nouns(self, text)

source code 

Analyze text morphologically as nouns

Parameters:
  • text (unicode.) - the input text.
Returns: list.
list of dictionaries of analyzed words with tags.

check_text_as_verbs(self, text)

source code 

Analyze text morphologically as verbs

Parameters:
  • text (unicode.) - the input text.
Returns: list.
list of dictionaries of analyzed words with tags.

check_word(self, word)

source code 

Analyze one word morphologically as verbs

Parameters:
  • word (unicode.) - the input word.
Returns: list.
list of dictionaries of analyzed words with tags.

check_normalized(self, word_vocalised, resulted_data)

source code 

If the entred word is like the found word in dictionary, to treat some normalized cases, the analyzer return the vocalized like words; ُIf the word is ذئب, the normalized form is ذءب, which can give from dictionary ذئبـ ذؤب. this function filter normalized resulted word according the given word, and give ذئب.

Parameters:
  • word_vocalised (unicode.) - the input word.
  • resulted_data (list of dict.) - the founded resulat from dictionary.
Returns: list.
list of dictionaries of analyzed words with tags.

check_shadda(self, word_vocalised, resulted_data)

source code 

if the entred word is like the found word in dictionary, to treat some normalized cases, the analyzer return the vocalized like words. This function treat the Shadda case.

Parameters:
  • word_vocalised (unicode.) - the input word.
  • resulted_data (list of dict.) - the founded resulat from dictionary.
Returns: list.
list of dictionaries of analyzed words with tags.

check_partial_vocalized(self, word_vocalised, resulted_data)

source code 

if the entred word is vocalized fully or partially, the analyzer return the vocalized like words; This function treat the partial vocalized case.

Parameters:
  • word_vocalised (unicode.) - the input word.
  • resulted_data (list of dict.) - the founded resulat from dictionary.
Returns: list.
list of dictionaries of analyzed words with tags.

check_word_as_stopword(self, word)

source code 

Check if the word is a stopword,

Parameters:
  • word (unicode.) - the input word.
Returns: list.
list of dictionaries of analyzed words with tags.

check_word_as_pounct(self, word)

source code 

Check if the word is a pounctuation,

Parameters:
  • word (unicode.) - the input word.
Returns: list.
list of dictionaries of analyzed words with tags.

check_word_as_verb(self, verb)

source code 

Analyze the word as verb.

Parameters:
  • verb (unicode.) - the input word.
Returns: list.
list of dictionaries of analyzed words with tags.

check_word_as_noun(self, noun)

source code 

Analyze the word as noun.

Parameters:
  • noun (unicode.) - the input word.
Returns: list.
list of dictionaries of analyzed words with tags.

context_analyze(self, result)

source code 

Deprecated: Analyze the context.

Parameters:
  • result (list of dict.) - analysis result.
Returns: list.
filtred relust according to context.

get_number_tags(self, word)

source code 

Check the numbers and return tags.

Parameters:
  • word (unicode.) - the input word.
Returns: text.
tags.