Package aranalex :: Module stem_noun :: Class nounStemmer
[hide private]
[frames] | no frames]

Class nounStemmer

source code

Arabic noun stemmer

Instance Methods [hide private]
 
__init__(self, debug=False) source code
list.
stemming_noun(self, noun)
Analyze word morphologically as noun
source code
list.
steming_second_level(self, noun, noun2, procletic, encletic)
Analyze word morphologically by stemming the conjugation affixes.
source code
list.
find_broken_plural(self, broken, VocalisedEntree=False)
Look up for the broken plural in dictionary.
source code
list of pairs.
verify_affix(self, word, list_seg, affix_list)
Verify possible affixes in the resulted segments according to the given affixes list.
source code
list of unicode.
getStemVariants(self, stem, prefix, suffix)
Generate the Noun stem variants according to the affixes.
source code
True/False.
is_compatible_proaffix_affix(self, procletic, encletic, suffix)
Verify if proaffixes (sytaxic affixes) are compatable with affixes ( conjugation)
source code
 
create_index_broken_plural(self)
Deprecated: create index from the broken_plural dictionary to accelerate the search in the dictionary for broken_plural
source code
unicode.
vocalize(self, noun, proclitic, prefix, suffix, enclitic)
Join the noun and its affixes, and get the vocalized form
source code
unicode.
getSuffixVariant(self, word, suffix, enclitic)
Get the suffix variant to be joined to the word.
source code
unicode.
getWordVariant(self, word, suffix)
Get the word variant to be joined to the suffix.
source code
 
set_debug(self, debug)
Set the debug attribute to allow printing internal analysis results.
source code
Method Details [hide private]

stemming_noun(self, noun)

source code 

Analyze word morphologically as noun

Parameters:
  • noun (unicode.) - the input noun.
Returns: list.
list of dictionaries of analyzed words with tags.

steming_second_level(self, noun, noun2, procletic, encletic)

source code 

Analyze word morphologically by stemming the conjugation affixes.

Parameters:
  • noun (unicode.) - the input noun.
  • noun2 (unicode.) - the noun stemed from syntaxic affixes.
  • procletic (unicode.) - the syntaxic prefixe extracted in the fisrt stage.
  • encletic (unicode.) - the syntaxic suffixe extracted in the fisrt stage.
Returns: list.
list of dictionaries of analyzed words with tags.

find_broken_plural(self, broken, VocalisedEntree=False)

source code 

Look up for the broken plural in dictionary.

Parameters:
  • broken (unicode.) - the input word.
  • VocalisedEntree (Boolean.) - the entry is vocalized or not.
Returns: list.
list of found words.

verify_affix(self, word, list_seg, affix_list)

source code 

Verify possible affixes in the resulted segments according to the given affixes list.

Parameters:
  • word (unicode.) - the input word.
  • list_seg (list of pairs.) - list of word segments indexes (numbers).
Returns: list of pairs.
list of acceped segments.

getStemVariants(self, stem, prefix, suffix)

source code 

Generate the Noun stem variants according to the affixes.

For example مدرستي=>مدرست+ي => مدرسة +ي.

Return a list of possible cases.

Parameters:
  • stem (unicode.) - the input stem.
  • prefix (unicode.) - prefixe.
  • suffix (unicode.) - suffixe.
Returns: list of unicode.
list of stem variants.

is_compatible_proaffix_affix(self, procletic, encletic, suffix)

source code 

Verify if proaffixes (sytaxic affixes) are compatable with affixes ( conjugation)

Parameters:
  • procletic (unicode.) - first level prefix.
  • encletic (unicode.) - first level suffix.
  • suffix (unicode.) - second level suffix.
Returns: True/False.
compatible.

vocalize(self, noun, proclitic, prefix, suffix, enclitic)

source code 

Join the noun and its affixes, and get the vocalized form

Parameters:
  • noun (unicode.) - noun found in dictionary.
  • proclitic (unicode.) - first level prefix.
  • prefix (unicode.) - second level suffix.
  • suffix (unicode.) - second level suffix.
  • enclitic (unicode.) - first level suffix.
Returns: unicode.
vocalized word.

getSuffixVariant(self, word, suffix, enclitic)

source code 

Get the suffix variant to be joined to the word.

For example: word = مدرس, suffix=ة, encletic=ي. The suffix is convert to Teh.

Parameters:
  • word (unicode.) - word found in dictionary.
  • suffix (unicode.) - second level suffix.
  • enclitic (unicode.) - first level suffix.
Returns: unicode.
variant of suffix.

getWordVariant(self, word, suffix)

source code 

Get the word variant to be joined to the suffix.

For example: word = ةمدرس, suffix=ي. The word is converted to مدرست.

Parameters:
  • word (unicode.) - word found in dictionary.
  • suffix (unicode.) - suffix ( firts or second level).
Returns: unicode.
variant of word.

set_debug(self, debug)

source code 

Set the debug attribute to allow printing internal analysis results.

Parameters:
  • debug (True/False.) - the debug value.