enchant.checker: High-level spellchecking functionality

This package is designed to host higher-level spellchecking functionality than is available in the base enchant package. It should make writing applications that follow common usage idioms significantly easier.

The most useful class is SpellChecker, which implements a spellchecking loop over a block of text. It is capable of modifying the text in-place if given an array of characters to work with.

This package also contains several interfaces to the SpellChecker class, such as a wxPython GUI dialog and a command-line interface.

class enchant.checker.SpellChecker(lang=None, text=None, tokenize=None, chunkers=None, filters=None)

Class implementing stateful spellchecking behaviour.

This class is designed to implement a spell-checking loop over a block of text, correcting/ignoring/replacing words as required. This loop is implemented using an iterator paradigm so it can be embedded inside other loops of control.

The SpellChecker object is stateful, and the appropriate methods must be called to alter its state and affect the progress of the spell checking session. At any point during the checking session, the attribute ‘word’ will hold the current erroneously spelled word under consideration. The action to take on this word is determined by calling methods such as ‘replace’, ‘replace_always’ and ‘ignore_always’. Once this is done, calling ‘next’ advances to the next misspelled word.

As a quick (and rather silly) example, the following code replaces each misspelled word with the string “SPAM”:

>>> text = "This is sme text with a fw speling errors in it."
>>> chkr = SpellChecker("en_US",text)
>>> for err in chkr:
...   err.replace("SPAM")
...
>>> chkr.get_text()
'This is SPAM text with a SPAM SPAM errors in it.'
>>>

Internally, the SpellChecker always works with arrays of (possibly unicode) character elements. This allows the in-place modification of the string as it is checked, and is the closest thing Python has to a mutable string. The text can be set as any of a normal string, unicode string, character array or unicode character array. The ‘get_text’ method will return the modified array object if an array is used, or a new string object if a string it used.

Words input to the SpellChecker may be either plain strings or unicode objects. They will be converted to the same type as the text being checked, using python’s default encoding/decoding settings.

If using an array of characters with this object and the array is modified outside of the spellchecking loop, use the ‘set_offset’ method to reposition the internal loop pointer to make sure it doesn’t skip any words.

add(word=None)

Add given word to the personal word list.

If no word is given, the current erroneous word is added.

add_to_personal(word=None)

Add given word to the personal word list.

If no word is given, the current erroneous word is added.

check(word)

Check correctness of the given word.

coerce_string(text, enc=None)

Coerce string into the required type.

This method can be used to automatically ensure that strings are of the correct type required by this checker - either unicode or standard. If there is a mismatch, conversion is done using python’s default encoding unless another encoding is specified.

get_text()

Return the spell-checked text.

ignore_always(word=None)

Add given word to list of words to ignore.

If no word is given, the current erroneous word is added.

leading_context(chars)

Get <chars> characters of leading context.

This method returns up to <chars> characters of leading context - the text that occurs in the string immediately before the current erroneous word.

next()

Process text up to the next spelling error.

This method is designed to support the iterator protocol. Each time it is called, it will advance the ‘word’ attribute to the next spelling error in the text. When no more errors are found, it will raise StopIteration.

The method will always return self, so that it can be used sensibly in common idioms such as:

for err in checker:
err.do_something()
replace(repl)

Replace the current erroneous word with the given string.

replace_always(word, repl=None)

Always replace given word with given replacement.

If a single argument is given, this is used to replace the current erroneous word. If two arguments are given, that combination is added to the list for future use.

set_offset(off, whence=0)

Set the offset of the tokenization routine.

For more details on the purpose of the tokenization offset, see the documentation of the ‘enchant.tokenize’ module. The optional argument whence indicates the method by which to change the offset:

  • 0 (the default) treats <off> as an increment
  • 1 treats <off> as a distance from the start
  • 2 treats <off> as a distance from the end
set_text(text)

Set the text to be spell-checked.

This method must be called, or the ‘text’ argument supplied to the constructor, before calling the ‘next()’ method.

suggest(word=None)

Return suggested spellings for the given word.

If no word is given, the current erroneous word is used.

trailing_context(chars)

Get <chars> characters of trailing context.

This method returns up to <chars> characters of trailing context - the text that occurs in the string immediately after the current erroneous word.

wants_unicode()

Check whether the checker wants unicode strings.

This method will return True if the checker wants unicode strings as input, False if it wants normal strings. It’s important to provide the correct type of string to the checker.