Search / Comparison =================== Levenshtein ----------- Basic Usage ^^^^^^^^^^^ Calculates the Levenshtein distance between a and b. Usage:: >>> from web2py_utils import search >>> leve = search.Levenshtein() >>> leve.levenshtein('hello', 'hellp') 1 Suggestion ^^^^^^^^^^ Compares a string to a list of strings and returns list of closest matches. Sorted by lowest distance first. Usage:: >>> leve.suggestion('hello', ['hello', 'world', 'mello', 'hi', 'hell'], number_of_matches=3) ['hello', 'hell', 'mello'] NCD --- Performs an NCD (Normalized Compression Distance) comparison between two keys. You can read some documentation on the algorithm here: http://www.sophos.com/blogs/sophoslabs/?p=188 These can either be the contents of a file or search terms. This will be a float value between 0.0 and 1.1, where the lower the value the closer in similarity. Basic Usage ^^^^^^^^^^^ Calculates the similarity between key1 and key2 Usage:: >>> from web2py_utils.search import ncd >>> key1 = "h3ll0" >>> key2 = "hello" >>> key3 = "lehlo" >>> key4 = "aeiou" >>> ncd(key1, key2) 0.071428571428571425 >>> ncd(key2, key3) 0.024390243902439025 >>> ncd(key3, key4) 0.11904761904761904 >>> ncd(key1, key3) 0.095238095238095233 >>> ncd(key1, key4) 0.16666666666666666 >>> ncd(key2, key4) 0.095238095238095233 NGram ----- Usage:: # A simple example >>> base = ['sdafaf', 'asfwef', 'asdfawe', 'adfwe', 'askfjwehiuasdfji'] >>> tg = search.Ngram(base, min_sim = 0.0) >>> pprint.pprint(tg.getSimilarStrings('askfjwehiuasdfji')) >>> print >>> pprint.pprint(Ngram.compare('sdfeff', 'sdfeff')) >>> print >>> pprint.pprint(tg.getBestMatch('afadfwe', 2)) Constructor PARAMETERS:: haystack - String or list of strings. This is where we will look for matches OPTIONAL PARAMETERS:: min_sim - minimum similarity score for a string to be considered worthy. Default = 0.0 warp - If warp > 1 short strings are getting away better, if warp < 1 they are getting away worse. Default = 1.0 ic - Ignore case? Default = False only_alnum - Only consider alphanumeric characters? Default = False ngram_len - n-gram size. Default = 3 (trigram) padding - padding size. Default = ngram_len - 1 noise - noise characters that should be ignored when comparing .. autoclass:: web2py_utils.search.Ngram :members: :undoc-members: