Search / Comparison
===================

Levenshtein
-----------

Basic Usage
^^^^^^^^^^^

Calculates the Levenshtein distance between a and b.

Usage::

    >>> from web2py_utils import search

    >>> leve = search.Levenshtein()

    >>> leve.levenshtein('hello', 'hellp')
    1

Suggestion
^^^^^^^^^^

Compares a string to a list of strings and returns list of closest matches. Sorted
by lowest distance first.

Usage::

    >>> leve.suggestion('hello', ['hello', 'world', 'mello', 'hi', 'hell'],
                        number_of_matches=3)
    ['hello', 'hell', 'mello']

NCD
---

Performs an NCD (Normalized Compression Distance) comparison between two keys.

You can read some documentation on the algorithm here: http://www.sophos.com/blogs/sophoslabs/?p=188

These can either be the contents of a file or search terms.

This will be a float value between 0.0 and 1.1, where the lower the value the
closer in similarity.

Basic Usage
^^^^^^^^^^^

Calculates the similarity between key1 and key2

Usage::

    >>> from web2py_utils.search import ncd
    >>> key1 = "h3ll0"
    >>> key2 = "hello"
    >>> key3 = "lehlo"
    >>> key4 = "aeiou"

    >>> ncd(key1, key2)
    0.071428571428571425
    >>> ncd(key2, key3)
    0.024390243902439025
    >>> ncd(key3, key4)
    0.11904761904761904
    >>> ncd(key1, key3)
    0.095238095238095233
    >>> ncd(key1, key4)
    0.16666666666666666
    >>> ncd(key2, key4)
    0.095238095238095233

NGram
-----

Usage::

    # A simple example
    >>> base = ['sdafaf', 'asfwef', 'asdfawe', 'adfwe', 'askfjwehiuasdfji']
    >>> tg = search.Ngram(base, min_sim = 0.0)

    >>> pprint.pprint(tg.getSimilarStrings('askfjwehiuasdfji'))
    >>> print
    >>> pprint.pprint(Ngram.compare('sdfeff', 'sdfeff'))
    >>> print
    >>> pprint.pprint(tg.getBestMatch('afadfwe', 2))


Constructor

PARAMETERS::

    haystack - String or list of strings. This is where we will look for
               matches

OPTIONAL PARAMETERS::

    min_sim     - minimum similarity score for a string to be considered
                  worthy. Default = 0.0
    warp        - If warp > 1 short strings are getting away better, if
                  warp < 1 they are getting away worse. Default = 1.0
    ic          - Ignore case? Default = False
    only_alnum  - Only consider alphanumeric characters? Default = False
    ngram_len   - n-gram size. Default = 3 (trigram)
    padding     - padding size. Default = ngram_len - 1
    noise       - noise characters that should be ignored when comparing


.. autoclass:: web2py_utils.search.Ngram
   :members:
   :undoc-members: