errorgeopy package

Submodules

errorgeopy.address module

Contains the Address class, representing a collection of reverse geocoding results. Primarily, this functions as a container for a set of errorgeopy.Location objects after a successful reverse geocode, and exposes methods that operate on this set of results, including:

  • de-duplication
  • extracting the results that best match a pre-expected outcome
  • finding the longest common substring of candidate addresses
class errorgeopy.address.Address(*args, **kwargs)[source]

Bases: object

Represents a collection of parsed reverse geocoder responses (parsed with geopy). Each member of the address property (which is iterable) is a geopy.address object. The raw respones can therefore be obtained with:

>>> [a.raw for a in Address.addresses]

errorgeopy adds methods that operate on the collection of addresses that consider the set of addresses as a related set.

Attributes:
addresses (list): Collection of reverse geocoding responses from as many services that were capable of returning a response to a query. Each member of the array is a geopy.location.Location object.
addresses

A list of reverse geocoding results from all configured providers. The single central property of the Address object.

Notes:
Depending on configuration, a provider may return more than one result for a given query. All results from all providers are available in this property, in a flat (not nested) structure. The list may be empty if no provider could match an address.
dedupe(threshold=95)[source]

Produces a fuzzily de-duplicated version of the candidate addresses, using fuzzywuzzy.proccess.dedupe.

Note:
See https://github.com/seatgeek/fuzzywuzzy/blob/master/fuzzywuzzy/process.py for detail on the deduplication algorithm implementation. This method does not modify the Address.addresses. property.
Kwargs:
threshold (int): the numerical value (0,100) point at which you expect to find duplicates. Defaults to 95 out of 100, which is higher than the fuzzywuzzy default (70); this higher threshold is used by defauly since addresses are more sensitive to small changes (e.g. “250 Main Street” and “150 Main Street” have a small edit distance when considered as strings, but may have a reasonably large physical distance when considered as physical addresses).
Returns:
A list of geopy.location.Location objects (essentially a filtered list of the original set).
extract(extraction, limit=4)[source]

Returns the address or addresses within the set of the reverse geocoded addresses that best match an expected result. Uses fuzzywuzzy under the hood for matching.

Args:
expectation (str): The string indicating your expected result for a reverse geocoding operation. It should probably look like an address. Results are returned in the order that best meets this expected address.
Kwargs:
limit (int): The maximum number of match candidates to retrieve from fuzzywuzzy. The length of the returned array may be longer, if the set of addresses has identical addresses that are good matches for the expected address (i.e. if two geocoders resolve to the same string address).
Returns:
list. Return value is a list of tuples, where each tuple contains a geopy Location, and a matching score based on an extension of the Levenshtien distance between the expectation and the Location’s address (a higher score is a better match). The algorithm is implemented by SeatGeek’s fuzzywuzzy, and you can read more here: http://chairnerd.seatgeek.com/fuzzywuzzy-fuzzy-string-matching-in-python/
longest_common_sequence(separator='')[source]

Returns the longest common sequence of the reverse geocoded addresses... or it would, if I had written this code. Raises:

NotImplementedError
longest_common_substring(dedupe=False)[source]

Returns the longest common substring of the reverse geocoded addresses. Note that if there is no common substring, a string of length zero is returned. If the longest common substring is whitespace, that is stripped, and a string of length zero is returned.

Kwargs:
dedupe (bool): whether to first perform a deduplication operation on the set of addresses. Defaults to False.
Returns:
str
parse()[source]

Raises: NotImplementedError

regex()[source]

Returns a regular expression that matches all of the reverse geocoded addresses... well it would if I had written this code.

Raises:
NotImplementedError
tag(summarise=True)[source]

Raises: NotImplementedError

errorgeopy.geocoders module

Contains the Geocoder and GeocoderPool classes, representing one, and a pool of pre-configured geocoders, respectively.

Geocoder is a very thin piece of wrapping over geopy.geocoders.base.Geocoder that primarily just initialises a geopy.Geocoder instance by referring to it by name and passing configuration.

GeocoderPool coordinates reading of configuration (file or dictionary) of a suite of geocoders that you should configure, although a small number are available with no configuration. The GeoCoder pool then coordinates requests via individual Geocoder objects, handling failures and geocoding in parallel for the sake of efficiency. Both forward and backward (“reverse”) geocoding is supported, but note that not all geocoding services exposed via errorgeopy support both methods.

class errorgeopy.geocoders.Geocoder(name, config)[source]

Bases: object

A single geocoder exposing access to a geocoding web service with geopy. Thin wrapping over the geopy.Geocoder set of geocoding services. Used by errorgeopy.GeocoderPool to access the configuration of each component service. The base geopy.Geocoder object can be obtained via the geocoder attribute.

config

The configuration of the geocoder (less the kwargs for the geocode and reverse methods), as a dictionary.

geocoder

The geopy.Geocoder instance.

name

The string name of the geocoder.

class errorgeopy.geocoders.GeocoderPool(config=None, geocoders=None)[source]

Bases: object

A “pool” of objects that inherit from geopy.geocoders.base.Geocoder, with configuration specific to each service. Represents the inputs for geocoding operations that span across multiple providers. Queries are run in parallel across all configured geocoding providers, and results are intended to be a composition of multiple responses from different providers with coherent configuration (e.g. a universal country_bias), although this is not enforced.

config

The (parsed) configuration that will be referred to when geocoding, as a dictionary.

classmethod fromfile(config, caller=None)[source]

Instantiates a GeocoderPool from a configuration file. For example, a config.yml file may look like:

ArcGIS:
  geocode:
    exactly_one: true
  reverse:
    distance: 1000
Nominatim:
  country_bias: "New Zealand"
  geocode:
    addressdetails: true
    language: en
    exactly_one: false
  reverse:
    exactly_one: true
    language: en

Then you could use this classmethod as follows:

>>> import yaml
>>> from errorgeopy.geocoders import GeocoderPool
>>> gpool = GeocoderPool.fromfile('./config.yml', yaml.load)
Args:
config (str): path to a configuration file on your system.
Kwargs:
caller (function): optional method that will parse the config file into a Python dictionary with keys matching GeoPy geocoder names, and those keys holding values that are also dictionaries: function signatures for geocode and reverse, and any other geocoder-specific configuration (e.g. country_bias above).
geocode(query)[source]

Forward geocoding: given a string address, return a point location. ErrorGeoPy does this, and also provides you with ways to interrogate the spatial error in the result.

Args:
query (str): Address you want to find the location of (with spatial
error).
Returns:
A list of errorgeopy.address.Address instances.
geocoders

The list of unique geocoders that will be used when geocoding. Each member of the array inherits from geopy.geocoder.base.

reverse(query)[source]

Reverse geocoding: given a point location, returns a string address. ErrorGeoPy does this, and also provides you with ways to interrogate the uncertainty in the result.

Args:
query (geopy.point.Point, iterable of (lat, lon), or string as “%(latitude)s, %(longitude)s”): The coordinates for which you wish to obtain the closest human-readable addresses.
Returns:
A list of errorgeopy.location.Location instances.

errorgeopy.location module

A “Location” is a collecion of responses from geocoding services, each one a distinct attempt to either find a string address given a point (reverse geocode) or an attempt to find a point that best matches a string address (forward geocode). A Location is a collection, because fundamentally ErrorGeoPy is oriented to working across providers, and considering all of their results as a related set of responses.

A “LocationClusters” object, also defined here, is also a collection of addresses. but is slightly less abstract in that the members of the collection are organised into clusters, based on some clustering algorithm.

Heavy use is made of shapely in return values of methods for these classes.

class errorgeopy.location.Location(*args, **kwargs)[source]

Bases: object

Represents a collection of parsed geocoder responses, each of which are geopy.Location objects, representing the results of different geocoding services for the same query.

addresses

geopy.Location.address properties for all candidate locations as a sequence of strings.

centroid

A shapely.geometry.Point of the centre of all candidate address locations (centre of the multipoint).

clusters

Clusters that have been identified in the Location’s candidate addresses, as an errorgeopy.location.LocationClusters object.

concave_hull

A concave hull of the Location, as a shapely.geometry.Polygon object. Needs at least four candidates, or else this property is None.

Kwargs:
alpha (float): The parameter for the alpha shape
convex_hull

A convex hull of the Location, as a shapely.geometry.Polygon object. Needs at least three candidates, or else this property is None.

locations

A sequence of geopy.Location objects.

mbc

A shapely.geometry.Polygon representing the minimum bounding circle of the candidate locations.

most_central_location

A shapely.geometry.Point representing the geometry of the candidate location that is nearest to the geometric centre of all of the candidate locations.

multipoint

A shapely.geometry.MultiPoint of the Location members.

points

An array of geopy.Point objects representing the candidate locations physical positions. These are geodetic points, with latitude, longitude, and altitude (in kilometres, when supported by providers; defaults to 0.

class errorgeopy.location.LocationClusters(location)[source]

Bases: object

Represents clusters of addresses identified from an errorgeopy.Location object, which itself is one coherent collection of respones from multiple geocoding services for the same query.

cluster_centres

Multipoint of cluster geometric centroids.

clusters

A sequence of clusters identified from the input. May have length 0 if no clusters can be determined.

geometry_collection

GeometryCollection of clusters as multipoint geometries.

Module contents