orangecontrib.earth API reference

Learner/Classifier

The EarthLearner and EarthClassifier provide the standard Orange learner/classifier pair for model induction/prediction.

class orangecontrib.earth.EarthLearner(degree=1, terms=None, penalty=None, thresh=0.001, min_span=0, new_var_penalty=0, fast_k=20, fast_beta=1, pruned_terms=None, scale_resp=True, store_instances=True, **kwds)

Bases: Orange.regression.base.BaseRegressionLearner

Earth learner class.

Supports both regression and classification problems. For classification, class values are expanded into continuous indicator columns (one for each value if the number of values is grater then 2), and a multi response model is fit to these new columns. The resulting classifier computes response values on new instances to select the final predicted class.

Parameters:
  • degree (int) – Maximum degree (num. of hinge functions per term) of the terms in the model (default: 1).
  • terms (int) – Maximum number of terms in the forward pass. If set to None (default), min(200, max(20, 2 * n_attributes)) + 1 will be used, like the default setting in earth R package.
  • penalty (float) – Penalty for hinges in the GCV computation (used in the pruning pass). Default is 3.0 if degree is above 1, and 2.0 otherwise.
  • thresh (float) – Threshold for RSS decrease in the forward pass (default: 0.001).
  • min_span (int) – TODO.
  • new_var_penalty (float) – Penalty for introducing a new variable in the model during the forward pass (default: 0).
  • fast_k (int) – Fast k.
  • fast_beta (float) – Fast beta.
  • pruned_terms (int) – Maximum number of terms in the model after pruning (default: None, no limit).
  • scale_resp (bool) – Scale responses prior to forward pass (default: True); Ignored for models with multiple responses.
  • store_instances (bool) – Store training instances in the model (default: True).
__call__(instances, weight_id=None)

Train an EarthClassifier instance on the instances.

Parameters:instances (Orange.data.Table) – Training instances.

Note

weight_id is ignored.

class orangecontrib.earth.EarthClassifier(domain, best_set, dirs, cuts, betas, subsets=None, rss_per_subset=None, gcv_per_subset=None, instances=None, multitarget=False, expanded_class=None, **kwargs)

Bases: Orange.classification.ClassifierFD

Earth classifier.

__call__(instance, result_type=0)

Predict the response value on instance.

Parameters:instance (Orange.data.Instance) – Input data instance.
__weakref__

list of weak references to the object (if defined)

base_features()

Return a list of constructed features of Earth terms.

The features can be used in Orange’s domain translation (i.e. they define the proper get_value_from() functions).

Return type:list of Orange.feature.Descriptor
base_matrix(instances=None)

Return the base matrix (bx) of the Earth model for the table.

Base matrix is a len(instances) * num_terms matrix of computed values of terms in the model (not multiplied by beta) for each instance.

If table is not supplied, the base matrix of the training instances is returned.

Parameters:instances (Orange.data.Table) – Input instances for the base matrix.
evimp(used_only=True)

Return the estimated variable importances.

Parameters:used_only (bool) – If True return only used attributes.
predict(instance)

Predict the response values.

Parameters:instance (Orange.data.Instance) – Data instance
to_string(precision=3, indent=3)

Return a string representation of the model.

This is also the default string representation of the model (as returned by str())

used_attributes(term=None)

Return the used features in term (index).

If no term is given, return all features used in the model.

Parameters:term (int) – Term index

Example:

>>> import Orange, orangecontrib.earth
>>> data = Orange.data.Table("housing")
>>> c = orangecontrib.earth.EarthLearner(data, degree=2, terms=10)
>>> print c
MEDV =
   23.587
   +11.896 * max(0, RM - 6.431)
   +1.142 * max(0, 6.431 - RM)
   -0.612 * max(0, LSTAT - 6.120)
   -228.795 * max(0, NOX - 0.647) * max(0, RM - 6.431)
   +0.023 * max(0, TAX - 307.000) * max(0, 6.120 - LSTAT)
   +0.029 * max(0, 307.000 - TAX) * max(0, 6.120 - LSTAT)

Feature scoring

class orangecontrib.earth.ScoreEarthImportance(t=10, degree=2, terms=10, score_what='nsubsets', cached=True)

Bases: Orange.feature.scoring.Score

A subclass of Orange.feature.scoring.Score that scores features based on their importance in the Earth model using bagged_evimp.

Parameters:
  • t (int) – Number of earth models to train on the data.
  • degree (int) – The maximum degree of the induced models.
  • terms (int) – Maximum number of terms induced in the forward pass.
  • score_what (int) –

    What to return as a score. Can be one of:

    • "nsubsets"
    • "rss"
    • "gcv"

    string or or class constants:

    • NSUBSETS
    • RSS
    • GCV
GCV = 2

GCV increase when the feature was removed during the pruning pass (averaged over all t models)

NSUBSETS = 0

The number of subsets the feature is included during the pruning pass.

RSS = 1

Residual squared error increase when the feature was removed during the pruning pass (averaged over all t models)

__call__(attr, data, weight_id=None)

Return the score for attr as evaluated on data.

Parameters:

Note

weight_id is ignored.

__weakref__

list of weak references to the object (if defined)

See also

Orange.feature.scoring

Utility functions

orangecontrib.earth.gcv(rss, n, n_effective_params)

Return the generalized cross validation (GCV).

gcv = rss / (n * (1 - NumEffectiveParams / n) ^ 2)

Parameters :

rss : array_like

Residual sum of squares.

n : float

Number of training instances.

n_effective_params : array_like

Number of effective parameters.

orangecontrib.earth.plot_evimp(evimp)

Plot the variable importances as returned from EarthClassifier.evimp().

import Orange, orangecontrib.earth
data = Orange.data.Table("housing")
c = orangecontrib.earth.EarthLearner(data, degree=3)
orangecontrib.earth.plot_evimp(c.evimp())
_images/earth-evimp.png

The left axis is the nsubsets measure and on the right are the normalized RSS and GCV.

orangecontrib.earth.bagged_evimp(classifier, used_only=True)

Extract combined (average) evimp() from an instance of BaggedClassifier using EarthLearner as a base_learner.

Example:

from Orange.ensemble.bagging import BaggedLearner
bc = BaggedLearner(EarthLearner(degree=3, terms=10), data)
bagged_evimp(bc)

Table Of Contents

Previous topic

Overview

Next topic

Earth Learner

This Page