evaluator – Network evaluators

Greedy learning algorithms work by scoring small, local changes to existing networks. To do this efficiently, one must maintain state to eliminate redundant computation or unnecessary cache retrievals – that is, we should only rescore nodes that have changed. Maintaining this state can make implementing efficient versions of learning algorithms more difficult than a naive implementation.

The classes in this module provide helpers that encapsulate all the state-management complexities required for efficient scoring. As long as callers make changes to networks in a transactional manner (using the provided methods), networks will be scored efficiently without redendant computation.

The main evaluator is the SmartNetworkEvaluator class. It’s interface is described below.

Note: Most user’s shouldn’t need to use this module directly. All included learners encapsulate this functionality. This is really only for writing custom learners.

LocalscoreCache

Although most users will never use the localscore cache directly, using pebl with large datasets will require setting the maximum size of the cache to avoid memory issues. There is only one relevant configuration parameter.

localscore_cache.maxsize

Max number of localscores to cache. Default=-1 means unlimited size. default=-1

SmartNetworkEvaluator

class pebl.evaluator.SmartNetworkEvaluator(data_, network_, prior_=None, localscore_cache=None)

Create a ‘smart’ network evaluator.

This network evaluator eliminates redundant computation by keeping track of changes to network and only rescoring the changes. This requires that all changes to the network are done through this evaluator’s methods.

The network can be altered by the following methods:
  • alter_network
  • score_network
  • randomize_network
  • clear_network

The last change applied can be ‘undone’ with restore_network

alter_network(add=[], remove=[])

Alter the network while retaining the ability to quickly undo the changes.

clear_network()

Clear all edges from the network.

randomize_network()

Randomize the network edges.

restore_network()

Undo the last change to the network (and score).

Undo the last change performed by any of these methods:
  • score_network
  • alter_network
  • randomize_network
  • clear_network
score_network(net=None)

Score a network.

If net is provided, scores that. Otherwise, score network previously set.

Network Evaluators for use with Missing Values

Scoring networks with missing values requires use of sampling algorithms to sample over the space of possible completions for the missing values. Pebl provides a few algorithms for this.

Configuration Parameters

evaluator.missingdata_evaluator
Evaluator to use for handling missing data. Choices include:
  • gibbs: Gibb’s sampling

  • maxentropy_gibbs: Gibbs’s sampling over all completions of the missing values that result in maximum entropy discretization for the variables.

  • exact: exact enumeration of all possible missing values (only

    useable when there are few missing values)

default=gibbs

gibbs.burnin

Burn-in period for the gibbs sampler (specified as a multiple of the number of missing values) default=10

gibbs.stopping_criteria

Stopping criteria for the gibbs sampler.

Should be a valid python expression that evaluates to true when gibbs should stop. It can use the following variables:

  • iters: number of iterations
  • n: number of missing values

Examples:

  • iters > n**2 (for n-squared iterations)
  • iters > 100 (for 100 iterations)

default=iters > n**2

MissingDataNetworkEvaluator

class pebl.evaluator.MissingDataNetworkEvaluator(data_, network_, prior_=None, localscore_cache=None, **options)

Create a network evaluator for use with missing values.

This evaluator uses a Gibb’s sampler for sampling over the space of possible completions for the missing values.

For more information about Gibb’s sampling, consult:

  1. http://en.wikipedia.org/wiki/Gibbs_sampling
  2. D. Heckerman. A Tutorial on Learning with Bayesian Networks. Microsoft Technical Report MSR-TR-95-06, 1995. p.21-22.

Any config param for ‘gibbs’ can be passed in via options. Use just the option part of the parameter name.

score_network(net=None, gibbs_state=None)

Score a network.

If net is provided, scores that. Otherwise, score network previously set.

The default stopping criteria is to run for n**2 iterations.

gibbs_state is the state of a previous run of the Gibb’s sampler. With this, one can do the following:

myeval = evaluator.MissingDataNetworkEvaluator(...)
myeval.score_network(...)
gibbs_state = myeval.gibbs_state
cPickle.dump(gibbs_state, 'gibbs_state.txt')

# look at results, do other analysis, etc
# If we decide that we need further Gibb's sampler iterations, we
# don't need to restart
gibbs_state = cPickle.load(open('gibbs_state.txt'))
myeval = evaluator.MissingDataNetworkEvaluator(...)

# continue with the previous run of the Gibb's sampler
myeval.score_network(
    gibbs_state=gibbs_state,
    stopping_criteria=lambda i,N: i>200*N**2
)
class pebl.evaluator.MissingDataMaximumEntropyNetworkEvaluator(data_, network_, prior_=None, localscore_cache=None, **options)

MissingDataNetworkEvaluator that uses a different space of completions.

This evaluator only samples from missing value completions that result in a maximum entropy discretization for the variables with missing values. This is useful when the rest of the variables are maximum-entropy discretized because then all variables have the same entropy.

Create a network evaluator for use with missing values.

This evaluator uses a Gibb’s sampler for sampling over the space of possible completions for the missing values.

For more information about Gibb’s sampling, consult:

  1. http://en.wikipedia.org/wiki/Gibbs_sampling
  2. D. Heckerman. A Tutorial on Learning with Bayesian Networks. Microsoft Technical Report MSR-TR-95-06, 1995. p.21-22.

Any config param for ‘gibbs’ can be passed in via options. Use just the option part of the parameter name.

score_network(net=None, gibbs_state=None)

Score a network.

If net is provided, scores that. Otherwise, score network previously set.

Note: See MissingDataNetworkEvaluator.score_network for more information about arguments.

class pebl.evaluator.MissingDataExactNetworkEvaluator(data_, network_, prior_=None, localscore_cache=None, **options)

MissingDataNEtworkEvaluator that does an exact enumeration.

This network evaluator enumerates over all possible completions of the missing values. Since this is a combinatorial space, this class is only feasible with datasets with few missing values.

Create a network evaluator for use with missing values.

This evaluator uses a Gibb’s sampler for sampling over the space of possible completions for the missing values.

For more information about Gibb’s sampling, consult:

  1. http://en.wikipedia.org/wiki/Gibbs_sampling
  2. D. Heckerman. A Tutorial on Learning with Bayesian Networks. Microsoft Technical Report MSR-TR-95-06, 1995. p.21-22.

Any config param for ‘gibbs’ can be passed in via options. Use just the option part of the parameter name.

score_network(net=None, stopping_criteria=None, gibbs_state=None)

Score a network.

If net is provided, scores that. Otherwise, score network previously set.

Note: See MissingDataNetworkEvaluator.score_network for more information about arguments.

Factory Functions

pebl.evaluator.fromconfig(data_=None, network_=None, prior_=None)

Create an evaluator based on configuration parameters.

This function will return the correct evaluator based on the relevant configuration parameters.