discretizer – Discretization algorithms

Currently, Pebl only includes one discretization implementation but more may come. Discretization and other data pre-processing steps can have a big impact on the final results.

pebl.discretizer.maximum_entropy_discretize(indata, includevars=None, excludevars=[], numbins=3)

Performs a maximum-entropy discretization of data in-place.

Requirements for this implementation:

  1. Try to make all bins equal sized (maximize the entropy)
  2. If datum x==y in the original dataset, then disc(x)==disc(y) For example, all datapoints with value 3.245 discretize to 1 even if it violates requirement 1.

Example:

input: [3,7,4,4,4,5] output: [0,1,0,0,0,1]

Note that all 4s discretize to 0, which makes bin sizes unequal.

Previous topic

data – Pebl Dataset

Next topic

evaluator – Network evaluators

This Page