tablecpdfactorization

This module provides tools for creating and using factorized representations of Bayesian networks. Factorized representations of Bayesian networks are discrete CPDs whose values have been flattened into a single array, while the cardinalities and strides of each variable represented are kept track of separately. With the proper setup, these flattened structures can be more easily multiplied together, reduced, and operated on. For more information on factors cf. Koller et al. Ch. 4.

class libpgm.tablecpdfactorization.TableCPDFactorization(bn)[source]

This class represents a factorized Bayesian network with discrete CPD tables. It contains the attributes bn, originalfactorlist, and factorlist, and the methods refresh, sumproductve, sumproducteliminatevar, condprobve, specificquery, and gibbssample.

This class is constructed with a DiscreteBayesianNetwork instance as argument. First, it takes the input itself and stores it in the bn attribute. Then, it transforms the information of each of these nodes from standard discrete CPD form into a TableCPDFactor isntance and stores the instances in an array in the attribute originalfactorlist. Finally, it makes a copy of this list to work with and stores it in factorlist.

bn = None

The Bayesian network used as argument at instantiation.

originalfactorlist = None

A list of TableCPDFactor instances, one per node.

factorlist = None

A working copy of originalfactorlist.

refresh()[source]

Refresh the factorlist attribute to equate with originalfactorlist. This is in effect a reset of the system, erasing any changes to factorlist that the program has executed.

sumproducteliminatevar(vertex)[source]

Multiply the all the factors in factorlist that have vertex in their scope, then sum out vertex from the resulting product factor. Replace all factors that were multiplied together with the resulting summed-out product.

Arguments:
  1. vertex - The name of the variable to eliminate.
Attributes modified:
  1. factorlist – Modified to reflect the eliminated variable.

For more information on this algorithm cf. Koller et al. 298

sumproductve(vertices)[source]

Eliminate each vertex in vertices from factorlist using sumproducteliminatevar.

Arguments:
  1. vertices – A list of UUIDs of vertices to be eliminated.
Attributes modified:
  1. factorlist – modified to become a single factor representing the remaining variables.
condprobve(query, evidence)[source]

Eliminate all variables in factorlist except for the ones queried. Adjust all distributions for the evidence given. Return the probability distribution over a set of variables given by the keys of query given evidence.

Arguments:
  1. query – A dict containing (key: value) pairs reflecting (variable: value) that represents what outcome to calculate the probability of.
  2. evidence – A dict containing (key: value) pairs reflecting (variable: value) that represents what is known about the system.
Attributes modified:
  1. factorlist – Modified to be one factor representing the probability distribution of the query variables given the evidence.

The function returns factorlist after it has been modified as above.

Usage example: this code would return the distribution over a queried node, given evidence:

import json

from libpgm.graphskeleton import GraphSkeleton
from libpgm.nodedata import NodeData
from libpgm.discretebayesiannetwork import DiscreteBayesianNetwork
from libpgm.tablecpdfactorization import TableCPDFactorization

# load nodedata and graphskeleton
nd = NodeData()
skel = GraphSkeleton()
nd.load("../tests/unittestdict.txt")
skel.load("../tests/unittestdict.txt")

# toporder graph skeleton
skel.toporder()

# load evidence
evidence = dict(Letter='weak')
query = dict(Grade='A')

# load bayesian network
bn = DiscreteBayesianNetwork(skel, nd)

# load factorization
fn = TableCPDFactorization(bn)

# calculate probability distribution
result = fn.condprobve(query, evidence)

# output
print json.dumps(result.vals, indent=2)
print json.dumps(result.scope, indent=2)
print json.dumps(result.card, indent=2)
print json.dumps(result.stride, indent=2)
specificquery(query, evidence)[source]

Eliminate all variables except for the ones specified by query. Adjust all distributions to reflect evidence. Return the entry that matches the exact probability of a specific event, as specified by query.

Arguments:
  1. query – A dict containing (key: value) pairs reflecting (variable: value) that represents what outcome to calculate the probability of.
  2. evidence – A dict containing (key: value) pairs reflecting (variable: value) evidence that is known about the system.
Attributes modified:
  1. factorlist – Modified as in condprobve.

The function then chooses the entries of factorlist that match the queried event or events. It then operates on them to return the probability that the event (or events) specified will occur, represented as a float between 0 and 1.

Note that in this function, queries of the type P((x=A or x=B) and (y=C or y=D)) are permitted. They are executed by formatting the query dictionary like so:

{
    "x": ["A", "B"],
    "y": ["C", "D"]
}

Usage example: this code would answer the specific query that vertex Grade gets outcome A given that Letter has outcome weak, in this Bayesian network:

import json

from libpgm.graphskeleton import GraphSkeleton
from libpgm.nodedata import NodeData
from libpgm.discretebayesiannetwork import DiscreteBayesianNetwork
from libpgm.tablecpdfactorization import TableCPDFactorization

# load nodedata and graphskeleton
nd = NodeData()
skel = GraphSkeleton()
nd.load("../tests/unittestdict.txt")
skel.load("../tests/unittestdict.txt")

# toporder graph skeleton
skel.toporder()

# load evidence
evidence = dict(Letter='weak')
query = dict(Grade='A')

# load bayesian network
bn = DiscreteBayesianNetwork(skel, nd)

# load factorization
fn = TableCPDFactorization(bn)

# calculate probability distribution
result = fn.specificquery(query, evidence)

# output
print result
gibbssample(evidence, n)[source]

Return a sequence of n samples using the Gibbs sampling method, given evidence specified by evidence. Gibbs sampling is a technique wherein for each sample, each variable in turn is erased and calculated conditioned on the outcomes of its neighbors. This method starts by sampling from the ‘prior distribution,’ which is the distribution not conditioned on evidence, but the samples provably get closer and closer to the posterior distribution, which is the distribution conditioned on the evidence. It is thus a good way to deal with evidence when generating random samples.

Arguments:
  1. evidence – A dict containing (key: value) pairs reflecting (variable: value) that represents what is known about the system.
  2. n – The number of samples to return.

Returns:

A list of n random samples, each element of which is a dict containing (vertex: value) pairs.

For more information, cf. Koller et al. Ch. 12.3.1

Usage example: This code would generate a sequence of 10 samples:

import json

from libpgm.graphskeleton import GraphSkeleton
from libpgm.nodedata import NodeData
from libpgm.discretebayesiannetwork import DiscreteBayesianNetwork
from libpgm.tablecpdfactorization import TableCPDFactorization

# load nodedata and graphskeleton
nd = NodeData()
skel = GraphSkeleton()
nd.load("../tests/unittestdict.txt")
skel.load("../tests/unittestdict.txt")

# toporder graph skeleton
skel.toporder()

# load evidence
evidence = dict(Letter='weak')

# load bayesian network
bn = DiscreteBayesianNetwork(skel, nd)

# load factorization
fn = TableCPDFactorization(bn)

# sample 
result = fn.gibbssample(evidence, 10)

# output
print json.dumps(result, indent=2)

Previous topic

dyndiscbayesiannetwork

Next topic

tablecpdfactor

This Page