This module provides tools for creating and using factorized representations of Bayesian networks. Factorized representations of Bayesian networks are discrete CPDs whose values have been flattened into a single array, while the cardinalities and strides of each variable represented are kept track of separately. With the proper setup, these flattened structures can be more easily multiplied together, reduced, and operated on. For more information on factors cf. Koller et al. Ch. 4.
This class represents a factorized Bayesian network with discrete CPD tables. It contains the attributes bn, originalfactorlist, and factorlist, and the methods refresh, sumproductve, sumproducteliminatevar, condprobve, specificquery, and gibbssample.
This class is constructed with a DiscreteBayesianNetwork instance as argument. First, it takes the input itself and stores it in the bn attribute. Then, it transforms the information of each of these nodes from standard discrete CPD form into a TableCPDFactor isntance and stores the instances in an array in the attribute originalfactorlist. Finally, it makes a copy of this list to work with and stores it in factorlist.
The Bayesian network used as argument at instantiation.
A list of TableCPDFactor instances, one per node.
A working copy of originalfactorlist.
Refresh the factorlist attribute to equate with originalfactorlist. This is in effect a reset of the system, erasing any changes to factorlist that the program has executed.
Multiply the all the factors in factorlist that have vertex in their scope, then sum out vertex from the resulting product factor. Replace all factors that were multiplied together with the resulting summed-out product.
For more information on this algorithm cf. Koller et al. 298
Eliminate each vertex in vertices from factorlist using sumproducteliminatevar.
Eliminate all variables in factorlist except for the ones queried. Adjust all distributions for the evidence given. Return the probability distribution over a set of variables given by the keys of query given evidence.
The function returns factorlist after it has been modified as above.
Usage example: this code would return the distribution over a queried node, given evidence:
import json
from libpgm.graphskeleton import GraphSkeleton
from libpgm.nodedata import NodeData
from libpgm.discretebayesiannetwork import DiscreteBayesianNetwork
from libpgm.tablecpdfactorization import TableCPDFactorization
# load nodedata and graphskeleton
nd = NodeData()
skel = GraphSkeleton()
nd.load("../tests/unittestdict.txt")
skel.load("../tests/unittestdict.txt")
# toporder graph skeleton
skel.toporder()
# load evidence
evidence = dict(Letter='weak')
query = dict(Grade='A')
# load bayesian network
bn = DiscreteBayesianNetwork(skel, nd)
# load factorization
fn = TableCPDFactorization(bn)
# calculate probability distribution
result = fn.condprobve(query, evidence)
# output
print json.dumps(result.vals, indent=2)
print json.dumps(result.scope, indent=2)
print json.dumps(result.card, indent=2)
print json.dumps(result.stride, indent=2)
Eliminate all variables except for the ones specified by query. Adjust all distributions to reflect evidence. Return the entry that matches the exact probability of a specific event, as specified by query.
The function then chooses the entries of factorlist that match the queried event or events. It then operates on them to return the probability that the event (or events) specified will occur, represented as a float between 0 and 1.
Note that in this function, queries of the type P((x=A or x=B) and (y=C or y=D)) are permitted. They are executed by formatting the query dictionary like so:
{
"x": ["A", "B"],
"y": ["C", "D"]
}
Usage example: this code would answer the specific query that vertex Grade gets outcome A given that Letter has outcome weak, in this Bayesian network:
import json
from libpgm.graphskeleton import GraphSkeleton
from libpgm.nodedata import NodeData
from libpgm.discretebayesiannetwork import DiscreteBayesianNetwork
from libpgm.tablecpdfactorization import TableCPDFactorization
# load nodedata and graphskeleton
nd = NodeData()
skel = GraphSkeleton()
nd.load("../tests/unittestdict.txt")
skel.load("../tests/unittestdict.txt")
# toporder graph skeleton
skel.toporder()
# load evidence
evidence = dict(Letter='weak')
query = dict(Grade='A')
# load bayesian network
bn = DiscreteBayesianNetwork(skel, nd)
# load factorization
fn = TableCPDFactorization(bn)
# calculate probability distribution
result = fn.specificquery(query, evidence)
# output
print result
Return a sequence of n samples using the Gibbs sampling method, given evidence specified by evidence. Gibbs sampling is a technique wherein for each sample, each variable in turn is erased and calculated conditioned on the outcomes of its neighbors. This method starts by sampling from the ‘prior distribution,’ which is the distribution not conditioned on evidence, but the samples provably get closer and closer to the posterior distribution, which is the distribution conditioned on the evidence. It is thus a good way to deal with evidence when generating random samples.
Returns:
A list of n random samples, each element of which is a dict containing (vertex: value) pairs.
For more information, cf. Koller et al. Ch. 12.3.1
Usage example: This code would generate a sequence of 10 samples:
import json
from libpgm.graphskeleton import GraphSkeleton
from libpgm.nodedata import NodeData
from libpgm.discretebayesiannetwork import DiscreteBayesianNetwork
from libpgm.tablecpdfactorization import TableCPDFactorization
# load nodedata and graphskeleton
nd = NodeData()
skel = GraphSkeleton()
nd.load("../tests/unittestdict.txt")
skel.load("../tests/unittestdict.txt")
# toporder graph skeleton
skel.toporder()
# load evidence
evidence = dict(Letter='weak')
# load bayesian network
bn = DiscreteBayesianNetwork(skel, nd)
# load factorization
fn = TableCPDFactorization(bn)
# sample
result = fn.gibbssample(evidence, 10)
# output
print json.dumps(result, indent=2)