This module provides tools to represent and handle Bayesian networks with linear Gaussian conditional probability distributions.
A linear Gaussian distribution means that the node has a continuous range of outcomes, with a normal distribution over those outcomes. This normal distribution can be parameterized by a mean and a variance. A linear Gaussian means that the variance of the node is fixed, while the mean is a linear function of the outcomes of the node’s parents. In math terms, the mean \(m(u)\) of a node u is a linear function of the values \(x_1,\dots,x_n\) of its parents, each weighted by some coefficient \(\beta_i\):
Linear Gaussians are simple but widely used in statistical modeling.
This class represents a Bayesian network with linear Gaussian CPDs. It contains the attributes V, E, and Vdata, as well as the method randomsample.
This class can be called either with or without arguments. If it is called without arguments, none of its attributes are instantiated and it is left to the user to instantiate them manually. If it is called with arguments, the attributes will be loaded directly from the inputs. The arguments must be (in order):
- orderedskeleton – An instance of the OrderedSkeleton or GraphSkeleton (as long as it’s ordered) class.
- nodedata – An instance of the NodeData class.
If these arguments are present, all attributes of the class (V, E, and Vdata) will be automatically copied from the graph skeleton and node data inputs.
This class requires that the Vdata attribute gets loaded with a dictionary with node data of the following fomat:
"vertex": {
"parents": ["<name of parent 1>", ... , "<name of parent n>"],
"children": ["<name of child 1>", ... , "<name of child n>"],
"mean_base": <the base mean of the Gaussian distribution>,
"mean_scal": [<scalar for parent 1 outcome>, ... , <scalar for parent n outcome>],
"variance": <variance of the Gaussian distibution>
}
Note that additional keys are possible in the dict of each vertex.
Upon loading, the class will also check that the keys of Vdata correspond to the vertices in V.
A list of the names of the vertices.
A list of [origin, destination] pairs of vertices that make edges.
A dictionary containing CPD data for the nodes.
Produce n random samples from the Bayesian Network and return them in a list.
See above for how the means of linear Gaussians are calculated during sampling.
This function takes the following arguments:
- n – The number of random samples to produce.
- evidence – (Optional) A dict containing (vertex: value) pairs that describe the evidence. To be used carefully because it does manually overrides the nodes with evidence instead of affecting the joint probability distribution of the entire graph.
Usage example: this would generate a sequence of 10 random samples:
import json
from libpgm.nodedata import NodeData
from libpgm.graphskeleton import GraphSkeleton
from libpgm.lgbayesiannetwork import LGBayesianNetwork
# load nodedata and graphskeleton
nd = NodeData()
skel = GraphSkeleton()
nd.load("../tests/unittestlgdict.txt") # an input file
skel.load("../tests/unittestdict.txt")
# topologically order graphskeleton
skel.toporder()
# load bayesian network
lgbn = LGBayesianNetwork(skel, nd)
# sample
result = lgbn.randomsample(10)
# output
print json.dumps(result, indent=2)