mdpproblog package¶

Submodules¶

mdpproblog.engine module¶

class mdpproblog.engine.Engine(program)[source]¶

Bases: object

Adapter class to Problog grounding and query engine.

Parameters:	program (str) – a valid MDP-ProbLog program

add_annotated_disjunction(facts, probabilities)[source]¶

Add a new annotated disjunction to the program database from a list of facts and its probabilities. Return a list of choice nodes.

Parameters:	facts (list of problog.logic.Term) – list of probabilistic facts probabilities (list of float in [0.0, 1.0]) – list of valid individual probabilities such that the total probability is less than or equal to 1.0
Return type:	list of int

add_assignment(term, value)[source]¶

Add a new utility assignment of value to term in the program database. Return the corresponding node number.

Parameters:	term (problog.logic.Term) – a predicate value (float) – a numeric value
Return type:	int

add_fact(term, probability=None)[source]¶

Add a new term with a given probability to the program database. Return the corresponding node number.

Parameters:	term (problog.logic.Term) – a predicate probability (float) – a number in [0,1]
Return type:	int

add_rule(head, body)[source]¶

Add a new rule defined by a head and body arguments to the program database. Return the corresponding node number.

Parameters:	head (problog.logic.Term) – a predicate body (list of problog.logic.Term or problog.logic.Not) – a list of literals
Return type:	int

assignments(assignment_type)[source]¶

Return a dictionary of assignments of type assignment_type.

Parameters:	assignment_type (str) – assignment type.
Return type:	dict of (problog.logic.Term, problog.logic.Constant) items.

compile(terms=[])[source]¶

Create compiled knowledge database from ground program. Return mapping of terms to nodes in the compiled knowledge database.

Parameters:	terms (list of problog.logic.Term) – list of predicates
Return type:	dict of (problog.logic.Term, int)

declarations(declaration_type)[source]¶

Return a list of all terms of type declaration_type.

Parameters:	declaration_type (str) – declaration type.
Return type:	list of problog.logic.Term

evaluate(queries, evidence)[source]¶

Compute probabilities of queries given evidence.

Parameters:	queries (dict of (problog.logic.Term, int)) – mapping of predicates to nodes evidence (dictionary of (problog.logic.Term, {0, 1})) – mapping of predicate and evidence weight
Return type:	list of (problog.logic.Term, [0.0, 1.0])

get_annotated_disjunction(nodes)[source]¶

Return the list of choice nodes in the table of instructions corresponding to nodes.

Parameters:	nodes (list of int) – list of node identifiers
Return type:	list of problog.engine.choice

get_assignment(node)[source]¶

Return the assignment in the table of instructions corresponding to node.

Parameters:	node (int) – identifier of assignment in table of instructions
Return type:	pair of (problog.logic.Term, problog.logic.Constant)

get_fact(node)[source]¶

Return the fact in the table of instructions corresponding to node.

Parameters:	node (int) – identifier of fact in table of instructions
Return type:	problog.engine.fact

get_instructions_table()[source]¶

Return the table of instructions separated by instruction type as described in problog.engine.ClauseDB.

Return type:	dict of (str, list of (node,namedtuple))

get_rule(node)[source]¶

Return the rule in the table of instructions corresponding to node.

Parameters:	node (int) – identifier of rule in table of instructions
Return type:	problog.engine.clause

relevant_ground(queries)[source]¶

Create ground program with respect to queries.

Parameters:	queries (list of problog.logic.Term) – list of predicates

mdpproblog.fluent module¶

class mdpproblog.fluent.ActionSpace(actions)[source]¶

Bases: object

Iterator class for looping over vector representations of actions in a factored MDP. Each action is implemented by an OrderedDict of (problog.logic.Term, 0/1).

Parameters:	actions (list of problog.logic.Term) – predicates listing possible actions

classmethod index(action)[source]¶

Return action index in the action space.

Parameters:	action (OrderedDict) – action representation
Return type:	int

class mdpproblog.fluent.Fluent[source]¶

Bases: object

Factory class for building fluent terms. A fluent term is a problog.logic.Term with a problog.logic.Constant as last argument representing its timestep.

classmethod create_fluent(term, timestep)[source]¶

” Return a new fluent made from term with given timestep.

Parameters:	term (problog.logic.Term) – any problog term timestep (int) – timestep numeric value
Return type:	problog.logic.Term

class mdpproblog.fluent.StateSpace(state_fluents)[source]¶

Bases: object

Iterator class for looping over vector representations of states in a factored MDP defined by state_fluents. Each state is implemented by an OrderedDict of (problog.logic.Term, 0/1).

Parameters:	state_fluents (list of problog.logic.Term) – predicates defining a state in a given timestep

classmethod index(state)[source]¶

Return the state index in the state space.

Parameters:	state (OrderedDict) – state representation
Return type:	int

classmethod state(valuation)[source]¶

Return the state representation of a valuation of fluents.

Parameters:	valuation (list of pairs (Fluent, bool)) – mapping from fluent to boolean value
Return type:	OrderedDict

mdpproblog.mdp module¶

class mdpproblog.mdp.MDP(model)[source]¶

Bases: object

Representation of an MDP and its components. Implemented as a bridge class to the ProbLog programs specifying the MDP domain and problems.

Parameters:	model (str) – a valid MDP-ProbLog program

actions()[source]¶

Return an ordered list of action objects.

Return type:	list of action objects sorted by string representation

current_state_fluents()[source]¶

Return the ordered list of current state fluent objects.

Return type:	list of current state fluent objects sorted by string representation

next_state_fluents()[source]¶

Return the ordered list of next state fluent objects.

Return type:	list of next state fluent objects sorted by string representation

reward(state, action, cache=None)[source]¶

Return the immediate reward value of the transition induced by applying action to the given state. Cache results optionally if parameter cache is given.

Parameters:	state (list of 0/1 according to state fluents order) – state vector representation of current state fluents action (one-hot vector encoding of action as a list of 0/1) – action vector representation cache (immutable, hashable object) – key to cache results
Return type:	float

reward_model()[source]¶

Return the reward model of all valid transitions.

Return type:	dict of ((state,action), float)

state_fluents()[source]¶

Return an ordered list of state fluent objects.

Return type:	list of state fluent objects sorted by string representation

transition(state, action, cache=None)[source]¶

Return the probabilities of next state fluents given current state and action. Cache results optionally if parameter cache is given.

Parameters:	state (list of 0/1 according to state fluents order) – state vector representation of current state fluents action (one-hot vector encoding of action as a list of 0/1) – action vector representation cache (immutable, hashable object) – key to cache results
Return type:	list of pairs (problog.logic.Term, float)

transition_model()[source]¶

Return the transition model of all valid transitions.

Return type:	dict of ((state,action), list of probabilities)

mdpproblog.simulator module¶

class mdpproblog.simulator.Simulator(mdp, policy)[source]¶

Bases: object

Simulator class for MDPs. Given an mdp and a policy, it generates histories and its corresponding expected cummulative discounted rewards.

Parameters:	mdp (mdpproblog.mdp.MDP object) – an MDP formulation policy (dict of (tuple, str)) – mapping from state to action

run(trials, horizon, start_state, gamma=0.9)[source]¶

Simulate a number of trials of given horizon from start_state following its policy. Compute the discounted expected reward using gamma as discount factor. Return average reward over all trials, a list of rewards received at each trial and list of sampled states for each trial.

Parameters:	trials (int) – number of trials horizon (int) – number of timesteps start_state – state from which the simulation starts gamma (float) – discount factor
Return type:	tuple (float, list of list of floats, list of list of states)

run_trial(horizon, start_state, gamma=0.9)[source]¶

Simulate a single trial of given horizon from start_state following its policy. Compute the discounted expected reward using gamma as discount factor. Return total discounted reward over all steps of the horizon and a list of sampled states in the trial.

Parameters:	trials (int) – number of trials horizon (int) – number of timesteps start_state – state from which the simulation starts gamma (float) – discount factor
Return type:	tuple (float, list of states)

mdpproblog.value_iteration module¶

class mdpproblog.value_iteration.ValueIteration(mdp)[source]¶

Bases: object

Implementation of the enumerative Value Iteration algorithm. It performs successive, synchronous Bellman backups until convergence is achieved for the given error epsilon for the infinite-horizon MDP with discount factor gamma.

Parameters:	mdp (mdpproblog.MDP) – MDP representation

run(gamma=0.9, epsilon=0.1)[source]¶

Execute value iteration until convergence. Return optimal value function, greedy policy and number of iterations.

Parameters:	gamma (float) – discount factor epsilon (float) – maximum error
Return type:	triple (dict(state, value), dict(policy, action), float)

mdpproblog package¶

Submodules¶

mdpproblog.engine module¶

mdpproblog.fluent module¶

mdpproblog.mdp module¶

mdpproblog.simulator module¶

mdpproblog.value_iteration module¶

Module contents¶

Table Of Contents

Previous topic

This Page