mdpproblog package

Submodules

mdpproblog.engine module

class mdpproblog.engine.Engine(program)[source]

Bases: object

Adapter class to Problog grounding and query engine.

Parameters:program (str) – a valid MDP-ProbLog program
add_annotated_disjunction(facts, probabilities)[source]

Add a new annotated disjunction to the program database from a list of facts and its probabilities. Return a list of choice nodes.

Parameters:
  • facts (list of problog.logic.Term) – list of probabilistic facts
  • probabilities (list of float in [0.0, 1.0]) – list of valid individual probabilities such that the total probability is less than or equal to 1.0
Return type:

list of int

add_assignment(term, value)[source]

Add a new utility assignment of value to term in the program database. Return the corresponding node number.

Parameters:
  • term (problog.logic.Term) – a predicate
  • value (float) – a numeric value
Return type:

int

add_fact(term, probability=None)[source]

Add a new term with a given probability to the program database. Return the corresponding node number.

Parameters:
  • term (problog.logic.Term) – a predicate
  • probability (float) – a number in [0,1]
Return type:

int

add_rule(head, body)[source]

Add a new rule defined by a head and body arguments to the program database. Return the corresponding node number.

Parameters:
  • head (problog.logic.Term) – a predicate
  • body (list of problog.logic.Term or problog.logic.Not) – a list of literals
Return type:

int

assignments(assignment_type)[source]

Return a dictionary of assignments of type assignment_type.

Parameters:assignment_type (str) – assignment type.
Return type:dict of (problog.logic.Term, problog.logic.Constant) items.
compile(terms=[])[source]

Create compiled knowledge database from ground program. Return mapping of terms to nodes in the compiled knowledge database.

Parameters:terms (list of problog.logic.Term) – list of predicates
Return type:dict of (problog.logic.Term, int)
declarations(declaration_type)[source]

Return a list of all terms of type declaration_type.

Parameters:declaration_type (str) – declaration type.
Return type:list of problog.logic.Term
evaluate(queries, evidence)[source]

Compute probabilities of queries given evidence.

Parameters:
  • queries (dict of (problog.logic.Term, int)) – mapping of predicates to nodes
  • evidence (dictionary of (problog.logic.Term, {0, 1})) – mapping of predicate and evidence weight
Return type:

list of (problog.logic.Term, [0.0, 1.0])

get_annotated_disjunction(nodes)[source]

Return the list of choice nodes in the table of instructions corresponding to nodes.

Parameters:nodes (list of int) – list of node identifiers
Return type:list of problog.engine.choice
get_assignment(node)[source]

Return the assignment in the table of instructions corresponding to node.

Parameters:node (int) – identifier of assignment in table of instructions
Return type:pair of (problog.logic.Term, problog.logic.Constant)
get_fact(node)[source]

Return the fact in the table of instructions corresponding to node.

Parameters:node (int) – identifier of fact in table of instructions
Return type:problog.engine.fact
get_instructions_table()[source]

Return the table of instructions separated by instruction type as described in problog.engine.ClauseDB.

Return type:dict of (str, list of (node,namedtuple))
get_rule(node)[source]

Return the rule in the table of instructions corresponding to node.

Parameters:node (int) – identifier of rule in table of instructions
Return type:problog.engine.clause
relevant_ground(queries)[source]

Create ground program with respect to queries.

Parameters:queries (list of problog.logic.Term) – list of predicates

mdpproblog.fluent module

class mdpproblog.fluent.ActionSpace(actions)[source]

Bases: object

Iterator class for looping over vector representations of actions in a factored MDP. Each action is implemented by an OrderedDict of (problog.logic.Term, 0/1).

Parameters:actions (list of problog.logic.Term) – predicates listing possible actions
classmethod index(action)[source]

Return action index in the action space.

Parameters:action (OrderedDict) – action representation
Return type:int
class mdpproblog.fluent.Fluent[source]

Bases: object

Factory class for building fluent terms. A fluent term is a problog.logic.Term with a problog.logic.Constant as last argument representing its timestep.

classmethod create_fluent(term, timestep)[source]

” Return a new fluent made from term with given timestep.

Parameters:
  • term (problog.logic.Term) – any problog term
  • timestep (int) – timestep numeric value
Return type:

problog.logic.Term

class mdpproblog.fluent.StateSpace(state_fluents)[source]

Bases: object

Iterator class for looping over vector representations of states in a factored MDP defined by state_fluents. Each state is implemented by an OrderedDict of (problog.logic.Term, 0/1).

Parameters:state_fluents (list of problog.logic.Term) – predicates defining a state in a given timestep
classmethod index(state)[source]

Return the state index in the state space.

Parameters:state (OrderedDict) – state representation
Return type:int
classmethod state(valuation)[source]

Return the state representation of a valuation of fluents.

Parameters:valuation (list of pairs (Fluent, bool)) – mapping from fluent to boolean value
Return type:OrderedDict

mdpproblog.mdp module

class mdpproblog.mdp.MDP(model)[source]

Bases: object

Representation of an MDP and its components. Implemented as a bridge class to the ProbLog programs specifying the MDP domain and problems.

Parameters:model (str) – a valid MDP-ProbLog program
actions()[source]

Return an ordered list of action objects.

Return type:list of action objects sorted by string representation
current_state_fluents()[source]

Return the ordered list of current state fluent objects.

Return type:list of current state fluent objects sorted by string representation
next_state_fluents()[source]

Return the ordered list of next state fluent objects.

Return type:list of next state fluent objects sorted by string representation
reward(state, action, cache=None)[source]

Return the immediate reward value of the transition induced by applying action to the given state. Cache results optionally if parameter cache is given.

Parameters:
  • state (list of 0/1 according to state fluents order) – state vector representation of current state fluents
  • action (one-hot vector encoding of action as a list of 0/1) – action vector representation
  • cache (immutable, hashable object) – key to cache results
Return type:

float

reward_model()[source]

Return the reward model of all valid transitions.

Return type:dict of ((state,action), float)
state_fluents()[source]

Return an ordered list of state fluent objects.

Return type:list of state fluent objects sorted by string representation
transition(state, action, cache=None)[source]

Return the probabilities of next state fluents given current state and action. Cache results optionally if parameter cache is given.

Parameters:
  • state (list of 0/1 according to state fluents order) – state vector representation of current state fluents
  • action (one-hot vector encoding of action as a list of 0/1) – action vector representation
  • cache (immutable, hashable object) – key to cache results
Return type:

list of pairs (problog.logic.Term, float)

transition_model()[source]

Return the transition model of all valid transitions.

Return type:dict of ((state,action), list of probabilities)

mdpproblog.simulator module

class mdpproblog.simulator.Simulator(mdp, policy)[source]

Bases: object

Simulator class for MDPs. Given an mdp and a policy, it generates histories and its corresponding expected cummulative discounted rewards.

Parameters:
  • mdp (mdpproblog.mdp.MDP object) – an MDP formulation
  • policy (dict of (tuple, str)) – mapping from state to action
run(trials, horizon, start_state, gamma=0.9)[source]

Simulate a number of trials of given horizon from start_state following its policy. Compute the discounted expected reward using gamma as discount factor. Return average reward over all trials, a list of rewards received at each trial and list of sampled states for each trial.

Parameters:
  • trials (int) – number of trials
  • horizon (int) – number of timesteps
  • start_state – state from which the simulation starts
  • gamma (float) – discount factor
Return type:

tuple (float, list of list of floats, list of list of states)

run_trial(horizon, start_state, gamma=0.9)[source]

Simulate a single trial of given horizon from start_state following its policy. Compute the discounted expected reward using gamma as discount factor. Return total discounted reward over all steps of the horizon and a list of sampled states in the trial.

Parameters:
  • trials (int) – number of trials
  • horizon (int) – number of timesteps
  • start_state – state from which the simulation starts
  • gamma (float) – discount factor
Return type:

tuple (float, list of states)

mdpproblog.value_iteration module

class mdpproblog.value_iteration.ValueIteration(mdp)[source]

Bases: object

Implementation of the enumerative Value Iteration algorithm. It performs successive, synchronous Bellman backups until convergence is achieved for the given error epsilon for the infinite-horizon MDP with discount factor gamma.

Parameters:mdp (mdpproblog.MDP) – MDP representation
run(gamma=0.9, epsilon=0.1)[source]

Execute value iteration until convergence. Return optimal value function, greedy policy and number of iterations.

Parameters:
  • gamma (float) – discount factor
  • epsilon (float) – maximum error
Return type:

triple (dict(state, value), dict(policy, action), float)

Module contents