Policy Intelligence¶

Implements Coordinated PGT intelligence for Policies for iterSemiNFG objects

Created on Fri Mar 22 15:32:33 2013

GNU Affero General Public License

pynfg.pgtsolutions.intelligence.policy.policy_MC(G, S, noise, X, M, innoise=1, delta=1, integrand=None, mix=False, satisfice=None)[source]¶

Run Importance Sampling on policies for PGT Intelligence Calculations

For examples, see below or PyNFG/bin/hideandseek.py

Parameters:

G (iterSemiNFG) – the game to be evaluated
S (int) – number of policy profiles to sample
noise (float) – the degree of independence of the proposal distribution on the current value. 1 is independent, 0 returns no perturbation.
X (int) – number of samples of each policy profile
M (int) – number of random alt policies to compare
innoise (float) – the perturbation noise for the loop within iq_calc to draw alt CPTs to compare utilities to current CPT.
delta (float) – the discount factor (ignored if SemiNFG)
integrand (func) – a user-supplied function of G that is evaluated for each s in S
mix (bool) – False if restricting sampling to pure strategies. True if mixed strategies are included in sampling. Default is False.
satisfice (iterSemiNFG) – game G such that the CPTs of G together with innoise determine the intelligence satisficing distribution.

Returns:

intel - a sample-keyed dictionary of player-keyed iq dictionaries
funcout - a sample-keyed dictionary of the output of the user-supplied integrand.
weight - a sample-keyed dictionay of player-keyed importance weight dictionaries.

Warning

This will throw an error if there is a decision node in G.starttime that is not repeated throughout the net.

Note

This is the policy-approach because intelligence is assigned to a player instead of being assigned to a DecisionNode, and all DNs with the same basename have the same CPT.

Example:

def welfare(G):
    #calculate the welfare of a single sample of the iterSemiNFG G
    G.sample()
    w = 0
    for p in G.players:
        w += G.npv_reward(p, G.starttime, 1.0)
    return w

import copy
GG = copy.deepcopy(G) #G is an iterSemiNFG
S = 50 #number of MC samples
X = 10 #number of samples of utility of G in calculating iq
M = 20 #number of alternative strategies sampled in calculating iq
noise = .2 #noise in the perturbations of G for MC sampling

from pynfg.pgtsolutions.intelligence.policy import policy_MC

intelMC, funcoutMC, weightMC = policy_MC(GG, S, noise, X, M,
                                         innoise=.2,
                                         delta=1,
                                         integrand=welfare,
                                         mix=False,
                                         satisfice=GG)

pynfg.pgtsolutions.intelligence.policy.policy_MH(G, S, density, noise, X, M, innoise=1, delta=1, integrand=None, mix=False, satisfice=None)[source]¶

Run Metropolis-Hastings on policies for PGT Intelligence Calculations

For examples, see below or PyNFG/bin/hideandseek.py

Parameters:

G (iterSemiNFG) – the game to be evaluated
S (int) – number of MH iterations
density (func) – the function that assigns weights to iq
noise (float) – the degree of independence of the proposal distribution on the current value. 1 is independent, 0 returns no perturbation.
X (int) – number of samples of each policy profile
M (int) – number of random alt policies to compare
innoise (float) – the perturbation noise for the loop within iq_calc to draw alt CPTs to compare utilities to current CPT.
delta (float) – the discount factor (ignored if SemiNFG)
integrand (func) – a user-supplied function of G that is evaluated for each s in S
mix (bool) – if true, proposal distribution is over mixed CPTs. Default is False.
satisfice (iterSemiNFG) – game G such that the CPTs of G together with innoise determine the intelligence satisficing distribution.

Returns:

intel - a sample-keyed dictionary of player-keyed iq dictionaries
funcout - a sample-keyed dictionary of the output of the user-supplied integrand.
dens - a list of the density values, one for each MH draw.

Warning

This will throw an error if there is a decision node in G.starttime that is not repeated throughout the net.

Note

This is the policy-approach because intelligence is assigned to a player instead of being assigned to a DecisionNode, and all DNs with the same basename have the same CPT.

Example:

def density(iqdict):
    #calculate the PGT density for a given iqdict
    x = iqdict.values()
    y = np.power(x,2)
    z = np.product(y)
    return z

def welfare(G):
    #calculate the welfare of a single sample of the iterSemiNFG G
    G.sample()
    w = 0
    for p in G.players:
        w += G.npv_reward(p, G.starttime, 1.0)
    return w

import copy
GG = copy.deepcopy(G) #G is an iterSemiNFG
S = 50 #number of MH samples
X = 10 #number of samples of utility of G in calculating iq
M = 20 #number of alternative strategies sampled in calculating iq
noise = .2 #noise in the perturbations of G for MH sampling

from pynfg.pgtsolutions.intelligence.policy import policy_MH

intelMH, funcoutMH, densMH = policy_MH(GG, S, density, noise, X, M,
                                       innoise=.2,
                                       delta=1,
                                       integrand=welfare,
                                       mix=False,
                                       satisfice=GG)

pynfg.pgtsolutions.intelligence.policy.policy_calciq(p, G, X, M, mix, delta, innoise, satisfice=None)[source]¶

Estimate IQ of player’s policy

Parameters:

p (str) – the name of the player whose intelligence is being evaluated.
G (iterSemiNFG) – the iterated semi-NFG to be evaluated
X (int) – number of samples of each policy profile
M (int) – number of random alt policies with which to compare
mix (bool) – if true, proposal distribution is over mixed CPTs. Default is False.
delta (float) – the discount factor (ignored if SemiNFG)
innoise (float) – the perturbation noise for the inner loop to draw alt CPTs
satisfice (iterSemiNFG) – game G such that the CPTs of G together with innoise determine the intelligence satisficing distribution.

Returns:

an estimate of the fraction of alternative policies that have a lower npv reward than the current policy.

Policy Intelligence¶

Previous topic

Next topic

This Page

Mailing List

Navigation

Policy Intelligence¶

Previous topic

Next topic

This Page

Quick search

Mailing List

Navigation