# Monte Carlo RL¶

Implements Monte Carlo Reinforcement Learning for iterSemiNFG objects

Created on Mon Feb 18 09:03:32 2013

Copyright (C) 2013 James Bono (jwbono@gmail.com)

GNU Affero General Public License

pynfg.rlsolutions.mcrl.ewma_mcrl(Game, bn, J, N, alpha, delta, eps, uni=False, pureout=False)[source]

Use EWMA MC RL to approximate the optimal CPT at bn given G

Parameters: Game (iterSemiNFG) – The iterated semi-NFG on which to perform the RL bn (str) – the basename of the node with the CPT to be trained J (int, list, or np.array) – The number of runs per training episode. If a schedule is desired, enter a list or np.array with size equal to N. N (int) – The number of training episodes alpha (int, list or np.array) – The exponential weight for the moving average. If a schedule is desired, enter a list or np.array with size equal to N delta (float) – The discount factor eps (float) – The maximum step-size for policy improvements uni (bool) – if True, training is initialized with a uniform policy. Default False to allow “seeding” with different policies, e.g. level k-1 pureout (bool) – if True, the policy is turned into a pure policy at the end of training by assigning argmax actions prob 1. Default is False

Example:

import copy
GG = copy.deepcopy(G)
from pynfg.rlsolutions.mcrl import ewma_mcrl
G1, Rseries = ewma_mcrl(GG, 'D1', J=np.floor(linspace(300,100,num=50)),
N=50, alpha=1, delta=0.8, eps=0.4,
pureout=True)


RL Solutions

Q-Learning