Monte Carlo RL
Implements Monte Carlo Reinforcement Learning for iterSemiNFG objects
Created on Mon Feb 18 09:03:32 2013
Copyright (C) 2013 James Bono (jwbono@gmail.com)
GNU Affero General Public License
-
pynfg.rlsolutions.mcrl.ewma_mcrl(Game, bn, J, N, alpha, delta, eps, uni=False, pureout=False)[source]
Use EWMA MC RL to approximate the optimal CPT at bn given G
Parameters: |
- Game (iterSemiNFG) – The iterated semi-NFG on which to perform the RL
- bn (str) – the basename of the node with the CPT to be trained
- J (int, list, or np.array) – The number of runs per training episode. If a schedule is desired,
enter a list or np.array with size equal to N.
- N (int) – The number of training episodes
- alpha (int, list or np.array) – The exponential weight for the moving average. If a schedule is
desired, enter a list or np.array with size equal to N
- delta (float) – The discount factor
- eps (float) – The maximum step-size for policy improvements
- uni (bool) – if True, training is initialized with a uniform policy. Default
False to allow “seeding” with different policies, e.g. level k-1
- pureout (bool) – if True, the policy is turned into a pure policy at the end
of training by assigning argmax actions prob 1. Default is False
|
Example:
import copy
GG = copy.deepcopy(G)
from pynfg.rlsolutions.mcrl import ewma_mcrl
G1, Rseries = ewma_mcrl(GG, 'D1', J=np.floor(linspace(300,100,num=50)),
N=50, alpha=1, delta=0.8, eps=0.4,
pureout=True)