Q-Learning¶

Implements Optimistic Q-Learning for policies in pynfg.iterSemiNFG objects

Created on Fri Mar 22 15:32:33 2013

GNU Affero General Public License

Author: Dongping Xie

pynfg.rlsolutions.qlearning.opt_qlearning(G, bn, w, d, N, r_max=0)[source]¶

Solve for the optimal policy using Optimistic Q-learning

Optimistic Q-Learning is an off-policy TD control RL algorithm

Parameters:

G (iterSemiNFG) – The iterated semi-NFG on which to perform the RL
bn (str) – The basename of the node with the CPT to be trained
w (float) – The learning rate parameter
d (float) – The discount factor
N (int) – The number of training episodes
r_max (float) – (Optional) a guess of upperbound of reward in a single time step. The default is 0 if no value is specified.

Returns:

The iterated semi-NFG; a plot of the dynamic average reward; the q table

Example:

G1, rseries, Q1 = opt_qlearning(G,'D1',w=0.1,d=0.95,N=100):