# Q-Learning¶

Implements Optimistic Q-Learning for policies in pynfg.iterSemiNFG objects

Created on Fri Mar 22 15:32:33 2013

Copyright (C) 2013 James Bono (jwbono@gmail.com)

Author: Dongping Xie

pynfg.rlsolutions.qlearning.opt_qlearning(G, bn, w, d, N, r_max=0)[source]

Solve for the optimal policy using Optimistic Q-learning

Optimistic Q-Learning is an off-policy TD control RL algorithm

Parameters: G (iterSemiNFG) – The iterated semi-NFG on which to perform the RL bn (str) – The basename of the node with the CPT to be trained w (float) – The learning rate parameter d (float) – The discount factor N (int) – The number of training episodes r_max (float) – (Optional) a guess of upperbound of reward in a single time step. The default is 0 if no value is specified. The iterated semi-NFG; a plot of the dynamic average reward; the q table

Example:

G1, rseries, Q1 = opt_qlearning(G,'D1',w=0.1,d=0.95,N=100):

Monte Carlo RL

Utilities