Q-Learning

Implements Optimistic Q-Learning for policies in pynfg.iterSemiNFG objects

Created on Fri Mar 22 15:32:33 2013

Copyright (C) 2013 James Bono (jwbono@gmail.com)

GNU Affero General Public License

Author: Dongping Xie

pynfg.rlsolutions.qlearning.opt_qlearning(G, bn, w, d, N, r_max=0)[source]

Solve for the optimal policy using Optimistic Q-learning

Optimistic Q-Learning is an off-policy TD control RL algorithm

Parameters:
  • G (iterSemiNFG) – The iterated semi-NFG on which to perform the RL
  • bn (str) – The basename of the node with the CPT to be trained
  • w (float) – The learning rate parameter
  • d (float) – The discount factor
  • N (int) – The number of training episodes
  • r_max (float) – (Optional) a guess of upperbound of reward in a single time step. The default is 0 if no value is specified.
Returns:

The iterated semi-NFG; a plot of the dynamic average reward; the q table

Example:

G1, rseries, Q1 = opt_qlearning(G,'D1',w=0.1,d=0.95,N=100):

Previous topic

Monte Carlo RL

Next topic

Utilities

This Page

Mailing List

Join the Google Group: