Implements Optimistic Q-Learning for policies in pynfg.iterSemiNFG objects
Created on Fri Mar 22 15:32:33 2013
Copyright (C) 2013 James Bono (jwbono@gmail.com)
GNU Affero General Public License
Author: Dongping Xie
Solve for the optimal policy using Optimistic Q-learning
Optimistic Q-Learning is an off-policy TD control RL algorithm
Parameters: |
|
---|---|
Returns: | The iterated semi-NFG; a plot of the dynamic average reward; the q table |
Example:
G1, rseries, Q1 = opt_qlearning(G,'D1',w=0.1,d=0.95,N=100):