Welcome to LSPI Python’s documentation!¶
Contents:
This is a Python implementation of the Least Squares Policy Iteration (LSPI) reinforcement learning algorithm. For more information on the algorithm please refer to the paper
You can also visit their website where more information and a Matlab version is provided.
http://www.cs.duke.edu/research/AI/LSPI/
Overview¶
When using this library the first thing you must do is collect a set of samples
for LSPI to learn from. Each sample should be an instance of the Sample
.
These samples are then passed into the lspi.learn()
method. This method
takes in the list of samples, a policy, and a solver. The Policy
class
provided should not need to be modified. The learn method then continuously
calls the solver on the data samples and policy until the policy converges. Once
the policy has converged the agent can use the policy to find the best action
in every state and execute it.
The Policy class contains the basis function approximation and its associated weights. Weights can be specified or if left unspecified, randomly generated. The policy also contains the probability of doing an exploration action, and the discount factor. The Policy class should not need to be modified when using this library.
The basis functions all inherit from the abstract base class lspi.basis_functions.BasisFunction
. This
class provides the minimum interface for a basis function. Instances of this class
may contain specialized fields and methods. There are a handful of basis function
classes provided in this package including: lspi.basis_functions.FakeBasis
, lspi.basis_functions.ExactBasis
,
lspi.basis_functions.OneDimensionalPolynomialBasis
, and lspi.basis_functions.RadialBasisFunction
. See
each class for its respective construction parameters and how the basis is calculated.
You can also implement your own BasisFunctions by inheriting from the BasisFunction class and implementing
all of the abstract methods.
As mentioned the learn method takes in a Solver instance. This instance is responsible
for performing a single policy update step given the current policy and the samples being
learned from. Currently the only implemented Solver is the lspi.solvers.LSTDQSolver
which implements
the algorithm from Figure 5 of the LSPI paper. There are other variants in the LSPI paper that could
also be implemented. Additionally if a different matrix solving style is needed (e.g. sparse matrix solver)
then a new solver can be implemented. To implement a new Solver simply create a
class that inherits from the lspi.solvers.Solver
class. You must implement all of the abstract methods.
For testing and demonstration purposes the simple ChainDomain from the LSPI paper is included
in the lspi.domains
module. If you wish to implement other domains it is recommended
that you inherit from the lspi.domains.Domain
class and implement the abstract methods.