lspi.solvers module¶

Contains main LSPI method and various LSTDQ solvers.

class lspi.solvers.LSTDQSolver(precondition_value=0.1)[source]¶

Bases: lspi.solvers.Solver

LSTDQ Implementation with standard matrix solvers.

Uses the algorithm from Figure 5 of the LSPI paper. If the A matrix turns out to be full rank then scipy’s standard linalg solver is used. If the matrix turns out to be less than full rank then least squares method will be used.

By default the A matrix will have its diagonal preconditioned with a small positive value. This will help to ensure that even with few samples the A matrix will be full rank. If you do not want the A matrix to be preconditioned then you can set this value to 0.

Parameters:	precondition_value (float) – Value to set A matrix diagonals to. Should be a small positive number. If you do not want preconditioning enabled then set it 0.

solve(data, policy)[source]¶

Run LSTDQ iteration.

See Figure 5 of the LSPI paper for more information.

class lspi.solvers.Solver[source]¶

Bases: object

ABC for LSPI solvers.

Implementations of this class will implement the various LSTDQ algorithms with various linear algebra solving techniques. This solver will be used by the lspi.learn method. The instance will be called iteratively until the convergence parameters are satisified.

solve(data, policy)[source]¶

Return one-step update of the policy weights for the given data.

Parameters:	data – This is the data used by the solver. In most cases this will be a list of samples. But it can be anything supported by the specific Solver implementation’s solve method. policy (Policy) – The current policy to find an improvement to.
Returns:	Return the new weights as determined by this method.
Return type:	numpy.array