lspi.sample module¶
Contains class representing an LSPI sample.
-
class
lspi.sample.
Sample
(state, action, reward, next_state, absorb=False)[source]¶ Bases:
object
Represents an LSPI sample tuple
(s, a, r, s', absorb)
.Parameters: - state (numpy.array) – State of the environment at the start of the sample.
s
in the sample tuple. (The usual type is a numpy array.) - action (int) – Index of action that was executed.
a
in the sample tuple - reward (float) – Reward received from the environment.
r
in the sample tuple - next_state (numpy.array) – State of the environment after executing the sample’s action.
s'
in the sample tuple (The type should match that of state.) - absorb (bool, optional) – True if this sample ended the episode. False otherwise.
absorb
in the sample tuple (The default is False, which implies that this is a non-episode-ending sample)
Assumes that this is a non-absorbing sample (as the vast majority of samples will be non-absorbing).
This class is just a dumb data holder so the types of the different fields can be anything convenient for the problem domain.
For states represented by vectors a numpy array works well.
- state (numpy.array) – State of the environment at the start of the sample.