lspi.sample module¶

Contains class representing an LSPI sample.

class lspi.sample.Sample(state, action, reward, next_state, absorb=False)[source]¶

Bases: object

Represents an LSPI sample tuple (s, a, r, s', absorb).

Parameters:

state (numpy.array) – State of the environment at the start of the sample. s in the sample tuple. (The usual type is a numpy array.)
action (int) – Index of action that was executed. a in the sample tuple
reward (float) – Reward received from the environment. r in the sample tuple
next_state (numpy.array) – State of the environment after executing the sample’s action. s' in the sample tuple (The type should match that of state.)
absorb (bool, optional) – True if this sample ended the episode. False otherwise. absorb in the sample tuple (The default is False, which implies that this is a non-episode-ending sample)

Assumes that this is a non-absorbing sample (as the vast majority of samples will be non-absorbing).

This class is just a dumb data holder so the types of the different fields can be anything convenient for the problem domain.

For states represented by vectors a numpy array works well.