lspi.basis_functions module¶
Abstract Base Class for Basis Function and some common implementations.
-
class
lspi.basis_functions.
BasisFunction
[source]¶ Bases:
object
ABC for basis functions used by LSPI Policies.
A basis function is a function that takes in a state vector and an action index and returns a vector of features. The resulting feature vector is referred to as \(\phi\) in the LSPI paper (pg 9 of the PDF referenced in this package’s documentation). The \(\phi\) vector is dotted with the weight vector of the Policy to calculate the Q-value.
The dimensions of the state vector are usually smaller than the dimensions of the \(\phi\) vector. However, the dimensions of the \(\phi\) vector are usually much smaller than the dimensions of an exact representation of the state which leads to significant savings when computing and storing a policy.
-
evaluate
(state, action)[source]¶ Calculate the \(\phi\) matrix for the given state-action pair.
The way this value is calculated depends entirely on the concrete implementation of BasisFunction.
Parameters: - state (numpy.array) – The state to get the features for. When calculating Q(s, a) this is the s.
- action (int) – The action index to get the features for. When calculating Q(s, a) this is the a.
Returns: The \(\phi\) vector. Used by Policy to compute Q-value.
Return type: numpy.array
-
num_actions
¶ Return number of possible actions.
Returns: Number of possible actions. Return type: int
-
-
class
lspi.basis_functions.
ExactBasis
(num_states, num_actions)[source]¶ Bases:
lspi.basis_functions.BasisFunction
Basis function with no functional approximation.
This can only be used in domains with finite, discrete state-spaces. For example the Chain domain from the LSPI paper would work with this basis, but the inverted pendulum domain would not.
Parameters: - num_states (list) – A list containing integers representing the number of possible values for each state variable.
- num_actions (int) – Number of possible actions.
-
evaluate
(state, action)[source]¶ Return a \(\phi\) vector that has a single non-zero value.
Parameters: - state (numpy.array) – The state to get the features for. When calculating Q(s, a) this is the s.
- action (int) – The action index to get the features for. When calculating Q(s, a) this is the a.
Returns: \(\phi\) vector
Return type: numpy.array
Raises: IndexError
– If action index < 0 or action index > num_actionsValueError
– If the size of the state does not match the the size of the num_states list used during construction.ValueError
– If any of the state variables are < 0 or >= the corresponding value in the num_states list used during construction.
-
get_state_action_index
(state, action)[source]¶ Return the non-zero index of the basis.
Parameters: - state (numpy.array) – The state to get the index for.
- action (int) – The state to get the index for.
Returns: The non-zero index of the basis
Return type: int
Raises: IndexError
– If action index < 0 or action index > num_actions
-
num_actions
¶ Return number of possible actions.
-
class
lspi.basis_functions.
FakeBasis
(num_actions)[source]¶ Bases:
lspi.basis_functions.BasisFunction
Basis that ignores all input. Useful for random sampling.
When creating a purely random Policy a basis function is still required. This basis function just returns a \(\phi\) equal to [1.] for all inputs. It will however, still throw exceptions for impossible values like negative action indexes.
-
evaluate
(state, action)[source]¶ Return \(\phi\) equal to [1.].
Parameters: - state (numpy.array) – The state to get the features for. When calculating Q(s, a) this is the s. FakeBasis ignores these values.
- action (int) – The action index to get the features for. When calculating Q(s, a) this is the a. FakeBasis ignores these values.
Returns: \(\phi\) vector equal to [1.].
Return type: numpy.array
Raises: IndexError
– If action index is < 0Example
>>> FakeBasis().evaluate(np.arange(10), 0) array([ 1.])
-
num_actions
¶ Return number of possible actions.
-
-
class
lspi.basis_functions.
OneDimensionalPolynomialBasis
(degree, num_actions)[source]¶ Bases:
lspi.basis_functions.BasisFunction
Polynomial features for a state with one dimension.
Takes the value of the state and constructs a vector proportional to the specified degree and number of actions. The polynomial is first constructed as [..., 1, value, value^2, ..., value^k, ...] where k is the degree. The rest of the vector is 0.
Parameters: - degree (int) – The polynomial degree.
- num_actions (int) – The total number of possible actions
Raises: ValueError
– If degree is less than 0ValueError
– If num_actions is less than 1
-
evaluate
(state, action)[source]¶ Calculate \(\phi\) matrix for given state action pair.
The \(\phi\) matrix is used to calculate the Q function for the given policy.
Parameters: - state (numpy.array) – The state to get the features for. When calculating Q(s, a) this is the s.
- action (int) – The action index to get the features for. When calculating Q(s, a) this is the a.
Returns: The \(\phi\) vector. Used by Policy to compute Q-value.
Return type: numpy.array
Raises: IndexError
– If \(0 \le action < num\_actions\) then IndexError is raised.ValueError
– If the state vector has any number of dimensions other than 1 a ValueError is raised.
Example
>>> basis = OneDimensionalPolynomialBasis(2, 2) >>> basis.evaluate(np.array([2]), 0) array([ 1., 2., 4., 0., 0., 0.])
-
num_actions
¶ Return number of possible actions.
-
size
()[source]¶ Calculate the size of the basis function.
The base size will be degree + 1. This basic matrix is then duplicated once for every action. Therefore the size is equal to (degree + 1) * number of actions
Returns: The size of the phi matrix that will be returned from evaluate. Return type: int Example
>>> basis = OneDimensionalPolynomialBasis(2, 2) >>> basis.size() 6
-
class
lspi.basis_functions.
RadialBasisFunction
(means, gamma, num_actions)[source]¶ Bases:
lspi.basis_functions.BasisFunction
Gaussian Multidimensional Radial Basis Function (RBF).
Given a set of k means \((\mu_1 , \ldots, \mu_k)\) produce a feature vector \((1, e^{-\gamma || s - \mu_1 ||^2}, \cdots, e^{-\gamma || s - \mu_k ||^2})\) where s is the state vector and \(\gamma\) is a free parameter. This vector will be padded with 0’s on both sides proportional to the number of possible actions specified.
Parameters: - means (list(numpy.array)) – List of numpy arrays representing \((\mu_1, \ldots, \mu_k)\). Each \(\mu\) is a numpy array with dimensions matching the state vector this basis function will be used with. If the dimensions of each vector are not equal than an exception will be raised. If no means are specified then a ValueError will be raised
- gamma (float) – Free parameter which controls the size/spread of the Gaussian “bumps”. This parameter is best selected via tuning through cross validation. gamma must be > 0.
- num_actions (int) – Number of actions. Must be in range [1, \(\infty\)] otherwise an exception will be raised.
Raises: ValueError
– If means list is emptyValueError
– If dimensions of each mean vector do not match.ValueError
– If gamma is <= 0.ValueError
– If num_actions is less than 1.
Note
The numpy arrays specifying the means are not copied.
-
evaluate
(state, action)[source]¶ Calculate the \(\phi\) matrix.
Matrix will have the following form:
\([\cdots, 1, e^{-\gamma || s - \mu_1 ||^2}, \cdots, e^{-\gamma || s - \mu_k ||^2}, \cdots]\)
where the matrix will be padded with 0’s on either side depending on the specified action index and the number of possible actions.
Returns: The \(\phi\) vector. Used by Policy to compute Q-value.
Return type: numpy.array
Raises: IndexError
– If \(0 \le action < num\_actions\) then IndexError is raised.ValueError
– If the state vector has any number of dimensions other than 1 a ValueError is raised.
-
num_actions
¶ Return number of possible actions.