XCS Tutorial¶

This is the official tutorial for the xcs package for Python 3. You can find the latest release and get updates on the project's status at the project home page.

What is XCS?¶

XCS is a Python 3 implementation of the XCS algorithm as described in the 2001 paper, An Algorithmic Description of XCS, by Martin Butz and Stewart Wilson. XCS is a type of Learning Classifier System (LCS), a machine learning algorithm that utilizes a genetic algorithm acting on a rule-based system, to solve a reinforcement learning problem.

In its canonical form, XCS accepts a fixed-width string of bits as its input, and attempts to select the best action from a predetermined list of choices using an evolving set of rules that match inputs and offer appropriate suggestions. It then receives a reward signal indicating the quality of its decision, which it uses to adjust the rule set that was used to make the decision. This process is subsequently repeated, allowing the algorithm to evaluate the changes it has already made and further refine the rule set.

A key feature of XCS is that, unlike many other machine learning algorithms, it not only learns the optimal input/output mapping, but also produces a minimal set of rules for describing that mapping. This is a big advantage over other learning algorithms such as neural networks whose models are largely opaque to human analysis, making XCS an important tool in any data scientist's tool belt.

The XCS library provides not only an implementation of the standard XCS algorithm, but a set of interfaces which together constitute a framework for implementing and experimenting with other LCS variants. Future plans for the XCS library include continued expansion of the tool set with additional algorithms, and refinement of the interface to support reinforcement learning algorithms in general.

Terminology¶

Being both a reinforcement learning algorithm and an evolutionary algorithm, XCS requires an understanding of terms pertaining to both.

Situation¶

A situation is just another term for an input received by the classifier.

Action¶

An action is an output produced by the classifier.

Scenario¶

A series of situations, each of which the algorithm must respond to in order with an appropriate action in order to maximize the total reward received for each action. A scenario may be dynamic, meaning that later training cycles can be affected by earlier actions, or static, meaning that each training cycle is independent of the actions that came before it.

Classifier Rule¶

A classifier rule, sometimes referred to as just a rule or a classifier, is a pairing between a condition, describing which situations can be matched, and a suggested action. Each classifier rule has an associated prediction indicating the expected reward if the suggested action is taken when the condition matches the situation, a fitness indicating its suitability for reproduction and continued use in the population, and a numerosity value which indicates the number of (virtual) instances of the rule in the population. (There are other parameters associated with each rule, as well, but these are visibly important ones.)

Classifier Set¶

Also referred to as the population, this is the collection of all rules currently used and tracked by the classifier. The genetic algorithm operates on this set of rules over time to optimize them for accuracy and generality in their descriptiveness of the problem space. Note that the population is virtual, meaning that if the same rule has multiple copies in the population, it is represented only once, with an associated numerosity value to indicate the number of virtual instances of the rule in the population.

Match Set¶

The match set is the set of rules which match against the current situation.

Action Set¶

The action set is the set of rules which match against the current situation and recommend the selected action. Thus the action set is a subset of the match set. In fact, the match set can be seen as a collection of mutually exclusive and competing action sets, from which only one is to be selected.

Reward¶

The reward is a floating point value which acts as the signal the algorithm attempts to maximize. There are three types of reward that are commonly mentioned with respect to temporal difference learning algorithms. The immediate reward (aka raw reward) is the original, unaltered reward value returned by the scenario in response to each action. The expected future reward is the estimated payoff for later reward cycles, specifically excluding the current one; the prediction of the action set on the next reward cycle acts in this role in the canonical XCS algorithm. The payoff or combined reward is the combined sum of the immediate reward, plus the discounted expected future reward. (Discounted means the value is multiplied by a non-negative coefficient whose value is less than 1, which causes the algorithm to value immediate reward more highly than reward received later on.) The term reward, when used alone, is generally used to mean the immediate reward.

Prediction¶

A prediction is an estimate by a classifier rule or an action set as to the payoff expected to be received by taking the suggested action in the given situation. The prediction of an action set is formed by taking the fitness-weighted average of the predictions made by the individual rules within it.

Fitness¶

Fitness is another floating point value similar in function to the reward, except that in this case it is an internal signal defined by the algorithm itself, which is then used as a guide for selection of which rules are to act as parents to the next generation. Each rule in the population has its own associated fitness value. In XCS, as opposed to strength-based LCS variants such as ZCS, the fitness is actually based on the accuracy of each rule's reward prediction, as opposed to its size. Thus a rule with a very low expected reward can have a high fitness provided it is accurate in its prediction of low reward, whereas a rule with very high expected reward may have low fitness because the reward it receives varies widely from one reward cycle to the next. Using reward prediction accuracy instead of reward prediction size helps XCS find rules that describe the problem in a stable, predictable way.

Installation¶

To install xcs, you will of course need a Python 3 interpreter. The latest version of the standard CPython distribution is available for download from the Python Software Foundation, or if you prefer a download that comes with a long list of top-notch machine learning and scientific computing packages already built for you, I recommend Anaconda from Continuum Analytics.

Starting with Python 3.4, the standard CPython distribution comes with the package installation tool, pip, as part of the standard distribution. Anaconda comes with pip regardless of the Python version. If you have pip, installation of xcs is straight forward:

pip install xcs

If all goes as planned, you should see a message like this:

Successfully installed xcs-1.0.0

If for some reason you are unable to use pip, you can still install xcs manually. The latest release can be found here or here. Download the zip file, unpack it, and cd into the directory. Then run:

python setup.py install

You should see a message indicating that the package was successfully installed.

Testing the Newly Installed Package¶

Let's start things off with a quick test, to verify that everything has been installed properly. First, fire up the Python interpreter. We'll set up Python's built-in logging system so we can see the test's progress.

In [1]:
import logging
logging.root.setLevel(logging.INFO)


Then we import the xcs module and run the built-in test() function. By default, the test() function runs the canonical XCS algorithm on the 11-bit (3-bit address) MUX problem for 10,000 steps.

In [ ]:
import xcs
xcs.test()

INFO:xcs.scenarios:Possible actions:
INFO:xcs.scenarios:    False
INFO:xcs.scenarios:    True
INFO:xcs.scenarios:Steps completed: 0
INFO:xcs.scenarios:Average reward per step: 0.00000
INFO:xcs.scenarios:Steps completed: 100
INFO:xcs.scenarios:Average reward per step: 0.57000
INFO:xcs.scenarios:Steps completed: 200
INFO:xcs.scenarios:Average reward per step: 0.58500

.
.
.

001#0###### => False
Time Stamp: 9980
Average Reward: 1.0
Error: 0.0
Fitness: 0.8161150828153352
Experience: 236
Action Set Size: 25.03847865419106
Numerosity: 9
11#######11 => True
Time Stamp: 9994
Average Reward: 1.0
Error: 0.0
Fitness: 0.9749473121531844
Experience: 428
Action Set Size: 20.685392494947063
Numerosity: 11

INFO:xcs:Total time: 15.05068 seconds

Your results may vary somewhat from what is shown here. XCS relies on randomization to discover new rules, so unless you set the random seed with random.seed(), each run will be different.

Usage¶

Now we'll run through a quick demo of how to use existing algorithms and problems. This is essentially the same code that appears in the test() function we called above.

First, we're going to need to import a few things:

In [1]:
from xcs import XCSAlgorithm
from xcs.scenarios import MUXProblem, ScenarioObserver


The XCSAlgorithm class contains the actual XCS algorithm implementation. The ClassifierSet class is used to represent the algorithm's state, in the form of a set of classifier rules. MUXProblem is the classic multiplexer problem, which defaults to 3 address bits (11 bits total). ScenarioObserver is a wrapper for scenarios which logs the inputs, actions, and rewards as the algorithm attempts to solve the problem.

Now that we've imported the necessary tools, we can define the actual problem, telling it to give the algorithm 10,000 reward cycles to attempt to learn the appropriate input/output mapping, and wrapping it with an observer so we can see the algorithm's progress.

In [2]:
scenario = ScenarioObserver(MUXProblem(50000))


Next, we'll create the algorithm which will be used to manage the classifier set and learn the mapping defined by the problem we have selected:

In [3]:
algorithm = XCSAlgorithm()


The algorithm's parameters are set to appropriate defaults for most problems, but it is straight forward to modify them if it becomes necessary.

In [4]:
algorithm.exploration_probability = .1
algorithm.discount_factor = 0
algorithm.do_ga_subsumption = True
algorithm.do_action_set_subsumption = True


Here we have selected an exploration probability of .1, which will sacrifice most (9 out of 10) learning opportunities in favor of taking advantage of what has already been learned so far. This makes sense in real-time learning environment; a lower value is more appropriate in cases where the classifier is being trained in advance or is being used simply to learn a minimal rule set. The discount factor is set to 0, since future rewards are not affected at all by the currently selected action. (This is not strictly necessary, since the scenario will inform the algorithm that reward chaining should not be used, but it is useful to highlight this fact.) We have also elected to turn on GA and action set subsumption, which help the system to converge to the minimal effective rule set more quickly in some types of scenarios.

Next, we create the classifier set:

In [5]:
model = algorithm.new_model(scenario)


The algorithm does the work for us, initializing the classifier set as it deems appropriate for the scenario we have provided. It provides the classifier set with the possible actions that can be taken in the given scenario; these are necessary for the classifier set to perform covering operations when the algorithm determines that the classifiers in the population provide insufficient coverage for a particular situation. (Covering is the addition to the population of a randomly generated classifier rule whose condition matches the current situation.)

And finally, this is where all the magic happens:

In [6]:
model.run(scenario, learn=True)


We pass the scenario to the classifier set and ask it to run to learn the appropriate input/output mapping. It executes training cycles until the scenario dictates that training should stop. Note that if you wish to see the progress as the algorithm interacts with the scenario, you will need to set the logging level to INFO, as described in the previous section, before calling the run() method.

Now we can observe the fruits of our labors.

In [ ]:
print(model)

10001#10100 => True
Time Stamp: 41601
Average Reward: 1e-05
Error: 1e-05
Fitness: 1e-05
Experience: 0
Action Set Size: 1
Numerosity: 1
00#00100#00 => True
Time Stamp: 48589
Average Reward: 1e-05
Error: 1e-05
Fitness: 1e-05
Experience: 0
Action Set Size: 1
Numerosity: 1

.
.
.

1111######1 => True
Time Stamp: 49968
Average Reward: 1.0
Error: 0.0
Fitness: 0.9654542879926405
Experience: 131
Action Set Size: 27.598176294274904
Numerosity: 10
010##1##### => True
Time Stamp: 49962
Average Reward: 1.0
Error: 0.0
Fitness: 0.8516265524887351
Experience: 1257
Action Set Size: 27.21325456027306
Numerosity: 13

This gives us a printout of each classifier rule, in the form condition => action, followed by various stats about the rule pertaining to the algorithm we selected. The classifier set can also be accessed as an iterable container:

In [8]:
print(len(model))

87

In [9]:
for rule in model:
if rule.fitness > .5 and rule.experience >= 10:
print(rule.condition, '=>', rule.action, ' [%.5f]' % rule.fitness)

0001####### => True  [0.83226]
10#####00## => False  [0.80711]
110######1# => True  [0.83886]
010##1##### => True  [0.85163]
##00#0##### => False  [0.65383]
1111######1 => True  [0.96545]
100####1### => True  [0.81688]
#0111###1## => True  [0.94060]


Defining New Scenario Types¶

To define a new scenario type, inherit from the Scenario abstract class defined in the xcs.scenarios submodule. Suppose, as an example, that we wish to test the algorithm's ability to find a single important input bit from among a large number of irrelevant input bits.

In [1]:
from xcs.scenarios import Scenario

class HaystackProblem(Scenario):
pass


We defined a new class, HaystackProblem, to represent this test case, which inherits from Scenario to ensure that we cannot instantiate the problem until the appropriate methods have been implemented.

Now let's define an __init__ method for this class. We'll need a parameter, training_cycles, to determine how many reward cycles the algorithm has to identify the "needle", and another parameter, input_size, to determine how big the "haystack" is.

In [2]:
from xcs.scenarios import Scenario

class HaystackProblem(Scenario):

def __init__(self, training_cycles=1000, input_size=500):
self.input_size = input_size
self.possible_actions = (True, False)
self.initial_training_cycles = training_cycles
self.remaining_cycles = training_cycles


The input_size is saved as a member for later use. Likewise, the value of training_cycles was saved in two places: the remaining_cycles member, which tells the instance how many training cycles remain for the current run, and the initial_training_cycles member, which the instance will use to reset remaining_cycles to the original value for a new run.

We also defined the possible_actions member, which we set to (True, False). This is the value we will return when the algorithm asks for the possible actions. We will expect the algorithm to return True when the needle bit is set, and False when the needle bit is clear, in order to indicate that it has correctly identified the needle's location.

Now let's define some methods for the class. The Scenario base class defines several abstract methods, and one abstract property:

• is_dynamic is a property with a Boolean value that indicates whether the actions from one reward cycle can affect the rewards or situations of later reward cycles.
• get_possible_actions() is a method that should return the actions the algorithm can take.
• reset() should restart the problem for a new run.
• sense() should return a new input (the "situation").
• execute(action) should accept an action from among those returned by get_possible_actions(), in response to the last situation that was returned by sense(). It should then return a reward value indicating how well the algorithm is doing at responding correctly to each situation.
• more() should return a Boolean value to indicate whether the algorithm has remaining reward cycles in which to learn.

The abstract methods and the property must each be defined, or we will get a TypeError when we attempt to instantiate the class:

In [3]:
problem = HaystackProblem()

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-3-3ba8a9996059> in <module>()
----> 1 problem = HaystackProblem()

TypeError: Can't instantiate abstract class HaystackProblem with abstract methods execute, get_possible_actions, is_dynamic, more, reset, sense

The implementations for the property and the methods other than sense() and execute() will be trivial, so let's start with those:

In [4]:
from xcs.scenarios import Scenario

class HaystackProblem(Scenario):

def __init__(self, training_cycles=1000, input_size=500):
self.input_size = input_size
self.possible_actions = (True, False)
self.initial_training_cycles = training_cycles
self.remaining_cycles = training_cycles

@property
def is_dynamic(self):
return False

def get_possible_actions(self):
return self.possible_actions

def reset(self):
self.remaining_cycles = self.initial_training_cycles

def more(self):
return self.remaining_cycles > 0


Now we are going to get into the meat of the problem. We want to give the algorithm a random string of bits of size input_size and have it pick out the location of the needle bit through trial and error, by telling us what it thinks the value of the needle bit is. For this to be a useful test, the needle bit needs to be in a fixed location, which we have not yet defined. Let's choose a random bit from among inputs on each run.

In [5]:
import random

from xcs.scenarios import Scenario

class HaystackProblem(Scenario):

def __init__(self, training_cycles=1000, input_size=500):
self.input_size = input_size
self.possible_actions = (True, False)
self.initial_training_cycles = training_cycles
self.remaining_cycles = training_cycles
self.needle_index = random.randrange(input_size)

@property
def is_dynamic(self):
return False

def get_possible_actions(self):
return self.possible_actions

def reset(self):
self.remaining_cycles = self.initial_training_cycles
self.needle_index = random.randrange(self.input_size)

def more(self):
return self.remaining_cycles > 0


The sense() method is going to create a string of random bits of size input_size and return it. But first it will pick out the value of the needle bit, located at needle_index, and store it in a new member, needle_value, so that execute(action) will know what the correct value for action is.

In [6]:
import random

from xcs.scenarios import Scenario
from xcs.bitstrings import BitString

class HaystackProblem(Scenario):

def __init__(self, training_cycles=1000, input_size=500):
self.input_size = input_size
self.possible_actions = (True, False)
self.initial_training_cycles = training_cycles
self.remaining_cycles = training_cycles
self.needle_index = random.randrange(input_size)
self.needle_value = None

@property
def is_dynamic(self):
return False

def get_possible_actions(self):
return self.possible_actions

def reset(self):
self.remaining_cycles = self.initial_training_cycles
self.needle_index = random.randrange(self.input_size)

def more(self):
return self.remaining_cycles > 0

def sense(self):
haystack = BitString.random(self.input_size)
self.needle_value = haystack[self.needle_index]
return haystack


Now we need to define the execute(action) method. In order to give the algorithm appropriate feedback to make problem solvable, we should return a high reward when it guesses the correct value for the needle bit, and a low value otherwise. Thus we will return a 1 when the action is the value of the needle bit, and a 0 otherwise. We must also make sure to decrement the remaining cycles to prevent the problem from running indefinitely.

In [7]:
import random

from xcs.scenarios import Scenario
from xcs.bitstrings import BitString

class HaystackProblem(Scenario):

def __init__(self, training_cycles=1000, input_size=500):
self.input_size = input_size
self.possible_actions = (True, False)
self.initial_training_cycles = training_cycles
self.remaining_cycles = training_cycles
self.needle_index = random.randrange(input_size)
self.needle_value = None

@property
def is_dynamic(self):
return False

def get_possible_actions(self):
return self.possible_actions

def reset(self):
self.remaining_cycles = self.initial_training_cycles
self.needle_index = random.randrange(self.input_size)

def more(self):
return self.remaining_cycles > 0

def sense(self):
haystack = BitString.random(self.input_size)
self.needle_value = haystack[self.needle_index]
return haystack

def execute(self, action):
self.remaining_cycles -= 1
return action == self.needle_value


We have now defined all of the methods that Scenario requires. Let's give it a test run.

In [ ]:
import logging
import xcs

from xcs.scenarios import ScenarioObserver

# Setup logging so we can see the test run as it progresses.
logging.root.setLevel(logging.INFO)

# Create the scenario instance
problem = HaystackProblem()

# Wrap the scenario instance in an observer so progress gets logged,
# and pass it on to the test() function.
xcs.test(scenario=ScenarioObserver(problem))

INFO:xcs.scenarios:Possible actions:
INFO:xcs.scenarios:    False
INFO:xcs.scenarios:    True
INFO:xcs.scenarios:Steps completed: 0
INFO:xcs.scenarios:Average reward per step: 0.00000
INFO:xcs.scenarios:Steps completed: 100
INFO:xcs.scenarios:Average reward per step: 0.55000

.
.
.

INFO:xcs.scenarios:Steps completed: 900
INFO:xcs.scenarios:Average reward per step: 0.51667
INFO:xcs.scenarios:Steps completed: 1000
INFO:xcs.scenarios:Average reward per step: 0.50900
INFO:xcs.scenarios:Run completed.
INFO:xcs.scenarios:Total steps: 1000
INFO:xcs.scenarios:Average reward per step: 0.50900
INFO:xcs:Classifiers:

010#11110##001###01#101001#00#1##100110##11#111#00#00#1#10#10#1110#100110#1#1100#10#111#1011100###1#1##1#0#1##011#1#0#0##1011010011#0#0101#00#01#0#0##01101##100#00010111##111010#100110##1101110##11#01110##1#0#110#000#010#1011##10#00#0#101011#000000##11#00#1#0110#0110100010##0100011#1#0###11#110#0###1##0100##1#11#1##101####111011#01#110101011001#110110#011111##1#0##1010#011000101001#10#10#0#00##1#110##1011100#1111##01#00#11#010001100#10####01###010001###1##1110#10####100#0#01#0#10##100####1110#00 => False
Time Stamp: 169
Average Reward: 0.0
Error: 0.0
Fitness: 0.15000850000000002
Experience: 1
Action Set Size: 1.0
Numerosity: 1
11##101#1###11101#0010####01#111##100011010###10##01#1100#010#11##01011#00##0#0#1001111#0#11011100010100101#1#1#01#0001000##101100###11#1#1111011110010#01010#101010###010##010##001#1#10#1001##0#1101111##0#0#0#1#11#01011000####111#1#1##10110##1###1#1#00#110##00000#11101110010###01#0#11#1###1#1#01#100110####11##0000#01#0#0011#01##10#100##00##010111##0#1#100#0##10#01000000001#00##1#11001#1011##1##1100011#1###01#####0#0111111#00#1101101##101#01#101#11##001#0000#1011#01#0#11#0#0#0##0#1010#0#01110110# => False
Time Stamp: 254
Average Reward: 0.0
Error: 0.0
Fitness: 0.15000850000000002
Experience: 1
Action Set Size: 1.0
Numerosity: 1

.
.
.

###10010010010110#1#01###000100##0#0##0###01#1#1#100101#01#110#0##011#0100#0#1111001##01010##0#1#01011110#0#100110#00##1100##1011##1##0#0####111##111##000##01#001##110##10#01#0#1#00#110#100#10#1#0#1100#010#110##1011##1110#0#01#00#011#0001110#1110#0110111#0#101#01#101#00#0#1110100#1##0#101101#1###11#11###001100010###0#111101##1#111#111010#1##0011##00111000##11110#0#01#0#0#0#1#0#110000###00110##10001001011111#001101#11#111##01#0#1#10#1##000######0110##01#1#010#011#11#001##10111#1101#0#1001##011#10 => True
Time Stamp: 996
Average Reward: 1.0
Error: 0.0
Fitness: 0.15000850000000002
Experience: 1
Action Set Size: 1.0
Numerosity: 1
0101#0010100011#11##1100##001001###010#111001#####111001#1011#1100#1111#00101111#0#1011##1#1###00001011001#10##00###101##011111##1#00#1011001###10001###11####1##1#01#0#1#0#11100001110##11#001001#01#####0110#011011#0#111#1111##0#110111001#100#011111100110#11####0##01#100#11#1000#10#00#00#0#0#1##0100#100#11###01#1100##1###000##01#10#0#0001#0100#10#1#001#11####1001#110#0##11#0#0100#010##0#011100##11#0#11101#000000010#00101#0#0#11110#0010#1100#11#01#11##10#10#10#1100#1#00#0100#10#1##10#00011010100#0 => True
Time Stamp: 998
Average Reward: 1.0
Error: 0.0
Fitness: 0.15000850000000002
Experience: 1
Action Set Size: 1.0
Numerosity: 1

INFO:xcs:Total time: 2.65542 seconds

Hmm, the classifier set didn't do so hot. Maybe we've found a weakness in the algorithm, or maybe some different parameter settings will improve its performance. Let's reduce the size of the haystack and give it more reward cycles so we can see whether it's learning at all.

In [ ]:
problem = HaystackProblem(training_cycles=10000, input_size=100)

xcs.test(scenario=ScenarioObserver(problem))

INFO:xcs.scenarios:Possible actions:
INFO:xcs.scenarios:    False
INFO:xcs.scenarios:    True
INFO:xcs.scenarios:Steps completed: 0
INFO:xcs.scenarios:Average reward per step: 0.00000
INFO:xcs.scenarios:Steps completed: 100
INFO:xcs.scenarios:Average reward per step: 0.47000

.
.
.

INFO:xcs.scenarios:Steps completed: 9900
INFO:xcs.scenarios:Average reward per step: 0.49222
INFO:xcs.scenarios:Steps completed: 10000
INFO:xcs.scenarios:Average reward per step: 0.49210
INFO:xcs.scenarios:Run completed.
INFO:xcs.scenarios:Total steps: 10000
INFO:xcs.scenarios:Average reward per step: 0.49210
INFO:xcs:Classifiers:

11#1001##0110000#101####001010##111111#1110#00#0100#11100#1###0110110####11#011##0#0#1###011#1#11001 => False
Time Stamp: 9771
Average Reward: 1.0
Error: 0.0
Fitness: 8.5e-07
Experience: 0
Action Set Size: 1
Numerosity: 1
#00001100##1010#01111101001#0###0#10#10#11###10#1#0#0#11#11010111111#0#01#111#0#100#00#10000111##000 => False
Time Stamp: 8972
Average Reward: 0.0
Error: 0.0
Fitness: 0.15000850000000002
Experience: 1
Action Set Size: 1.0
Numerosity: 1

.
.
.

100#0010010###0#1001#1#0100##0#1##101#011#0#0101110#1111#11#000##0#1#0##001#1110##001011###1001##01# => True
Time Stamp: 9993
Average Reward: 1.0
Error: 0.0
Fitness: 0.15000850000000002
Experience: 1
Action Set Size: 1.0
Numerosity: 1
10#100##110##00#001##0#100100#00#1110##100##1#1##1111###00#0#1#1##00#010##00011#10#1#11##0#0#01100#0 => False
Time Stamp: 9997
Average Reward: 1.0
Error: 0.0
Fitness: 0.15000850000000002
Experience: 1
Action Set Size: 1.0
Numerosity: 1

INFO:xcs:Total time: 21.50882 seconds

It appears the algorithm isn't learning at all, at least not at a visible rate. But after a few rounds of playing with the parameter values, it becomes apparent that with the correct settings and sufficient training cycles, it is possible for the algorithm to handle the new scenario.

In [ ]:
problem = HaystackProblem(training_cycles=10000, input_size=500)

algorithm = xcs.XCSAlgorithm()

# Default parameter settings in test()
algorithm.exploration_probability = .1

# Modified parameter settings
algorithm.ga_threshold = 1
algorithm.crossover_probability = .5
algorithm.do_action_set_subsumption = True
algorithm.do_ga_subsumption = False
algorithm.wildcard_probability = .998
algorithm.deletion_threshold = 1
algorithm.mutation_probability = .002

xcs.test(algorithm, scenario=ScenarioObserver(problem))

INFO:xcs.scenarios:Possible actions:
INFO:xcs.scenarios:    False
INFO:xcs.scenarios:    True
INFO:xcs.scenarios:Steps completed: 0
INFO:xcs.scenarios:Average reward per step: 0.00000
INFO:xcs.scenarios:Steps completed: 100
INFO:xcs.scenarios:Average reward per step: 0.44000

.
.
.

INFO:xcs.scenarios:Steps completed: 9900
INFO:xcs.scenarios:Average reward per step: 0.71818
INFO:xcs.scenarios:Steps completed: 10000
INFO:xcs.scenarios:Average reward per step: 0.71990
INFO:xcs.scenarios:Run completed.
INFO:xcs.scenarios:Total steps: 10000
INFO:xcs.scenarios:Average reward per step: 0.71990
INFO:xcs:Classifiers:

.
.
.

#########################################################################0#################################################################################################################################################################################################################0######################################################################################################################################################################################################################## => False
Time Stamp: 10000
Average Reward: 1.0
Error: 0.0
Fitness: 0.06603241567396244
Experience: 84
Action Set Size: 35.38393265784616
Numerosity: 2

.
.
.

#########################################################################0##################################################################################################################0####################################################################################################################################################################################################################################################################################################################### => False
Time Stamp: 10000
Average Reward: 1.0
Error: 0.0
Fitness: 0.11504822315545861
Experience: 218
Action Set Size: 35.391204726113
Numerosity: 2

.
.
.

#########################################################################0################################################################################################################################################################################################################################################################################################################################1######################################################################################################### => False
Time Stamp: 10000
Average Reward: 1.0
Error: 0.0
Fitness: 0.09257423134262922
Experience: 67
Action Set Size: 35.07541457874612
Numerosity: 3
#########################################################################0########################################################################################################################################################################################################################################################################################################################################################################################################################################## => False
Time Stamp: 10000
Average Reward: 1.0
Error: 0.0
Fitness: 0.12456467313416163
Experience: 2384
Action Set Size: 35.1459543020003
Numerosity: 5
#################################################1#######################0########################################################################################################################################################################################################################################################################################################################################################################################################################################## => False
Time Stamp: 9998
Average Reward: 1.0
Error: 0.0
Fitness: 0.14507972120595913
Experience: 178
Action Set Size: 35.37225272699431
Numerosity: 5
#########################################################################0####################################################################################################################################################0##################################################################################################################################################################################################################################################################################### => False
Time Stamp: 10000
Average Reward: 1.0
Error: 0.0
Fitness: 0.17931849040575895
Experience: 60
Action Set Size: 37.06726834263018
Numerosity: 5
#########################################################################1########################################################################################################################################################################################################################################################################################################################################################################################################################################## => True
Time Stamp: 9997
Average Reward: 1.0
Error: 0.0
Fitness: 0.9556621050752229
Experience: 2482
Action Set Size: 54.59161655445003
Numerosity: 49

INFO:xcs:Total time: 35.27361 seconds