Introduction
Warholize is a GC3Pie demo application to produce, from a generic
image picture, a new picture like the famous Warhol’s work:
Marylin. The script uses the powerful ImageMagick set of tools
(at least version 6.3.5-7). This tutorial will assume that both
ImageMagick and GC3Pie are already installed and configured.
In order to produce a similar image we have to do a series of
transformations on the picture:
convert the original image to grayscale.
colorize the grayscale image using three different colors each
time, based on the gray levels. We may, for instance, make all
pixels with luminosity between 0-33% in red, pixels
between 34-66% in yellow and pixels between 67% and 100% in green.
To do that, we first have to:
- create a Color Lookup Table (LUT) using a combination of three
randomly chosen colors
- apply the LUT to the grayscale image
Finally, we can merge together all the colorized images and produce
our warholized image.
Clearly, step 2) depends on the step 1), and 3) depends on 2), so we
basically have a sequence of tasks, but since step 2) need to create
N different independent images, we can parallelize this step.
From top to bottom
We will write our script starting from the top and will descend to the
bottom, from the command line script, to the workflow and finally to
the single execution units which compose the application.
The script
The SessionBasedScript class in the gc3libs.cmdline module is used
to create a generic script. It already have all what is needed to read
gc3pie configuration files, manage resources, schedule jobs etc. The
only missing thing is, well, your application!
Let’s start by creating a new empty file and importing some basic
modules:
import os
import gc3libs
from gc3libs.cmdline import SessionBasedScript
we then create a class which inherits from SessionBasedScript (in
GC3Pie, most of the customizations are done by inheriting from a more
generic class and overriding the __init__ method and possibly
others):
class WarholizeScript(SessionBasedScript):
"""
Demo script to create a `Warholized` version of an image.
"""
version='1.0'
Please note that you must either write a small docstring, or add a
description attribute. These values are used when the script is
called with options --help or --version, which are
automatically added by GC3Pie.
The way we want to use our script is straightforward:
$ warholize.py inputfile [inputfiles ...]
and this will create a directory Warholized.<inputfile> in which
there will be a file called warhol_<inputfile> containing the
desired warholized image (and a lot of temporary files, at least for now).
But we may want to add some additional options to the script, in order
to decide how many colorized pictures the warholized image will be
made of, or if we want to resize the image. SessionBasedScript uses
the PyCLI module which is, in turn, a wrapper around standard
argparse (or optparse for older pythons) module. To customize
the script you may define a setup_options method and put in there
some calls to SessionBasedScript.add_param(), which is inherited
from cli.app.CommandLineApp:
def setup_options(self):
self.add_param('--copies', default=4, type=int,
help="Number of copyes (Default:4). It has to be a perfect square!")
In this example we will accept a --copies option to define how
many colorized copies the final picture will be made of. Please refer
to the documentation of the PyCLI module for details on the syntax
of the add_param method.
The heart of the script is, however, the new_tasks method, which
will be called to create the initial tasks of the scripts. In our
case it will be something like:
def new_tasks(self, extra):
gc3libs.log.info("Creating main sequential task")
for (i, input_file) in enumerate(self.params.args):
extra_args = extra.copy()
extra_args['output_dir'] = 'Warholized.%s' % os.path.basename(input_file)
yield WarholizeWorkflow(input_file,
self.params.copies,
**extra_args)
new_tasks is used as a generator (but it could return a list as
well). Each yielded object is a task. In GC3Pie, a task is
either a single application or a complex workflow, and rapresents an
execution unit. In our case we create a WarholizeWorkflow task
which is the workflow described before.
In our case we yield a different WarholizeWorkflow task for each
input file. These tasks will run in parallel.
Please note that we are using the gc3libs.log module to log
informations about the execution. This module works like the
logging module and has methods like error, warning, info or
debug, but its logging level is automatically configured by
SessionBasedScript‘s constructor. This way you can increase the
verbosity of your script by simply adding -v options from the
command line.
The workflows
Main sequential workflow
The module gc3libs.workflow contains two main objects:
SequentialTaskCollection and ParallelTaskCollection. They execute
tasks in serial and in parallel, respectively. We will use both of
them to create our workflow; the first one, WarholizeWorkflow, is a
sequential task, therefore we have to inherit from
SequentialTaskCollection and customize its __init__ method:
from gc3libs.workflow import SequentialTaskCollection, ParallelTaskCollection
import math
from gc3libs import Run
class WarholizeWorkflow(SequentialTaskCollection):
"""
Main workflow.
"""
def __init__(self, input_image, copies, **extra_args):
self.input_image = input_image
self.output_image = "warhol_%s" % os.path.basename(input_image)
gc3libs.log.info(
"Producing a warholized version of input file %s "
"and store it in %s" % (input_image, self.output_image))
self.output_dir = os.path.relpath(extra_args.get('output_dir'))
self.copies = copies
# Check that copies is a perfect square
if math.sqrt(self.copies) != int(math.sqrt(self.copies)):
raise gc3libs.exceptions.InvalidArgument(
"`copies` argument must be a perfect square.")
self.jobname = extra_args.get('jobname', 'WarholizedWorkflow')
self.grayscaled_image = "grayscaled_%s" % os.path.basename(self.input_image)
Up to now we just parsed the arguments. The following lines, instead,
create the first task that we want to execute. By now, we can create
only the first one, GrayScaleConvertApplication, which will produce
a grayscale image from the input file:
self.tasks = [
GrayScaleConvertApplication(
self.input_image, self.grayscaled_image, self.output_dir,
self.output_dir),
]
Finally, we call the parent’s constructor.:
SequentialTaskCollection.__init__(
self, self.tasks)
This will create the initial task list, but we have to run also step 2
and 3, and this is done by creating a next method. This method will
be called after all the tasks in self.tasks are finished. We cannot
create all the jobs at once because we don’t have all the needed input
files yet. Please note that by creating the tasks in the next method
you could decide at runtime which tasks to run next and what
arguments we may want to give to them.
In our case, however, the next method is quite simple:
def next(self, iteration):
last = self.tasks[-1]
if iteration == 0:
# first time we got called. We have the grayscaled image,
# we have to run the Tricolorize task.
self.add(TricolorizeMultipleImages(
os.path.join(self.output_dir, self.grayscaled_image),
self.copies, self.output_dir))
return Run.State.RUNNING
elif iteration == 1:
# second time, we already have the colorized images, we
# have to merge them together.
self.add(MergeImagesApplication(
os.path.join(self.output_dir, self.grayscaled_image),
last.warhol_dir,
self.output_image))
return Run.State.RUNNING
else:
self.execution.returncode = last.execution.returncode
return Run.State.TERMINATED
At each iteration, we call self.add() to add an instance of a
task-like class (gc3libs.Application,
gc3libs.workflow.ParallelTaskCollection or
gc3libs.workflow.SequentialTaskCollection, in our case) to complete
the next step, and we return the current state, which will be
gc3libs.Run.State.RUNNING unless we have finished the computation.
Step one: convert to grayscale
GrayScaleConvertApplication is the application responsible to
convert to grayscale the input image. The command we want to execute
is:
$ convert -colorspace gray <input_image> grayscaled_<input_image>
To create a generic application we create a class which inherit from
gc3libs.Application and we usually only need to customize the
__init__ method:
# An useful function to copy files
from gc3libs.utils import copyfile
class GrayScaleConvertApplication(gc3libs.Application):
def __init__(self, input_image, grayscaled_image, output_dir, warhol_dir):
self.warhol_dir = warhol_dir
self.grayscaled_image = grayscaled_image
arguments = [
'convert',
os.path.basename(input_image),
'-colorspace',
'gray',
]
gc3libs.log.info(
"Craeting GrayScale convert application from file %s"
"to file %s" % (input_image, grayscaled_image))
gc3libs.Application.__init__(
self,
arguments = arguments + [grayscaled_image],
inputs = [input_image],
outputs = [grayscaled_image, 'stderr.txt', 'stdout.txt'],
output_dir = output_dir,
stdout = 'stdout.txt',
stderr = 'stderr.txt',
)
Creating a gc3libs.Application is straigthforward: you just
call the constructor with the executable, the arguments, and the
input/output files you will need.
If you don’t specify the output_dir directory, gc3pie libraries
will create one starting from the class name. If the output directory
exists already, the old one will be renamed.
To do any kind of post processing you can define a terminate method
for your application. It will be called after your application will
terminate. In our case we want to copy the gray scale version of the
image to the warhol_dir, so that it will be easily reachable by all
other applications:
def terminated(self):
"""Move grayscale image to the main output dir"""
copyfile(
os.path.join(self.output_dir, self.grayscaled_image),
self.warhol_dir)
Step two: parallel workflow to create colorized images
The TricolorizeMultipleImages is responsible to create multiple
versions of the grayscale image with different coloration chosen
randomly from a list of available colors. It does it by running
multiple instance of TricolorizeImage with different
arguments. Since we want to run the various colorization in parallel,
it inherits from gc3libs.workflow.ParallelTaskCollection class. Like we
did for GrayScaleConvertApplication, we only need to customize the
constructor __init__, creating the various subtasks we want to run:
import itertools
import random
class TricolorizeMultipleImages(ParallelTaskCollection):
colors = ['yellow', 'blue', 'red', 'pink', 'orchid',
'indigo', 'navy', 'turquoise1', 'SeaGreen', 'gold',
'orange', 'magenta']
def __init__(self, grayscaled_image, copies, output_dir):
gc3libs.log.info(
"TricolorizeMultipleImages for %d copies run" % copies)
self.jobname = "Warholizer_Parallel"
ncolors = 3
### XXX Why I have to use basename???
self.output_dir = os.path.join(
os.path.basename(output_dir), 'tricolorize')
self.warhol_dir = output_dir
# Compute a unique sequence of random combination of
# colors. Please note that we can have a maximum of N!/3! if N
# is len(colors)
assert copies <= math.factorial(len(self.colors)) / math.factorial(ncolors)
combinations = [i for i in itertools.combinations(self.colors, ncolors)]
combinations = random.sample(combinations, copies)
# Create all the single tasks
self.tasks = []
for i, colors in enumerate(combinations):
self.tasks.append(TricolorizeImage(
os.path.relpath(grayscaled_image),
"%s.%d" % (self.output_dir, i),
"%s.%d" % (grayscaled_image, i),
colors,
self.warhol_dir))
ParallelTaskCollection.__init__(self, self.tasks)
The main loop will fill the self.tasks list with various
TricolorizedImage tasks, each one with an unique combination of
three colors to use to generate the colorized image. The GC3Pie
framework will then run these tasks in parallel, on any available
resource.
The TricolorizedImage class is indeed a SequentialTaskCollection,
since it has to generate the LUT first, and then apply it to the
grayscale image. We already saw how to create a
SequentialTaskCollection: we modify the constructor in order to add
the first job (CreateLutApplication), and the next method will
take care of running the ApplyLutApplication application on the
output of the first job:
class TricolorizeImage(SequentialTaskCollection):
"""
Sequential workflow to produce a `tricolorized` version of a
grayscale image
"""
def __init__(self, grayscaled_image, output_dir, output_file,
colors, warhol_dir):
self.grayscaled_image = grayscaled_image
self.output_dir = output_dir
self.warhol_dir = warhol_dir
self.jobname = 'TricolorizeImage'
self.output_file = output_file
if not os.path.isdir(output_dir):
os.mkdir(output_dir)
gc3libs.log.info(
"Tricolorize image %s to %s" % (
self.grayscaled_image, self.output_file))
self.tasks = [
CreateLutApplication(
self.grayscaled_image,
"%s.miff" % self.grayscaled_image,
self.output_dir,
colors, self.warhol_dir),
]
SequentialTaskCollection.__init__(self, self.tasks)
def next(self, iteration):
last = self.tasks[-1]
if iteration == 0:
# First time we got called. The LUT has been created, we
# have to apply it to the grayscale image
self.add(ApplyLutApplication(
self.grayscaled_image,
os.path.join(last.output_dir, last.lutfile),
os.path.basename(self.output_file),
self.output_dir, self.warhol_dir))
return Run.State.RUNNING
else:
self.execution.returncode = last.execution.returncode
return Run.State.TERMINATED
The CreateLutApplication is again an application which inherits from
gc3libs.Application. The command we want to execute is something
like:
$ convert -size 1x1 xc:<color1> xc:<color2> xc:<color3> +append -resize 256x1! <output_file.miff>
This will basically create an image 256x1 pixels big, made of a
gradient using all the listed colors. The code will look like:
class CreateLutApplication(gc3libs.Application):
"""Create the LUT for the image using 3 colors picked randomly
from CreateLutApplication.colors"""
def __init__(self, input_image, output_file, output_dir, colors, working_dir):
self.lutfile = os.path.basename(output_file)
self.working_dir = working_dir
gc3libs.log.info("Creating lut file %s from %s using "
"colors: %s" % (
self.lutfile, input_image, str.join(", ", colors)))
gc3libs.Application.__init__(
self,
arguments = [
'convert',
'-size',
'1x1'] + [
"xc:%s" % color for color in colors] + [
'+append',
'-resize',
'256x1!',
self.lutfile,
],
inputs = [input_image],
outputs = [self.lutfile, 'stdout.txt', 'stderr.txt'],
output_dir = output_dir + '.createlut',
stdout = 'stdout.txt',
stderr = 'stderr.txt',
)
Similarly, the ApplyLutApplication application will run the
following command:
$ convert grayscaled_<input_image> <lutfile.N.miff> -clut grayscaled_<input_image>.<N>
This command will apply the LUT to the grayscaled image: it will
modify the grayscaled image by coloring a generic pixel with a
luminosity value of n (which will be an integer value from 0 to 255,
of course) with the color at position n in the LUT image (actually,
n+1). Each ApplyLutApplication will save the resulting image to a
file named as grayscaled_<input_image>.<N>.
The class will look like:
class ApplyLutApplication(gc3libs.Application):
"""Apply the LUT computed by `CreateLutApplication` to
`image_file`"""
def __init__(self, input_image, lutfile, output_file, output_dir, working_dir):
gc3libs.log.info("Applying lut file %s to %s" % (lutfile, input_image))
self.working_dir = working_dir
self.output_file = output_file
gc3libs.Application.__init__(
self,
arguments = [
'convert',
os.path.basename(input_image),
os.path.basename(lutfile),
'-clut',
output_file,
],
inputs = [input_image, lutfile],
outputs = [output_file, 'stdout.txt', 'stderr.txt'],
output_dir = output_dir + '.applylut',
stdout = 'stdout.txt',
stderr = 'stderr.txt',
)
The terminated method:
def terminated(self):
"""Copy colorized image to the output dir"""
copyfile(
os.path.join(self.output_dir, self.output_file),
self.working_dir)
will copy the colorized image file in the top level directory,
so that it will be easier for the last application to find all the
needed files.
Step three: merge all them together
At this point we will have in the main output directory a bunch of
files named after grayscaled_<input_image>.N with N a sequential
integer and <input_image> the name of the original image. The last
application, MergeImagesApplication, will produce a
warhol_<input_image> image by merging all of them using the
command:
$ montage grayscaled_<input_image>.* -tile 3x3 -geometry +5+5 -background white warholized_<input_image>
Now it should be easy to write such application:
import re
class MergeImagesApplication(gc3libs.Application):
def __init__(self, grayscaled_image, input_dir, output_file):
ifile_regexp = re.compile(
"%s.[0-9]+" % os.path.basename(grayscaled_image))
input_files = [
os.path.join(input_dir, fname) for fname in os.listdir(input_dir)
if ifile_regexp.match(fname)]
input_filenames = [os.path.basename(i) for i in input_files]
gc3libs.log.info("MergeImages initialized")
self.input_dir = input_dir
self.output_file = output_file
tile = math.sqrt(len(input_files))
if tile != int(tile):
gc3libs.log.error(
"We would expect to have a perfect square"
"of images to merge, but we have %d instead" % len(input_files))
raise gc3libs.exceptions.InvalidArgument(
"We would expect to have a perfect square of images to merge, but we have %d instead" % len(input_files))
gc3libs.Application.__init__(
self,
arguments = ['montage'] + input_filenames + [
'-tile',
'%dx%d' % (tile, tile),
'-geometry',
'+5+5',
'-background',
'white',
output_file,
],
inputs = input_files,
outputs = [output_file, 'stderr.txt', 'stdout.txt'],
output_dir = os.path.join(input_dir, 'output'),
stdout = 'stdout.txt',
stderr = 'stderr.txt',
)
Making the script executable
Finally, in order to make the script executable, we add the
following lines to the end of the file. The WarholizeScritp().run()
call will be executed only when the file is run as a script, and will
do all the magic related to argument parsing, creating the session
etc...:
if __name__ == '__main__':
import warholize
warholize.WarholizeScript().run()
Please note that the import warholize statement is important to
address issue 95 and make the gc3pie scripts work with your current
session (gstat, ginfo...)
Testing
To test this script I would suggest to use the famous Lena picture,
which can be found in the miscelaneous section of the Signal and
Image Processing Institute page. Download the image, rename it as
lena.tiff and run the following command:
$ ./warholize.py -C 1 lena.tiff --copies 9
(add -r localhost if your gc3pie.conf script support it and you
want to test it locally).
After completion a file Warholized.lena.tiff/output/warhol_lena.tiff
will be created.