Introduction

Warholize is a GC3Pie demo application to produce, from a generic image picture, a new picture like the famous Warhol’s work: Marylin. The script uses the powerful ImageMagick set of tools (at least version 6.3.5-7). This tutorial will assume that both ImageMagick and GC3Pie are already installed and configured.

In order to produce a similar image we have to do a series of transformations on the picture:

  1. convert the original image to grayscale.

  2. colorize the grayscale image using three different colors each time, based on the gray levels. We may, for instance, make all pixels with luminosity between 0-33% in red, pixels between 34-66% in yellow and pixels between 67% and 100% in green.

    To do that, we first have to:

    1. create a Color Lookup Table (LUT) using a combination of three randomly chosen colors
    2. apply the LUT to the grayscale image
  3. Finally, we can merge together all the colorized images and produce our warholized image.

Clearly, step 2) depends on the step 1), and 3) depends on 2), so we basically have a sequence of tasks, but since step 2) need to create N different independent images, we can parallelize this step.

workflow of the `warholize` script

Workflow of the warholize script

From top to bottom

We will write our script starting from the top and will descend to the bottom, from the command line script, to the workflow and finally to the single execution units which compose the application.

The script

The SessionBasedScript class in the gc3libs.cmdline module is used to create a generic script. It already have all what is needed to read gc3pie configuration files, manage resources, schedule jobs etc. The only missing thing is, well, your application!

Let’s start by creating a new empty file and importing some basic modules:

import os
import gc3libs
from gc3libs.cmdline import SessionBasedScript

we then create a class which inherits from SessionBasedScript (in GC3Pie, most of the customizations are done by inheriting from a more generic class and overriding the __init__ method and possibly others):

class WarholizeScript(SessionBasedScript):
    """
    Demo script to create a `Warholized` version of an image.
    """
    version='1.0'

Please note that you must either write a small docstring, or add a description attribute. These values are used when the script is called with options --help or --version, which are automatically added by GC3Pie.

The way we want to use our script is straightforward:

$  warholize.py inputfile [inputfiles ...]

and this will create a directory Warholized.<inputfile> in which there will be a file called warhol_<inputfile> containing the desired warholized image (and a lot of temporary files, at least for now).

But we may want to add some additional options to the script, in order to decide how many colorized pictures the warholized image will be made of, or if we want to resize the image. SessionBasedScript uses the PyCLI module which is, in turn, a wrapper around standard argparse (or optparse for older pythons) module. To customize the script you may define a setup_options method and put in there some calls to SessionBasedScript.add_param(), which is inherited from cli.app.CommandLineApp:

def setup_options(self):
    self.add_param('--copies', default=4, type=int,
                   help="Number of copyes (Default:4). It has to be a perfect square!")

In this example we will accept a --copies option to define how many colorized copies the final picture will be made of. Please refer to the documentation of the PyCLI module for details on the syntax of the add_param method.

The heart of the script is, however, the new_tasks method, which will be called to create the initial tasks of the scripts. In our case it will be something like:

def new_tasks(self, extra):
    gc3libs.log.info("Creating main sequential task")
    for (i, input_file) in enumerate(self.params.args):
        extra_args = extra.copy()
        extra_args['output_dir'] = 'Warholized.%s' % os.path.basename(input_file)
        yield WarholizeWorkflow(input_file,
                                self.params.copies,
                                **extra_args)

new_tasks is used as a generator (but it could return a list as well). Each yielded object is a task. In GC3Pie, a task is either a single application or a complex workflow, and rapresents an execution unit. In our case we create a WarholizeWorkflow task which is the workflow described before.

In our case we yield a different WarholizeWorkflow task for each input file. These tasks will run in parallel.

Please note that we are using the gc3libs.log module to log informations about the execution. This module works like the logging module and has methods like error, warning, info or debug, but its logging level is automatically configured by SessionBasedScript‘s constructor. This way you can increase the verbosity of your script by simply adding -v options from the command line.

The workflows

Main sequential workflow

The module gc3libs.workflow contains two main objects: SequentialTaskCollection and ParallelTaskCollection. They execute tasks in serial and in parallel, respectively. We will use both of them to create our workflow; the first one, WarholizeWorkflow, is a sequential task, therefore we have to inherit from SequentialTaskCollection and customize its __init__ method:

from gc3libs.workflow import SequentialTaskCollection, ParallelTaskCollection
import math
from gc3libs import Run

class WarholizeWorkflow(SequentialTaskCollection):
    """
    Main workflow.
    """

    def __init__(self, input_image,  copies, **extra_args):
        self.input_image = input_image
        self.output_image = "warhol_%s" % os.path.basename(input_image)

        gc3libs.log.info(
            "Producing a warholized version of input file %s "
            "and store it in %s" % (input_image, self.output_image))

        self.output_dir = os.path.relpath(extra_args.get('output_dir'))

        self.copies = copies

        # Check that copies is a perfect square
        if math.sqrt(self.copies) != int(math.sqrt(self.copies)):
            raise gc3libs.exceptions.InvalidArgument(
                "`copies` argument must be a perfect square.")

        self.jobname = extra_args.get('jobname', 'WarholizedWorkflow')
        self.grayscaled_image = "grayscaled_%s" % os.path.basename(self.input_image)

Up to now we just parsed the arguments. The following lines, instead, create the first task that we want to execute. By now, we can create only the first one, GrayScaleConvertApplication, which will produce a grayscale image from the input file:

self.tasks = [
    GrayScaleConvertApplication(
        self.input_image, self.grayscaled_image, self.output_dir,
        self.output_dir),
    ]

Finally, we call the parent’s constructor.:

SequentialTaskCollection.__init__(
    self, self.tasks)

This will create the initial task list, but we have to run also step 2 and 3, and this is done by creating a next method. This method will be called after all the tasks in self.tasks are finished. We cannot create all the jobs at once because we don’t have all the needed input files yet. Please note that by creating the tasks in the next method you could decide at runtime which tasks to run next and what arguments we may want to give to them.

In our case, however, the next method is quite simple:

def next(self, iteration):
    last = self.tasks[-1]

    if iteration == 0:
        # first time we got called.  We have the grayscaled image,
        # we have to run the Tricolorize task.
        self.add(TricolorizeMultipleImages(
            os.path.join(self.output_dir, self.grayscaled_image),
            self.copies, self.output_dir))
        return Run.State.RUNNING
    elif iteration == 1:
        # second time, we already have the colorized images, we
        # have to merge them together.
        self.add(MergeImagesApplication(
            os.path.join(self.output_dir, self.grayscaled_image),
            last.warhol_dir,
            self.output_image))
        return Run.State.RUNNING
    else:
        self.execution.returncode = last.execution.returncode
        return Run.State.TERMINATED

At each iteration, we call self.add() to add an instance of a task-like class (gc3libs.Application, gc3libs.workflow.ParallelTaskCollection or gc3libs.workflow.SequentialTaskCollection, in our case) to complete the next step, and we return the current state, which will be gc3libs.Run.State.RUNNING unless we have finished the computation.

Step one: convert to grayscale

GrayScaleConvertApplication is the application responsible to convert to grayscale the input image. The command we want to execute is:

$    convert -colorspace gray <input_image> grayscaled_<input_image>

To create a generic application we create a class which inherit from gc3libs.Application and we usually only need to customize the __init__ method:

# An useful function to copy files
from gc3libs.utils import copyfile

class GrayScaleConvertApplication(gc3libs.Application):
    def __init__(self, input_image, grayscaled_image, output_dir, warhol_dir):
        self.warhol_dir = warhol_dir
        self.grayscaled_image = grayscaled_image

        arguments = [
            'convert',
            os.path.basename(input_image),
            '-colorspace',
            'gray',
            ]

        gc3libs.log.info(
            "Craeting  GrayScale convert application from file %s"
            "to file %s" % (input_image, grayscaled_image))

        gc3libs.Application.__init__(
            self,
            arguments = arguments + [grayscaled_image],
            inputs = [input_image],
            outputs = [grayscaled_image, 'stderr.txt', 'stdout.txt'],
            output_dir = output_dir,
            stdout = 'stdout.txt',
            stderr = 'stderr.txt',
            )

Creating a gc3libs.Application is straigthforward: you just call the constructor with the executable, the arguments, and the input/output files you will need.

If you don’t specify the output_dir directory, gc3pie libraries will create one starting from the class name. If the output directory exists already, the old one will be renamed.

To do any kind of post processing you can define a terminate method for your application. It will be called after your application will terminate. In our case we want to copy the gray scale version of the image to the warhol_dir, so that it will be easily reachable by all other applications:

def terminated(self):
    """Move grayscale image to the main output dir"""
    copyfile(
        os.path.join(self.output_dir, self.grayscaled_image),
        self.warhol_dir)

Step two: parallel workflow to create colorized images

The TricolorizeMultipleImages is responsible to create multiple versions of the grayscale image with different coloration chosen randomly from a list of available colors. It does it by running multiple instance of TricolorizeImage with different arguments. Since we want to run the various colorization in parallel, it inherits from gc3libs.workflow.ParallelTaskCollection class. Like we did for GrayScaleConvertApplication, we only need to customize the constructor __init__, creating the various subtasks we want to run:

import itertools
import random

class TricolorizeMultipleImages(ParallelTaskCollection):
    colors = ['yellow', 'blue', 'red', 'pink', 'orchid',
              'indigo', 'navy', 'turquoise1', 'SeaGreen', 'gold',
              'orange', 'magenta']

    def __init__(self, grayscaled_image, copies, output_dir):
        gc3libs.log.info(
            "TricolorizeMultipleImages for %d copies run" % copies)
        self.jobname = "Warholizer_Parallel"
        ncolors = 3
        ### XXX Why I have to use basename???
        self.output_dir = os.path.join(
            os.path.basename(output_dir), 'tricolorize')
        self.warhol_dir = output_dir

        # Compute a unique sequence of random combination of
        # colors. Please note that we can have a maximum of N!/3! if N
        # is len(colors)
        assert copies <= math.factorial(len(self.colors)) / math.factorial(ncolors)

        combinations = [i for i in itertools.combinations(self.colors, ncolors)]
        combinations = random.sample(combinations, copies)

        # Create all the single tasks
        self.tasks = []
        for i, colors in enumerate(combinations):
            self.tasks.append(TricolorizeImage(
                os.path.relpath(grayscaled_image),
                "%s.%d" % (self.output_dir, i),
                "%s.%d" % (grayscaled_image, i),
                colors,
                self.warhol_dir))

        ParallelTaskCollection.__init__(self, self.tasks)

The main loop will fill the self.tasks list with various TricolorizedImage tasks, each one with an unique combination of three colors to use to generate the colorized image. The GC3Pie framework will then run these tasks in parallel, on any available resource.

The TricolorizedImage class is indeed a SequentialTaskCollection, since it has to generate the LUT first, and then apply it to the grayscale image. We already saw how to create a SequentialTaskCollection: we modify the constructor in order to add the first job (CreateLutApplication), and the next method will take care of running the ApplyLutApplication application on the output of the first job:

class TricolorizeImage(SequentialTaskCollection):
    """
    Sequential workflow to produce a `tricolorized` version of a
    grayscale image
    """
    def __init__(self, grayscaled_image, output_dir, output_file,
                 colors, warhol_dir):
        self.grayscaled_image = grayscaled_image
        self.output_dir = output_dir
        self.warhol_dir = warhol_dir
        self.jobname = 'TricolorizeImage'
        self.output_file = output_file

        if not os.path.isdir(output_dir):
            os.mkdir(output_dir)

        gc3libs.log.info(
            "Tricolorize image %s to %s" % (
                self.grayscaled_image, self.output_file))

        self.tasks = [
            CreateLutApplication(
                self.grayscaled_image,
                "%s.miff" % self.grayscaled_image,
                self.output_dir,
                colors, self.warhol_dir),
            ]

        SequentialTaskCollection.__init__(self, self.tasks)

    def next(self, iteration):
        last = self.tasks[-1]
        if iteration == 0:
            # First time we got called. The LUT has been created, we
            # have to apply it to the grayscale image
            self.add(ApplyLutApplication(
                self.grayscaled_image,
                os.path.join(last.output_dir, last.lutfile),
                os.path.basename(self.output_file),
                self.output_dir, self.warhol_dir))
            return Run.State.RUNNING
        else:
            self.execution.returncode = last.execution.returncode
            return Run.State.TERMINATED

The CreateLutApplication is again an application which inherits from gc3libs.Application. The command we want to execute is something like:

$    convert -size 1x1 xc:<color1> xc:<color2> xc:<color3> +append -resize 256x1! <output_file.miff>

This will basically create an image 256x1 pixels big, made of a gradient using all the listed colors. The code will look like:

class CreateLutApplication(gc3libs.Application):
    """Create the LUT for the image using 3 colors picked randomly
    from CreateLutApplication.colors"""

    def __init__(self, input_image, output_file, output_dir, colors, working_dir):
        self.lutfile = os.path.basename(output_file)
        self.working_dir = working_dir
        gc3libs.log.info("Creating lut file %s from %s using "
                         "colors: %s" % (
            self.lutfile, input_image, str.join(", ", colors)))
        gc3libs.Application.__init__(
            self,
            arguments = [
                'convert',
                '-size',
                '1x1'] + [
                "xc:%s" % color for color in colors] + [
                '+append',
                '-resize',
                '256x1!',
                self.lutfile,
                ],
            inputs = [input_image],
            outputs = [self.lutfile, 'stdout.txt', 'stderr.txt'],
            output_dir = output_dir + '.createlut',
            stdout = 'stdout.txt',
            stderr = 'stderr.txt',
            )

Similarly, the ApplyLutApplication application will run the following command:

$    convert grayscaled_<input_image> <lutfile.N.miff> -clut grayscaled_<input_image>.<N>

This command will apply the LUT to the grayscaled image: it will modify the grayscaled image by coloring a generic pixel with a luminosity value of n (which will be an integer value from 0 to 255, of course) with the color at position n in the LUT image (actually, n+1). Each ApplyLutApplication will save the resulting image to a file named as grayscaled_<input_image>.<N>.

The class will look like:

class ApplyLutApplication(gc3libs.Application):
    """Apply the LUT computed by `CreateLutApplication` to
    `image_file`"""

    def __init__(self, input_image, lutfile, output_file, output_dir, working_dir):

        gc3libs.log.info("Applying lut file %s to %s" % (lutfile, input_image))
        self.working_dir = working_dir
        self.output_file = output_file

        gc3libs.Application.__init__(
            self,
            arguments = [
                'convert',
                os.path.basename(input_image),
                os.path.basename(lutfile),
                '-clut',
                output_file,
                ],
            inputs = [input_image, lutfile],
            outputs = [output_file, 'stdout.txt', 'stderr.txt'],
            output_dir = output_dir + '.applylut',
            stdout = 'stdout.txt',
            stderr = 'stderr.txt',
            )

The terminated method:

def terminated(self):
    """Copy colorized image to the output dir"""
    copyfile(
        os.path.join(self.output_dir, self.output_file),
        self.working_dir)

will copy the colorized image file in the top level directory, so that it will be easier for the last application to find all the needed files.

Step three: merge all them together

At this point we will have in the main output directory a bunch of files named after grayscaled_<input_image>.N with N a sequential integer and <input_image> the name of the original image. The last application, MergeImagesApplication, will produce a warhol_<input_image> image by merging all of them using the command:

$    montage grayscaled_<input_image>.* -tile 3x3 -geometry +5+5 -background white warholized_<input_image>

Now it should be easy to write such application:

import re

class MergeImagesApplication(gc3libs.Application):
    def __init__(self, grayscaled_image, input_dir, output_file):
        ifile_regexp = re.compile(
            "%s.[0-9]+" % os.path.basename(grayscaled_image))
        input_files = [
            os.path.join(input_dir, fname) for fname in os.listdir(input_dir)
            if ifile_regexp.match(fname)]
        input_filenames = [os.path.basename(i) for i in input_files]
        gc3libs.log.info("MergeImages initialized")
        self.input_dir = input_dir
        self.output_file = output_file

        tile = math.sqrt(len(input_files))
        if tile != int(tile):
            gc3libs.log.error(
                "We would expect to have a perfect square"
                "of images to merge, but we have %d instead" % len(input_files))
            raise gc3libs.exceptions.InvalidArgument(
                "We would expect to have a perfect square of images to merge, but we have %d instead" % len(input_files))

        gc3libs.Application.__init__(
            self,
            arguments = ['montage'] + input_filenames + [
                '-tile',
                '%dx%d' % (tile, tile),
                '-geometry',
                '+5+5',
                '-background',
                'white',
                output_file,
                ],
            inputs = input_files,
            outputs = [output_file, 'stderr.txt', 'stdout.txt'],
            output_dir = os.path.join(input_dir, 'output'),
            stdout = 'stdout.txt',
            stderr = 'stderr.txt',
            )

Making the script executable

Finally, in order to make the script executable, we add the following lines to the end of the file. The WarholizeScritp().run() call will be executed only when the file is run as a script, and will do all the magic related to argument parsing, creating the session etc...:

if __name__ == '__main__':
    import warholize
    warholize.WarholizeScript().run()

Please note that the import warholize statement is important to address issue 95 and make the gc3pie scripts work with your current session (gstat, ginfo...)

Testing

To test this script I would suggest to use the famous Lena picture, which can be found in the miscelaneous section of the Signal and Image Processing Institute page. Download the image, rename it as lena.tiff and run the following command:

$    ./warholize.py -C 1 lena.tiff --copies 9

(add -r localhost if your gc3pie.conf script support it and you want to test it locally).

After completion a file Warholized.lena.tiff/output/warhol_lena.tiff will be created.

warholized version of Lena

Warholized version of Lena