Running software using Docker¶

We recommend users to install Docker software in their system to use the docker images.

In order to work with the Docker image for APRICOT, please follow these directions:

1. Get Docker image

The image can be acquired by simply using this command:

$ docker pull malvikasharan/apricot

2. Create the Docker container for testing the software

$ docker run -it malvikasharan/apricot bash

Here is a quick way to test if different modules work in your system (without really installing the complete filesystem).

Run the analysis in the ``home`` folder

$ cd home
$ apricot -h

Run test/example analysis

The git repository contains a shell script run_example.sh with shell commands that can be used for the demonstration of APRICOT installation including analysis with an example.

Copy the script from the existing repository in the home folder.

::: $ cp APRICOT/shell_scripts/run_example.sh .

Or use wget to get the most updated version from the repository. $ wget https://raw.githubusercontent.com/malvikasharan/APRICOT/master/shell_scripts/run_example.sh

...and run it.

$ sh run_example.sh

By default, this script generates a main analysis folder APRICOT_analysis. To understand the file structure, please see below (point-5). We recomend you to check out the tutorial to understand each components of the software and the result generated by their analysis.

3. Get the supporting data required for running your queries

Users are required to set a directory source_files containing all the required supporting data, which can be setup in the local filesystem (recommended) or inside the docker container (in the home folder). See below for the details.

Be aware that the supporting data is a collection of large datasets of size: ~15 G compressed, and ~50 G uncompressed.

Options for installation

1. In the local filesystem - RECOMMENDED

This should be setup once (please exit the container using the command exit if already running it) and can be reused in different containers (shown in the point 4).

This will ensure that users would not have to get these files every time a new Docker container for APRICOT is created. Moreover, this will keep the size of the container small by not having to setup the large databases inside the container.

2. Inside a new Docker conatiner

The supporting data can be used only inside the Docker container (every Docker container will need such setup individually).

Commands to acquire the supporting data

$ wget http://data.imib-zinf.net/APRICOT-supporting_dataset.zip
$ unzip APRICOT-supporting_dataset.zip

Alternatively, these files can be acquired using the script docker_support.sh provided in the git repository of APRICOT.

$ cp APRICOT/shell_scripts/docker_support.sh .
$ sh docker_support.sh

4. Using the supporting data

When the directory source_files is located in the local system (Recommended), use the following command to mount this directory into the Docker container (provide full path for $FULL_PATH_SOURCE_FILES):

## if the docker is started in the same path where the supporting data exists set the following path
#$ FULL_PATH_SOURCE_FILES=`pwd`

$ docker run -it -v /$FULL_PATH_SOURCE_FILES/source_files/:/home/source_files malvikasharan/apricot bash
$ cd home

Skip this step when working in the Docker container already.

5. Carry out analysis by APRICOT

$ cp APRICOT/shell_scripts/run_example.sh .
$ sh run_example.sh

If the the analysis was successful, a directory APRICOT_analysis will be created, which contains following files with the outputs generated by different modules of the software.

APRICOT_analysis
└───├input                                  # Location used by subcommand 'query' to store all the related files
└───├output
        └───├0_predicted_domains            # Location for the output data obtained from the subcommand 'predict'
        └───├1_compiled_domain_information  # Location for the output data obtained from the subcommand 'filter'
        └───├2_selected_domain_information
        └───├3_annotation_scoring           # Location for the output data obtained from the subcommand 'annoscore'
        └───├4_additional_annotations       # Location for additional annotations for the selected
                |                                   # queries using subcommand 'addanno'
        └───├5_analysis_summary             # Location for the output data obtained from the subcommand 'summary'
        └───├format_output_data             # Location for the output data obtained from the subcommand 'format'
        └───├visualization_files            # Location for the output data obtained from the subcommand 'vis'``

You can check APRICOT_analysis_summary.csv in the path APRICOT_analysis/output/5_analysis_summary file for the quick overview of the analysis.

To run analysis on new query proteins, please edit the “Input-1” part of the run_example.sh script, for example, provide Uniprot ids of your query proteins (DOMAIN_KEYWORDS, line number 78).

For further details, please check the tutorial and software requirements

We recommend users to use APRICOT Docker image which comprises of all the tool dependencies and allows a frictionfree functionalities of the software.

Use the follwing command to pull the image to your local system (the Docker must be installed):

$ docker pull malvikasharan/apricot

Run the container:¶

$ docker run -it malvikasharan/apricot bash

APRICOT is installed and can be called using command apricot and the libraries will be saved at /usr/local/lib/python3.5/site-packages/apricotlib/

Go to the `home` folder to test the software:¶

$ cd home
$ apricot

Try a test run:

$ cp APRICOT/shell_scripts/run_example.sh .
$ sh run_example.sh

Database requirements for the software¶

An additional step for fetching the databases is required to carry out analysis by the software.

The shell script: docker_support.sh, can be called inside a new Docker container or can be installed locally that could be used inside (multiple) Docker containers.

$ wget https://raw.githubusercontent.com/malvikasharan/APRICOT/master/shell_scripts/docker_support.sh
$ sh docker_support.sh

This script will create a directory source_files with all the required datasets.

When the script is used for fetching the datasets inside the Docker container (in the home folder), APRICOT can be simply run to carry out analysis.

When the script is used to create a local dataset, use th following command to mount the directory source_file into the Docker container (set or replace $FULL_PWD by the path on the hosting system):

$ docker run -it -v /$FULL_PWD/source_files/:/home/source_files malvikasharan/apricot bash
$ cd home
$ cp APRICOT/shell_scripts/run_example.sh .
$ sh run_example.sh

This will ensure that users would not have to get the dataset every time a new Docker container for APRICOT is created. Moreover, this will keep the size of the container small by not having to install the large databases inside the container.¶

Docker image with all dependencies¶

To avoid the extra step for the installation of the databases locally (or inside the Docker container), an optional Docker image containing all dependencies can be used.

$ docker pull malvikasharan/apricot_with_dependencies
$ docker run -it malvikasharan/apricot_with_dependencies bash
$ cd home
$ cp APRICOT/shell_scripts/run_example.sh .
$ sh run_example.sh