Running software using Docker¶
We recommend users to install Docker software in their system to use the docker images.
In order to work with the Docker image for APRICOT, please follow these directions:
1. Get Docker image
The image can be acquired by simply using this command:
$ docker pull malvikasharan/apricot
2. Create the Docker container for testing the software
$ docker run -it malvikasharan/apricot bash
Here is a quick way to test if different modules work in your system (without really installing the complete filesystem).
Run the analysis in the ``home`` folder
$ cd home
$ apricot -h
Run test/example analysis
The git repository contains a shell script run_example.sh with shell commands that can be used for the demonstration of APRICOT installation including analysis with an example.
Copy the script from the existing repository in the home
folder.
- ::
- $ cp APRICOT/shell_scripts/run_example.sh .
Or use wget
to get the most updated version from the repository.
$ wget https://raw.githubusercontent.com/malvikasharan/APRICOT/master/shell_scripts/run_example.sh
...and run it.
$ sh run_example.sh
By default, this script generates a main analysis folder
APRICOT_analysis
. To understand the file structure, please see below
(point-5). We recomend you to check out the
tutorial
to understand each components of the software and the result generated
by their analysis.
3. Get the supporting data required for running your queries
Users are required to set a directory source_files
containing all
the required supporting
data,
which can be setup in the local filesystem (recommended) or inside the
docker container (in the home folder). See below for the details.
Be aware that the supporting data is a collection of large datasets of size: ~15 G compressed, and ~50 G uncompressed.
Options for installation
1. In the local filesystem - RECOMMENDED
This should be setup once (please exit the container using the command
exit
if already running it) and can be reused in different
containers (shown in the point 4).
This will ensure that users would not have to get these files every time a new Docker container for APRICOT is created. Moreover, this will keep the size of the container small by not having to setup the large databases inside the container.
2. Inside a new Docker conatiner
The supporting data can be used only inside the Docker container (every Docker container will need such setup individually).
Commands to acquire the supporting data
$ wget http://data.imib-zinf.net/APRICOT-supporting_dataset.zip
$ unzip APRICOT-supporting_dataset.zip
Alternatively, these files can be acquired using the script docker_support.sh provided in the git repository of APRICOT.
$ cp APRICOT/shell_scripts/docker_support.sh .
$ sh docker_support.sh
4. Using the supporting data
When the directory source_files
is located in the local system
(Recommended), use the following command to mount this directory
into the Docker container (provide full path for
$FULL_PATH_SOURCE_FILES):
## if the docker is started in the same path where the supporting data exists set the following path
#$ FULL_PATH_SOURCE_FILES=`pwd`
$ docker run -it -v /$FULL_PATH_SOURCE_FILES/source_files/:/home/source_files malvikasharan/apricot bash
$ cd home
Skip this step when working in the Docker container already.
5. Carry out analysis by APRICOT
$ cp APRICOT/shell_scripts/run_example.sh .
$ sh run_example.sh
If the the analysis was successful, a directory APRICOT_analysis
will be created, which contains following files with the outputs
generated by different modules of the software.
APRICOT_analysis
└───├input # Location used by subcommand 'query' to store all the related files
└───├output
└───├0_predicted_domains # Location for the output data obtained from the subcommand 'predict'
└───├1_compiled_domain_information # Location for the output data obtained from the subcommand 'filter'
└───├2_selected_domain_information
└───├3_annotation_scoring # Location for the output data obtained from the subcommand 'annoscore'
└───├4_additional_annotations # Location for additional annotations for the selected
| # queries using subcommand 'addanno'
└───├5_analysis_summary # Location for the output data obtained from the subcommand 'summary'
└───├format_output_data # Location for the output data obtained from the subcommand 'format'
└───├visualization_files # Location for the output data obtained from the subcommand 'vis'``
You can check APRICOT_analysis_summary.csv
in the path
APRICOT_analysis/output/5_analysis_summary
file for the quick
overview of the analysis.
To run analysis on new query proteins, please edit the “Input-1” part of
the run_example.sh
script, for example, provide Uniprot ids of your
query proteins
(DOMAIN_KEYWORDS,
line number 78).
For further details, please check the tutorial and software requirements
We recommend users to use APRICOT Docker image which comprises of all the tool dependencies and allows a frictionfree functionalities of the software.
Use the follwing command to pull the image to your local system (the Docker must be installed):
$ docker pull malvikasharan/apricot
Run the container:¶
$ docker run -it malvikasharan/apricot bash
APRICOT is installed and can be called using command apricot
and the
libraries will be saved at
/usr/local/lib/python3.5/site-packages/apricotlib/
Go to the home
folder to test the software:¶
$ cd home
$ apricot
Try a test run:
$ cp APRICOT/shell_scripts/run_example.sh .
$ sh run_example.sh
Database requirements for the software¶
An additional step for fetching the databases is required to carry out analysis by the software.
The shell script: docker_support.sh, can be called inside a new Docker container or can be installed locally that could be used inside (multiple) Docker containers.
$ wget https://raw.githubusercontent.com/malvikasharan/APRICOT/master/shell_scripts/docker_support.sh
$ sh docker_support.sh
This script will create a directory source_files
with all the
required
datasets.
When the script is used for fetching the datasets inside the Docker container (in the home folder), APRICOT can be simply run to carry out analysis.
When the script is used to create a local dataset, use th following
command to mount the directory source_file
into the Docker container
(set or replace $FULL_PWD
by the path on the hosting system):
$ docker run -it -v /$FULL_PWD/source_files/:/home/source_files malvikasharan/apricot bash
$ cd home
$ cp APRICOT/shell_scripts/run_example.sh .
$ sh run_example.sh
This will ensure that users would not have to get the dataset every time a new Docker container for APRICOT is created. Moreover, this will keep the size of the container small by not having to install the large databases inside the container.¶
Docker image with all dependencies¶
To avoid the extra step for the installation of the databases locally (or inside the Docker container), an optional Docker image containing all dependencies can be used.
$ docker pull malvikasharan/apricot_with_dependencies
$ docker run -it malvikasharan/apricot_with_dependencies bash
$ cd home
$ cp APRICOT/shell_scripts/run_example.sh .
$ sh run_example.sh