caspo combines PyASP and CellNOpt (through cellnopt.wrapper) to provide an easy to use software for learning Boolean logic models describing the immediate-early response of protein signaling networks.
Given a Prior Knowledge Network (PKN) describing causal interactions (SIF), and a phospho-proteomics dataset (MIDAS), caspo searches for optimal Boolean logic models derived from the PKN, such that the fitness between model predictions and experimental observations is maximized. For more information please visit caspo’s website
Typical usage of caspo is running the caspo.py script that you will find in your PATH after installation:
$ caspo.py pkn.sif midas.csv
For more options you can ask for help as follows:
$ caspo.py --help
usage: caspo.py [-h] [--version] [--fit F] [--size S] [--discrete D] [--gtts]
[--cross N K] [--out O]
pkn midas
positional arguments:
pkn Prior knowledge network in SIF format
midas Experimental dataset in MIDAS file
optional arguments:
-h, --help show this help message and exit
--version show program's version number and exit
--fit F suboptimal enumeration tolerance (Default to 0)
--size S suboptimal size enumeration tolerance (Default to 0). Combined
with --fit could lead to a huge number of models
--discrete D discretization over [0,D] (Default to 100)
--gtts compute Global Truth Tables (Default to False). This could
take some time for many models.
--cross N K compute N random K-fold cross validation
--out O output directory path (Default to current directory)
For example, for computing all logic models and their Global Truth Tables (input-output behaviors) within a 2% tolerance over the best fit, you have to run:
$ caspo.py pkn.sif midas.csv --fit 0.02 --gtts
Reading input files... done.
Learning Boolean logic models and their Global Truth Tables with ASP... done in 6.63 sec.
192 Boolean logic models and 4 Global Truth Tables have been learned.
Wrote ./models.csv
Wrote ./frequencies.csv
Wrote ./exclusive.csv
Wrote ./inclusive.csv
Wrote ./gtt-[0, 1, 2, 3].csv
Wrote ./gtts_stats.csv
Output files are: - models.csv: Matrix representation of logic models. - frequencies.csv: Logic conjunctions frequencies among the family of models - exclusive.csv: Mutual exclusive pairs of conjunctions - inclusive.csv: Mutual inclusive pairs of conjunctions - gtt-i.csv: Matrix representation for the complete input-output behaviors - gtts_stats.csv: Basic GTTs statistics
You can validate the learning process using N random K-fold cross-validation simply running:
$ caspo.py pkn.sif midas.csv --cross 1 10
Reading input files... done.
Learning Boolean logic models with ASP... done in 0.41 sec.
16 Boolean logic models have been learned.
Wrote ./models.csv
Wrote ./frequencies.csv
Wrote ./exclusive.csv
Wrote ./inclusive.csv
Running 1 random 10-fold cross validation...
Wrote ./cross_validation_1.csv
done.
For each cross-validation round i, you will get: - cross_validation_i.csv: GTTs, MSE and number models for each fold in the cross-validation
In order to facilitate the integration of caspo with other software tools, you can access to all its functionalities through a comprehensive API. Here we show a very simple example and we refer to __caspo__ Package for the full documentation:
#some imports
from __caspo__ import Network, Dataset, Learner, cno
midas_file = 'path-to-your-midas.csv'
sif_file = 'path-to-your.sif.sif'
#compress PKN using CellNOpt
compressed_sif = cno.compress(sif_file, midas_file)
#load dataset and compressed network
dataset = Dataset.from_midas(midas_file)
network = Network(compressed_sif)
#create a Learner object for our network and dataset
learner = Learner(network, dataset)
#learn logic models and their GTTs satisfying:
# - 2% fitness tolerance
# - 0 size tolerance
# - 100-valued discretization
family = learner.learn(0.02, 0, 100, True)
#print the number of models and the family's weighted MSE
print len(family)
print family.weighted_mse(dataset)
#print MSE and models gathered for each GTT in the family
for gtt in familiy.gtts:
print gtt.mse(dataset), len(gtt)