Dictyostelium discoideum databases (dicty)

The following example downloads experiments from the PIPA database, specifically “RPKM + mapability expression (polyA) - R6” results for all public experiments on Dictyostelium discoideum (dd) at time point 16.

import orangecontrib.bio.dicty

pipa = orangecontrib.bio.dicty.PIPAx()

results = pipa.results_list("R6")
dd16 = [ (i,d) for i,d in results.items() if \
        d["tp"] == '16' and d["species_id"] == "dd" ]

#group similar experiments with sorting
dd16 = sorted(dd16, key=lambda x: (x[1]["treatment"], x[1]["replicate"]))

data = pipa.get_data([i for i,d in dd16], exclude_constant_labels=True, \
    allowed_labels=["id", "treatment", "replicate"])

def print_data(data):
    for at in data.domain.attributes:
        print("%s treatment: %s replicate: %s" % \
            (at.name, at.attributes["treatment"], at.attributes["replicate"]))
    print("")
    for a in data[:10]:
        print(a)

print_data(data)

print("")
datar = orangecontrib.bio.dicty.join_replicates(data)

print_data(datar)

PIPAx database

class orangecontrib.bio.dicty.PIPAx(address='https://pipa.biolab.si/pipax/api.py', cache=None, username=None, password=None)

An interface to PIPAx API.

__init__(address='https://pipa.biolab.si/pipax/api.py', cache=None, username=None, password=None)
Parameters:
  • address (str) – The address of the API.
  • username (str) –
  • password (str) – Login info; None for public access.
  • cache (CacheSQLite) – A cache that stores results locally (an CacheSQLite).
genomes(reload=False, bufver='0')

Return a list of available genomes as a list of (genome_id, genome_name) tuples.

get_data(ids=None, result_type=None, exclude_constant_labels=False, average=<function median>, callback=None, bufver='0', transform=None, allowed_labels=None, reload=False)

Return data in a Orange.data.Table. Each feature represents a sample and each row is a gene. The feature’s .attributes contain annotations.

Parameters:
  • ids (list) – List of ids as returned by results_list if result_type is None; list of ids as returned by mappings if result_type is set.
  • result_type (str) – Result template type id as returned by result_types.
  • exclude_constant_labels (bool) – If a label has the same value in whole example table, remove it.
  • average (function) – Function that combines multiple reading of the same gene on a chip. If None, no averaging is done. Function should take a list of floats and return an “averaged” float (the default functions returns the median).
  • transform (function) – A function that transforms individual values. It should take and return a float. Example use: logarithmic transformation. Default: None.
mappings(reload=False, bufver='0')

Return available mappings as dictionary of { mapping_id: dictionary_of_annotations } where the keys for dictionary_of_annotations are “id”, data_id”, “data_name”, “genomes_id”.

result_types(reload=False, bufver='0')

Return a list of available result types.

results_list(rtype, reload=False, bufver='0')

Return a list of available gene expressions for a specific result type. Returns a dictionary, where the keys are ID and values are dictionaries of sample annotations.

Parameters:rtype (str) – Result type to use (see result_types).

DictyExpress database

class orangecontrib.bio.dicty.DictyExpress(address='http://bcm.fri.uni-lj.si/microarray/api/index.php?', cache=None)

Access the DictyExpress data API.

__init__(address='http://bcm.fri.uni-lj.si/microarray/api/index.php?', cache=None)
Parameters:
  • address (str) – The address of the API.
  • cache – A cache that stores results locally (an instance of CacheSQLite).
annotationOptions(ao=None, onlyDiff=False, **kwargs)

Return annotation options for given query. Return all possible annotations if the query is omitted.

If ao is chosen, only return options for that object id.

annotationTypes()

Returns all annotation types.

annotations(type, ids=None, all=False)

Return annotations for specified type and ids.

Parameters:
  • type – Object type (see objects).
  • ids – If set, only annotations corresponding to the given ids are returned. Annotations are in the same order as input ids.
  • all – If False (default), only annotations for “meaningful” annotation types are returned. If True, return annotations for all annotation types.
get_data(type='norms', exclude_constant_labels=False, average=<function median>, ids=None, callback=None, format='short', transform=None, allowed_labels=None, **kwargs)

Return data in a Orange.data.Table. Each feature is a sample and each row(Orange.data.Instance is a gene. The feature’s .attributes contain annotations.

Parameters:
  • ids (list) – A list of chip ids. If absent, make a search. In this case any additional keyword arguments are threated as in search.
  • exclude_constant_labels – Remove labels if they have the same value for the whole table.
  • format (str) – If “short”, use short format for downloads.
  • average (function) – Function that combines multiple reading of the same gene on a chip. If None, no averaging is done. Function should take a list of floats and return an “averaged” float (the default functions returns the median).
  • transform (function) – A function that transforms individual values. It should take and return a float. Example use: logarithmic transformation. Default: None.
Returns:

Chips with given ids in a single data table.

Return type:

Orange.data.Table

objects()

Return all objects types.

search(type, **kwargs)

Search the database. Search is case insensitive.

Parameters:
  • type – Annotation type (list them with DictyExpress().saoids.keys()).
  • kwargs – In the form annotation=values. Values can are either strings or a list of strings (interpreted as an OR operator between list elements).

The following example lists ids of normalized entries where platform is minichip and sample is abcC3-:

search("norms", platform='minichip', sample='abcC3-')

The following example lists ids of normalized entries where platform is minichip and sample is abcC3- or abcG15-:

search("norms", platform='minichip', sample=[ 'abcC3-', 'abcG15-'])

Auxillary functionality

class orangecontrib.bio.dicty.CacheSQLite(filename, compress=True)

An SQLite-based cache.

__init__(filename, compress=True)

Opens an existing cache or creates a new one if it does not exist.

Parameters:
  • filename (str) – The filename.
  • compress (bool) – Whether to use on-the-fly compression.
add(addr, con, version='0', autocommit=True)

Inserts an element into the cache.

Parameters:
  • addr – Element address.
  • con – Contents.
  • version – Version.
clear()

Remove all entries.

commit()

Commit the changes. Run only if previous add was called without autocommit.

contains(addr)

Return the element’s version or False, if the element does not exists.

Parameters:addr – Element address.
get(addr)

Loads an element from the cache.

Parameters:addr – Element address.
list()

List all element addresses in the cache.