Array Express (arrayexpress)

Access the ArrayExpress web services and database.

ArrayExpress is a database of gene expression experiments that you can query and download.

Retrieve the object representing experiment with accession E-TABM-25

>>> from orangecontrib.bio import arrayexpress
>>> experiment = ArrayExpressExperiment("E-TABM-25")
>>> print experiment.accession
E-TABM-25
>>> print experiment.name
Transcription profiling of aging in the primate brain
>>> print experiment.species
['Pan troglodytes']
>>> print experiment.files
[{'kind': ...
>>> # Retrieve the data matrix for experiment 'E-MEXP-2917'
>>> experiment = ArrayExpressExperiment("E-MEXP-2917")
>>> table = experiment.fgem_to_table()

Low level Array Express query using REST services:

>>> from orangecontrib.bio import arrayexpress
>>> arrayexpress.query_experiments(accession='E-MEXP-31')
{u'experiments': ...
>>> arrayexpress.query_experiments(keywords='gliobastoma')
{u'experiments': ...
>>> arrayexpress.query_files(accession='E-MEXP-32', format="xml")
<xml.etree.ElementTree.ElementTree object ...

Note

Currently querying ArrayExpress files only works with the xml format.

Interface

class orangecontrib.bio.arrayexpress.ArrayExpressConnection(address=None, timeout=30, cache=None, username=None, password=None)

Constructs and runs REST query on ArrayExpress.

Parameters:
  • address – Address of the ArrayExpress API.
  • timeout – Timeout for the connection.
format_query(**kwargs)

Format the query arguments in kwargs.

>>> conn.format_query(gxa=True, efcount=(1, 5))
'efcount=[1 TO 5]&gxa=true'
open_file(accession, kind='raw', ext=None)

Return a file handle to experiment data.

Parameters:
  • accession (str) –
  • kind (str) – Experiment data type.
Possible values for the parameter kind:
  • raw: return the raw data if available
  • processed: return the processed data if available
  • biosamples: a png or svg design image
  • idf: investigation description
  • adf: array design description
  • mageml: MAGE-ML file

Example:

>>> raw_file = conn.open_file("E-TABM-1087", kind="raw")
>>> processed_file = conn.open_file("E-TABM-1087", kind="processed")
query_experiment(**kwargs)

Return an open stream to the experiments query results. Takes the same arguments as the query_experiments function.

query_files(**kwargs)

Return an open stream to the files query results. Takes the same arguments as the query_files function.

query_url(what='experiments', **kwargs)

Return a formatted query URL for the query arguments.

>>> conn.query_url(accession="E-MEXP-31")
'http://www.ebi.ac.uk/arrayexpress/json/v2/experiments?accession=E-MEXP-31'
query_url_experiments(**kwargs)

Return query URL of formatted experiments for the query arguments.

query_url_files(**kwargs)

Return query URL of formatted experiments for the query arguments.

class orangecontrib.bio.arrayexpress.ArrayDesign(adf_file=None)

Array design (contains the contents of the .adf file).

class orangecontrib.bio.arrayexpress.SampleDataRelationship(sdrf_file=None)

Sample-Data Relationship (contains the contents of the .sdrf file).

array_data()

Return the Array Data subsection.

array_data_file()

Return the Array Data File subsection.

array_data_matrix()

Return the Array Data Matrix subsection.

array_data_matrix_file()

Return the Array Data Matrix File subsection.

assay()

Return the Assay subsection.

assay_name()

Return the Assay Name subsection.

derived_array_data()

Return the Derived Array Data subsection.

derived_array_data_file()

Return the Derived Array Data File subsection.

derived_array_data_matrix()

Return the Derived Array Data Matrix subsection.

derived_array_data_matrix_file()

Return the Derived Array Data Matrix File subsection.

extract()

Return the Extract subsection.

extract_name()

Return the Extract Name subsection.

hybridization()

Return the Hybridization subsection.

hybridization_name()

Return the Hybridization Name subsection.

image()

Return the Image subsection

image_file()

Return the Image File subsection.

labeled_extract()

Return the Labeled Extract subsection.

labeled_extract_name()

Return the Labeled Extract Name subsection.

normalization()

Return the Normalization subsection.

normalization_name()

Return the Normalization Name subsection.

sample()

Return the Sample subsection.

sample_name()

Return the Sample Name subsection

scan()

Return the Scan subsection.

scan_name()

Return the Scan name subsection.

source()

Return the Source subsection.

source_name()

Return the Source Name subsection.

transform_tag(tag)

Transform the tag into a proper Python attribute name by replacing all spaces and special characters (e.g ‘[‘, ‘]’ into underscores).

class orangecontrib.bio.arrayexpress.InvestigationDesign(idf_file=None)

Investigation design (contains the contents of the .idf).

>>> idf_file = six.StringIO(
...     'Investigation Title\tfoo investigation\n' +
...     'Experimental Design\tfubar\tsnafu\n' +
...     'SDRF File\tfoobar.sdrf\n'
... )
>>> idf = InvestigationDesign(idf_file)
>>> print(idf.investigation_title)
foo investigation
>>> print(idf.experimental_design)
['fubar', 'snafu']
>>> print(idf.sdrf_file)
['foobar.sdrf']
transform_tag(tag)

Transform the tag into a proper python attribute name by replacing all spaces and special characters (e.g ‘[‘, ‘]’ into underscores).

Low-level querying with REST

orangecontrib.bio.arrayexpress.query_experiments(keywords=None, accession=None, array=None, ef=None, efv=None, expdesign=None, exptype=None, gxa=None, pmid=None, sa=None, species=None, expandefo=None, directsub=None, assaycount=None, efcount=None, samplecount=None, rawcount=None, fgemcount=None, miamescore=None, date=None, format='json', wholewords=None, connection=None)

Query Array Express experiments.

Parameters:
  • keywords – A list of keywords to search (e.g. ['gliobastoma']).
  • accession – Search by experiment accession (e.g. 'E-MEXP-31').
  • array – Search by array design name or accession (e.g. 'A-AFFY-33').
  • ef – Experimental factor (names of main variables of experiments).
  • efv – Experimental factor value (Has EFO expansion).
  • expdesign – Experiment design type (e.g. ["dose", "response"]).
  • exptype – Experiment type (e.g. ‘RNA-Seq’, has EFO expansion).
  • gxa – If True limit the results to the Gene Expression Atlas only.
  • pmid – Search by PubMed identifier.
  • sa – Sample attribute values (e.g. 'fibroblast', has EFO expansion).
  • species – Search by species (e.g. 'Homo sapiens', has EFO expansion)
  • expandefo – If True expand the search terms with all its child terms in the Experimental Factor Ontology (EFO) (e.g. keywords="cancer" will be expanded to include for synonyms and sub types of cancer).
  • directsub – If True return only experiments submitted directly to Array Express; if False return only experiments imported from GEO database; if None (default) return both.
  • assaycount – A two tuple (min, max) for filter on the number of assays (e.g. (1, 5) will return only experiments with at least 1 and no more then 5 assays).
  • efcount – Filter on the number of experimental factors (e.g. (1, 5)).
  • sacount – Filter on the number of sample attribute categories.
  • rawcount – Filter on the number or raw files.
  • fgemcount – Filter on the number of final gene expression matrix (processed data) files.
  • miamescore – Filter on the MIAME complience score (max 5).
  • date – Filter by release date.
>>> query_experiments(species="Homo sapiens", ef="organism_part", efv="liver") 
{...
orangecontrib.bio.arrayexpress.query_files(keywords=None, accession=None, array=None, ef=None, efv=None, expdesign=None, exptype=None, gxa=None, pmid=None, sa=None, species=None, expandefo=None, directsub=None, assaycount=None, efcount=None, samplecount=None, rawcount=None, fgemcount=None, miamescore=None, date=None, format='json', wholewords=None, connection=None)

Query Array Express files. See query_experiments for the arguments.

>>> query_files(species="Mus musculus", ef="developmental_stage",
...             efv="embryo", format="xml")
<xml.etree.ElementTree.ElementTree object ...