Array Express (arrayexpress
)¶
Access the ArrayExpress web services and database.
ArrayExpress is a database of gene expression experiments that you can query and download.
Retrieve the object representing experiment with accession E-TABM-25
>>> from orangecontrib.bio import arrayexpress
>>> experiment = ArrayExpressExperiment("E-TABM-25")
>>> print experiment.accession
E-TABM-25
>>> print experiment.name
Transcription profiling of aging in the primate brain
>>> print experiment.species
['Pan troglodytes']
>>> print experiment.files
[{'kind': ...
>>> # Retrieve the data matrix for experiment 'E-MEXP-2917'
>>> experiment = ArrayExpressExperiment("E-MEXP-2917")
>>> table = experiment.fgem_to_table()
Low level Array Express query using REST services:
>>> from orangecontrib.bio import arrayexpress
>>> arrayexpress.query_experiments(accession='E-MEXP-31')
{u'experiments': ...
>>> arrayexpress.query_experiments(keywords='gliobastoma')
{u'experiments': ...
>>> arrayexpress.query_files(accession='E-MEXP-32', format="xml")
<xml.etree.ElementTree.ElementTree object ...
Note
Currently querying ArrayExpress files only works with the xml format.
Interface¶
-
class
orangecontrib.bio.arrayexpress.
ArrayExpressConnection
(address=None, timeout=30, cache=None, username=None, password=None)¶ Constructs and runs REST query on ArrayExpress.
Parameters: - address – Address of the ArrayExpress API.
- timeout – Timeout for the connection.
-
format_query
(**kwargs)¶ Format the query arguments in kwargs.
>>> conn.format_query(gxa=True, efcount=(1, 5)) 'efcount=[1 TO 5]&gxa=true'
-
open_file
(accession, kind='raw', ext=None)¶ Return a file handle to experiment data.
Parameters: - accession (str) –
- kind (str) – Experiment data type.
- Possible values for the parameter kind:
- raw: return the raw data if available
- processed: return the processed data if available
- biosamples: a png or svg design image
- idf: investigation description
- adf: array design description
- mageml: MAGE-ML file
Example:
>>> raw_file = conn.open_file("E-TABM-1087", kind="raw") >>> processed_file = conn.open_file("E-TABM-1087", kind="processed")
-
query_experiment
(**kwargs)¶ Return an open stream to the experiments query results. Takes the same arguments as the
query_experiments
function.
-
query_files
(**kwargs)¶ Return an open stream to the files query results. Takes the same arguments as the
query_files
function.
-
query_url
(what='experiments', **kwargs)¶ Return a formatted query URL for the query arguments.
>>> conn.query_url(accession="E-MEXP-31") 'http://www.ebi.ac.uk/arrayexpress/json/v2/experiments?accession=E-MEXP-31'
-
query_url_experiments
(**kwargs)¶ Return query URL of formatted experiments for the query arguments.
-
query_url_files
(**kwargs)¶ Return query URL of formatted experiments for the query arguments.
-
class
orangecontrib.bio.arrayexpress.
ArrayDesign
(adf_file=None)¶ Array design (contains the contents of the .adf file).
-
class
orangecontrib.bio.arrayexpress.
SampleDataRelationship
(sdrf_file=None)¶ Sample-Data Relationship (contains the contents of the .sdrf file).
-
array_data
()¶ Return the Array Data subsection.
-
array_data_file
()¶ Return the Array Data File subsection.
-
array_data_matrix
()¶ Return the Array Data Matrix subsection.
-
array_data_matrix_file
()¶ Return the Array Data Matrix File subsection.
-
assay
()¶ Return the Assay subsection.
-
assay_name
()¶ Return the Assay Name subsection.
-
derived_array_data
()¶ Return the Derived Array Data subsection.
-
derived_array_data_file
()¶ Return the Derived Array Data File subsection.
-
derived_array_data_matrix
()¶ Return the Derived Array Data Matrix subsection.
-
derived_array_data_matrix_file
()¶ Return the Derived Array Data Matrix File subsection.
-
extract
()¶ Return the Extract subsection.
-
extract_name
()¶ Return the Extract Name subsection.
-
hybridization
()¶ Return the Hybridization subsection.
-
hybridization_name
()¶ Return the Hybridization Name subsection.
-
image
()¶ Return the Image subsection
-
image_file
()¶ Return the Image File subsection.
-
labeled_extract
()¶ Return the Labeled Extract subsection.
-
labeled_extract_name
()¶ Return the Labeled Extract Name subsection.
-
normalization
()¶ Return the Normalization subsection.
-
normalization_name
()¶ Return the Normalization Name subsection.
-
sample
()¶ Return the Sample subsection.
-
sample_name
()¶ Return the Sample Name subsection
-
scan
()¶ Return the Scan subsection.
-
scan_name
()¶ Return the Scan name subsection.
-
source
()¶ Return the Source subsection.
-
source_name
()¶ Return the Source Name subsection.
-
transform_tag
(tag)¶ Transform the tag into a proper Python attribute name by replacing all spaces and special characters (e.g ‘[‘, ‘]’ into underscores).
-
-
class
orangecontrib.bio.arrayexpress.
InvestigationDesign
(idf_file=None)¶ Investigation design (contains the contents of the .idf).
>>> idf_file = six.StringIO( ... 'Investigation Title\tfoo investigation\n' + ... 'Experimental Design\tfubar\tsnafu\n' + ... 'SDRF File\tfoobar.sdrf\n' ... ) >>> idf = InvestigationDesign(idf_file) >>> print(idf.investigation_title) foo investigation >>> print(idf.experimental_design) ['fubar', 'snafu'] >>> print(idf.sdrf_file) ['foobar.sdrf']
-
transform_tag
(tag)¶ Transform the tag into a proper python attribute name by replacing all spaces and special characters (e.g ‘[‘, ‘]’ into underscores).
-
Low-level querying with REST¶
-
orangecontrib.bio.arrayexpress.
query_experiments
(keywords=None, accession=None, array=None, ef=None, efv=None, expdesign=None, exptype=None, gxa=None, pmid=None, sa=None, species=None, expandefo=None, directsub=None, assaycount=None, efcount=None, samplecount=None, rawcount=None, fgemcount=None, miamescore=None, date=None, format='json', wholewords=None, connection=None)¶ Query Array Express experiments.
Parameters: - keywords – A list of keywords to search (e.g.
['gliobastoma']
). - accession – Search by experiment accession (e.g.
'E-MEXP-31'
). - array – Search by array design name or accession (e.g.
'A-AFFY-33'
). - ef – Experimental factor (names of main variables of experiments).
- efv – Experimental factor value (Has EFO expansion).
- expdesign – Experiment design type (e.g.
["dose", "response"]
). - exptype – Experiment type (e.g. ‘RNA-Seq’, has EFO expansion).
- gxa – If True limit the results to the Gene Expression Atlas only.
- pmid – Search by PubMed identifier.
- sa – Sample attribute values (e.g.
'fibroblast'
, has EFO expansion). - species – Search by species (e.g.
'Homo sapiens'
, has EFO expansion) - expandefo – If True expand the search terms with all its child terms
in the Experimental Factor Ontology (EFO) (e.g.
keywords="cancer"
will be expanded to include for synonyms and sub types of cancer). - directsub – If True return only experiments submitted directly to Array Express; if False return only experiments imported from GEO database; if None (default) return both.
- assaycount – A two tuple (min, max) for filter on the number of assays (e.g. (1, 5) will return only experiments with at least 1 and no more then 5 assays).
- efcount – Filter on the number of experimental factors (e.g. (1, 5)).
- sacount – Filter on the number of sample attribute categories.
- rawcount – Filter on the number or raw files.
- fgemcount – Filter on the number of final gene expression matrix (processed data) files.
- miamescore – Filter on the MIAME complience score (max 5).
- date – Filter by release date.
>>> query_experiments(species="Homo sapiens", ef="organism_part", efv="liver") {...
- keywords – A list of keywords to search (e.g.
-
orangecontrib.bio.arrayexpress.
query_files
(keywords=None, accession=None, array=None, ef=None, efv=None, expdesign=None, exptype=None, gxa=None, pmid=None, sa=None, species=None, expandefo=None, directsub=None, assaycount=None, efcount=None, samplecount=None, rawcount=None, fgemcount=None, miamescore=None, date=None, format='json', wholewords=None, connection=None)¶ Query Array Express files. See
query_experiments
for the arguments.>>> query_files(species="Mus musculus", ef="developmental_stage", ... efv="embryo", format="xml") <xml.etree.ElementTree.ElementTree object ...