============================================================ Gene Set Enrichment Analysis (GSEA, :mod:`gsea`) ============================================================ .. py:currentmodule:: orangecontrib.bio.gsea Gene Set Enrichment Analysis (GSEA) [GSEA]_ aims to identify enriched gene sets given gene expression data for multiple samples with their phenotypes. .. autofunction:: orangecontrib.bio.gsea.run .. autofunction:: orangecontrib.bio.gsea.direct Examples: gene expression data ------------------------------ The following examples use a gene expression data set from the GEO database. We show the same analysis on two formats of data. With samples as instances (in rows): .. literalinclude:: code/gsea_instances.py With samples as features (in columns): .. literalinclude:: code/gsea_genes.py Both scripts output:: GSEA results (descriptor: tissue) LABEL NES FDR SIZE MATCHED Porphyrin and chlorophyll meta -1.817 0.000 43 23 Staphylococcus aureus infectio -1.998 0.000 59 28 Non-homologous end-joining 1.812 0.000 13 12 Fanconi anemia pathway 1.911 0.000 53 27 Cell cycle 1.777 0.000 124 106 Glycine, serine and threonine -2.068 0.000 39 29 HIF-1 signaling pathway -1.746 0.000 106 90 Ether lipid metabolism -1.788 0.000 42 27 Fc epsilon RI signaling pathwa -1.743 0.000 70 53 B cell receptor signaling path -1.782 0.000 72 62 Example: our own gene sets -------------------------- We present a simple example on iris data set. Because data set is not a gene expression data set, we had to specify our own sets of features that belong together. .. literalinclude:: code/gsea1.py The output:: LABEL NES P-VAL GENES sepal 1.087 0.630 ['sepal width', 'sepal length'] petal -1.117 0.771 ['petal width', 'petal length'] Example: directly passing correlation data ------------------------------------------ GSEA can also directly use correlation data between individual genes and a phenotype. If (1) input data with only one example (attribute names are gene names) or (2) there is only one continuous feature in the given data set (gene names are in the first :obj:`Orange.feature.String`. The following outputs ten pathways with smallest p-values. .. literalinclude:: code/gsea2.py The output:: LABEL NES P-VAL SIZE MATCHED Biosynthesis of amino acids 1.407 0.056 58 40 beta-Alanine metabolism 1.165 0.232 13 10 Taurine and hypotaurine metabolism 1.160 0.413 4 3 Porphyrin and chlorophyll metabolis -0.990 0.517 14 5 Valine, leucine and isoleucine degr 0.897 0.585 29 21 Ether lipid metabolism 0.713 0.857 10 6 Biosynthesis of unsaturated fatty a 0.659 0.922 10 6 Protein processing in endoplasmic r 0.647 0.941 71 40 RNA polymerase 0.550 0.943 24 7 Glycosylphosphatidylinositol(GPI)-a -0.540 0.946 19 4 .. [GSEA] Subramanian, Aravind et al. Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. PNAS, 2005.