Differential Expression Analysis tool

Differential Expression Analysis tool

 

We provide a web page for performing differential expression analysis between two replicate sets from a gene expression experiment.  This analysis is similar to a fold-change analysis, in that it ranks the genes from most upregulated to most downregulated.  It also performs gene set analysis on sets of genes from subsystems and atomic regulons.   The web page can be accessed here:

http://bioseed.mcs.anl.gov/~dejongh/FIG/seedviewer.cgi?page=DifferentialExpression

Note that you must log in using your RAST account to use this web page.

Select a genome

The first page asks you to select a genome.  If you click in the text box, you will see a drop down list of all the genomes for which we have loaded gene expression data into the SEED database.  The full list of genomes and details about the experiments loaded for them are available here:

http://www.theseed.org/TheBook/GXP.htm

You can also type in the text box to filter the genomes according to the text that you enter.

Click "Load Expression Samples for Genome" to proceed.

Select expression samples

For each Replicate Set, you must select one or more samples from the drop down lists available from each text box.  Again you can type in the text box to filter the samples.  The gene expression values for each gene across all samples in each Replicate Set will be averaged to obtain an expression value for the gene.  Every sample that starts with the prefix "GSM" is from NCBI's GEO database, which can be searched at http://www.ncbi.nlm.nih.gov/geo/

Click "Run Analysis" to proceed.

Differential Expression Results

The results table has three tabs:

The "Subsystems" tab (green) shows the results of the gene set analysis for each Subsystem.  The "Size" column shows how many pegs/rnas in the query genome are associated with each subsystem in the SEED database.  The "Mean" column shows the mean differential expression value for all features in the subsystem.  The two "p-value" columns show a probability estimate of how likely it is that a set of genes of the given size will have the given differential expression value.  Look for the subsystems with the highest means and lowest over expressed p-values, as well as the lowest means and the lowest under expressed p-values.

The "Atomic Regulons" tab (green) shows a similar gene set analysis for each atomic regulon.

The "Genome Features" tab shows ranked differential expression results for each protein-encoding gene ("peg") and RNA in the genome. Specific peg and rna features can be searched by typing in the text box below the heading "Feature".  Likewise, the "Function", "Subsystems", and "Atomic Regulon" columns are all searchable.  All columns are sortable, by clicking on the up/down arrows.  Since p-values could not be computed for individual features, the column "Rank out of xxxx" can be used instead to evaluate the ranking of differential expression result for each feature in the context of the entire genome

 

Excersise:

For a trial run of this tool you are welcome to either follow along with example on the screen or chose one of the test cases suggested here.

 

  1. Open the list of available experiments, locate Pseudomonas aeruginosa PA01  and follow the provided link to GEO Platform GPL84
  2. Scroll to about mid-page and open/maximize the description of the Series (58)
  3. Again scroll down (or use "Find on page" functionality of your Browser) to locate series/experiment "GSE22665"
  4. Click on it to open the detailed description of the experiment "Expression data from P. aeruginosa colonizing the Murine GI Tract" (PMID: 21170272).   A shortcut to this GEO page: http://www.ncbi.nlm.nih.gov/projects/geo/query/acc.cgi?acc=GSE22665
  5. Scroll to about mid-page again and open the list of Samples (6)

 * * * * * *

  1. Open Differential Expression tool a separate window of your Browser.  
  2. From "Select a Genome" page choose Pseudomonas aeruginosa PA01 and press "Load Expression samples for genome" button
  3. Start typing or copy/paste GEO ID for one of the Control samples (e.g. GSM562112) into "Expression Sample 1" field.  Use "Add Another Sample" button to add the 2 remaining control replicate samples.   They all will be averaged into "Replicate Set 1" (note, that Rep1 set is treated by this tool as a control)
  4. Likewise, enter 3 replicate samples (GSM562109, etc) into "Replicate Set 2" fields
  5. Hit "Run Analysis"
  6. Explore the Results Table:
    1. Sort "subsystems" tab (green) by Mean differential expression value. What subsystems are preferentially expressed during gut colonization?  Which are repressed?  There are two ways to analyze Subsystems of interest: (i) to follow the corresponding link from the "Subsystem" tab - it will take you to the underlying SEED database, leaving the Differential Expression viewer; (ii) alternatively you can copy SS name and paste it into "Subsystems" text box under "Genome Features" tab.  This way expression of each PEG or RNA associated with this SS can be explored
    2. Sort "Genome Features" tab (green) by Rank - explore
    3. Which Atomic Regulons are represented among the highest-ranking PEGs?  Find them under "Atomic Regulons" tab, and follow the links
    4. Try typing "rna" in the text box below the heading "Feature"

 

  1. The beauty of the normalization procedure that is utilized to build our microarray data compendium in that it allows comparison between any samples within the data collection, regardless of which experiment they were originally generated.  For example, gene expression in sterile water can be compared to Pseudomonas aeruginosa planktonic growth in rich medium (sample GSM260371), and not to colonizing cells in this experiment