Onto-express,
the tool which translates expression profiling outcomes into functional profiles
High throughput expression profiling had been initially
designed to provide a means of assigning function to genes at the same pace
as sequencing projects released data. In the mid-1990's, when genome wide
sequencing projects were launched, the community realized it would no longer
be possible to elucidate gene functions, one at a time, with the same laboratory
set-up. Thus, there were two major concepts that led to the design of high
throughput expression profiling with microarrays. Firstly, there was the fact
that the most controlled event in the biology of the cell is the initiation
of transcription. Accordingly, any gene whose transcript level is modified
upon a given process is more likely to be involved in that process; Secondly,
a biological output, triggered by any input, is the resultant of a series
of events usually grouped in a set referred to as a cascade. These series
are components, namely proteins, acting upon each other in a stepwise manner,
according to some prescribed pattern of interaction. This pattern of interaction
is represented by pathways. If in a time course experiment, assayed by microarray,
several genes can be clustered together, the ones for which sequence data
were the only experimental evidence of their existence can then be assigned
to a biological process by virtue of being co-regulated with genes of known
functions. This concept was introduced as such by Shena et al., in 1995: "The
temporal, developmental, topographical, histological, and physiological patterns
in which a gene is expressed provide clues to its biological role".
On the other hand, expression profiling captures data which holds information
about the biological object under investigation. As genomics high throughput
methods have continued to develop over several years, a greater amount of
genes have already been assigned functions. Therefore, from a transcriptional
response to a given biological event, measured by a high-throughput system,
(e.g. SAGE or DNA microarray), one could potentially infer knowledge about
the physiology of the biological object under investigation. One famous example
of this deduction process is the revelation of how Human fibroblasts respond
to serum. Iyer et al., 1999 assayed human fibroblasts mRNA with a 3.7 k cDNA
array and showed that these cells are committed to the physiology of wound
repair.
In other words, any given expression profiling experiment delivers data that
can potentially be mined to annotate genes, provide insight about which genes
are involved in the inception of a biological event, and finally describe
complex biological systems at the molecular level. The latter project requires
to translate the outcome of expression profiling data, eventually represented
by files holding gene identifiers, into a biologically meaningful end result.
The accomplishment of such tasks relies both on a common standard for gene
annotations and on an automated system designed for retrieving these annotations
from input files containing gene identifiers lists. The former task is undertaken
by a group of major biological database administrators, referred to as the
GENE ONTOLOGY CONSORTIUM (GO). The GO consortium is setting a
"dynamic controlled vocabulary that can be applied to all organisms even
as knowledge of gene and protein roles in cells is accumulating and changing"
(http://www.geneontology.org/).
Each gene product is aimed to be assigned three attributes: its molecular
function(s), in which biological process(es) it acts and in which cellular
component(s) it is part of. All things being equal, one can compare the gene
data structure specified by the GO as an effort to order genes the same way
atoms are ordered in the periodic table. The latter task, which consists to
automatically retrieve the biological attributes with respect to each gene
identifier and to visualize the result of the query requires a sophisticated
software tool. Onto-Express has been precisely designed for handling such
task (http://vortex.cs.wayne.edu/Projects.html).
Onto-Express relies on a large relational database that stores relevant sequence
and annotation information from various public data sources including LocusLink
and RefSeq, UniGene, dbEST, Gene Ontology, KEGG Genes, KEGG Ligand and KEGG
Pathways. Currently, this database contains information about more than 6
milion sequence and is hosted in the Department of Computer Science, Wayne
State University. USA. The Onto-Express tool mines the available functional
annotation data and returns a functional profile of the biological system
being studied. For each set of genes found to be differentially regulated
in a condition, Onto-Express constructs a number of functional profiles. These
functional profiles include: the biochemical function, biological process,
cellular component, the cellular role and the genome map position provided
these data are available. Onto-express delivers meaningful graphical representation
of the functional profiles, referred to as ontogeny. The ontogeny of the biological
system can for instance be displayed as a pie with its different parts representing
the relative amount of the biological processes affected by the system under
study. A decisive function of Onto Express V2 is the fact that it associates
each result with a statistical significance. This feature allows the investigator
to distinguish between significant biological processes and those affected
by chance.