Onto-express, the tool which translates expression profiling outcomes into functional profiles

High throughput expression profiling had been initially designed to provide a means of assigning function to genes at the same pace as sequencing projects released data. In the mid-1990's, when genome wide sequencing projects were launched, the community realized it would no longer be possible to elucidate gene functions, one at a time, with the same laboratory set-up. Thus, there were two major concepts that led to the design of high throughput expression profiling with microarrays. Firstly, there was the fact that the most controlled event in the biology of the cell is the initiation of transcription. Accordingly, any gene whose transcript level is modified upon a given process is more likely to be involved in that process; Secondly, a biological output, triggered by any input, is the resultant of a series of events usually grouped in a set referred to as a cascade. These series are components, namely proteins, acting upon each other in a stepwise manner, according to some prescribed pattern of interaction. This pattern of interaction is represented by pathways. If in a time course experiment, assayed by microarray, several genes can be clustered together, the ones for which sequence data were the only experimental evidence of their existence can then be assigned to a biological process by virtue of being co-regulated with genes of known functions. This concept was introduced as such by Shena et al., in 1995: "The temporal, developmental, topographical, histological, and physiological patterns in which a gene is expressed provide clues to its biological role".
On the other hand, expression profiling captures data which holds information about the biological object under investigation. As genomics high throughput methods have continued to develop over several years, a greater amount of genes have already been assigned functions. Therefore, from a transcriptional response to a given biological event, measured by a high-throughput system, (e.g. SAGE or DNA microarray), one could potentially infer knowledge about the physiology of the biological object under investigation. One famous example of this deduction process is the revelation of how Human fibroblasts respond to serum. Iyer et al., 1999 assayed human fibroblasts mRNA with a 3.7 k cDNA array and showed that these cells are committed to the physiology of wound repair.
In other words, any given expression profiling experiment delivers data that can potentially be mined to annotate genes, provide insight about which genes are involved in the inception of a biological event, and finally describe complex biological systems at the molecular level. The latter project requires to translate the outcome of expression profiling data, eventually represented by files holding gene identifiers, into a biologically meaningful end result. The accomplishment of such tasks relies both on a common standard for gene annotations and on an automated system designed for retrieving these annotations from input files containing gene identifiers lists. The former task is undertaken by a group of major biological database administrators, referred to as the GENE ONTOLOGY CONSORTIUM (GO). The GO consortium is setting a "dynamic controlled vocabulary that can be applied to all organisms even as knowledge of gene and protein roles in cells is accumulating and changing" (http://www.geneontology.org/). Each gene product is aimed to be assigned three attributes: its molecular function(s), in which biological process(es) it acts and in which cellular component(s) it is part of. All things being equal, one can compare the gene data structure specified by the GO as an effort to order genes the same way atoms are ordered in the periodic table. The latter task, which consists to automatically retrieve the biological attributes with respect to each gene identifier and to visualize the result of the query requires a sophisticated software tool. Onto-Express has been precisely designed for handling such task (http://vortex.cs.wayne.edu/Projects.html). Onto-Express relies on a large relational database that stores relevant sequence and annotation information from various public data sources including LocusLink and RefSeq, UniGene, dbEST, Gene Ontology, KEGG Genes, KEGG Ligand and KEGG Pathways. Currently, this database contains information about more than 6 milion sequence and is hosted in the Department of Computer Science, Wayne State University. USA. The Onto-Express tool mines the available functional annotation data and returns a functional profile of the biological system being studied. For each set of genes found to be differentially regulated in a condition, Onto-Express constructs a number of functional profiles. These functional profiles include: the biochemical function, biological process, cellular component, the cellular role and the genome map position provided these data are available. Onto-express delivers meaningful graphical representation of the functional profiles, referred to as ontogeny. The ontogeny of the biological system can for instance be displayed as a pie with its different parts representing the relative amount of the biological processes affected by the system under study. A decisive function of Onto Express V2 is the fact that it associates each result with a statistical significance. This feature allows the investigator to distinguish between significant biological processes and those affected by chance.