Genomic Data Viewed on Biological Pathways with New Computer Program

May 06, 2002
News Office: Laura Lane (415) 695-3833

A new computer program is helping genomic researchers to make sense of the reams of data -- a massive collection of numbers and decimals -- that result from using DNA microarrays, the widespread method of examining gene expression. Called GenMAPP (Gene MicroArray Pathway Profiler), the program displays the gene expression data in the context of known biological pathways so scientists can see how their results fit in with real life examples.

"GenMAPP provides a new way of looking at genomic information," said project leader Dr. Bruce Conklin, investigator at the Gladstone Institute of Cardiovascular Disease and UCSF associate professor of medicine, molecular and cellular pharmacology. "Gene expression data was in one world and known biology was in another. GenMAPP helps to connect the two."

The program is the subject of a paper published in the May issue of Nature Genetics.

The flood of sequences from various genome-sequencing projects has paved the way for large-scale experiments to study gene expression. Just one experiment can yield information from thousands of genes. GenMAPP organizes the results by biological process, allowing researchers to see coordinated changes in gene expression that would be difficult to see when looking at all the data at once.

Since its beta release one year ago, more than 1,000 scientists from more than 35 different countries have registered to download the program, which is distributed freely to the public through its website ( Feedback from these users has been used to refine the program and guide plans for its future development.

The program is the brainchild of the Conklin lab ( at Gladstone whose members began drawing graphical representations of biological pathways that included genes for which they had expression data. With these diagrams, they could examine the biological significance of a gene's change in expression-information obtained from DNA microarrays. Finding the diagrams useful, they hired a computer programmer to write a program that would enable them to input their data easily and generate diagrams called MAPPs.

With GenMAPP, the user simply chooses one of the thousands of MAPPs already available at the GenMAPP website or draws a MAPP using the program's graphics tools.

Currently, there are 1009 MAPPs for mouse, 1,905 MAPPs for human, 345 MAPPs for rat and 633 MAPPs for yeast available. The user imports microarray data into the MAPP and picks the colors desired to represent the genes' down- and up-regulation. Then, the colored MAPP appears, displaying the gene expression data.

In MAPPs boxes represent genes and arrows represent the direction of the cascade of genes involved in the biological pathway (see For example, the MAPP for the fatty acid degradation pathway includes gene boxes for lipoprotein lipase and glycerol kinase, to name a couple of genes. The arrows point from lipoprotein lipase to glycerol kinase, showing that lipoprotein lipase acts on triacylglycerol so that it becomes glycerol and fatty acid. Glycerol kinase then takes the glycerol and transforms it into L-glycerol-3-phosphate.

Each gene box is linked to information about that gene. By clicking on the box, users can get hyperlinks to the GenBank and SWISS-PROT web pages on that gene.

Many of the existing MAPPs available at the GenMAPP web site were constructed from information gleaned from biochemistry textbooks. Some MAPPs were constructed based upon information from recent research by scientists worldwide. GenMAPP comes equipped with illustration tools so scientists can draw their own MAPPs and add to the database.

"GenMAPP becomes more useful as the number of MAPPs grows ," said Kam Dahlquist, first author of the paper and Gladstone postdoctoral fellow. "The long-term goal is to have MAPPs representing all biological pathways so we can use a systems approach to analyze gene expression data and thus see the big picture."

The most widespread method of analyzing gene expression data is called hierarchical clustering, which groups genes with similar levels of expression. This method is powerful because it groups genes without any prior knowledge of the gene's function. GenMAPP takes an opposite, complementary approach. By looking at genes in the context of a known biological process, it is possible to make sense of data that would otherwise be uninterpretable.

Furthermore, small changes in gene expression may be missed in hierarchical clustering, yet have real meaning when displayed on a biological pathway. Hierarchical clustering and the pathway-based GenMAPP work together to help in interpreting biological data.

"This is part of a larger process of putting all the pieces together and has tremendous value for figuring out diseases," Conklin said.

Currently GenMAPP is the only free available program for drawing, viewing, and sharing pathway information in a format that can be used with gene expression data. There are no barriers for any scientist to use the program to modify MAPPs according to his or her hypothesis, to design new pathways and to share the data with others in the community.

Conklin said that he hopes that GenMAPP becomes a standard means of viewing microarray data, and pathway information. Already, researchers are using GenMAPP to display results in conferences and published papers. By providing a common format to present and share data, Conklin said that he hopes that GenMAPP will help biologists communicate.

"This puts us one step closer to a common goal of trying to determine how we put all the pieces of the biological puzzle together," he said.

Authors of the Nature Genetics paper also include Nathan Salomonis, Gladstone research associate, Karen Vranizan, Gladstone statistical consultant and Steve Lawlor, professor of computers, technology, and information systems at Foothill College in Los Altos, Calif.

Funding for the development of GenMAPP came from the J. David Gladstone Institutes, the San Francisco General Hospital General Clinical Research Center, the National Heart, Lung, and Blood Institute and the NHLBI Programs for Genomics Applications.

The Gladstone Institute of Cardiovascular Disease is one of three research institutes that make up The J. David Gladstone Institutes, a private nonprofit biomedical research institution affiliated with UCSF. The institute is named for a prominent real estate developer who died in 1971. His will created a testamentary trust that reflects his long-standing interest in medical education and research.

This news release has been modified for the Web site