|
PROTEOMICS Pegging Protein Function to DNA Sequence
Researchers Use Worm Sex Organ to Demonstrate Meaning of Genome Letters
Lost amid the media fanfare greeting new DNA sequencing milestonesmost recently that of human chromosome 22is the less glamorous fact that eye-glazing streams of As, Cs, Ts, and Gs per se reveal little about how a creature works. "How do you get from a complete genome sequence back to the biology of the organism? This is a big, unanswered question," says Marc Vidal, currently HMS instructor in medicine at the Cancer Center at Massachusetts General Hospital. In the Jan. 7 Science, a research team led by Vidal takes a stab at this problem. In a report accompanied by a Perspectives article, the scientists describe how they have tested a way of discovering which proteins interact with one another. The method uses a high-throughput version of a standardized assay to generate data on hundreds of proteins in one experiment.
Marc Vidal (left) and Marian Walhout have developed a method to study how otherwise completely unknown gene products interact.
It is the first functional genomics study that attempts to reveal the proteinprotein interactions in the tiny worm Caenorhabditis elegans, which last year became the first animal to have its genome fully sequenced. Previous work in the field interpreted the genomes of yeast or bacteria, and much of it uses DNA chips to assess the expression of all genes of these cells under a given condition. But while expression studies indicate which proteins may be involved in the biological function at hand, it is the proteins that actually carry it out, making protein studies a more direct approach to understanding how things work. The goal of this research is to find quick ways to "annotate" genes predicted to exist by the genome sequence but that remain unexplored by conventional genetics. In C. elegans, 20 years of molecular genetics, studying a gene at a time, has assigned a function to only 1,277 of the 19,293 genes thought to compose the worm's genome. Yet quick often implies quick and dirty. Indeed, Vidal concedes that molecular biologists rightly fault functional genomics for the high rate of false-positive and false-negative results produced by the automated screening methods used to test large numbers of genes all at once. That is why the authors devote much of the current paper to discussing ways for validating the potential proteinprotein interactions their screen has generated. Functional genomics as Vidal develops it is routinely used in genomics companies, but academia has been slower to embrace it. Beyond criticizing the artificial nature of its screens, some academic scientists question genomics because it is not driven by a specific hypothesis. Vidal, who is setting up his lab as an HMS assistant professor of genetics at the DanaFarber Cancer Institute, says, "We do not have a hypothesis." But he adds, "We hope to generate sensible hypotheses that can be tested."
Seeing Who Does What to Whom
In this study, first author Marian Walhout, HMS research fellow in Vidal's lab, used an improved and automated version of the two-hybrid screen, in which C. elegans genes are cloned into yeast strains in such a way that an interaction between two expressed proteins will allow the yeast cells to grow.To test their approach, the scientists chose to study which proteins interact to form the worm's vulva. This area has been studied extensively with conventional genetics, allowing Vidal's group to check the reliability of their screen. First, Walhout ran a matrix experiment testing 29 proteins implicated in vulva development against each other. The screen picked up six of 11 known interactions and suggested two new ones. Then she tested each of the 29 vulva proteins individually against thousands of C. elegans proteins expressed from a cDNA library. That experiment generated 150 potential proteinprotein interactionsnot necessarily all connected to vulva developmentinvolving 126 genes. Interestingly, 110 of those genes fall among those 18,000 for which no information is available. "So one small sampling of the genome with just these 29 proteins produced a first annotation for 110 genes," says Vidal. But how can one tell the real interactions from the artifacts? "We are trying to develop a set of increasing heuristic values that gradually establish confidence in the data generated by the screens," says Vidal. For instance, they introduce the concept of the "interolog," meaning that a known interaction between homologous proteins in another species makes a potential interaction in the worm screens more plausible. Such combing of the literature, however, enables only the transfer of already known proteinprotein interactions back to C. elegans. To assess new interactions, the authors adapted a method of data analysis to extract patterns from the 150 interactions. The method essentially looks for snakes that bite their own tailspatterns where A interacts with B, B with C, C with D, and so on, until one protein harks back to A. The premise is that such clusters again increase the likelihood that the proteins actually work together in the worm. Finally, by integrating these patterns with whatever information is known about some of the involved proteins, the researchers suggest networks of protein interaction that geneticists can test (see image). Even so, Vidal's comprehensive effort to establish proteinprotein interactions for most of the C. elegans genes might be wasted if its data were to stand in isolation, he says. But they will not. His is only one part of a larger effort to connect several different genomics/proteomics approaches through a set of hyperlinked databases. A group at Stanford is gathering gene expression data on 12,000 worm genes in parallel. Other academic groups are working on systematically knocking out genes and on visualizing where proteins reside in the cell.
  This nematode has developed multiple female genitals (arrows) thanks to mutations in two of the genes depicted on the right. The diagram exemplifies how scientists including Marc Vidal are beginning to extract from the worm's genome sequence information about its proteins. This work represents an early attempt at handling many proteins at the same time and placing them in functional relationships. In this case, screens testing many proteins in parallel generated large sets of raw data from which Vidal identified clusters of protein interactions that feed back on each other. The diagram represents such a cluster (protein abbreviations shown). Circled proteins were previously described; adjacent ones are known to interact physically; red ones probably act in linear pathways. Plain labels denote gene products merely predicted by the DNA sequence, for which no information is available. Vidal's top-down approach posits that these genes all interact, and he invites genetic experiments to test that notion.
In Silico Biology
The academic labs are expected to hyperlink their respective Web pages to a central database on worm genes maintained by Proteome Inc., a Beverly, Mass., company that curates published information about genes. Access for academic scientists is free. Vidal and Walhout expect that such databases will eventually be established for other model organismsone for yeast already existsand linked together into one powerful resource. Taking the long view, Vidal hopes this resource will change the way biomedical scientists form research hypotheses a few years from now. Before starting the first experiment, a scientist could consult functional genomics databases to learn everything known about dozens of genes involved in a favorite problem, and then decide on a set of genes to study in detail with genetic tests tailored to the problem. In short, genetics and genomics complementand needeach other, says Vidal. Gabrielle Strobel
|