features

Genetics:
Worm Genes Confirmed and Cloned for Proteomic Tool Kit

Developmental Biology:
Diabetes-boosted Oxygen Radicals Block Neural Tube Closure

Epidemiology:
Link Strengthened Between MS and Epstein-Barr

Diversity:
Set an Example for Others, U.S. Surgeon General Advises

eAddendum:
Senator Kennedy Holds SARS Briefing at HMS
 

research briefs Study Writes 'Guidebook' to Chromosome 7

Alzheimer's Marker Revealed in Eye

Informatics System Simplifies Complex Image Analysis
 

bulletin
HMS Faculty Council Report

Students Take On Big Questions at Soma Weiss

U.S. News and World Report Ranks HMS Top Med School

Junior Science Awards

Dana-Farber to Host Antibody Library

In Memoriam:
Brian McGovern
 

incident report
'Humor' Belittles Pregnant Student
 
forum
Labs Look for Write Stuff
 
Front Page

GENETICS

Worm Genes Confirmed and Cloned for Proteomic Tool Kit

After the completion of a high-quality, comprehensive sequence of the human genome announced this week, scientists face the immense challenge of finding and understanding its working parts.

marc vidal

Behind Marc Vidal is the worm ORFeome, a collection of nearly 20,000 PCR amplifications that elicited about 12,000 clones of open reading frames--the gene coding regions--generated by his lab. (Photo by Phil Farnsworth)


One of the next steps is to identify and make all the proteins from the protein-coding genes, estimated to number about 30,000, and to learn what they do and how they interact. To make a proteome, researchers must extract a gene's exact coding sequences--called open reading frames (ORFs)--interspersed among the so-called junk base pairs along a strand of DNA. But computer predictions for genes can be as imprecise as weather forecasts for New England.

In fact, new evidence suggests that more than half of the predicted genes in sequenced genomes may need correction. A paper in the April 7 online Nature Genetics reports the first systematic attempt to check the protein-coding precision of the genome for the worm C. elegans.

A collaborative effort based in the Dana-Farber Cancer Institute lab of Marc Vidal, HMS assistant professor of genetics, the study generated a collection of high-quality clones of about 12,000 of the 20,000 ORFs in the C. elegans genome. In the process, the researchers found for the first time biological evidence for about 4,000 predicted genes. In the tradition of genome and software releases, they made their first draft public (version 1.1) and are working to refine it.

"While this shows that computational gene prediction tools are very important for the identification of new ORFs, it was very interesting to see that the in silico prediction tools were struggling much more with the prediction of the internal intron- exon gene structure."

--Philippe Lamesch

"This is the first experimental verification of an entire set of genes for a multicellular organism," said Kerstin Lindblad-Toh, who led the mouse genome sequencing project at the Whitehead Institute/MIT Center for Genome Research. "It generates a very useful catalog of genes. In addition, the Vidal lab is generating actual clones for each gene, which will permit experimentation with these genes."

The results affect both the database and experimental resources for the worm genome. Of the billions of bases in genome databases, as few as 1 percent may be part of protein-coding genes. Except for yeast and simpler organisms, the coding elements, or exons, are interrupted by stretches of introns. After a piece of one strand of DNA is transcribed into RNA, the exons are excised and spliced together to make a messenger RNA. The exons of a gene also are known as an ORF. This study refines the annotated open reading frames along the C. elegans genome sequence in the database.

Even better, the 12,000 ORF clones are available to other scientists for individual protein studies or wholesale proteomics in a form easily transferred to their favorite protein expression vector.

"Building clones that allow us to make proteins is a key step in taking biology to the next level," said Joshua LaBaer, director of the HMS Institute of Proteomics. "These are really important resources. To understand what genes do, you have to understand proteins. Proteins make biology tick. The vast majority of diseases are a result of protein malfunction in some form or another. And virtually every pharmaceutical that we give today is either a protein itself, or it somehow alters protein function by binding to proteins and changing how they act." LaBaer's group is working on a similar effort for the human genome and other organisms, rolling out this summer with a fully sequenced set of clones for all yeast genes.

In Vidal's lab, the ORFeome project scaled up to high-throughput robotic production under the direction of postdoctoral fellow Jérôme Reboul, now setting up his own lab at INSERM, France's National Institute for Health and Medical Research, in Marseille. Reboul teamed up with graduate student Jean-François Rual, and for the bioinformatics components, postdoc Philippe Vaglio worked with graduate student Philippe Lamesch. The lab also partnered with several companies that provided reagents, PCR primers, and postcloning sequencing confirmation.

ORF Processing

As a starting point, the researchers designed 19,000 primer pairs by computer based on gene predictions from the first draft of the worm sequence made public in 1998. (The latest draft predicts 20,000 genes.) They amplified the ORFs by PCR from a high-quality cDNA library created earlier in collaboration with Sander van den Heuvel at MGH and recombined them into a versatile vector. When they tested their strategy on the small proportion of well-known ORFs annotated by 20 years of worm research from many labs, they found a 93 percent success rate of cloning ORFs from corresponding genes.

researchers

To help scientists move from genome to proteome, Marc Vidal, David Hill, Philippe Lamesch, Jean-François Rual, Laurent Jacotot (clockwise from left), and colleagues published the first systematic attempt to decipher the coding sequences of an entire genome. In the process, they created a community resource of protein-making clones useful for functional proteomics. (Photo by Phil Farnsworth)


"We were surprised at the beginning that it was working so well," Reboul said. The cloning rate dropped to 80 percent and then to about 60 percent as the researchers began working on lesser known parts of the genome. In the early days, they were processing 200 ORFs a week; toward the end, they were processing 1,000 a week.

About half of the ORFs had errors in their exon-intron structure, such as alternative splice sites, novel introns, and additional exons. In many, they were able to correct predictions based on the structure of the expressed gene cloned as an ORF. "You've got to give the computer credit for finding them in the first place," said co-author David Hill, a scientific associate at DFCI.

"While this shows that computational gene prediction tools are very important for the identification of new ORFs, it was very interesting to see that the in silico prediction tools were struggling much more with the prediction of the internal intron- exon gene structure," said Lamesch, now working on an updated version of the ORFeome, which he hopes will include more of the approximately 8,000 genes the researchers were unable to amplify. "The most interesting technical aspect of this work is that we were able to amplify genes that are very poorly represented in cDNA libraries by using specific primers."

"This completely facilitates large-scale functional genomics and will accelerate the pace of research in individual labs tremendously," said yeast geneticist Charlie Boone, associate professor at the Banting and Best Department of Medical Research at the University of Toronto. Boone co-authored a commentary on the Vidal paper for the May 2003 print edition of Nature Genetics.

--Carol Cruzan Morton