Chalking up another victory for comparative genomics, researchers from Genoscope (the French National Sequencing Center) in Paris and the Whitehead Institute Center for Genome Research announced Oct. 25 that they have produced a sixfold sequence coverage of Tetraodon nigroviridis, a type of puffer fish whose genome is estimated to be 350 million DNA letters long.
Tetraodon is a freshwater puffer fish that is 20-30 million years distant from the Fugu rubripes, another type of puffer fish whose genome sequence was announced the same day by the Department of Energy's Joint Genome Institute.
Both fish are considered to be important species to sequence because they are vertebrates and because their genomes are eight times as compact as the human genome, having many of the same genes and regulatory content as humans but with much less "junk" DNA. As a result, scientists say, finding genes and regulatory sequences in the puffer fish genome will be easier. This in turn will help researchers identify analogous genes and DNA regions in the human genome.
Along with the genome sequences of other species, the sequences of the two puffer fish will provide key tools for gaining insights into the human genome, which will in turn translate into practical knowledge toward developing better therapies in the future.
PUFFER SEQUENCE
The tetraodon puffer fish sequence represents a sixfold coverage of its genome; in other words, it covers the length of the Tetraodon genome six times over with overlapping DNA fragments.
Whitehead and Genoscope scientists say that they now have enough coverage of the tetraodon genome to be able to assemble the genome, or determine the exact order of the DNA chemical bases (A, T, C and G) along the tetraodon chromosomes.
The Tetraodon sequences are available online so scientists everywhere can look for Evolutionary COnserved REgions (Ecore) using a tool called Exofish, which already has been used to help refine the estimate of the number of genes contained in the human genome. Exofish (which stands for EXOn FInding by Sequence Homology) is a genomic comparative method used to identify genes based on the homology or similarities between two species.
In fact, 18 months ago, genome-wide analysis comparing the tetraodon and human genomes caused Genoscope scientists to propose a first reevaluation of the gene content of the human genome, suggesting that the genome contains 28,000 to 34,000 genes rather than the previous estimates of 50,000 to 90,000 genes.
Although the recently completed draft sequence of the human genome offers an initial look at the human gene content, to fully unravel the important information from the human genome, scientists will have to compare it to the genome sequences of many other species. That's because evolution preserves the most important genetic information across species; if genes and regulatory elements have survived hundreds of millions of years of evolution, they would be functionally important. But other genes may not have survived evolution because they may no longer be important for survival in the new environment. So researchers need comparative sequences that are both closely and distantly related to humans, because different genetic elements in humans would call for comparison with different species.
SEQUENCED TO DATE
So far, scientists have sequenced baker's yeast, the nematode worm and the fruit fly. They are also racing toward completion of the mouse genome sequence. Sequences of many other organisms large and small are also in the works.
Puffer fish diverged from humans around 400 million years ago. So researchers reason that any genes that survived that length of evolutionary time must be important.
A version of this article appeared in MIT Tech Talk on October 31, 2001.