Scientists are racing to sequence the genomes of all 1.8 million known eukaryotic species by 2035 through the Earth BioGenome Project, an ambitious international effort that could revolutionize our understanding of evolution and biodiversity. The project has completed approximately 4,200 genomes as of July 2024, representing just 0.2% of its ultimate goal, but researchers are confident that rapidly advancing sequencing technology will enable the massive scaling required.
The Global Hunt for Genetic Diversity
On an alpine trail above Malles Venosta, Italy, entomologist Benjamin Wiesmair identifies a Chersotis multangula moth drawn to his ultraviolet light trap. This specimen, along with hundreds of others collected during a five-day expedition, will contribute to Project Psyche’s mission to sequence all European butterflies and moths. The project represents one of 60 initiatives under the Earth BioGenome Project umbrella, each focusing on specific taxonomic groups or geographic regions.
Researchers collected more than 200 new species for sequencing during the July expedition, adding to approximately 1,000 finished lepidoptera genome sequences already completed. With roughly 11,000 species of moths and butterflies across Europe and Britain, the work demonstrates both the progress made and the substantial ground still to cover. The Darwin Tree of Life Project, sequencing all species in Britain and Ireland, has contributed about half of all genomes recorded by the EBP so far, showing how regional efforts combine to build the global database.
Technological Revolution Driving the Project
The sequencing revolution has been staggering in scale and speed. “Compared to 2001, when the Human Genome Project was nearing completion, it is now approximately 500,000 times cheaper to sequence DNA,” says Steven Salzberg, director of Johns Hopkins University’s Center for Computational Biology. “And it is also about 500,000 times faster to sequence. That is a scale of acceleration that has vastly outstripped any improvements in computational technology.”
Long-read sequencing technology from companies like Pacific Biosciences and Oxford Nanopore Technologies has been crucial for handling complex genomic regions. PacBio’s Revio system, costing approximately $600,000, can sequence four human genomes in 24 hours for under $1,000 per genome with 99.9% accuracy. Oxford Nanopore’s portable MinION system offers field sequencing capability for just $3,000, while their PromethION 24 handles fragments up to one million base pairs. This technological diversity gives researchers flexibility in tackling different sequencing challenges across the tree of life.
Scientific and Economic Implications
The potential benefits extend far beyond basic scientific understanding. “One idea is that by looking at plants, which have all sorts of chemicals, often which they make in order to fight off insects or pests, we might find new molecules that are going to be important drugs,” says Richard Durbin, professor of genetics at the University of Cambridge. The immunosuppressant and cancer drug rapamycin serves as just one example of valuable compounds discovered through genomic investigation of natural samples.
Beyond pharmaceutical applications, the complete genomic catalog will help answer fundamental evolutionary questions. “With this genomic data, we can get to one of the questions that Darwin asked a long time ago, which is, How does a species arise?” says Mark Blaxter, who leads the Darwin Tree of Life Project. Analyzing lepidoptera genomes dating back 300 million years will help explain why some evolutionary branches produced more species than others, providing insights into speciation mechanisms across all life forms.
The Road to 2035: Scaling and Challenges
The Earth BioGenome Project faces enormous scaling challenges as it moves into its next phases. “We need to scale, from where we’re at, more than a hundredfold in terms of the number of genomes per year that we’re producing worldwide,” says Harris Lewin, who leads the EBP. The project’s official road map calls for sequencing 150,000 genomes between 2026 and 2030, requiring production to increase from the current 3,000 genomes annually to 37,500.
Cost reduction presents another major hurdle. The current cost of approximately $26,000 per genome must drop to $6,100 by 2030 and ultimately to $1,900 per genome by 2035. If successful, the entire project will cost roughly $4.7 billion—less in real terms than sequencing just the human genome 22 years ago. The resulting data, occupying just over 1 exabyte of storage, will represent one of science’s most valuable digital resources for understanding life on Earth.
References: