On Feb. 12, 2001, scientists at the Human Genome Project unveiled a bold undertaking nearly 20 years in the making: a first draft of the sequence of the human genome. This early translation—and its more complete version two years later—gave researchers the ability to finally read human DNA. Since then, the blueprint of around 20,000 cataloged genes has informed and transformed healthcare, making it possible for us to predict, diagnose, and treat genetic diseases like Alzheimer’s disease and multiple sclerosis more precisely than ever before.
However, there's been one final hurdle to overcome: decoding the entire human genome. The “finished” sequence published in 2003 only amounted to about 90 of the genome that seemed most vital at the time. Segments of that deciphered DNA (about 2 percent) encoded instructions for making proteins. But there was another 8 percent that appeared to be “junk”—repetitive sequences that seemed inconsequential—and went untranslated.
“We said it was done with a little asterisk way back in 2003,” Dr. David Valle, a geneticist at Johns Hopkins University School of Medicine who was an advisor to the project, told The Daily Beast.
ADVERTISEMENT
Since then, scientists have come to realize junk DNA isn’t actually junk. Instead, it contains a treasure trove of information linked to diseases like cancer and developmental disorders like autism. A renewed effort to completely decode these under-appreciated genetic sequences has now led to a group of scientists removing that little asterisk and leaving no stone of the human genome left unturned.
In a sprawling set of six different studies published Thursday in the journal Science, an international group of 100 scientists calling themselves the Telomere-to-Telomere (T2T) Consortium sequenced the remaining 8 percent. They've patched up the enormous gaps in our book of life and provided new insight into human genomic health and disease.
“This is an impressive tour de force and a landmark accomplishment,” Lloyd Smith, a biochemist at the University of Wisconsin-Madison who was not involved with the T2T project, told The Daily Beast. “It takes tremendous commitment, perseverance, and deep technical knowledge to decipher these most difficult to access regions of the genome.”
The human genome consists of 23 pairs of chromosomes—one each inherited from your mom and dad—made from just over six billion molecules called nucleotides, or base pairs. To sequence a genome, scientists have to cut it up into many small pieces and then try to rearrange these pieces in an order that makes sense, much like recreating a full movie from short film clips, Jef Boeke, a geneticist at NYU Langone Medical Center who was not involved in the project, told The Daily Beast.
But this method gets tricky when you’re dealing with repetitive sequences of DNA, which are believed to regulate gene expression and ensure chromosomes stay intact.
“The genome has many repetitive components, where similar sequences can be duplicated or replicated hundreds to thousands of times,” Dr. Elizabeth McNally, a geneticist and cardiologist at Northwestern University who was involved in the T2T project, told The Daily Beast in an email. “Sometimes the repetitive regions are near each other and sometimes the repeat copies can be scattered about in many different places on the genome map.”
“They’re hard to put together because you’re putting pieces of a puzzle where the pieces look very similar, sometimes even identical to each other but they’re actually separate,” Erich Jarvis, a neurogeneticist at Rockefeller University and Howard Hughes Medical Institute, and a co-author of the new research, told The Daily Beast. “You have to figure out where they go and that’s hard.”
An even bigger problem with dealing with repetitive sequences is how to tell, for any one individual, which repetitive sequences come from mom or dad. If these two aren’t separated, the number of repetitive sequences in a sample will inflate. At the time of the Human Genome Project, Jarvis said there was no technology available to separate the two. Even now, it’s still an arduous task.
The researchers had a pair of aces up their sleeve to get around these two challenges. One was a group of cells called CHM13 where, by some spontaneous genetic fluke, all 23 pairs of chromosomes came from the father. These cells were grown in the lab but were originally derived from a molar pregnancy (in which a tumor-like growth of tissue develops instead of a pregnancy that can come to term). The other ace was a new and improved gene sequencing tool developed by Pacific Biosciences and Oxford Nanopore that allowed the scientists to piece together ultra-long clips of DNA (versus the traditional approach of using super short snippets).
These two developments allowed the T2T Consortium researchers to add and fix more than 200 million base pairs in the reference genome and discover nearly 2,000 new genes.
The new information is sure to have an impact on medicine and understanding how various diseases like cancer or neurodegenerative disorders like Parkinson’s disease and Lou Gehrig’s disease arise from mutations and errors within repetitive sequences—and more importantly, how to treat them. But even with this milestone, the group’s work isn’t done.
“We pretty much got the genome of the father of [the CHM13 cells] but there’s a lot of genetic diversity out there—no one individual has the entire genome sequences of the entire human population,” said Jarvis.
To ascertain that information, the researchers are taking what they’ve learned and applying it to the genomes of as many people worldwide. They plan to expand their efforts over the next three to five years and are hopeful they can uncover 90-95 percent of human genetic diversity, which could also shed better light on evolutionary biology and why some groups are more or less prone to certain diseases, said Valle.
“It’s an ambitious project,” said Jarvis. “We still have gaps and they’re better than anything else that’s out there but it’s not yet the perfection we’re trying to achieve.”