{"id":183927,"date":"2022-04-13T07:00:07","date_gmt":"2022-04-13T11:00:07","guid":{"rendered":"https:\/\/today.uconn.edu\/?p=183927"},"modified":"2023-06-27T11:59:40","modified_gmt":"2023-06-27T15:59:40","slug":"the-human-genome-project-pieced-together-92-of-the-dna-now-scientists-have-finally-filled-in-the-remaining-8","status":"publish","type":"post","link":"https:\/\/today.uconn.edu\/2022\/04\/the-human-genome-project-pieced-together-92-of-the-dna-now-scientists-have-finally-filled-in-the-remaining-8\/","title":{"rendered":"The Human Genome Project Pieced Together 92% of the DNA \u2013 Now Scientists Have Finally Filled in the Remaining 8%"},"content":{"rendered":"<p>When the\u00a0<a href=\"https:\/\/www.genome.gov\/11006929\/2003-release-international-consortium-completes-hgp\">Human Genome Project<\/a>\u00a0announced that they had completed the first human genome in 2003, it was a momentous accomplishment &#8211; for the first time, the DNA blueprint of human life was unlocked. But it came with a catch &#8211; they weren\u2019t actually able to put together all the genetic information in the genome. There were gaps: unfilled, often repetitive regions that were too confusing to piece together.<\/p>\n<p>With advancements in technology that could handle these repetitive sequences, scientists finally\u00a0<a href=\"https:\/\/doi.org\/10.1101\/2021.05.26.445798\">filled those gaps in May 2021<\/a>, and the first end-to-end human genome was\u00a0<a href=\"https:\/\/www.science.org\/doi\/10.1126\/science.abj6987\">officially published on Mar. 31, 2022<\/a>.<\/p>\n<p>I am a\u00a0<a href=\"https:\/\/scholar.google.com\/citations?user=q3BBiy8AAAAJ&amp;hl=en\">genome biologist<\/a>\u00a0who studies repetitive DNA sequences and how they shape genomes throughout evolutionary history. I was part of the team that helped\u00a0<a href=\"http:\/\/www.science.org\/doi\/10.1126\/science.abk3112\">characterize the repeat sequences<\/a>\u00a0missing from the genome. And now, with a truly complete human genome, these uncovered repetitive regions are finally being explored in full for the first time.<\/p>\n<blockquote class=\"twitter-tweet\" data-width=\"500\" data-dnt=\"true\">\n<p lang=\"en\" dir=\"ltr\">When Human Genome Project researchers announced they had successfully completed sequencing the human genome, it was only about 92% complete. There were still hundreds of gaps or missing DNA sequences. Why was it so difficult to complete the sequence? Let\u2019s break it down! <a href=\"https:\/\/twitter.com\/hashtag\/T2T?src=hash&amp;ref_src=twsrc%5Etfw\">#T2T<\/a> <a href=\"https:\/\/t.co\/2RXDJUNdXM\">pic.twitter.com\/2RXDJUNdXM<\/a><\/p>\n<p>&mdash; National Human Genome Research Institute (@genome_gov) <a href=\"https:\/\/twitter.com\/genome_gov\/status\/1415692472156495875?ref_src=twsrc%5Etfw\">July 15, 2021<\/a><\/p><\/blockquote>\n<p><script async src=\"https:\/\/platform.twitter.com\/widgets.js\" charset=\"utf-8\"><\/script><\/p>\n<p><strong>The Missing Puzzle Pieces<\/strong><\/p>\n<p>German botanist Hans Winkler coined the word \u201c<a href=\"https:\/\/doi.org\/10.1371\/journal.pgen.1006181\">genome<\/a>\u201d in 1920, combining the word \u201cgene\u201d with the suffix \u201c-ome,\u201d meaning \u201ccomplete set,\u201d to describe the full DNA sequence contained within each cell. Researchers still use this word a century later to refer to the genetic material that makes up an organism.<\/p>\n<p>One way to describe what a genome looks like is to compare it to a reference book. In this analogy, a genome is an anthology containing the DNA instructions for life. It\u2019s composed of a vast array of nucleotides (letters) that are packaged into chromosomes (chapters). Each chromosome contains genes (paragraphs) that are regions of DNA which code for the specific proteins that allow an organism to function.<\/p>\n<p>While every living organism has a genome, the size of that genome varies from species to species. An elephant uses the same form of genetic information as the grass it eats and the bacteria in its gut. But no two genomes look exactly alike. Some are short, like the genome of the insect-dwelling bacteria\u00a0<a href=\"https:\/\/doi.org\/10.1093\/gbe\/evt118\"><em>Nasuia deltocephalinicola<\/em><\/a>\u00a0with just 137 genes across 112,000 nucleotides. Some, like the 149 billion nucleotides of the flowering plant\u00a0<a href=\"https:\/\/doi.org\/10.1111\/j.1095-8339.2010.01072.x\"><em>Paris japonica<\/em><\/a>, are so long that it\u2019s difficult to get a sense of how many genes are contained within.<\/p>\n<p>But genes as they\u2019ve traditionally been understood \u2013 as stretches of DNA that code for proteins \u2013 are just a small part of an organism\u2019s genome. In fact, they make up\u00a0<a href=\"https:\/\/dx.doi.org\/10.1038%2Fnature11247\">less than 2% of human DNA<\/a>.<\/p>\n<p>The\u00a0<a href=\"https:\/\/www.science.org\/doi\/10.1126\/science.abj6987\">human genome<\/a>\u00a0contains roughly 3 billion nucleotides and just under 20,000 protein-coding genes &#8211; an estimated 1% of the genome\u2019s total length. The remaining 99% is non-coding DNA sequences that don\u2019t produce proteins. Some are regulatory components that work as a switchboard to control how other genes work. Others are\u00a0<a href=\"https:\/\/doi.org\/10.1155\/2012\/424526\">pseudogenes<\/a>, or genomic relics that have lost their ability to function.<\/p>\n<p>And\u00a0<a href=\"https:\/\/doi.org\/10.1101\/2021.07.12.451456\">over half<\/a>\u00a0of the human genome is repetitive, with multiple copies of near-identical sequences.<\/p>\n<p><strong>What is Repetitive DNA?<\/strong><\/p>\n<p>The simplest form of repetitive DNA are blocks of DNA repeated over and over in tandem called\u00a0<a href=\"https:\/\/doi.org\/10.3390\/genes8090230\">satellites<\/a>. While\u00a0<a href=\"https:\/\/doi.org\/10.1093\/molbev\/msq198\">how much satellite DNA<\/a>\u00a0a given genome has varies from person to person, they often cluster toward the ends of chromosomes in regions called\u00a0<a href=\"https:\/\/doi.org\/10.1016\/j.febslet.2004.11.036\">telomeres<\/a>. These regions protect chromosomes from degrading during DNA replication. They\u2019re also found in the\u00a0<a href=\"https:\/\/doi.org\/10.3390\/genes10030223\">centromeres<\/a>\u00a0of chromosomes, a region that helps keep genetic information intact when cells divide.<\/p>\n<p>Researchers still lack a clear understanding of all the functions of satellite DNA. But because satellite DNA forms unique patterns in each person, forensic biologists and genealogists use this\u00a0<a href=\"https:\/\/www.yourgenome.org\/facts\/what-is-a-dna-fingerprint\">genomic \u201cfingerprint\u201d<\/a>\u00a0to match crime scene samples and track ancestry. Over 50 genetic disorders are linked to variations in satellite DNA, including\u00a0<a href=\"https:\/\/doi.org\/10.1212\/WNL.0b013e318249f683\">Huntington\u2019s disease<\/a>.<\/p>\n<p>Another abundant type of repetitive DNA are\u00a0<a href=\"https:\/\/doi.org\/10.1007\/s10577-017-9569-5\">transposable elements<\/a>, or sequences that can move around the genome.<\/p>\n<p>Some scientists have described them as selfish DNA because they can insert themselves anywhere in the genome, regardless of the consequences. As the human genome evolved, many transposable sequences collected mutations\u00a0<a href=\"https:\/\/doi.org\/10.1186\/s13100-016-0070-z\">repressing<\/a>\u00a0their ability to move to avoid harmful interruptions. But some can likely still move about. For example, transposable element insertions are linked to a number of\u00a0<a href=\"https:\/\/doi.org\/10.1186\/s13100-016-0065-9\">cases of hemophilia A<\/a>, a genetic bleeding disorder.<\/p>\n<p>But transposable elements aren\u2019t just disruptive. They can have\u00a0<a href=\"https:\/\/doi.org\/10.1101\/gr.218149.116\">regulatory functions<\/a>\u00a0that help control the expression of other DNA sequences. When they\u2019re\u00a0<a href=\"https:\/\/doi.org\/10.1016\/j.tig.2004.09.011\">concentrated in centromeres<\/a>, they may also help maintain the integrity of the genes fundamental to cell survival.<\/p>\n<p>They can also contribute to evolution. Researchers recently found that the insertion of a transposable element into a gene important to development might be why some primates, including humans,\u00a0<a href=\"https:\/\/doi.org\/10.1101\/2021.09.14.460388\">no longer have tails<\/a>. Chromosome rearrangements due to transposable elements are even linked to the genesis of new species like the\u00a0<a href=\"https:\/\/doi.org\/10.1093\/molbev\/msab148\">gibbons of southeast Asia<\/a>\u00a0and the\u00a0<a href=\"https:\/\/doi.org\/10.1146\/annurev-animal-021419-083555\">wallabies of Australia<\/a>.<\/p>\n<p><strong>Completing the Genomic Puzzle<\/strong><\/p>\n<p>Until recently, many of these complex regions could be compared to the far side of the moon: known to exist, but unseen.<\/p>\n<p>When the\u00a0<a href=\"https:\/\/www.genome.gov\/11006929\/2003-release-international-consortium-completes-hgp\">Human Genome Project<\/a>\u00a0first launched in 1990, technological limitations made it impossible to fully uncover repetitive regions in the genome.\u00a0<a href=\"https:\/\/www.nature.com\/scitable\/topicpage\/dna-sequencing-technologies-key-to-the-human-828\/\">Available sequencing technology<\/a>\u00a0could only read about 500 nucleotides at a time, and these short fragments had to overlap one another in order to recreate the full sequence. Researchers used these overlapping segments to identify the next nucleotides in the sequence, incrementally extending the genome assembly one fragment at a time.<\/p>\n<p>These repetitive gap regions were like putting together a 1,000-piece puzzle of an overcast sky: When every piece looks the same, how do you know where one cloud starts and another ends? With near-identical overlapping stretches in many spots, fully sequencing the genome by piecemeal became unfeasible.\u00a0<a href=\"https:\/\/doi.org\/10.1371\/journal.pcbi.1003628\">Millions of nucleotides<\/a>\u00a0remained hidden in the the first iteration of the human genome.<\/p>\n<p>Since then, sequence patches have gradually filled in gaps of the human genome bit by bit. And in 2021, the\u00a0<a href=\"https:\/\/github.com\/marbl\/CHM13#telomere-to-telomere-consortium\">Telomere-to-Telomere (T2T) Consortium<\/a>, an international consortium of scientists working to complete a human genome assembly from end to end, announced that all remaining gaps were\u00a0<a href=\"https:\/\/www.science.org\/doi\/10.1126\/science.abj6987\">finally filled<\/a>.<\/p>\n<p>This was made possible by improved sequencing technology capable of\u00a0<a href=\"https:\/\/doi.org\/10.1038\/s41576-020-0236-x\">reading longer sequences<\/a>\u00a0thousands of nucleotides in length. With more information to situate repetitive sequences within a larger picture, it became easier to identify their proper place in the genome. Like simplifying a 1,000-piece puzzle to a 100-piece puzzle, long-read sequences made it\u00a0<a href=\"http:\/\/www.science.org\/doi\/10.1126\/science.abk3112\">possible to assemble<\/a>\u00a0large repetitive regions for the first time.<\/p>\n<p>With the increasing power of long-read DNA sequencing technology, geneticists are positioned to explore a new era of genomics, untangling complex repetitive sequences across populations and species for the first time. And a complete, gap-free human genome provides an invaluable resource for researchers to investigate repetitive regions that shape genetic structure and variation, species evolution and human health.<\/p>\n<p>But one complete genome doesn\u2019t capture it all. Efforts continue to create diverse genomic references that fully represent\u00a0<a href=\"https:\/\/humanpangenome.org\/\">the human population<\/a>\u00a0and\u00a0<a href=\"https:\/\/www.earthbiogenome.org\/\">life on Earth<\/a>. With more complete, \u201ctelomere-to-telomere\u201d genome references, scientists\u2019 understanding of the repetitive dark matter of DNA will become more clear.<\/p>\n<p>&nbsp;<\/p>\n<p><em><a href=\"https:\/\/theconversation.com\/the-human-genome-project-pieced-together-only-92-of-the-dna-now-scientists-have-finally-filled-in-the-remaining-8-176138\">Originally published in The Conversation.<\/a><\/em><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Over half of the human genome contains repetitive DNA sequences whose functions are still not fully understood<\/p>\n","protected":false},"author":68,"featured_media":183928,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"wds_primary_category":0,"wds_primary_series":0,"wds_primary_attribution":0,"footnotes":""},"categories":[2226,2459,2231,2076,2235,179],"tags":[],"magazine-issues":[],"coauthors":[1902],"class_list":["post-183927","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-clas","category-graduate-students","category-health-well-being","category-research","category-today-homepage","category-uconn-health"],"pp_statuses_selecting_workflow":false,"pp_workflow_action":"current","pp_status_selection":"publish","acf":[],"publishpress_future_action":{"enabled":false,"date":"2026-06-17 02:02:39","action":"change-status","newStatus":"draft","terms":[],"taxonomy":"category","extraData":[]},"publishpress_future_workflow_manual_trigger":{"enabledWorkflows":[]},"_links":{"self":[{"href":"https:\/\/today.uconn.edu\/wp-rest\/wp\/v2\/posts\/183927","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/today.uconn.edu\/wp-rest\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/today.uconn.edu\/wp-rest\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/today.uconn.edu\/wp-rest\/wp\/v2\/users\/68"}],"replies":[{"embeddable":true,"href":"https:\/\/today.uconn.edu\/wp-rest\/wp\/v2\/comments?post=183927"}],"version-history":[{"count":3,"href":"https:\/\/today.uconn.edu\/wp-rest\/wp\/v2\/posts\/183927\/revisions"}],"predecessor-version":[{"id":188089,"href":"https:\/\/today.uconn.edu\/wp-rest\/wp\/v2\/posts\/183927\/revisions\/188089"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/today.uconn.edu\/wp-rest\/wp\/v2\/media\/183928"}],"wp:attachment":[{"href":"https:\/\/today.uconn.edu\/wp-rest\/wp\/v2\/media?parent=183927"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/today.uconn.edu\/wp-rest\/wp\/v2\/categories?post=183927"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/today.uconn.edu\/wp-rest\/wp\/v2\/tags?post=183927"},{"taxonomy":"magazine-issue","embeddable":true,"href":"https:\/\/today.uconn.edu\/wp-rest\/wp\/v2\/magazine-issues?post=183927"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/today.uconn.edu\/wp-rest\/wp\/v2\/coauthors?post=183927"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}