{"id":29778,"date":"2011-02-17T08:42:58","date_gmt":"2011-02-17T13:42:58","guid":{"rendered":"https:\/\/today.uconn.edu\/?p=29778"},"modified":"2011-05-31T12:39:54","modified_gmt":"2011-05-31T16:39:54","slug":"contamination-found-in-nearly-a-quarter-of-genome-databases","status":"publish","type":"post","link":"https:\/\/today.uconn.edu\/2011\/02\/contamination-found-in-nearly-a-quarter-of-genome-databases\/","title":{"rendered":"Contamination Found in Nearly a Quarter of Genome Databases"},"content":{"rendered":"<figure id=\"attachment_29571\" aria-describedby=\"caption-attachment-29571\" style=\"width: 394px\" class=\"wp-caption alignleft\"><a href=\"https:\/\/today.uconn.edu\/wp-content\/uploads\/2011\/02\/ONeill1_lg.jpg\"><img decoding=\"async\" class=\"size-full wp-image-29571  img-responsive lazyload\" title=\"Associate professor Rachel O'Neill and graduate student Mark Longo.\" data-src=\"https:\/\/today.uconn.edu\/wp-content\/uploads\/2011\/02\/ONeill1_lg.jpg\" alt=\"&lt;p&gt;Associate Professor Rachel O'Neill and her graduate student Mark Longo. Photo by Dan Buttrey.&lt;\/p&gt;\" width=\"394\" height=\"264\" data-srcset=\"https:\/\/today.uconn.edu\/wp-content\/uploads\/2011\/02\/ONeill1_lg.jpg 700w, https:\/\/today.uconn.edu\/wp-content\/uploads\/2011\/02\/ONeill1_lg-300x201.jpg 300w\" data-sizes=\"(max-width: 394px) 100vw, 394px\" src=\"data:image\/svg+xml;base64,PHN2ZyB3aWR0aD0iMSIgaGVpZ2h0PSIxIiB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciPjwvc3ZnPg==\" style=\"--smush-placeholder-width: 394px; --smush-placeholder-aspect-ratio: 394\/264;\" \/><\/a><figcaption id=\"caption-attachment-29571\" class=\"wp-caption-text\">Associate professor Rachel O&#39;Neill and graduate student Mark Longo. Photo by Dan Buttrey<\/figcaption><\/figure>\n<p>A new genomics study by molecular biologists at the University of Connecticut has shown that at least 22 percent of non-human genome databases are contaminated with human DNA. Their results imply that this level of contamination could also exist in records of the human genome, which could produce major problems in identifying human diseases.<\/p>\n<p>Associate professor Rachel O\u2019Neill, graduate student Mark Longo, and associate professor Michael O\u2019Neill of the molecular and cell biology department in the College of Liberal Arts and Sciences published their findings today in an online edition of the journal PLOS One.<\/p>\n<p>Longo says that he had originally been scanning the genome of zebrafish and comparing it with the human genome to find what are called ultraconserved regions, or bits of DNA that are so ancient they are similar among species that are distantly related, like humans and fish.<\/p>\n<p>But, to Longo\u2019s surprise, he found a region of DNA that was identical to one in humans and couldn\u2019t be a part of the fish genome. That\u2019s when he knew that the fish genome database he was using was contaminated.<\/p>\n<p>\u201cContamination in these databases could be from people\u2019s skin or hair, or it could be DNA from other sequence libraries kept in the same facility,\u201d says Longo. \u201cWe knew we needed to quantify this to see how many of the databases contained human contamination.\u201d<\/p>\n<p>The researchers gathered sequences from all the major global DNA repositories, including the archives at the National Center for Biotechnology Information, the University of California Santa Cruz, the Joint Genome Databases, and the Ensembl genome browser. Any sequencing project funded by federal funds is required to be deposited in one of these archives.<\/p>\n<figure id=\"attachment_29573\" aria-describedby=\"caption-attachment-29573\" style=\"width: 382px\" class=\"wp-caption alignright\"><a href=\"https:\/\/today.uconn.edu\/wp-content\/uploads\/2011\/02\/ONeill2_lg.jpg\"><img decoding=\"async\" class=\"size-full wp-image-29573  img-responsive lazyload\" title=\"Mark Longo, a Graduate student in molecular and cell biology, and associate professor Rachel O'Neill.\" data-src=\"https:\/\/today.uconn.edu\/wp-content\/uploads\/2011\/02\/ONeill2_lg.jpg\" alt=\"&lt;p&gt;Graduate student Mark Longo and associate professor Rachel O'Neill. Photo by Dan Buttrey.&lt;\/p&gt;\" width=\"382\" height=\"256\" data-srcset=\"https:\/\/today.uconn.edu\/wp-content\/uploads\/2011\/02\/ONeill2_lg.jpg 700w, https:\/\/today.uconn.edu\/wp-content\/uploads\/2011\/02\/ONeill2_lg-300x201.jpg 300w\" data-sizes=\"(max-width: 382px) 100vw, 382px\" src=\"data:image\/svg+xml;base64,PHN2ZyB3aWR0aD0iMSIgaGVpZ2h0PSIxIiB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciPjwvc3ZnPg==\" style=\"--smush-placeholder-width: 382px; --smush-placeholder-aspect-ratio: 382\/256;\" \/><\/a><figcaption id=\"caption-attachment-29573\" class=\"wp-caption-text\">Mark Longo, a Graduate student in molecular and cell biology, and associate professor Rachel O&#39;Neill. Photo by Dan Buttrey<\/figcaption><\/figure>\n<p>Using a section of DNA that is specific to primates and abundant in the human genome, the researchers identified 454 non-primate genomes out of the 2,027 they sampled as contaminated with human DNA.<\/p>\n<p>Rachel O\u2019Neill says this result led them to reason that if these non-human genome databases were contaminated with human DNA, then it\u2019s just as likely that many human databases would be contaminated as well. But, she says, the catch is that it\u2019s virtually impossible to identify a foreign bit of human DNA in a human genome database.<\/p>\n<p>\u201cIn sequencing, you have to put all the pieces of the genome together like a big jigsaw puzzle. The pieces that don\u2019t fit stand out,\u201d Longo says. \u201cBut if you\u2019re working on a human puzzle, it\u2019s like working on a three-billion piece puzzle, and it\u2019s all black.<\/p>\n<p>\u201cIt\u2019s virtually impossible to find human contamination in human genome databases,\u201d she adds, because they simply don\u2019t stand out as anything unusual in a human genome. This, she says, could lead to some terrible mistakes.<\/p>\n<p>A portion of the National Center for Biotechnology Information includes a Cancer Genome Atlas: a library documenting mutations that occur in cancer cells. O\u2019Neill says there\u2019s no room for error in these databases.<\/p>\n<p>\u201cIt would be very upsetting to be told you have a mutation for breast cancer, when in fact you don\u2019t, and it was just a contamination from another sample,\u201d she says.<\/p>\n<p>O\u2019Neill emphasizes that scientists need to exercise extreme caution when performing their sequencing, and that they should validate results through tests in their own laboratories before submitting them to databases. Longo points out that the UConn researchers found contaminations in some sequences that they had produced in their own laboratories, which they then discarded. O\u2019Neill says these practices should be the norm.<\/p>\n<p>\u201cWe\u2019re compounding this problem in our rush to move forward with genomics,\u201d she says. \u201cMillions of dollars are invested each year in these sequence databases, but we\u2019re plowing ahead with less caution than we should. The result is that we might have a harder time recognizing the etiology of something like cancer.\u201d<\/p>\n<p>Longo notes that in his analysis, there was one type of DNA database that showed no contamination at all: that of influenza. Because viruses are so dangerous, great care is taken in their preparation, he says \u2013 much more than is usually taken with a commonplace and harmless genome. This kind of caution should be extended to all sequencing, says O\u2019Neill.<\/p>\n<p>\u201cThe sequencing world has moved in leaps and bounds,\u201d she says. \u201cIt\u2019s time for validation to catch up.\u201d<\/p>\n","protected":false},"excerpt":{"rendered":"<p>UConn scientists say the results could complicate disease identification in humans.<\/p>\n","protected":false},"author":37,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"wds_primary_category":0,"wds_primary_series":0,"wds_primary_attribution":0,"footnotes":""},"categories":[1],"tags":[],"magazine-issues":[],"coauthors":[63],"class_list":["post-29778","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"pp_statuses_selecting_workflow":false,"pp_workflow_action":"current","pp_status_selection":"publish","acf":[],"publishpress_future_action":{"enabled":false,"date":"2026-07-02 19:51:42","action":"change-status","newStatus":"draft","terms":[],"taxonomy":"category","extraData":[]},"publishpress_future_workflow_manual_trigger":{"enabledWorkflows":[]},"_links":{"self":[{"href":"https:\/\/today.uconn.edu\/wp-rest\/wp\/v2\/posts\/29778","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/today.uconn.edu\/wp-rest\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/today.uconn.edu\/wp-rest\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/today.uconn.edu\/wp-rest\/wp\/v2\/users\/37"}],"replies":[{"embeddable":true,"href":"https:\/\/today.uconn.edu\/wp-rest\/wp\/v2\/comments?post=29778"}],"version-history":[{"count":5,"href":"https:\/\/today.uconn.edu\/wp-rest\/wp\/v2\/posts\/29778\/revisions"}],"predecessor-version":[{"id":37038,"href":"https:\/\/today.uconn.edu\/wp-rest\/wp\/v2\/posts\/29778\/revisions\/37038"}],"wp:attachment":[{"href":"https:\/\/today.uconn.edu\/wp-rest\/wp\/v2\/media?parent=29778"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/today.uconn.edu\/wp-rest\/wp\/v2\/categories?post=29778"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/today.uconn.edu\/wp-rest\/wp\/v2\/tags?post=29778"},{"taxonomy":"magazine-issue","embeddable":true,"href":"https:\/\/today.uconn.edu\/wp-rest\/wp\/v2\/magazine-issues?post=29778"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/today.uconn.edu\/wp-rest\/wp\/v2\/coauthors?post=29778"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}