Searching for Data in DNA with CRISPR

UConn researchers deploy CRISPR technology to make it easier to search for data stored in DNA—an emerging medium for data storage

DNA is an emerging medium for data storage.

DNA is an emerging medium for data storage. (Changchun Liu photo)

The digital age has led to the explosive growth of data of all kinds. Traditional methods for storing data—such as hard drives—are beginning to face challenges due to limited storage capacities. With the growing demand for data storage on the rise, alternate mediums of data storage are becoming increasingly popular – and necessary.

DNA is one of the emerging solutions to store data due to its physical density, data longevity, and data encryption ability. Any information that can be stored in a hard drive – such as texts, images, sounds, and movies – can also be converted into DNA sequences.

But while DNA is a promising solution to help meet the demand of data storage needs, performing a search within a strand of DNA can be cumbersome and difficult.

“Archiving information in synthetic DNA has emerged as an attractive solution to deal with the exploding growth of data in the modern world. However, quantitatively querying the data stored in DNA is still a challenge,” says Changchun Liu, professor in the Department of Biomedical Engineering at UConn Health.

In Nature Communications, Liu and a team of researchers discovered a way to simply and effectively search for data stored in DNA using a clustered regularly interspaced short palindromic repeats (CRISPR) powered quantitative search engine.

In the paper, Liu introduces Search Enabled by Enzymatic Keyword Recognition (SEEKER), which utilizes CRISPR-Cas12a to quantitatively identify the keyword in files stored in DNA.

“DNA is a promising medium for data storage because of its stability and high information density. Theoretically, one gram of DNA can store 215 petabytes of data, the data size of about 100 million movies. Like a hard drive which stores information in binary data bits, DNA stores information in sequences of four nucleobases— adenine (A), thymine (T), cytosine (C) and guanine (G). The developments in DNA synthesis technology and next-generation sequencing are turning DNA data storage into reality,” explains Jiongyu Zhang, a graduate student in Liu’s lab and first author of the paper.

Liu utilized his expertise in CRISPR technology to help come up with a better solution to search within a strand of DNA.

CRISPR is an acquired immune mechanism that can identify a specific infectious DNA sequence in a cell overwhelmed with interfering genes, similar to a keyword search in a database.

SEEKER, utilizing CRISPR, rapidly generates visible fluorescence, or light, when a DNA target corresponding to the keyword of interest is present. SEEKER is able to successfully perform quantitative text searching since the growth rate of the fluorescence intensity is proportional to the keyword frequency.

In the paper, the researchers successfully identified keywords in 40 files with a background of approximately 8,000 irrelevant terms.

“Overall, the SEEKER provides a quantitative approach to conducting parallel searching-including metadata search—over the complete content stored in DNA with simple implementation and rapid result generation,” explains Liu.