Why Microsoft stored 200MB of data in DNA strands, and what's next

Source: Why Microsoft stored 200MB of data in DNA strands, and what's next

dna-pencil-big.jpg

All the movies, images, emails and other digital data from more than 600 basic smartphones (10,000 gigabytes) can be stored in the faint pink smear of DNA at the end of this test tube.

Image: Tara Brown Photography/University of Washington

Imagine storing hundreds of books, videos, and works of art in strands of DNA no larger than the tip of a pencil. Researchers from the University of Washington and Microsoft made this sci-fi concept a reality last month, setting a new record for the amount of data stored in the molecules at 200 megabytes.

The researchers filed away digital versions of works of art (including a high-definition video by the band OK Go!), the top 100 books of Project Gutenberg, and the Universal Declaration of Human Rights in more than 100 languages, all on DNA strands. This marks the first time this much data has been stored at this scale.

“This could be the beginning of a new way of using computers,” said Luis Henrique Ceze, a University of Washington associate professor of computer science and engineering, and the university’s principal researcher on the project. “It shows that we can use something we borrowed from nature to build better computers that are more than just silicon.”

SEE: Why is Microsoft Using DNA to Store Data? (download)

Ceze was first a computer architect, researching new and more effective ways of using computer systems. A few years ago, his team began researching DNA storage, and two never-before-understood concepts. First was the ability for random access: When you store large amounts of data, how do you read one part without having to read all of the parts? Second, for DNA storage to be viable, there needs to be an end-to-end system that allows archivists to put data in and access it again.

After Ceze discovered that random access was a possibility, Microsoft came on board. In 2015, the company funded a lab at the University of Washington dedicated to DNA storage.

“We know how to read DNA, and we’ll always be interested in reading it, since it’s the storage material for life as well,” said Karin Strauss, the principal Microsoft researcher on the project. “The problem of having media that becomes obsolete, like VHS tapes or floppy disks, wouldn’t be a problem with DNA.”

Researchers were attracted to DNA’s density and durability. Studies have shown that you can synthetically encapsulate DNA and store it for 2,000 years at 50 degrees Fahrenheit. But scientists have been able to sequence DNA from fossils that are 700,000 years old, so its storage potential could be much larger, Strauss added.

With this large step forward, “there is no fundamental reason why we shouldn’t see this being used within 10 years,” Ceze said.

DNA storage history

The concept of DNA storage dates back to the 1960s. George Church, professor of genetics at Harvard Medical School and leader of the Synthetic Biology Platform, began work reading and writing DNA in his lab in 1986. In 2012, Church’s team of Harvard scientists fit a single book with photographs into a DNA strand.

“We came to an epiphany in 2012 when we realized we had the right things in place to encode, copy, and decode something in DNA,” Church said. “Almost immediately there was a great response, and companies came forward and said they had a serious archival problem.”

The University of Washington-Microsoft work “shows how quickly the field is scaling up,” Church said. Church’s team stored 22 megabytes on DNA this year, setting the previous record that was passed by this latest research.

In 2013, scientists from the European Bioinformatics Institute found a way to store at least 100 million hours of high-definition video in about a cup of DNA. All of the digital information in the world could be stored in a load of white, powdery DNA that fits into the space the size of a large van, according to Nick Goldman of the European Bioinformatics Institute.

dna-scientists-big.jpg

Luis Ceze, the UW Torode Family Career Development Professor of Computer Science & Engineering, and research scientist Lee Organick prepare DNA containing digital data for sequencing, which allows them to “read” and retrieve the original files.

Image: Tara Brown Photography/University of Washington

How does it work?

The storage process starts with digital information, be it videos, images, text—anything that can be encoded in digital form. The data is translated from zeros and ones into the letters of the four nucleotide bases of a DNA strand—(A)denine, (C)ytosine, (G)uanine, and (T)hymine.

After the translation is complete, researchers must manufacture the DNA molecules to encode the sequence. Once the digital data is mapped into sequences and the sequences become molecules, they can be dehydrated and stored for thousands of years under the right conditions.

When someone needs to retrieve the data, the molecules can be resuspended and run through a DNA sequencer, the same machine used for reading the human genome. The sequencer will read the information in the molecule and translate it back to digital information.

“In the biotech industry, progress on the ability to write and read DNA is moving very fast—by some measures it’s faster than Moore’s law,” Ceze said. “Today it’s too expensive and not fast enough, but there are many reasons to believe it will be cheap enough by the time we’re ready to make this a full reality.”

Church agreed that cost is the biggest barrier to making this technology more mainstream. His Harvard team is now working on reducing costs, and replacing the organic chemistry used for synthesis with biochemistry. Church also helped create a tool for reading DNA which is about the size of a Kindle, as opposed to the traditional machines, which are closer to that of a refrigerator.

Changing data centers

DNA storage will likely be used as an archival storage technology for data centers—Ceze called it “the ultimate backup system.” His team’s goal is to build an end-to-end system that might look like any other storage box, except with a DNA synthesizer and sequencer inside. Users would not interact with the DNA directly, but rather with this machine.

“It will be safe and easy to manage,” Ceze added. “The user might not even notice it’s using DNA, except they will see storage capacity and the ability to preserve go up.”

Ceze does not expect to see these units used in a consumer’s home, but it is possible that in the future it could become user-friendly enough for that.

SEE: When your genome costs less than your iPhone: The beautiful, terrifying future of DNA sequencing

“We’re now producing a lot more data than we’re able to store, and a lot of data we would like to store is thrown away,” Strauss said. “With the density and durability of DNA, we may be able to close this gap.”

If it works, they can target businesses that already archive their data and offer the service for a reduced cost, or enable people who might throw their data away to save it.

Barriers for archivists

Despite excitement over the possibilities DNA storage presents, some archivists remain wary.

DNA storage could aid with bit rot—the loss of data due to bits flipping from zero to one or vice versa, or simply becoming unreadable. However, a bigger issue in the field is how to ensure meaningful and useful access to the information stored on the media, said Cal Lee, a professor at the University of North Carolina’s School of Information and Library Science.

This requires a lot of contextual information, which must be adapted over time, as well as technical knowledge about how to access, render, and use the data. Even if people in the future have equipment to read the nucleotide bases of a DNA strand, they may not have the knowledge of how those bases are supposed to be read, Lee said. One would also need to embed into the DNA an extensive set of specifications about associated formats and applications.

“The notion of setting aside a storage medium in any form—whether that’s a floppy disk, flash drive, magnetic tape or DNA—to be read many years later isn’t a viable digital preservation strategy,” Lee said. “DNA storage holds exciting prospects for dense and persistent storage over intermediate periods of time, maybe around a decade or so. But over longer time horizons, the persistence of the storage medium simply isn’t the most pressing issue.”

Archivists who care about long-term preservation must devote resources to pay professionals over time who can ensure that the materials remain useful, authentic, and understandable, Lee added.

The 3 big takeaways for TechRepublic readers

  1. In July, researchers at the University of Washington and Microsoft reached an important milestone in DNA storage, storing a record 200 megabytes of data onto molecular strands. The previous record was 22 megabytes.
  2. The biggest barriers for mainstream use of DNA storage are the cost and the need to be able to retrieve and use the data. The University of Washington-Microsoft team is working on an end-to-end system, so users could input their information and find it again later without having to interact with the DNA itself.
  3. The most likely application for DNA storage will be as backup in large data centers. But it could potentially be available for consumer use sometime in the future.

Also see