a DNA helix illustrating 1s and 0s attached to it

Can DNA Really Store the World’s Data? A Practical Look

Did you know that data storage media has a shelf life? Yep, it does; our current methods of storing data can all degrade within decades. For most of us, that’s probably “good enough.” I mean, I really don’t expect my grandchildren to put the same value on the data I’m storing that I do. But what if the future of data storage looks more like biology than silicon? DNA data storage can last tens of thousands of years. Considering that prospect, what’s the point of it? It’s a question that has the attention of Microsoft, the University of Washington, Twist Bioscience, Catalog, MIT/Harvard synthetic biology labs, and the United States Government. It’s what we’re exploring today – the what, where, why, and how of DNA data storage.

What is DNA Data Storage?

DNA data storage is data storage that uses the ACGT base sequence instead of 1s and 0s. I’ll describe the process in just a bit. The Human Genome Project accelerated the automation and scaling of DNA sequencing. We’re now able to map pairs of binary digits onto one of the ACGT sequences and read from that sequence. Storing the data as DNA provides for a longer – much longer – shelf life.
Although we think of DNA as “the stuff of life,” and although DNA storage uses actual DNA strands, we’re not creating life with this process. This DNA exists outside of a cell and lacks the cellular machinery required for life. It is a real molecule, but not a molecule that can give rise to any lifeform.

Why DNA?

Why would we consider using DNA to store data? It seems like a pretty big jump from digital data to an actual molecule. There are four primary reasons to explore this option: density, longevity, energy efficiency, and future-proofing.

Density

DNA stores up to approximately 215 petabytes per gram. It’s just too simple to say “that’s a lot of data.” When you consider that one petabyte is the approximate equivalent of a quarter of a million DVDs, it’s a whole different dimension. And how much is a gram of DNA? Well, a gram is about the weight of a dollar bill or a paper clip, but it’s harder to visualize a gram of DNA. It would be a pile of tiny strands that would still make an invisible pile. In order to see it at all, it would have to be liquified, and then we’d see only a very small droplet. So, imagine a quarter of a million DVDs encapsulated in a tiny droplet of water. What we would essentially have is all the world’s data in a shoebox.

Longevity

DNA, as long as we keep it cool and dry, is stable for thousands of years. It can far outlast every other storage medium. Regular spinning hard drives are good for five to ten years. Solid State Drives aren’t really any better, topping out at about ten years. CDs and DVDs are a little stronger. However, the really cheap or poorly-stored disks are still only good for five to ten years. The high-quality archival disks, stored in ideal conditions, can make it 20 to 30 years. We’re not likely to store our tax returns on DNA. That’s not what we need to keep for thousands of years. The purpose of this kind of longevity is to archive data for posterity. We want to tell the story to future generations of what life was like during our day.

Energy

CDs and DVDs don’t require power to maintain the data, but both SSDs and spinning hard drives can have issues if we don’t address them properly. SSDs work very differently from spinning drives, and they have some advantages. However, for long-term use and storage, it’s important to know that they can both fail – they’ll just fail in different ways.

An SSD stores data as an electrical charge. If you don’t keep the power connected to the drive, the charges can slowly leak. Over a long period of time (we’re talking months to years, but that’s the kind of longevity we’re looking for here), the bits can actually flip. Because of the charge leaks, voltage levels drift, and the controller of the drive can misinterpret a 1 as a 0, and vice versa.

A spinning Hard Disk Drive stores bits as magnetic orientation. Magnetic domains can weaken over time, causing a region representing a 1 to present a 0 instead, but that’s not the biggest problem. Because it’s a mechanical piece of equipment, the parts themselves can have – and cause – problems. When the lubricants in the bearings dry out, the components won’t move smoothly, if they move at all. We can also see something we call “stiction,” where the platters and heads stick together after long periods of storage. In effect, the drive may maintain the data perfectly, but be unable to spin up and present it to us.

DNA can preserve data integrity without any electricity (to the data or storage medium itself), moving parts, active cooling, software maintenance, or periodic refreshing. Now, that’s not to say DNA data storage doesn’t require electricity in the grand scheme of things, but the requirement for electricity is environmental, rather than operational. DNA data storage does require a cool, dry environment, darkness, and chemical isolation, but once those conditions exist, we can just let it be. For standard long-term storage, stable indoor temperatures are sufficient.

Future-Proof Format

Every nerd I know still has some floppy disks around, whether or not we have a way to read them. We don’t know why we still have them, but there’s something inside us that just won’t let us get rid of them. I have a USB-connected floppy drive somewhere, but the truth is that the data I may have stored on those disks is so old that it wouldn’t be useful anymore.

There’s some possibility that CDs and DVDs will go the way of the VHS tape, hopefully not in my lifetime. But the media can outlast the hardware required to read it, and that’s where DNA data storage has an amazing advantage. Biology itself is the basis of DNA storage, and biology isn’t going to lose out to something better. DNA will always be readable as long as life itself exists, and if life itself ceases to exist, we’re probably not going to be concerned about what happens to the data stored on DNA strands.

How DNA Data Storage Actually Works

Storing data on DNA requires five steps, and it’s more complicated to understand than I’m going to present here. I’m not a biologist, so I won’t go deep into that part of it, but I think I can get you to a decent picture.

  1. The first step is the encoding. An algorithm maps the 1s and 0s to a sequence of the four DNA bases A, C, G, and T. At this phase, we have a symbolic sequence of the encoded data.
  2. Next is synthesis. This is where the magic begins, and we create actual DNA. Through a process called solid-phase phosphoramidite synthesis, the encoded A-C-G-T sequence is chemically built, one base at a time. The process is already in use to produce reagents and synthetic DNA for medicine and research. It creates short DNA strands called “oligos.”
  3. Storage is next: freeze-drying, encapsulating, or preserving it in some other way. There’s not much more to say about that, because it’s similar to storing DNA used for any other purpose.
  4. The next phase is retrieval, where we amplify the region we want, or we sequence everything. Understand that this won’t be like web browsing. We will need to know what we’re looking for, but each archive will be curated to include only particular data. For example, an archive may be vital records, or it may be a particular research project, but not both in the same archive. Because of the limitations of retrieval activities, the archive will need to be designed with retrieval in mind.
  5. The final step is decoding, which is using an algorithm to translate the sequences back into the binary code that a computer can read. This is part of what future-proofs the process. We don’t need a particular program to decode it, but we will need a complete, documented description of the encoding method. DNA archives are often designed to be self-describing, by encoding version data in the metadata, with error-correcting parameters included, and with the fragment layout clearly labeled.

Obviously, this is a very simplified explanation, and the science behind it is pretty deep, deeper than most readers of this blog want to go. It’s good for a basic overview, though, and once it’s in common use, the process will be more automated.

The Challenges – Where the Tech Isn’t Ready Yet

Cost

It would be pretty cool if this were right around the corner, but DNA synthesis is really expensive per megabyte of data. While sequencing prices are dropping quickly, they haven’t reached what we would call “consumer level,” or anything close to it.

Speed

Additionally, the write speed for DNA is slow. Now, “slow” is relative, so, for comparison, writing to a Solid State drive or a Hard Disk Drive can range from 100 megabytes per second to about five gigabytes per second. At that speed, we could write a terabyte of data in a few minutes. Writing to DNA gives us a comparison of hours to days to write a single gigabyte of data, and writing a terabyte would require weeks. However, we’re not writing to DNA for the purpose of accessing that data often. It’s archival, which we would access only occasionally, and it’s intended to last pretty close to forever under the right conditions.

Error Correction

Error correction is another challenge for DNA data storage. Biological systems tend to introduce insertions and deletions, and the cell almost always repairs the error before it becomes permanent. When the cell misses the error or fails to repair, it can cause problems, but that’s extremely rare. I mean “rare” on a microscopic level. The same insertions and deletions can occur in DNA data, but the error correction isn’t organic to that system, so the correction mechanisms are built into the foundation of every step of the process: overlapping sequences, random indexing, and parity encoding. Adding these protections does add overhead to the process, but they are the reason DNA data storage can work at all.

Retrieval

Finally, the concept of random access, or the lack of random access, is problematic to the retrieval of the data. DNA is inherently sequential, so retrieving one file out of millions is not a trivial thing to do.
Each of these things presents a different aspect of DNA data storage that must be attended to in order for this concept to become more widely available for archival storage.

Use Cases

I’ve mentioned that DNA data storage would be for archival storage, but who would actually use it? Well, let’s start with cold storage archives. We’re talking here about Google-scale logs, the past records of system requests, compliance or research datasets, and summarized and compressed archives. It’s important to note that these would be historical logs, not live or operational data. Google is more than just a search engine; its parent company, Alphabet, is involved with many internet and computing research efforts. But also candidates for cold storage are NASA data, CERN particle data, and historical records.

We can also envision DNA storage serving cultural preservation. Think of the library at Alexandria on a strand, but with all the literature, films, and manuscripts added since then. It’s also conceivable that we’d store the geologic timescale on DNA.

The Future: What It Will Take to Go Mainstream

The challenges I mentioned don’t have to be deal-breakers; they just require some different solutions. For example, the developers will need to develop fully-automated end-to-end systems, and they’ll need cheaper synthesis – enzymatic rather than chemical. We will also need to have a hybrid storage system: we’ll want to use SSDs for read and write, with DNA for a deep archive solution.

When would we see these solutions – how near is this future? Oh, it’s still a ways off – we need to think in terms of decades, not years. I won’t see it in my lifetime, but I still find it fascinating enough to explore it. Why? I’m so glad you asked.

Why It Matters

Let’s go back to the original problem – shelf life of existing media, and add to that the fact that data tends to grow exponentially, and silicon simply can’t keep up with the growth. DNA offers durability, density, and sustainability. Isn’t it fascinating that we might store humanity’s entire digital universe in the same molecule that carries life’s instructions?

Your Turn

So what do you think? Would you trust your family photos to DNA? What data do you wish could last for thousands of years? Should humanity build a “DNA Library of Alexandria?” Let me know your thoughts in the comments!


Want to dig a little deeper, after all that? Check out these reasonably readable sources:

DNA digital data storage – Wikipedia

How DNA Can Be Used to Store Digital Data – Biology Insights

What is DNA data storage and how does it work? | National Geographic


My photography shops are https://www.oakwoodfineartphotography.com/ and https://oakwoodfineart.etsy.com, my merch shop is https://www.zazzle.com/store/south_fried_shop.

Check out my New and Featured page – the latest photos and merch I’ve added to my shops! https://oakwoodexperience.com/new-and-featured/

Curious about safeguarding your digital life without getting lost in the technical weeds? Check out ‘Your Data, Your Devices, and You’—a straightforward guide to understanding and protecting your online presence. Perfect for those who love tech but not the jargon. Available now on Amazon:
https://www.amazon.com/Your-Data-Devices-Easy-Follow-ebook/dp/B0D5287NR3

Similar Posts