Quantcast
Skip to content

How Much Information is Stored in the Human Genome?

The other day I was having a conversation with a friend of mine who had some background in computer science. The conversation shifted towards my research and the following question came up: What is the amount of digital information stored in a human genome? I started searching in the deep dark corners of my brain, but I realized that I simply did not know the answer. So I decided to do the math to estimate how much information is stored in our genome.

Laying out the information storage capacity of the genome

The human genome contains the complete genetic information of the organism as DNA sequences stored in 23 chromosomes (22 autosomal chromosomes and one X or Y sex chromosome), structures that are organized from DNA and protein. A DNA molecule consists of two strands that form the iconic double-helix “twisted ladder”, whose backbone, which made of sugar and phosphate molecules, is connected by rungs of nitrogen-containing bases. DNA is composed of 4 different bases: Adenine (A), Thymine (T), Cytosine (C), and Guanine (G).  These bases are always paired in such a way that Adenine connects to Thymine, and Cytosine connects to Guanine.  These pairings produce 4 different base pair possibilities: A-T, T-A, G-C, and C-G. The haploid human genome (containing only 1 copy of each chromosome) consists of roughly 3 billion of these base pairs grouped into 23 chromosomes. A human being inherits two sets of genomes (one from each parent), and thus two sets of chromosomes, for a total of 46 chromosomes, representing the diploid genome, which contains about 6×10^9 base pairs.

Comparing the genome to computer data storage

In order to represent a DNA sequence on a computer, we need to be able to represent all 4 base pair possibilities in a binary format (0 and 1). These 0 and 1 bits are usually grouped together to form a larger unit, with the smallest being a “byte” that represents 8 bits. We can denote each base pair using a minimum of 2 bits, which yields 4 different bit combinations (00, 01, 10, and 11).  Each 2-bit combination would represent one DNA base pair.  A single byte (or 8 bits) can represent 4 DNA base pairs.  In order to represent the entire diploid human genome in terms of bytes, we can perform the following calculations:

6×10^9 base pairs/diploid genome x 1 byte/4 base pairs = 1.5×10^9 bytes or 1.5 Gigabytes, about 2 CDs worth of space! Or small enough to fit 3 separate genomes on a standard DVD!

Data storage across the whole organism

Some interesting question could follow. For example, how many megabytes of genetic data are stored in the human body? For simplicity’s sake, let’s ignore the microbiome (all non-human cells that live in our body), and focus only on the cells that make up our body. Estimates for the number of cells in the human body range between 10 trillion and 100 trillion. Let us take 100 trillion cells as the generally accepted estimate. So, given that each diploid cell contains 1.5 GB of data (this is very approximate, as I am only accounting for the diploid cells and ignoring the haploid sperm and egg cells in our body), the approximate amount of data stored in the human body is:

1.5 Gbytes x 100 trillion cells = 150 trillion Gbytes or 150×10^12 x 10^9 bytes = 150 Zettabytes (10^21)!!!

Sexual information exchange

Along the same lines, how much genetic data is exchanged during human reproduction?Each sperm cell in a human male is heterogametic and haploid, meaning that it contains only one of two sex chromosomes (X or Y) and only one set of the 22 autosomal chromosomes. Thus, each sperm contains about 3 billion bases of genetic information, representing 750 Mbytes of digital information. The average human ejaculate contains around 180 million sperm cells. So, that’s 180 x 10^6 haploid cells x 750 Mbytes/haploid cell = 135 x10^9 Mbytes=135000 Terabytes!!!! Following this idea even further, while 13500 Tbytes are transferred, only one sperm cell will fuse with an egg, using only 750 Mbytes of data, combining it with another 750 Mbytes of data from the egg. Thus, essentially 99.9999…% of the data transferred during sexual reproduction is lost in the pipeline … Whether the remaining fraction of information will result in anything constructive is up to good parenting.

Having worked out the above numbers, a whole bunch of other curious questions can be asked. Have you ever wondered about the data capacity of our biological organism? What is the rate of data transmission during cell division? The rate of data transmission during gamete fusion? The rate of data transmission when human lymphocytes circulate through the bloodstream? What amount of data is destroyed daily by apoptosis? What amount of data is created daily?  How does this compare to the rate of data transfer via an optical fiber?

Please feel free to contribute your own dubious calculations and questions below!

59 Comments

  1. A.Mofa on January 18, 2020 at 7:20 pm

    that’s a good point

  2. Henrique on June 20, 2019 at 4:24 am

    Very interesting! This makes me think about the potential to store digital data chemically. A single human cell varies a lot in size, but they’re around 4,000 μm³. 1.5 Gb in 4,000 μm³ is insanely dense, and it can be even denser, since the dna can be much larger.

    I wonder if in the future we will be able to manipulate these molecules so finely that we could reliably use DNA to store data in servers.

  3. Olen on October 27, 2018 at 6:33 am

    I would like to point out that an amount of data that extensive would not be able to-in the end- solve a single thing, all it would do is perpetuate the lack of solid answers and states of life the universe and everything, there is already a greater computer why should we try and match it when we can’t even interpret the information in front of us? Idk just seems true ????????

  4. Stephen Chu on March 21, 2018 at 12:59 pm

    If we have to admit that there need someone (many someones) to program the computer to work, then Who is the programmer of all species in this world. Who wrote my gene code?

    “DNA is like a computer program but far, far more advanced than any software ever created.” ― Bill Gates,

    Of course, there’s nothing related to religion in this question, just who program your DNA?

    • Lorne Thompson on June 21, 2018 at 1:18 pm

      Life is obviously not a. Accident. Think about it. Your brain and your body are created systems. The program to build you is encoded in every cell. Do you really believe this evolved over time?
      Only a master programer beyond our imagination could do this.
      “Ever since the creation of the world his eternal power and divine nature, invisible though they are, have been understood and seen through the things he has made. So they are without excuse.”

      • Albert Schey on February 5, 2019 at 6:21 pm

        You are so right on Lorne!

      • James Carrier on November 27, 2019 at 5:17 pm

        Lorne Thompson, how right you are! Has anyone ever asked the question how life could propagate with no coagulation system? Wonder how long that took to evolve? Wonder how childbirth went with no coagulation system? Ponder.

      • JOHN ATWOOD on December 6, 2019 at 7:24 pm

        And ones mind is not the brain. Ones neural network does not create consciousness. See and research Sir John Eccles who was a neural physicist.

      • Jonny on February 24, 2020 at 12:06 am

        Wow. Amen to that. Systems are designed. Then science observes the design and attaches theories to such.

    • Theodore St. John on December 9, 2018 at 1:25 pm

      This is an excellent topic. There is enough information stored in the DNA to provide the program necessary for life. But where does the information come from? The question should not be “who?” That assumes a forgone conclusion. Creation is a process, driven by the apparent separation of formless energy (a unity) via relative motion. You may be sitting motionless, but in fact, you are in motion relative to every other particle in the universe. The coexistence of these mutually exclusive states (binary bits) provides the apparent separation of unity and a restoring centripetal force that gives quantum particles their angular momentum. This is the basis for a metamorphic process. The self-sustaining process creates the vibrational energy that collapses into quantum states – bits of information, which creates constructive and destructive interference, which amplifies a holomorphic process (separation, projection, reflection, reunification; see https://www.scholarsresearchlibrary.com/archive/apr-volume-9-issue-2-year-2018.html). It results in self-organizing subatomic particles that make up the atoms that make up the base pairs that make up the DNA molecules… All of the information around us (i.e. Truth… that which actually happened) continuously collapses into every DNA center, and the same process results in cell-division and the destruction of cells that fail to follow the process, allowing the organism to adapt to a changing environment.

      The question, “who started the process?” is also a question fallacy because the process is a circle with no beginning of end. The act of drawing a circle may require a beginning, but the circle itself just is. Time is a human concept that is nothing more than a different way of quantifying motion. So there is no beginning, no creation event. Am I saying there is no God? No. “God” is the name of the process, by which Truth transforms into consciousness, which has no beginning of end… it just is. If it spoke for itself, it would say “I am… I just am. I am that intelligible sphere whose center is everywhere and circumference nowhere.” So God is the energy, the process and the resulting consciousness. Sounds like a holy trinity to me.

      • Vernon Broussard on October 30, 2019 at 10:33 pm

        I thoroughly enjoyed your perspective and explanation. If that’s what you’ve decided to expose in writing, I can only imagine the myriad of thoughts and theories you have danced with. Thank you for sharing.

      • Bryce Robertson on November 8, 2019 at 2:38 am

        You forget that due to entropy the universe is not a circle but a line segment. And that the true God Is the causeless cause.

  5. GXR441 on February 6, 2018 at 4:15 am

    Actual data needed to create a new individual can be huge because we are not taking into account the information needed to interpret the DNA.

    • XanderWilliams on March 9, 2018 at 5:33 pm

      I’m sorry but does anyone else automatically think Assassin’s creed when they read all of this genetic code.

  6. Chris on January 18, 2018 at 1:47 am

    That’s the reason Microsoft plays with using DNA as commecial data storage. I find it magic that biology is actually digital, and biological byte has 2 bits, which is a power of 2. Then, 3 biological bytes (6 bits) encode one aminoacid. Complementarity is irreflexive bijection equal to its inverse function, we’re pure math. Additionally, I have never thought that I transmitting 135 exabytes of data is such a great pleasure, I plan to transmit at least 270 exabytes tonight 😉

  7. anton on October 15, 2017 at 4:40 pm

    2 cd enough to store all the info required to build a human?

    • Nathan on December 20, 2017 at 9:07 pm

      Thank you for this explanation. You calculated the info content of both strands. However, each DNA strand serves as a template for and has the same information content as the other strand. So wouldn’t the non-redundant information content in the human genome be 1.5 Gigabytes/2 = 750Megabytes because of the redundancy in the genetic code?

      • Wyatt on April 5, 2018 at 3:11 am

        They accounted for that. They mentioned that there are four possible base pairs, “A-T T-A C-G G-C,” so it’s two bits per base pair. Without the template side of the DNA, it’s still two bits.

    • helomundo on January 19, 2018 at 7:35 pm

      you need even less information than that to build a human, it fits in 8 bytes of uncompressed ASCII text: “get laid”

  8. Cory B on August 19, 2017 at 4:48 pm

    “Whether the remaining fraction of information will result in anything constructive is up to good
    parenting.”

    LOL. Very cute.

    • B Henriksen on December 5, 2017 at 10:57 am

      Cute yes, and either political,theology or stupid. Otherwise the article is fine. Maybe it must go through some kind of censorship, or the writer is slightly schizofrenic 🙂

      With modern technology we can use any cell’s DNA (if it has low enough entropy still,and is intact enough) to determine the exact family relations with another person.
      This is possible because all cells in one body has approximately the same (dna bank) storage of data that advances survival for the next generation, but a different exact nature/condition (like different harddrives with slightly different content) due to the need for specialized cell function .

      So that the sperm carries on little portion in percentage of the data is not a mentionable obstacle to natural selection.

  9. Jim Gates-Patch on November 11, 2016 at 2:17 pm

    Yevgeniy, thank you – you answered my question. (I’m one of those annoying sci-fi writers pondering data storage in the human body.) But the responses from Mal and Andrew tell me there is much more to investigate yet! Thank you all.

Leave a Comment

You must be logged in to post a comment.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Scroll To Top
Share via
Copy link