# How Much Information is Stored in the Human Genome?

The other day I was having a conversation with a friend of mine who had some background in computer science. The conversation shifted towards my research and the following question came up: What is the amount of digital information stored in a human genome? I started searching in the deep dark corners of my brain, but I realized that I simply did not know the answer. So I decided to do the math to estimate how much information is stored in our genome.

## Laying out the information storage capacity of the genome

The human genome contains the complete genetic information of the organism as DNA sequences stored in 23 chromosomes (22 autosomal chromosomes and one X or Y sex chromosome), structures that are organized from DNA and protein. A DNA molecule consists of two strands that form the iconic double-helix “twisted ladder”, whose backbone, which made of sugar and phosphate molecules, is connected by rungs of nitrogen-containing bases. DNA is composed of 4 different bases: Adenine (A), Thymine (T), Cytosine (C), and Guanine (G).  These bases are always paired in such a way that Adenine connects to Thymine, and Cytosine connects to Guanine.  These pairings produce 4 different base pair possibilities: A-T, T-A, G-C, and C-G. The haploid human genome (containing only 1 copy of each chromosome) consists of roughly 3 billion of these base pairs grouped into 23 chromosomes. A human being inherits two sets of genomes (one from each parent), and thus two sets of chromosomes, for a total of 46 chromosomes, representing the diploid genome, which contains about 6×10^9 base pairs.

## Comparing the genome to computer data storage

In order to represent a DNA sequence on a computer, we need to be able to represent all 4 base pair possibilities in a binary format (0 and 1). These 0 and 1 bits are usually grouped together to form a larger unit, with the smallest being a “byte” that represents 8 bits. We can denote each base pair using a minimum of 2 bits, which yields 4 different bit combinations (00, 01, 10, and 11).  Each 2-bit combination would represent one DNA base pair.  A single byte (or 8 bits) can represent 4 DNA base pairs.  In order to represent the entire diploid human genome in terms of bytes, we can perform the following calculations:

6×10^9 base pairs/diploid genome x 1 byte/4 base pairs = 1.5×10^9 bytes or 1.5 Gigabytes, about 2 CDs worth of space! Or small enough to fit 3 separate genomes on a standard DVD!

## Data storage across the whole organism

Some interesting question could follow. For example, how many megabytes of genetic data are stored in the human body? For simplicity’s sake, let’s ignore the microbiome (all non-human cells that live in our body), and focus only on the cells that make up our body. Estimates for the number of cells in the human body range between 10 trillion and 100 trillion. Let us take 100 trillion cells as the generally accepted estimate. So, given that each diploid cell contains 1.5 GB of data (this is very approximate, as I am only accounting for the diploid cells and ignoring the haploid sperm and egg cells in our body), the approximate amount of data stored in the human body is:

1.5 Gbytes x 100 trillion cells = 150 trillion Gbytes or 150×10^12 x 10^9 bytes = 150 Zettabytes (10^21)!!!

## Sexual information exchange

Along the same lines, how much genetic data is exchanged during human reproduction?Each sperm cell in a human male is heterogametic and haploid, meaning that it contains only one of two sex chromosomes (X or Y) and only one set of the 22 autosomal chromosomes. Thus, each sperm contains about 3 billion bases of genetic information, representing 750 Mbytes of digital information. The average human ejaculate contains around 180 million sperm cells. So, that’s 180 x 10^6 haploid cells x 750 Mbytes/haploid cell = 135 x10^9 Mbytes=135000 Terabytes!!!! Following this idea even further, while 13500 Tbytes are transferred, only one sperm cell will fuse with an egg, using only 750 Mbytes of data, combining it with another 750 Mbytes of data from the egg. Thus, essentially 99.9999…% of the data transferred during sexual reproduction is lost in the pipeline … Whether the remaining fraction of information will result in anything constructive is up to good parenting.

Having worked out the above numbers, a whole bunch of other curious questions can be asked. Have you ever wondered about the data capacity of our biological organism? What is the rate of data transmission during cell division? The rate of data transmission during gamete fusion? The rate of data transmission when human lymphocytes circulate through the bloodstream? What amount of data is destroyed daily by apoptosis? What amount of data is created daily?  How does this compare to the rate of data transfer via an optical fiber?

Please feel free to contribute your own dubious calculations and questions below!

1. Cory B on August 19, 2017 at 4:48 pm

“Whether the remaining fraction of information will result in anything constructive is up to good
parenting.”

LOL. Very cute.

2. Jim Gates-Patch on November 11, 2016 at 2:17 pm

Yevgeniy, thank you – you answered my question. (I’m one of those annoying sci-fi writers pondering data storage in the human body.) But the responses from Mal and Andrew tell me there is much more to investigate yet! Thank you all.

3. Alfred Kaiser on June 7, 2016 at 12:17 am

A very good, if perhaps disquieting, book is Prof (ret) Dr Werner Gitt’s book “Without Excuse”, where he very methodically makes a case for information being a non-material entity. The book is obviously the result of many years of diligent study and research, and should be very thought-provoking (or cause for immediate rejection because it runs too much against the grain). Are you up to the challenge?
The book is available in at least German and English from Dr Gitt’s website or through bookstores, though I do not know the ISBN for ordering it.
Since comments are relatively current, I thought I’d add my 5c worth (we no longer have pennies in Canada, so I can’t make it my 2c worth). The combinations and permutations, loops and conditional loops and 3-D nature of DNA are stupenduous!

4. hadi on May 30, 2016 at 11:23 pm

i think data stored in dna is like the data for a software like lets say 3d studio max! then what actually makes human is the data that is “modeled” in this software not only data in dna. so if we want to transfer a human data for example, we have to transfer dna data plus the data of current state of everything that creates human…
dna only has the data of different softwares (systems of body organs), but each software is acting according to environmental circumstances in a unique way. So the same base dna data can not make two similar humans.
That is like having a 1.5 Gb software like 3Dmax in two different companies and one is used to animate Shrek and one is animating Minions!
and one final conclusion… if some day someone wants to simulate human in computer, he has to make the base cell and do the complete calculations of growing it up virtually…

5. Mal Christison on November 26, 2015 at 12:10 am

I have a computer background and a layman’s interest in the information carrying capacity of DNA. I’m puzzled by your calculations and subsequent comments, on several levels. I understand your first calculation of the data volume in a single cell. However your second calculation is curious because you imply that the information in the 100 trillion cells that form us is different from that in the first cell that conceived us. From an information perspective, the 100 trillion cells are a copy of the first cell, so we can hypothesize that the same 1.5 Gb of information is in every cell. In other words, we don’t possess 150 Zb of data; we only possess 1.5 Gb and we store it 100 trillion times.
I would comment that I find it hard to believe that such a small data set could carry anything more than a tiny fraction of the hereditary information required for a new life. Especially when we consider that this inheritance seems to include a morphogenetic instruction set for the 100 trillion cells, parental characteristics such as mannerism, likeness, preference etc. and the complex instruction sets of instinctive behaviour.
I’ve read a little on epigenetics, and learned that this is a hypothesis that proposes that external factors can modify the behaviour of cells, such that cells behave differently in different environments. This hypothesis is often offered as way to explain the data shortage apparent in each living cell. However epigenetics seems to imply that data can invent itself separately in each living organism. While I’m prepared to believe that each species developed according Darwinian Theory over millions of years, I find it hard to believe that each species can individually develop similar datasets from a 1.5Gb starter kit. Cells do respond to their environment in amazing ways, but I would suggest that each response is carefully planned and that it is logical that these plans are delivered in the original dataset. Surely there must be a near infinite volume of data stored in each living cell, billions of times larger than a mere 1.5 Gb. But how is it stored?
Another puzzling issue that eludes me is how this apparent 1.5 Gb of data is converted into action. Data can be compared in some ways to a book. A book doesn’t do anything on its own except exist. It takes a reader to convert the information in the book into action. Where is the reader of the information stored in our DNA. A computer converts digital data into action because it has an operating system driven by a clock that rapidly changes state. Where are the operating systems and the rapid changes of state required to turn data into action in DNA theory? I can only assume these are contained in the DNA itself, but how? Can anyone tell me who in the scientific community is an expert on this problem? I would love to read them.

• Loren Beck on April 8, 2016 at 2:19 pm

Not an expert in biology, although it was my major, but the “action” in the cell is performed by ribosomes that transcribe the DNA and RNA strands into protein chains that are used to perform work in the cell.

As the DNA is read by the ribosomes it is cleaved by a transcriptase (RNA Pol1, 2, or 3), copied to RNA strands, and the “data” in the copied RNA defines the assembly pattern of proteins. Only small portions of the DNA are read at any one time. There are start and stop “words” built into the DNA strand that tells the ribosome where to begin and end its transcription.

Ribosomes are essentially biological CPUs inside the cell, but they run at a pretty slow speed compared to modern computer CPUs. DNA is nothing more than run-time code for the ribosome.

Your observation about the total storage in the body being 1.5 Gb copied 100 trillion times is pretty much spot on. However, it is interesting to consider that 42 human cells could hold as much information as a modern iPad.

• Mal Christison on April 16, 2016 at 8:18 pm

Hi Loren, thanks for replying to my post. I confess that I forgot all about writing this until someone else read it and contacted me. Then I saw your reply. Your description of ribosomes copying sections of DNA, creating RNA and then proteins is fine as far as it goes. Here’s an animated video showing the process you describe, for anyone who’s interested in knowing more. http://www.youtube.com/watch?v=9kOGOY7vthk&list=PLkYW4AaKtDvjgT30GwHSUF1RwWk5W2rIV .
While I am certain all of these theories are well proven, what puzzles me is how many unasked questions they raise. These ribosomes appear to be amazingly complex little machines, performing extraordinary processes that rearrange complex strings of atoms and molecules. Where do they come from? What instruction set builds and operates them? How and why do they do what they do? And if that’s all there is, how do we inherit likeness, mannerisms, preferences, abilities etc. etc.? If that’s all there is, then we surely we would all be just a blob of protein.
Does your statement that ‘DNA is nothing more than run-time code for the ribosome’ contradict the hypothesis that DNA is the main molecule responsible for heredity – the passing of characteristics from parents to offspring – providing the ‘instruction manual’ for how our bodies grow and develop? See https://www.youtube.com/watch?v=Ab3fO726pik
I am pleased to read that you agree with my observation that the total data storage in the body is just 1.5 Gb copied 100 trillion times. I wonder why this myth of 150 Zb or whatever is perpetuated by qualified commentators? This is systems 101. Either they are being silly or intentionally misleading. Your next comment regarding 42 human cells holding as much information as an iPad, I assume refers to cells from 42 different individuals.
In my original post I asked, ‘Can anyone tell me who in the scientific community is an expert on this problem?’ I recently had the opportunity to ask a Professor of Statistical Genetics about the information carrying capacity of DNA. I told him I was puzzled by the volume of data delivered in a fertilised egg. We briefly discussed how big this data set might be. After some equivocation, he agreed that it was huge and said he didn’t know exactly how it was stored. He commented that even though he didn’t know, it works just fine.
Then he said that my question was ‘the big one.’ I asked who is working on it so that I could read more. The Professor said he didn’t know. He commented that the subject was popular about twenty years ago but interest seems to have waned since then. I asked why and he said that he didn’t know. He speculated that a large number of discoveries about DNA had occurred in this period and maybe these had occupied everybody’s attention. I didn’t press him on this point. Maybe other discoveries have side tracked scientists away from investigating ‘the big question,’ but I doubt it. Every field of science, no matter how obscure, has its experts, associations, curricular and publications, except this one.

• Mal Christison on April 20, 2016 at 10:38 pm

Dear moderator, may I request that these sentences be removed from my last post?
I wonder why this myth of 150 Zb or whatever is perpetuated by qualified commentators? This is systems 101. Either they are being silly or intentionally misleading. Your next comment regarding 42 human cells holding as much information as an iPad, I assume refers to cells from 42 different individuals.

Also in my first post could I add the word ‘member’ to this sentence please?
I find it hard to believe that each species member can individually develop similar datasets from a 1.5Gb starter kit.

• Andrew C on May 8, 2016 at 8:21 pm

Just some thoughts on your posts. I feel like this discussion highlights one of the big issues of non-academics discussing complex scientific systems. The theories that make it out to the general population (like DNA being a genetic blueprint for the cell) are invariably simplified, recast into soundbites and explained in ways that someone in high school can memorize and recite and maybe have some vague understanding of what is going on. To really understand what is happening in the cell requires knowing about transcription (and all of the systems that we currently understand that regulate it), translation, post-translational modifications, ion- and protein-gradients, protein-protein interactions, small and non-coding RNAs (and their interactions with DNA and proteins) and the dozens of other small, neither protein nor nucleic acid, chemicals that are found in and around the nucleus. The dynamic and insanely complex network of these different components gives you the varieties of cells, the responses of different cell types to stimuli and if you could instantaneously measure the state of the cell, a way to identify uniquely each cell in the body. Each cell contains an uncountably high amount of data, but it is generally data that is too complicated for us to understand or describe in an easily communicable manner. So we reduce and simplify and are left with ideas like the DNA in the nucleus being the repository for “data” in the cell.

In an extension of the earlier comments, specifically to the charge of “DNA as runtime code for the ribosomes” it is worth noting that it is much more complex than that. Yes, coding DNA is processed by the ribosome (with a few steps in between), but non-coding RNA sequences, siRNA and a large variety of helper proteins and nucleic structures are also present and are hugely important to how a segment of nucleic DNA gets processed.

Even if we want to restrict ourselves to talking about data “held” in the nucleus we also need to consider various epigenetic markers as a markup language that influences how a section of code is processed. Unlike the DNA sequence, which is relatively static, the epigenetic and gene-regulatory system is extremely dynamic as histones slide about, with modifications to their constituent proteins, along with enzymes that are methylating and demethylating the DNA itself (if the nucleotide is a cytosine). At any given time the information in a nucleus would have to include both the sequence of the DNA and the state of the 30-40 million histones, each having some subset of the at least 50 different histone modifications (see http://www.cellsignal.com/common/content/content.jsp?id=science-tables-histone). While certain modifications compete or are tied in to structures and motifs in the DNA itself it is fair to say that the DNA-histone system can hold a significantly larger amount of data than is expressed in this article.

Instead of describing each cell as a 1.5 Gb parcel, it would be more useful to think of each cell as containing 1.5 Gb of “core” code with much more than 1.5 Gb of data (~35 million histones, each with 8 subunits, and each subunit with likely a thousand unique states) that represents the current active state of the code.

That means that from the perspective of the data storage potential each nucleotide actually has many more possible states (is the nucleotide methylated, is there any protein or RNA actively bound to the nucleotide) and the state of each histone. If I’m remembering correctly are around 50 different histone modification that commonly exist in mammals (although some exclude others nearby), and each histone takes up 150-200 nucleotide. This gives

• Alan Matthews on June 16, 2016 at 1:02 am

Great response. No one understands the complexity involved which is why it’s not modeled yet. Add in time and temperature variants and this giant chemical soup operating in four dimensions is truly a “big question”.

• Mal Christison on August 3, 2016 at 3:07 pm

Thanks for your comments Andrew. I gained a clearer understanding of the current state of thinking on genetic information. You have given me much food for thought. Your comments that histone variations significantly increase the data storage potential of each cell are most interesting. Unfortunately your post seems to be cut off at the point where you reach your conclusion. I’m keen to read the rest.
Is it true to say that the dynamic epigenetic and gene-regulatory systems that you describe are very complex and highly controlled chemical processes? They are not comparable to natural chemical reactions that can occur spontaneously. If so, what is controlling these processes? Currently science says we don’t know, but if we dig deeper into the minutia of the physical cell structure and chemical processes, we will eventually find the answer.
This empirical approach is important work and has achieved remarkable results. So much so that it’s easy to believe that we are on the verge of understanding the entire system. But are we really on that verge? To illustrate this point, we could modify William Paley’s watchmaker analogy. Let us propose that instead of finding a watch in a world without watches, Paley found a computer and pondered on its operation. He could take the computer apart and learn a great deal about its construction. He could empirically explore the physical components down to the molecular level and map the entire the system. But he would never fully understand the workings of this device until he rationalised an operating system.
You can correctly say that there is no acceptable scientific evidence for an operating system of any kind in living cells. Yet there is considerable circumstantial evidence. Twin studies indicate a massive genetic inheritance of physical, familial and psychological characteristics. Animal studies suggest extraordinary a priori knowledge in the form of instinctive behaviour. But the most telling circumstantial evidence is that we die in a moment. How can the massively complex chemical processes in 100 trillion cells terminate simultaneous in a moment?
It’s life that terminates at the moment of death. It’s life that transfers our genetic inheritance and our process control systems from parent to child in the first cell. It’s life that copies this cell 100 trillion times with a 100 trillion variations and it’s life that controls the chemical processes in these cells. So can we apply reason to the problem of understanding the life force?
The first question must be, is the life force ethereal or real? If we choose real, then the life force must comply with the laws of physics and chemistry. So is the life force based on matter or energy? We quickly come to a fork in the road. We can explore the matter road first and find that it leads to a hugely successful community. In some parts the streets are literally paved with gold, and not without good reason. High achievement abounds with experts, associations, curricular and publications. The high achievers are kings of the world. Who can doubt that this is the road to success?
And yet, energy is much more efficient for information transfer and process control than matter. Energy can change state billions of times per second, can travel at or near light speed, is infinitely variable in form and function, and is preferred in all high performance control systems over matter. So what happens if we hypothesize that the life force is energy based? Let’s explore that road.
The first thing we would see is that the turnoff is a small overgrown track. Very few people pass this way. We would encounter various battle fields where disaster has struck previous unfortunate travellers. Animosities dating back to Aristotle the empiricist and Plato the rationalist are unleashed here. Battle scenes include Volta v Galvani, The Royal Academy v Mesmer, Randi v Puthoff and Targ, Maddox v Benveniste, medical science v quackery, the Ignoble Prize etc.
In the middle of this carnage, huge barricades are erected and manned continuously by the sceptical gatekeepers. It’s like a scene from Monty Python. The barricades have signs that declare, ‘none shall pass’, ‘extraordinary claims require extraordinary proof’ and ‘step right up to claim your million dollars after passing my simple test.’ The barricades are manned by thousands of volunteers, ready to cast heretics into the abyss. It’s a no-go area and only a fool would try.
If we look beyond this scene of devastation, behind the barricades and far off into the distance we can see another land, a magical land of fairies and elves, where science doesn’t matter and proof is not really a requirement. It’s a surprisingly successful place with strong community support. Even the British Royal family patronises their services and many universities provide degree courses in their beliefs and practices.
The gatekeepers point and pour scorn but it makes no real difference to the inhabitants. Occasionally people with a scientific bent come down from this magic land to show some evidence to the gatekeepers that they hope will gain some credibility with the scientific community. But the gatekeepers give them a good thrashing and send them on their way. I wonder if the gatekeepers will give me a good thrashing for writing this. I doubt it. It’s easier to ignore trouble makers. Still there may be a grain of truth in this Chautauqua.

• Sven Viking on July 14, 2017 at 8:19 pm

Curse you, comment character limit! (Re: Andrew C)

• JASON WEGELEBEN on May 19, 2016 at 10:10 am

“Total amount” of data is not dependent on Unique vs Duplicate data
total is total ..total is not how much it can be compressed (removing duplicate data)
I’ve read elsewhere that since there is less than 1% variation between all human individual’s dna that you could store the parts that make an individual unique on only 4MB thus the entire (8 Billion unique individuals) human race’s genetic data could be stored on about 32,000 Terabytes plus the 1.485 GB(other 99% dnd shared)
considering you can get 3.5″ hard drives with 8TB capacity that would require only 4000 drives which would take up about 1 cubic yard of space

• Lance on September 22, 2016 at 10:39 pm

Thank-you Mal Christison for putting into words (actually, quite eloquently) what I have been discovering and experiencing in my own search. Having a background within automotive embedded systems engineering, I understand the complexity that is coming with autonomous vehicles and AI. But it pales in comparison to the complexity you so keenly describe and on a microscopic scale. I too have asked professors about this and got the reply, “Yes, we need more research in this area.” The only conclusion that I have come to is that everyone in the research community knows from history that this is a one-way ticket to end your career because it gets at the Achilles Heel of evolutionary theory. It seems like only ID researchers publish anything in this area. If I’m wrong, I would be happy to be pointed to some research that has attempted to answer these questions. Thanks again for your Chautauqua (new word for me).

• Gordon on December 31, 2016 at 4:03 pm

How much data are required to create an ant, a bird or a monkey? I would envision this data doesn’t have to be very big but in high dimensional space (there are probably quantum physics involved) and is some sort of adaptive self manufacture and assembly processes and rules to create a large colony of cooperative and interdependent cells and microorganisms. There should be some scientists who are researching far less complex organisms than humans in order to understand the physics and chemistry of life. The “big question” is not interesting, the answer is obvious and would be truly waste of energy and resource for govs to fund any such research. Those research should be funded by those deep pocket interest groups and communities to proof their own answer is right. Just be careful not to be named a heretic and add knowledge to science.

• Mark on February 15, 2017 at 9:06 am

Thank you Mal and Andrew and others for your interesting observations and contributions.

What is very clear to me, as an engineer (who designs computers for a living), not a molecular biologist, is that this “system” didn’t just happen.

It didn’t happen without a specification. A stunningly complex blueprint. And most certainly a goal.

I am stunned that children are still being taught that “life” can be artificially produced in a test tube (based upon the Miller-Urey experiment). Given that for the last 60 years the best scientific minds have been trying to construct (purposely construct) just one protein , that seems to be dumbing down the issue to the point of complete falsehood.
When I challenged a Head of Science Dept why this was so I was told “It is ok to tell them this (small inaccuracy), because we know that it must be so”.

You can put all the component parts of a cell in a test tube and shake it for as many million or billion years as you like, and you are still not going to get a single DNA or RNA or molecular motor (like the kinesin motor that transports cargo along microtubule filaments) – you’ll be lucky to get more than a couple of amino acids holding hands.

Mal, I asked a doctor friend of mine the question of whether all cells die at the same time, or how to they get told to “shut down”. He advised that they only stop when they run out of energy. And that energy is ultimately provided by oxygen from breathing. I think I have recalled that correctly.

It amazes me that SETI still exists, searching the galaxies for signs of Information, when those signs are within every living organism on this planet.

The other thing I find puzzling is that, as an engineer, I find every problem I face already has a solution exquisitely designed within nature. Thermal insulation of a polar bear (far far better than anything man made). Animal night vision? When did we stop trying to mimic birds in flight? Is a fixed wing aircraft even a close approximation to any bird? Marine organisms using optical communications? Where is the information coming from that enables the Zooids within a Pyrosome to simultaneously communicate with light and propel the colony?

There is something more going on here than just information in a strand of DNA. And it most certainly didn’t randomly happen through unguided mutation.

I think Mal has asked some valid questions. But obviously not one that will get any funding!

• Eugene on May 4, 2016 at 2:18 pm

Well, the first DNA to RNA stage is actually called transcription, the second, from RNA to protein, is translation. From what I read, the information volume of the human genome in bites is about 800 Mb, i.e. the size of a compact disk. What is puzzling though is that it is not all information that is necessary in building of a body. A lot of body reconstruction is orchestrated at runtime (e.g. splicing). So DNA is like an index. There’s a lot more to biology than just DNA/RNA/protein transformation although that is essential of course. And, what is remarkable, that it does not reduce to chemistry as the new discipline of biosemiotics points out.

• Rodney on October 15, 2016 at 3:18 am

My thoughts exactly! I’m searching the web looking for the answer to this and while I haven’t found it its great to see someone else asking the exact same question. And that’s what I thought too – its 150 Zb of the same 1.5Gb so really just 1.5 Gb of information storage.

The fact that I can think these thoughts even the fact that people can build internets is ultimately contained in our DNA – I just can’t believe it all fits in 1.5 Gb

• Torres on May 29, 2017 at 6:43 am

Excellent contribution.

• Akkash on June 2, 2017 at 9:36 pm

Super information

6. Mike Levens on February 8, 2014 at 5:15 pm

All of this is a good discussion of the amount of data in the human genome, but it’s not really the same as discussing the amount of *information.* Storing an arbitrary 150-zettabyte value as a human would require the possibility of having human beings who have a completely different genome in each and every cell, which is obviously crazy. So how much redundancy is there, and what is the Shannon entropy of the data being stored?

Suppose my sci-fi future self wanted to write a compression algorithm for storing and transmitting a particular human genotype, say for molecular reassembly and cloning at a remote location or something. What data compression ratio could I achieve in theory? How big would this 150 zettabytes look like if LZW’d?

7. chrismckay on September 27, 2012 at 2:00 am

Your calculations are based on the chromosome, I pose the question would 150 Zettabytes be enough to store the human body as a binary form and then be able to reintegrate it?

• smartmoves on July 5, 2013 at 2:52 am

Give or take an order of magnitude, a 1Ghz async network would take 1,750 years to transfer that much data. I guess “beam me up Scotty” is out to lunch for a while…

• Torres on May 29, 2017 at 6:46 am

Very smart point.

8. DougB on April 11, 2012 at 5:48 pm

I have say much credit is due for even attempting the calculation. How about the data storage capacity of cellular mitochondria? It seems this also would add to the total. I am curious to know how your friend responded to the storage capacity of DNA.

9. 1857 on March 23, 2012 at 8:34 am

hi, this is leili, i would tahnk you if you add more last seminars and congress in biology, please,,,,,,,,,,,,

10. Yevgeniy Grigoryev on March 21, 2012 at 4:45 pm

So my knowledge of how computer works is rather limited but I think processing power is a good comparison. While the genetic data stored can be compared to RAM (Random access memory, a form of computer data storage). The amount of data expressed can be compared to processing power or computing power. The epigenetics machinery and promoters can be compared to computer processors that handle massive calculations. Alternatively, they can probably also be compared to the CPU’s electronic clock, that creates a series of electrical pulses at regular intervals. This allows the computer to synchronize all its components and determine the speed at which the computer can pull data from its memory and perform calculations. In a cell that would equate to all the active transcriptional states at any given time.

11. Emily Crow on March 20, 2012 at 8:44 pm

What do you think epigenetics would equate to in computer terms – processing power? Could different promoters be thought of as the amount of operations that can be carried out simultaneously??

12. Balaclava on March 20, 2012 at 4:28 pm

You are completely ignoring the fact that information in DNA is not only present in the base sequences but also (and maybe more so!) in its methylation profile. So when you are trying to represent a single specific genome in the form of bytes you have to take that into consideration.

Furthermore, when you are calculating the amount of data stored in the entire human body you have to take into account that not all promoters are in the same state in every cell (that’s why every cell is different..!)

So how would you represent acetylations and methylations?

• Yevgeniy Grigoryev on March 20, 2012 at 8:16 pm

Dear Balaclava,

Your observation about the methylation and acetylation profile are very valid, such epigenetic factors make the coding capacity of our genome almost infinite! However, what I tried to calculate was mere data “stored”, not expressed by the human genome. My calculations are oversimplified, of course. I doubt that currently there is a way to calculate the amount of data “expressed” in the genome, that also factors in all the genome-wide epigenetic modification events such as methylation and acetylation.

You are also absolutely right about the varying promoter states across different cells. However, the promoter expression does not affect the data stored. I think it would be virtually impossible to calculate the data actually expressed at any moment, taking into account any all the possible promoter states and epigenetic events. Of course, all attempts at such daring tasks are more than welcomed here, this is the point of this post, after all.