Nearly four months ago, I started working at the Wellcome Trust Sanger Institute. To say it’s a sweet gig would be something of an understatement! I can only think of a handful of cooler things: e.g., ESA, NASA, CERN; mostly by virtue of me understanding physics better than I do biology. Indeed, I’m not a genomicist — or even a bioinformatician — I’m just a run-of-the-mill software engineer (grunt). To be honest, being surrounded by alphageeks is a bit intimidating (think Stuart from The Big Bang Theory) with my barely-layman’s knowledge of genetics, but that’s entirely my problem.
Anyway, since joining, I’ve had a question about genomics that’s been bugging me. I think I’ve figured it out… (I probably should have asked my colleagues from the start, but in retrospect — presuming my understanding is correct — it’s a bit of a stupid question!)
If everyone’s DNA is different, how can there be a single human genome?
The clue to this is largely in the name: genomics is about genes, not nucleotides. DNA is a very long string of nucleotide pairs — the A, C, G and Ts you’re probably familiar with — but certain substrings of basepairs function to delimit genes. To analogise, written sentences begin with a capital letter and end with a full stop; likewise genes begin with a particular sequence and end with another. Genes are what define species and that’s what genomics is all about: Determining the manifest of genes for a species, what they do and how they interact. So every human has the same set of genes, but the “parameters” of those genes will be different. For simplification’s sake, if there were a single gene for eye colour: every person would have it, but some would have the blue version (allele), while others would have brown, etc.
To continue with my literary analogy, think of a genome for a particular species as a constricted poem, like a sonnet or limerick. The sentences (genes) within the poem must have the same meter, syllable count or rhyming scheme, but the actual words can be different. Thus we can have a multitude of poems with the same structure, but with vastly different content. Such is the genome to DNA.
I actually have another, related query, that is probably more of a thought-experiment rather than answerable…
The DNA molecule is a polymer, like plastic or sugar. Indeed, the two strands that hold the nucleotides in its quintessential double-helix are sugars: deoxyribose, hence the name deoxyribonucleic acid. In principal, polymers can be arbitrarily long — the human genome is over three billion basepairs — but are there non-chemical constraints on its length? For example, is there a point where it becomes too long to hold itself together, or that biochemical processes become too inefficient to be useful?
The reason I ask is because, as we are operating over an alphabet of just four symbols, any theoretical maximum genome size \(N\) would give us an upper-bound on all varieties of DNA-based life at \(4^N\). Granted this upper-bound would far exceed the number of atoms in the universe, but presumably vast swathes of DNA don’t translate to viable lifeforms.
My point being that life can’t therefore be infinite in variety.