Article in Sci-Am on organized systems and genetics

From: Kenneth Downs <firstinit.lastname_at_lastnameplusfam.net>
Date: Mon, 04 Oct 2004 17:25:42 -0400
Message-ID: <n4fsjc.fvs.ln_at_mercury.downsfam.net>



Disclaimer: this post is pure for-the-fun-of-it speculation. It does relate to databases, but in a roundabout way. You have been warned.

The October 2004 issue of Scientific American has a fascinating article on genetics. It seems that current conventional wisdom is that 98.5% of human genetic material is of unknown purpose, and has therefore been termed "junk." Apparently only 1.5% of a gene is composed of useful sequences that combine to form proteins, but that 1.5% is interspersed with sequences that seem to serve no purpose, and which are edited out during transcription.

This mystery is then linked to another mystery, which is that organism complexity does not seem to correspond to the count of proteins coded for in the animal's genes. They give the example of a microscopic nematode that has only 1000 cells and genes that code for 19,000 proteins, which is actually more than insects, who have genes for only 13,500 proteins, but very close to humans, who have genes for 25,000 proteins.

The article then makes a very interesting claim. They point out that there is a strong correspondence between organizational complexity and the amount of "junk" DNA the animal possesses. From here they make the leap to the hypothesis that the so-called "junk" DNA is actually organizational information, additional information that controls when and how the proteins are coded. Suddenly 25,000 proteins can do the work of millions, depending on how the organizational DNA allows it to be used.

So I'm thinking that this organizational DNA is like meta-data. But more on that in a minute.

The article has this nifty quote: "Indeed, these results suggest a general rule with relevance beyond biology: organized complexity is a function of regulatory information -- and in virtually all systems...explosions in complexity occur as a result of advanced controls and embedded networking."

So a couple of things struck me. The first is the idea that you only need so many basic ingredients to make every animal seen on Earth. This maps, with many disclaimers about analogies, to a basic claim of my own, which is that a finite number of database design patterns can be used to usefully store any data that needs to be stored. You do not need to keep inventing new table layouts, you just have to learn to recognize how to use the ones we have.

Next is the idea that organizational complexity is not about how many primitives you have, but how many ways you can usefully combine them. That requires a scheme to organize and manage the meta-data/orgDNA.

I'll bet anyone a dollar that if the authors' hypothesis turns to be correct, they will further discover that the junk/organizational DNA turns out to be composed of only a small number of primitives that combine in lots of ways. This is very important because it means that the organizational DNA is so voluminous not because it is "ad-hoc code", but because it is a very large amount of well organized "meta-data."

OK, so maybe I had too much sun on my vacation last week, but there is definitely something in that basic idea, that complexity arises not from the count of primitives, but from the meta-technology of managing primitives. This appeals to me because meta-data appeals to me, but that's enough rambling for now.

-- 
Kenneth Downs
Use first initial plus last name at last name plus literal "fam.net" to
email me
Received on Mon Oct 04 2004 - 23:25:42 CEST

Original text of this message