Re: Sourcing Metadata for Database Independence

From: Kenneth Downs <firstinit.lastname_at_lastnameplusfam.net>
Date: Wed, 11 Aug 2004 16:50:51 -0400
Message-ID: <cr0efc.n43.ln_at_mercury.downsfam.net>


Dawn M. Wolthuis wrote:

Dawn, hi! I have been lurking here and there on this forum for a year or so, and noticed you tend to start some interesting conversations. Let me see if I can make a stab at this:

> For database independent applications, such as those written by many
> application software development companies, the most basic of metadata,
> such as the names of attributes, must be sourced so it is useful with any
> target
> database implementation.

Not to start out too contentious, but I would dispute that the names of attributes are "basic metadata." Perhaps the word "simple", meaning easy to discuss? I would say it is not basic, or primary, because the name of something is not useful to me if I am writing some kind of rules engine. Rather I need more primary information, such as table and column definitions, constraint definitions and so forth.

> There are many different strategies for where to
> source such metadata (as well as how extensive this metadata should be
> since all code is, itself, metadata).

Is code metadata? I would say no. Since metadata is data, it ought to look like data, meaning I ought to be able to normalize it and store it in tables. Tables are sets of like items, code is a series of instructions executed in sequence, they are simply not the same thing.

But if I store a code fragment in a blob column, is it suddenly data? No, it is a computer program that is being stored in a table instead of being stored in a file on disk. It will still act like a program, meaning that it has no usefulness unless it is given at the appropriate time to another program that is going to interpret and execute it.

Even if I rule out the use of complex statements and decide that I will only allow expressions, I must still parse those expressions in order for them to acquire or produce meaningful results, so they are not atomic data.

So, (catching breath), it may be that the question of where to put this data is not so important as what this data is. What do you store as metadata? Every database designer will tell you (and we can read here in this forum at great length) that a poor database design is a millstone around your neck, while a strong database design pays dividends many times over. So it is with a metadata store, since it is just another data store.

Now finally, if I know my metadata is just normalized data, and I do not allow non-atomic values, then the question of where to store it evaporates. Store it where it is needed and where it will be useful. Outside of a system it can be used to build systems and can be diff'd against systems. It could be XML, comma-delimited or papyrus, so long as you can load it into the internal representation that the engine uses. Also, inside of a database it serves as a vendor-independent data dictionary.

>
> It could be sourced in code, thereby fixing on a particular language while
> remaining database-independent.

Well, I think I blustered at great length about this above :) Code depreciates over time. Data appreciates. IMHO this is just plain because data is easier to deal with.

> It could be sourced in the metadata
> repository of a development database from which the product is built for
> any
> environment. It could be sourced in XML documents (or any type of parm
> file) that serve as input for the code and for the database processes.

Aha! Exactly. In my own system, I have what I call the 'fundamental diagram' of using metadata, which is this:

                                              +-------> App Code
       Metadata    -------->  Builder   ------+
                              (rules engine)  +-------> Database

The format of the metadata is of no great importance, as you say it can be any kind of parm file. It is fed into an engine (my own preference is a builder that builds databases and generates code).

The act of fetching a package of metadata from a distribution site and running it through the engine is an upgrade.

> It
> could be sourced as data in a database (rather than simply as metadata).
> This could be an embedded database in a metadata service.

Exactly! In my own system I call this the "reference data dictionary." It is a vendor-independent set of tables that makes the database self-describing.

>
> If you figure that the specification of a data type for a given attribute
> is a business rule, of sorts, you could have a business rules repository
> that
> is the source of all metadata.

And again I ask, what is in that repository? You could make the choice that the repository must be *complete*, and define complete as meaning that the repository contains enough information to build a database from scratch (generate create table commands, declarative RI, triggers if you choose to use them), and also generate app code to work with the database (or at least some .net XML web services :) )

> You are then tied to a particular rules
> engine (which might then tie you to a language or database too) even if it
> is written in-house.

Ah, but what sweet bondage it is. Let's say that in some hypothetical universe you could say with some certainty that a particular job involved a lot of tables and a lot of code, and that you have fixed the estimate on this job at 10,000 hours, and we decide ten people will do this in six months (of course we laugh at managers who think this way, but that is how these decisions are made). The person who takes this route will end up with a late project that doesn't quite work right.

You could instead commit 1000 hours to the engine/generator (put that one unusual programmer on it, the guy who gets things done when left alone), and devote another 4000 hours to having non-programmers spec out the metadata, and feed their specs into the engine and try zillions of combinations until they get what they like.

The manager who believes that it will take 10,000 hours is tied forever to the results of his decision: a crawling horror that must be maintained until Judgement Day. Worse, the next job he does will be done the same way, and he will put millstones around his neck one by one. The toolmaker on the other hand, is tied only to a flexible tool that he can re-use on the next job.

I'd rather be tied to the tool :)

> I know there are some not-very-widely-used standards
> for metadata repositories -- are there industry standards for rules
> specifications other than SQL? There are also third party metadata
> repositories.
>
> I'm thinking more about the future than about what are the currently most
> accepted practicies. If you are not tied to a specific language or
> database up front and are developing a new software application to be
> deployed at many customer sites on many different databases, where/how
> would you source
> the metadata? Thanks in advance for your thoughts on this..

Putting it altogether, and since you ask, I would first define my metadata. What will it store, and to what purpose? (which is really the same question).

To answer this question, we begin with the following, which I will state as an axiom (neatly sidestepping the need to provide proof):

  1. All business rules resolve to database specifications.

A simple statement, but packed with implications. It means that the main goal of all analysis for this project will be to define what goes where when by whom.

2. Any business information can be modeled with a finite number

    of techniques.

Another simple statement, but most coders do not know this. Examine every user-supplied business rule with an aim to cast into into a form that has a routine solution. Eye with extreme skepticism the "unique" cases.

Having these two ideas in hand, we know make the startling realization that:

3. Business rules can be completely specified in a finite body

    of atomic metadata.

It may seem obvious, but again, most coders honestly believe that you cannot possibly fully specify a project purely in atomic data. This is true from their perspective and their experience, but that does not make it true universally. The key insight is that you must ruthlessly practice the technique of seeing all problems as database layout questions, instead of code questions. Once the proper database design is determined, the code writes itself.

Speaking of the code writing itself, that brings us to the conclusion:

4. If you have a finite set of techniques for implementing business

    rules, why not have the computer build the tables and write the code?

In other words, once you've settled on a complete metadata model that can entirely specify the project, why not just let the computer build the app for you? Then you can write lots of apps, not just one.

This whole approach works great on upgrades too.

This is why data is a portable appreciating asset and code is a non-portable depreciating asset. We can go back to that example where one manager spent his 10,000 hours having ten programmers create code, while the other manager had a single programmer write an engine while four other people generated meta-data. If the tool-maker wants to add support for a new backend, he has only to reproduce a *small* *amount* of the original 1000 hours spent on the tool to enhance it to support the quirky syntax of the new backend. After perhaps 200-300 hours he pushes a button and presto! His investment in meta-data produces a complete working app on a new back-end. The guy running the salt mines OTOH has no hope, he needs thousands of hours and months, and the two versions will *never* be the same.

> --dawn

-- 
Kenneth Downs
Use first initial plus last name at last name plus literal "fam.net" to
email me
Received on Wed Aug 11 2004 - 22:50:51 CEST

Original text of this message