Building the Meta-data Diff Engine
Date: Thu, 21 Oct 2004 11:43:19 -0400
Message-ID: <L9OdnSkAE828ROrcRVn-hg_at_comcast.com>
What follows in more skunk works. You have been warned!
On two of the contracts I worked, one of the things I was called upon to work with was a pair of databases that had diverged, because different DDL had been applied to the two.
I approached the first one by moderating a series of meetings between the two people who controlled the DDL, each on one database. I detected the differences pretty much by hand.
The meta-database contains user tables that are the reflection of the system tables in each of the databases. I only selected a subset of the data in the system tables. There were some tables and columns I didn't need. I had to add one column to each table in the meta-database, called DATABASE_NAME. This is completely unnecessary metadata in the original databases, due to what I'll call the "Tigger principle": ("The most wonderful thing is, I'm the only one!") But, without it, we've got the cat food problem all over again.
Then I loaded the meta-database from the respective databases with an INSERT...SELECT construct. The insert was into the user tables of my meta-database, while the select was from the system tables of one of the databases. The SELECT also has one extra item, a literal name of the databases, like this:
, 'EAST_DB' as DATABASE NAME
This means I needed two separate loader scripts, one for each database, that differed by two trivial details, the connect statement, and the literal database name. Big deal.
This INSERT...SELECT construct is what make me call this approach "skunk works". It violates the "thou shalt not conflate meta-data with data" dictum all over the place. I use skunk works whenever I need to.
I could have done the same for constraints, but it would have been a lot more work.
Among thousands of columns, I found one that was based on a floating point number on one side of the pond, and on an integer on the other side of the pond. And, it turns out, this was the answer to a problem that had plagued the client for months! But that was a byproduct of the effort. The main product was to produce a set of columns where the negotiation could proceed between the two sites as to whether the column should be removed from one site, or added to the other.
Tables that were only at one site were less of an issue, for reasons I can go into later.
All of the above was less than one days' work, from "Eureka!" to presented results. When they asked me how the heck I did it, I said, "I used a computer". The client was not amused.
This topic is tangentially related to a couple of topics started recently by Kenneth. Received on Thu Oct 21 2004 - 17:43:19 CEST