Re: Building the Meta-data Diff Engine

From: Kenneth Downs <firstinit.lastname_at_lastnameplusfam.net>
Date: Thu, 21 Oct 2004 20:53:37 -0400
Message-ID: <nml9lc.1ds.ln_at_mercury.downsfam.net>


Laconic2 wrote:

> What follows in more skunk works. You have been warned!
>
> On two of the contracts I worked, one of the things I was called upon to
> work with was a pair of databases that had diverged, because different DDL
> had been applied to the two.
>

In my case it was 75 and gaining 2 per month at least. The databases were on two platforms (74 on MS SQL Server, 1 on DB/2 on AS/400). Programmers ran this shop, and there were no controls, so they would go connect to a site and pretty much do whatever they wanted. Then they'd need Superman to come fix it and guess who that was.

>
> By the time I hit the second case, I was ready with a more automatic
> approach. I never would have been able to get the DBDs in the same room
> anyway. They were on opposite sides of the pond. And they spoke two
> different languages, which each of them insisted was called "English".
>
> So I built a meta-database.

Yup.

<SNIP>
>
> Once I had completed the loading, I had a database where I could
> construct some easy queries that would answer questions like these:
>
> Find all the tables that are in one database but not both.
> Find all the tables that are in both databases, but have a different
> number of columns.
> Find all the columns that are in only one database, but not both, and are
> in tables that are in both databases.
> Find all the columns that are in both databases, but are based on
> different domains in the two databases.
> Find all the columns that are in both databases, are based on the same
> domain, but have a different datatype, precision, or scale in the two
> databases.
>
> I could have done the same for constraints, but it would have been a lot
> more work.

I did it for constraints (unique, null, and and ri) and it is a lot more work. I used to tell them it was AI-complete but they didn't know what I was talking about. Throw in null rules, defaults, view definitions, stored procedures, UDFs, two platforms, and it gets really fun. We also had data-driven cascading trigger definitions.

That experience took about a year, and I would have to tell all of it or none of it, so it will be none. But the experience of having written that tool and seeing it successfully deployed was what convinced me that all of its code-handling features were non-optimal, and brought me to the conclusion that the "system of record" had to be totally scalar data, nothing less, no compromises.

The reason is simple. The tool did two things, it generated code from data and it loaded manually developed items (like stored procedures and views) provided by the developer. Everything always worked, in that it might generate 30,000 commands in an upgrade and every one of them executed successfully, but the data-driven stuff always turned out to be *right*, and the code-delivery stuff was only as right as the programmer was careful, and you know how that goes.

My objection to constraints coded in SQL comes directly from this experience, it is AI-complete (slight exaggeration) to parse them for diffing, but diffing data is as easy as 1-2-3.

-- 
Kenneth Downs
Use first initial plus last name at last name plus literal "fam.net" to
email me
Received on Fri Oct 22 2004 - 02:53:37 CEST

Original text of this message