Re: Building the Meta-data Diff Engine

From: Kenneth Downs <firstinit.lastname_at_lastnameplusfam.net>
Date: Sat, 23 Oct 2004 21:47:24 -0400
Message-ID: <dj1flc.k8d.ln_at_mercury.downsfam.net>


Laconic2 wrote:

<snip>

A generalized diff/install engine follows this algorithm, which I have implemented in Java in outline, though much flesh remains to be added. I should add as a legal point that I have never coded this algorithm for anybody except my current employer, who as I have said is pleasantly generous in allowing it to be exposed.

I. Setup: create throwaway working tables. Hardcoded. Create

    several sets of identical table, distinguished by set names, like     the "N" set, the "D" set and so forth. Explained below.

II. Specification Load

  1. Load set "H" tables with description of working tables, they now describe themselves. 2nd of 2 hardcoded areas.
  2. Load the "N" tables with the "N"ew information from the external source. Right now I type these into a spreadsheet and save them as CSV files.
  3. Load the "S" tables with the content of the current reference spec, if it exists. Along the way, drop entries that would conflict with the "H" set.

III. Specification Resolution

  1. UNION ALL the "H"ardcoded spec with the "N"ew spec.
  2. Overlay the HN spec with the "S" spec, so that on any matching key in any table, the "HN" spec wins, overwriting the older spec. Result goes into "C" set, the complete set.

IV. Specification Validation. Ensure the "C" spec can be built.

V. Specification Expand. Take the user-entered information and

     expand to secondary tables of derived information.

VI. RealityGet. Query the server's catalog to get current state

     of affairs, put it into the "R"eality set.

VII. Compare. FULL JOIN all tables in set "C" against "R" set to

     get the "D"ifferences set.

VIII.Analyze. Mark the items in the "D"ifferences set with an

     instruction code.  Some of these are obvious, such as a table in
     the "C" set not in the "R" set gets code "N" for new, it will
     be built from scratch.  Making the problem less than AI-complete
     depends entirely upon what you allow to enter into this stage.
     Extremely crucial decisions have to be made to avoid this
     becoming spaghetti queries.

IX.  Build Plan.  Generate the DDL out of the action codes found in
     the "D" set of tables.

X. Execute the DDL.

Cases:

  1. Database does not exist, external spec is Complete MegaProduct Version 2.0 spec. This is an INSTALL.
  2. Database exists at version 1.2, external spec is a Complete MegaProduct Version 2.0 spec. This is an UPGRADE.
  3. Database exists at version 1.2, external spec contains a handful of new tables, some formula changes, etc. This is a PATCH or SERVICE PACK.

The "discovery" as it were of this algorithm led me to some very unorthodox conclusions, including the fact that I want to store all biz rules in SCALAR DATA (else how do I validate in step IV?), and that a declarative constraint system is of NO PRACTICAL USE to me (how could I possibly get past step VII?).

From there I figured as follows. If I have meta-data entirely in scalar data, arranged in a relational model (I prefer RDM), then there are only a finite number of primitives I can use in building a database, and therefore the code can be entirely generated from templates, with complex cases built up with COMPOSITION, and it just might, maybe be possible after all these years to actually build complex operational databases entirely from scalar specifications, with no ad-hoc programming.

This is Ken's World.

-- 
Kenneth Downs
Use first initial plus last name at last name plus literal "fam.net" to
email me
Received on Sun Oct 24 2004 - 03:47:24 CEST

Original text of this message