Re: Building the Meta-data Diff Engine
Date: Sat, 23 Oct 2004 21:47:24 -0400
Message-ID: <dj1flc.k8d.ln_at_mercury.downsfam.net>
Laconic2 wrote:
<snip>
A generalized diff/install engine follows this algorithm, which I have implemented in Java in outline, though much flesh remains to be added. I should add as a legal point that I have never coded this algorithm for anybody except my current employer, who as I have said is pleasantly generous in allowing it to be exposed.
I. Setup: create throwaway working tables. Hardcoded. Create
several sets of identical table, distinguished by set names, like the "N" set, the "D" set and so forth. Explained below.
II. Specification Load
- Load set "H" tables with description of working tables, they now describe themselves. 2nd of 2 hardcoded areas.
- Load the "N" tables with the "N"ew information from the external source. Right now I type these into a spreadsheet and save them as CSV files.
- Load the "S" tables with the content of the current reference spec, if it exists. Along the way, drop entries that would conflict with the "H" set.
III. Specification Resolution
- UNION ALL the "H"ardcoded spec with the "N"ew spec.
- Overlay the HN spec with the "S" spec, so that on any matching key in any table, the "HN" spec wins, overwriting the older spec. Result goes into "C" set, the complete set.
IV. Specification Validation. Ensure the "C" spec can be built.
V. Specification Expand. Take the user-entered information and
expand to secondary tables of derived information.
VI. RealityGet. Query the server's catalog to get current state
of affairs, put it into the "R"eality set.
VII. Compare. FULL JOIN all tables in set "C" against "R" set to
get the "D"ifferences set.
VIII.Analyze. Mark the items in the "D"ifferences set with an
instruction code. Some of these are obvious, such as a table in the "C" set not in the "R" set gets code "N" for new, it will be built from scratch. Making the problem less than AI-complete depends entirely upon what you allow to enter into this stage. Extremely crucial decisions have to be made to avoid this becoming spaghetti queries. IX. Build Plan. Generate the DDL out of the action codes found in the "D" set of tables.
X. Execute the DDL.
Cases:
- Database does not exist, external spec is Complete MegaProduct Version 2.0 spec. This is an INSTALL.
- Database exists at version 1.2, external spec is a Complete MegaProduct Version 2.0 spec. This is an UPGRADE.
- Database exists at version 1.2, external spec contains a handful of new tables, some formula changes, etc. This is a PATCH or SERVICE PACK.
The "discovery" as it were of this algorithm led me to some very unorthodox conclusions, including the fact that I want to store all biz rules in SCALAR DATA (else how do I validate in step IV?), and that a declarative constraint system is of NO PRACTICAL USE to me (how could I possibly get past step VII?).
From there I figured as follows. If I have meta-data entirely in scalar data, arranged in a relational model (I prefer RDM), then there are only a finite number of primitives I can use in building a database, and therefore the code can be entirely generated from templates, with complex cases built up with COMPOSITION, and it just might, maybe be possible after all these years to actually build complex operational databases entirely from scalar specifications, with no ad-hoc programming.
-- Kenneth Downs Use first initial plus last name at last name plus literal "fam.net" to email meReceived on Sun Oct 24 2004 - 03:47:24 CEST