Re: Database Builders, Code Generators, On-Topic?

From: Bob Badour <bbadour_at_golden.net>
Date: Mon, 26 May 2003 01:34:47 -0400
Message-ID: <kqiAa.281$od6.72567878_at_mantis.golden.net>


"Kenneth Downs" <ken_remove_underscores_downs_at_downsfam.net> wrote in message news:0gvrab.dhl.ln_at_mercury.downsfam.net...
> Quoting unnamed sources, Bob Badour claimed:
>
> > "Kenneth Downs" <MyUseNetHandle_at_linuxmail.org> wrote in message
> > news:e3arab.kkk.ln_at_mercury.downsfam.net...
> >> Hi folks. I've been lurking here for a few weeks now and have made a
> >> post or two, but up until now have not started a thread.
> >>
> >> There is a topic close to my heart which I do not see discussed here
> >> much,
> >> and I wonder if it is considered topical. Briefly, I spend my days
> >> designing and implementing data-driven systems, wherein we use a
> > collection
> >> of tables that describe tables, columns, keys, references and so forth.
> >
> > This sounds suspiciously like a system catalog.
>
> This may be only semantics, but we reserve the term "system catalog" for
the
> database server's tables or views of the schema.

It's not even semantics. A proper system catalog would necessarily include at least everything mentioned in your earlier post. The system catalog should completely describe all domains and relation variables including all constraints ie. all predicates.

> Our own tables are
> outside of the system catalog.

I suggest this indicates a deficiency in your system catalog.

> This has many advantages, not the least of
> which is that run-time code only needs to know our own schema, instead of
> the various different schemas offered by the platforms we execute against.

Doesn't the security function already take care of this?

> Used in this sense, the purpose of the system catalog is to reveal a
schema
> through DML, while the purpose of our catalog is to store our
specification
> and generate both DDL and DML.

Does your specification not reveal a schema through DML? I think it does. Can one not use a system catalog to generate both DDL and DML? I don't see why not.

> >> We
> >> use this to generate both DDL at install time and DML at run-time, and
> > also
> >> client code (pure under-the-hood, nothing cosmetic).
> >
> > Sounds fair enough. Does it handle schema version upgrades? ie. Does it
> > generate the DDL to update one schema versionb into a new schema? If so,
> > does it do this without data loss?
> >
>
> Upgrades are actually the first reason I developed early versions of this
> system, when I was working independently 7-8 years ago. The flow is:
>
> 1. Our catalog equals "what I want"
> 2. The system catalog reveals "what is"
> 3. Generate a bunch of DDL to turn "what is" into "What I want"
> 4. Optionally Execute. Or stop and report the intended plan for
> review, approval, etc.

In other words, you have two system catalog values: 1. the value after assignment and 2. the value prior to assignment. By comparing the two values, 3. you determine an access path for the assignment, and 4. you either execute the assignment or display the planned access path.

It still looks like a system catalog to me.

> This has the nifty added advantage that a first time installation is the
> same as an upgrade. If "what is" is empty, it builds the entire database.
>
> This should answer your second question also, which is that it can turn
any
> schema into any other schema.
>
> The third question could mean, "does it drop retired columns and tables?"
> Mine does not, we just stop using them.

I expect it to drop retired columns and tables. Otherwise, they simply take up storage and interfere with integrity. For instance, how does the updated schema deal with the fact that inserts no longer reference the retired column while at the same time it still exists in the schema?

> The third question might also mean, "Can you upgrade a production system
> without destroying data?" Since I'm new here I'll figure there is no way
> you would really ask that question and that I do not know what you mean.

Why would I not ask that question? Perhaps, I could have worded it more clearly as: Does it perform the schema evolution without any unecessary or unwanted data loss? Would that sufficiently clarify things for you to answer?

> >> In an effort to give some rigor to our design strategies, I have coined
> > [1]
> >> the term "Scalar Complete" to refer to the insistence that all
meta-data
> > be
> >> composed entirely of scalars, meaning that meta-data must never be an
> >> expression that is evaluated according to the syntax rules of a
language
> >> (the expression "price * qty" may look like passable scalar data, but
it
> > is
> >> not), and must never require parsing.
> >
> > I fail to see how that arbitrary restriction provides rigor. It sounds
> > more like a "Gee, my life as implementer would be easier..." kludge.
>
> The restriction is not arbitrary. If you set out to design a meta-data
> system, then you must make it Scalar Complete.

I disagree. You are making an unwarranted assertion.

> If you break this rule and
> include expressions or something that must be parsed, it breaks down
> quickly.

Again, this is just an assertion.

> When it is not Scalar Complete, it is just another system for
> storing code.

Or just another dbms. The catalog for an rdbms must describe view definitions, type definitions, constraint definitions etc. which are all defined in some language to parse.

Even a regular expression to describe allowed literals for a type requires parsing.

I still do not see how your restriction increases rigor. It sounds like an oversimplified kludge to make an implementer's life easier.

> Perhaps an example will suffice. Let's say you are building a list of
> derived column definitions, something like this:
>
> colname expression Comment
> ======= =============== =================
> dsc_amt price * dsc_pct Discount Amount
> prc_dsc price - dsc_amt Price after discount
> all_amt price * all_pct Allowance Amount
> ...etc...
>
> In this case, notice that you cannot validate the expression with a
> reference to a table of columns, because you have to parse the expression.
> You also cannot tell until run-time if the programmer slipped in a syntax
> error.

Why not? Compilers are great at catching syntax errors.

> A key point here is that we want to use the relational model to
> validate everything, we do not want to reproduce a SQL parser to check the
> validity of expressions.

I agree that choosing SQL would be a bad choice, but I fail to see the point in reinventing the system catalog. Make the system catalog work the way it should in the first place, and nobody has to reinvent anything.

> As for making my life easy, um, guilty. It also makes the system rather
> robust, self-documenting, flexible, cross-platform, and has the potential
> (we have not gone this far yet) to be language-independent. Is this bad?

I disagree that the arbitrary restriction provides any of the features you assert above. How does the arbitrary restriction make anything robust? Isn't an rdbms sufficiently robust in the first place? If not, why not? A system catalog is already self-documenting. How does the arbitrary restriction make the system catalog any more self-documenting?

I fail to see how any restriction can increase flexibility. An arbitrary restriction need not necessarily reduce flexibility, but I don't see how it can increase flexibility. The system catalog you create must still conform to language rules for names etc. so the arbitrary restriction does not make the catalog any more cross-platform or language independent either.

Your argument is nothing but hand waving. Is that good?

> >> We have had great success with this approach for C/S apps, and I would
> > love
> >> to discuss such projects with like-minded inviduals, but as I say I do
> >> not see a lot of discussion of these types of things, either here or
> >> anywhere else that I can find.
> >
> > The folks over at Alphora use their product's system catalog for
automatic
> > application generation. Paul Vernon seems to have an interest in
devising
> > a standard relational system catalog.
>
> I have followed a thread or two on transactions that Paul gets involved
in,
> but I have not seen any comments on a standard relational system catalog.

Paul mentioned it to me in private correspondence. He sees a standard relational system catalog as a necessary step in elevating the relational algebra and the relational calculus to the database level. Instead of viewing a database as a set of relation variables, there may be advantages to viewing it as a single database variable with relation valued attributes. Interoperability would likely require a standardized system catalog, because it would be a component of any assignment statement of the form:

dbVar1 := dbVar2

> Also, this is more than a system catalog.

I have yet to see any evidence supporting this assertion. What does it provide that a system catalog cannot (or should not) provide?

> We believe that the distinction
> between a schema description and the listing of business rules is itself a
> false distinction, based on this principle:
>
> "Because all business rules must materialize as operations against a
> database, then all business rules can be reduced to schema statements."
>
> Taken with the statement given below, "All operations that can be
performed
> against a database can themselves be depicted within a database as scalar
> values", it is not so much a system catalog that we have as a business
> catalog.

I suggest you read Date's _What Not How_ if you have not already. A system catalog must necessarily include all of the relevant business rules as either constraints or derivations. It seems you are agreeing that there is no difference between a system catalog and a representation of a business' rules. I remain unconvinced that scalar values suffice to fully describe an rdbms and even less convinced that they would be the best way to describe an rdbms even if they do suffice.

> >> So again, I am wondering if such discussions would
> >> be considered topical here, or if there are other forums known for
this.
> >
> > Depends. What's the theory?
>
> I don't have yet the command of terms to express it rigorously, but in the
> vernacular it would be "Never code SQL." Or, "Always generate SQL."

I would call those rules, and rather arbitrary rules at that. I would prefer to have a better language than SQL, but it is the commercially dominant language right now.

I do not think the rules above are even principles or at least not worthy as proposed principles. They do not rank with separation of concerns or the principle of cautious design or even with the principle of orthogonal design. I don't know whether it has a succinct name, but I am inclined to call the most pertinent principle here the "principle of intent":

It is best to describe one's desires as close to the level of intent as possible.

I think this principle is also closely tied to the principle of elegant design.

> It is
> based on the simple observation that all operations permitted upon a
> database can themselves be expressed within a database.

It does not answer the question of how best to express them within a database, however.

> This is a very
> good thing because you can in principle reduce the entire specification of
> a business application into tables. This in a nutshell is the theory,
that
> if you can specify it as something that must happen in a database, then
the
> specification itself can be expressed in a database.

Is that a theory or a conjecture or a hypothesis? And how does the arbitrary restriction help or hinder? Received on Mon May 26 2003 - 07:34:47 CEST

Original text of this message