Re: In an RDBMS, what does "Data" mean?

From: Eric Kaun <ekaun_at_yahoo.com>
Date: Wed, 16 Jun 2004 18:49:40 GMT
Message-ID: <8D0Ac.41$NZ6.38_at_newssvr33.news.prodigy.com>


"Anthony W. Youngman" <wol_at_thewolery.demon.co.uk> wrote in message news:qZztNxCLLizAFwnY_at_thewolery.demon.co.uk...
> In message <qj3yc.2599$9C6.264_at_newssvr15.news.prodigy.com>, Eric Kaun
> <ekaun_at_yahoo.com> writes
> >OK, my mistake - I thought you meant the formulae themselves were axioms.
> >You're referring to underlying assumptions, which of course are.
>
> Some of the formulae may have to be axioms too. If you need to assume,
> then it's an axiom, if you can derive from your previous assumptions
> then it's a theorem.

Yes, agreed...

> >Right, but cascading delete is different than that. Constraints encode
what
> >you just said ("A cannot exist without B"); cascades are useful
shorthands
> >for updates, designed to make sets of operations easier while obeying the
> >constraints.
>
> But the need for a cascading delete is metadata - information that
> should be *implicit* within the database. You're turning it into an
> *external* constraint - putting it where it does NOT belong!

I don't understand your distinction between "implicit" and "external" here. External to what? In relational, both are part of the database, which includes both relations (actually relvars, relation-typed variables which are updated with new relation values) and constraints. In most businesses there are rules which bind multiple relations; I'm sure you have something similar in Pick, though it may be enforced by the application. Using first-order logic over relvars, you can specify most of these (if not all).

Haven't you ever seen a Pick app where deleting from (or updating) a record in FILE A requires a corresponding delete/update in FILE B, yet you don't have the ability to encode that in the file or dictionary?

I think you're suggesting that the data structures themselves should encode the constraints, which gets you into dangerous territory, leading to novel data structures for each individual enterprise. That's fine as long as the query and update operators stay consistent, but you'd quickly find constraints undoing that, leaving you with custom persistence and no standard at all. Remember that even foreign and primary key constraints are just that; SQL and even D give shortcuts, but it's just a 1-1 mapping to a constraint declaration.

> >I may have expressed myself badly. What relational theory says is that
> >statements about line items on an invoice state truths about values which
> >are unrelated to the invoice as a whole, though each line item of course
> >depends on the invoice (header). That statement, while "related to" the
> >invoice header (in that it can't exist without it), has logical meaning
on
> >its own - I can formulate useful queries over line items which don't
involve
> >the header.
>
> I think we're having a bit of fun here :-)

Yes, apparently we both have a sick notion of fun. :-)

> You're saying you want to
> extract data from certain "columns" without caring what the primary key
> is. Fine - no problem there. Ignore the columns you're not interested
> in.

But then why include them at all? Certainly I can ignore attributes, for example in updating attribute A I ignore attribute B. Consistently treating a set of Pick attributes as a group (e.g. the line item attributes), while they're part of INVOICE, seems logically wrong; those attributes are different than, for example, the INVOICE_DATE.

> I'm saying that deleting the primary key should delete all related rows
> - even those in other tables! If your analyst forgot to specify a
> cascading delete (and you say that they're external to the theory,
> anyway), what you're saying is that the theory FAILS to enforce data
> integrity in that you're using something external to theory to keep the
> tables in sync.

Nothing external about it. In Pick the integrity of the files is enforced by the application, yet you don't regard the app as external (at least I've seen arguments to the contrary). Constraints are different from relations because they make statements about those relations. Both are integral to relational.

> Pick just stores it all together so that taking out the primary key
> takes out everything else.

A fine shorthand, and again the CASCADE DELETE (not always what you want, by the way) is simple enough to do, and even to add in later (unlike in Pick, where you have to make that decision up front).

I object less to this than you'd expect; I can see some cases where this buys you a short-term gain. I just see little long-term gain, and expect long-term cost. I've been trying to think of past databases I've worked on, and whether MVs would have bought me anything. Haven't found anything yet... in the few cases where multi-descriptions or multi-coding would have helped, I had cross-business unit and internationalization issues that would have prevented leveraging them anyway. And I can remember a few cases where properly treating a simple code as its own "noun", rather than an adjective, saved me much work later.

Short version: I see adjectives "becoming" relations fairly frequently. I see relations which remain "unused" as such infrequently. Of course, I'm aware that our perceptions are less than objective, and that the lexicons in our head guide our observations more than they should.

> I think I know what you mean though, when you say "gets a little
> murkier". Except, in practice, it doesn't. "customer" is a noun - it
> gets its own FILE. "invoice" likewise. "line item" - is it a noun or
> adjectival clause? Pick Business Analysis would unhesitatingly place it
> in the category of adjectival clause. But I know why you would want to
> treat it as a noun.

And I can see the desire to make it an adjective - believe me, I understand the object-oriented view, the desire to treat the entire business notion as a single object. But I've been bitten too much by doing so, and rarely by "overnormalizing" - and I can usually see an impending need to "add more intelligence" to that "attribute."

> Probably because it makes the General Ledger so much easier :-) you want
> to analyse by line-item, and not by invoice. Actually, that's not
> difficult at all - you just add ledger code as an attribute of invoice,
> grouped as part of line-item :-) But yep. I can see why you wouldn't
> think it as clean - I'm inclined to agree with you. If I was programming
> this, I'd probably say that "line-item" in the general ledger wasn't the
> same as "line-item" in the invoice and that would make my life nice and
> simple :-) but it would have the relational people throwing their hands
> up in horror. Or just make the entries in the GENERAL-LEDGER FILE a list
> of foreign keys pointing at the line item in the invoice file - not hard
> at all. Just a smidgeon more work for the database (but rather more
> mental contortion for the programmer).

And I certainly understand the development-time advantage in reports and GUI screens that a list attribute gives you. I have no doubt that Pick leverages the MV paradigm far, far more than either SQL or SQL libraries leverage relational (or even SQL, for that matter). There are better environments and libraries, but they're far from good.

If you read Michael Jackson (not the king of pop, not the beer expert, but the English software engineer), he has an interesting approach to business (domain) analysis - and it advocates predicates prior to (or instead of) object analysis. It's interesting not just because it accords more with relational (a side benefit), but because it treats "phenomena", modeled by predicates and "owned" by different domains, the basis for design.

> But I've been thinking about a few other things while this reply has
> been sitting half-composed on my computer ... Relational Theory is all
> about capturing *data*. BUT - a lot of information is *metadata* which
> an RDBMS is incapable of storing as such. We were discussing ordering -
> an RDBMS only captures this - as data - if the analyst thinks it
> important. A Pick database captures it as a matter of course.

True. An avenue for discussion might also be what other metadata is useful, other than order. I think that more general question would cut more to the heart of why different data models appeal in different ways.

> And constraints - I categorise them as "natural constraints" and
> "business constraints". You can't have an invoice line item without an
> invoice - that's a "natural constraint". But you *can* have an invoice
> without a valid company. It might be an error, or it might be called a
> receipt. But there's nothing to stop the accounts dept screwing up and
> issuing an invoice to a non-existent company :-) That's what I call a
> "business constraint". You seem to think that should be captured as
> *data*. Pick captures it as *metadata*.

Given that relational advocates (at least in recent writings by Date) a system catalog that is also composed of relational (and which effectively represents a partial second-order relational algebra/calculus), I'd say that relational definitely wants all data, even metadata, as relations and constraints. I have no particular reason to think that that's not desirable, but it also begs the question: what metadata is there, what's useful, and how does the importance of a given "type" of metadata influence the utility of a given data model?

There may be research on such... just haven't stumbled across it.

> Now compare the amount of *metadata* available to Pick and/or
> relational. It doesn't matter what your database is, the data in it is,
> as far as the dbms is concerned, a meaningless "blob". To optimise
> performance, storage, whatever, the only thing available of any use to
> the dbms is *metadata*. Which Pick has in abundance.

What, other than ordering?

> That's why I describe Pick as a superset of relational - it can convert
> metadata into data and present it to the app.

True enough about the ordering, but I'd argue that without any constraints (ordering is implicit), Pick doesn't offer much else. I have to admit not knowing enough about the dictionary, but that seems to be functional transformation, not actual constraints on what's placed into a file... and in particular no constraints that cross multiple files. I think relational constraints can be much, much more descriptive (as well as being proscriptive) - far better than SQL would let on.

> It can also USE the metadata to optimise itself.

How so? I thought the programmer had to make the opimization, by choosing what data is retrieved at one time by virtue of being in the same file? I may be missing something.

> Relational can only store this sort of
> information as *data*, and as such the information is not available to
> the dbms for its internal use.

Well, if the catalog is relational (as it should be), then I'd say this isn't quite correct. One could even enforce database design / naming standards using constraints over system catalog relations!

In any event, it would seem useful even in Pick if there were certain "implicit" files that represented the files in the system - for example, a file called FILE with one record per file, and perhaps an attribute called ATTRIBUTES containing a list of attributes... anyway, you can probably see the utility of that for app generation, enforcing standards, and even implementing the Pick engine (and extensions/plugins). Date advocates that, and I believe that Dataphor uses that heavily.

I do wish the Dataphor folks would chime in... it would be nice to hear something from a real relational engine.

  • erk
Received on Wed Jun 16 2004 - 20:49:40 CEST

Original text of this message