Re: In an RDBMS, what does "Data" mean?

From: Dawn M. Wolthuis <dwolt_at_tincat-group.com>
Date: Wed, 16 Jun 2004 18:34:43 -0500
Message-ID: <caqlek$k4t$1_at_news.netins.net>


"Eric Kaun" <ekaun_at_yahoo.com> wrote in message

news:8D0Ac.41$NZ6.38_at_newssvr33.news.prodigy.com...

> "Anthony W. Youngman" <wol_at_thewolery.demon.co.uk> wrote in message
> news:qZztNxCLLizAFwnY_at_thewolery.demon.co.uk...
> > In message <qj3yc.2599$9C6.264_at_newssvr15.news.prodigy.com>, Eric Kaun
> > <ekaun_at_yahoo.com> writes
> > >OK, my mistake - I thought you meant the formulae themselves were
axioms.
> > >You're referring to underlying assumptions, which of course are.
> >
> > Some of the formulae may have to be axioms too. If you need to assume,
> > then it's an axiom, if you can derive from your previous assumptions
> > then it's a theorem.
>
> Yes, agreed...
>
> > >Right, but cascading delete is different than that. Constraints encode
> what
> > >you just said ("A cannot exist without B"); cascades are useful
> shorthands
> > >for updates, designed to make sets of operations easier while obeying
the
> > >constraints.
> >
> > But the need for a cascading delete is metadata - information that
> > should be *implicit* within the database. You're turning it into an
> > *external* constraint - putting it where it does NOT belong!
>
> I don't understand your distinction between "implicit" and "external"
here.
> External to what? In relational, both are part of the database, which
> includes both relations (actually relvars, relation-typed variables which
> are updated with new relation values) and constraints. In most businesses
> there are rules which bind multiple relations; I'm sure you have something
> similar in Pick, though it may be enforced by the application. Using
> first-order logic over relvars, you can specify most of these (if not
all).
>
> Haven't you ever seen a Pick app where deleting from (or updating) a
record
> in FILE A requires a corresponding delete/update in FILE B, yet you don't
> have the ability to encode that in the file or dictionary?
>
> I think you're suggesting that the data structures themselves should
encode
> the constraints, which gets you into dangerous territory, leading to novel
> data structures for each individual enterprise. That's fine as long as the
> query and update operators stay consistent, but you'd quickly find
> constraints undoing that, leaving you with custom persistence and no
> standard at all. Remember that even foreign and primary key constraints
are
> just that; SQL and even D give shortcuts, but it's just a 1-1 mapping to a
> constraint declaration.

I suspect that Wol was talking about one of the more common relationships between relations -- that of parent and child. A parent-child relationship is designed and then specified and no additional constraints or logic of any sort is required to ENSURE there are no children without a parent and if the parent goes, the children are gone too. Of course this can be accomplished handily in a SQL-based solution, but it isn't quite as intuitive.

> > >I may have expressed myself badly. What relational theory says is that
> > >statements about line items on an invoice state truths about values
which
> > >are unrelated to the invoice as a whole, though each line item of
course
> > >depends on the invoice (header). That statement, while "related to" the
> > >invoice header (in that it can't exist without it), has logical meaning
> on
> > >its own - I can formulate useful queries over line items which don't
> involve
> > >the header.
> >
> > I think we're having a bit of fun here :-)
>
> Yes, apparently we both have a sick notion of fun. :-)

Count me in on the sick fun, but, nevermind -- I deleted my first response, which is just as well.

> > You're saying you want to
> > extract data from certain "columns" without caring what the primary key
> > is. Fine - no problem there. Ignore the columns you're not interested
> > in.
>
> But then why include them at all? Certainly I can ignore attributes, for
> example in updating attribute A I ignore attribute B. Consistently
treating
> a set of Pick attributes as a group (e.g. the line item attributes), while
> they're part of INVOICE, seems logically wrong; those attributes are
> different than, for example, the INVOICE_DATE.

I'm missing your point. INVOICE_DATE is an attribute of an INVOICE and INVOICE_LINE_ITEM is an attribute of an INVOICE, even if it has both cardinality and degree greater than 1.

> > I'm saying that deleting the primary key should delete all related rows
> > - even those in other tables! If your analyst forgot to specify a
> > cascading delete (and you say that they're external to the theory,
> > anyway), what you're saying is that the theory FAILS to enforce data
> > integrity in that you're using something external to theory to keep the
> > tables in sync.
>
> Nothing external about it. In Pick the integrity of the files is enforced
by
> the application,

In the case that Wol is talking about -- the parent-child relationship, it is the database that enforces integrity, not the application.

> yet you don't regard the app as external (at least I've
> seen arguments to the contrary).

all a matter of definition

> Constraints are different from relations
> because they make statements about those relations. Both are integral to
> relational.

There are constraints that are part of the relation -- an attribute being part of a relation is a constraint of sorts, for example.

> > Pick just stores it all together so that taking out the primary key
> > takes out everything else.
>
> A fine shorthand, and again the CASCADE DELETE (not always what you want,
by
> the way) is simple enough to do, and even to add in later (unlike in Pick,
> where you have to make that decision up front).

Yes, but it does get missed often and application developers have to know whether such logic is left to the app or is encoded in the database.

> I object less to this than you'd expect; I can see some cases where this
> buys you a short-term gain. I just see little long-term gain, and expect
> long-term cost.

It might not be this particular feature, but I suspect it is a part of what makes for agile software development -- it is easy to make a mess of Pick design over time, but there are an amazing number of twenty-year-old systems out there (in need of database refactoring, no doubt).

> I've been trying to think of past databases I've worked on,
> and whether MVs would have bought me anything. Haven't found anything
yet...
> in the few cases where multi-descriptions or multi-coding would have
helped,
> I had cross-business unit and internationalization issues that would have
> prevented leveraging them anyway. And I can remember a few cases where
> properly treating a simple code as its own "noun", rather than an
adjective,
> saved me much work later.

I think some code-offs in the future might be in order.

>
> Short version: I see adjectives "becoming" relations fairly frequently. I
> see relations which remain "unused" as such infrequently. Of course, I'm
> aware that our perceptions are less than objective, and that the lexicons
in
> our head guide our observations more than they should.

Language does A LOT to guide our perceptions. I was just in a meeting with a project manager who is implementing a PICK application (although he doesn't know that) when the last project he managed was SAP on Oracle. He said that comparatively this was a piece of cake except that it is so different that he doesn't know if he is asking all of the right questions. I wanted to tell him to ask the questions that he would have if he had never been in an SAP or Oracle shop, but opted not to say that. It seems to me that relational thinking trains something out of us rather than training something into us. Just thinking outloud.

> > I think I know what you mean though, when you say "gets a little
> > murkier". Except, in practice, it doesn't. "customer" is a noun - it
> > gets its own FILE. "invoice" likewise. "line item" - is it a noun or
> > adjectival clause? Pick Business Analysis would unhesitatingly place it
> > in the category of adjectival clause. But I know why you would want to
> > treat it as a noun.
>
> And I can see the desire to make it an adjective - believe me, I
understand
> the object-oriented view, the desire to treat the entire business notion
as
> a single object. But I've been bitten too much by doing so, and rarely by
> "overnormalizing" - and I can usually see an impending need to "add more
> intelligence" to that "attribute."

It is the combination of the initial structure plus the ability to make changes over time that helps to handle these impending changes. I agree with your statement as it relates to 2nd & 3rd normal forms (functional dependency issues).

> > Probably because it makes the General Ledger so much easier :-) you want
> > to analyse by line-item, and not by invoice. Actually, that's not
> > difficult at all - you just add ledger code as an attribute of invoice,
> > grouped as part of line-item :-) But yep. I can see why you wouldn't
> > think it as clean - I'm inclined to agree with you. If I was programming
> > this, I'd probably say that "line-item" in the general ledger wasn't the
> > same as "line-item" in the invoice and that would make my life nice and
> > simple :-) but it would have the relational people throwing their hands
> > up in horror. Or just make the entries in the GENERAL-LEDGER FILE a list
> > of foreign keys pointing at the line item in the invoice file - not hard
> > at all. Just a smidgeon more work for the database (but rather more
> > mental contortion for the programmer).
>
> And I certainly understand the development-time advantage in reports and
GUI
> screens that a list attribute gives you. I have no doubt that Pick
leverages
> the MV paradigm far, far more than either SQL or SQL libraries leverage
> relational (or even SQL, for that matter).

One person mentioned that Pick is archaic, old-fashioned, or whatever. That is true and you should not give it too much credit, especially on the GUI side (given that didn't exist in the 70's and there has been little enhancement to Pick in the past few decades -- some will disagree with me). My interest in it for the future is as a better starting point for the industry than the SQL-DBMS's are. I can see that it has provided its users with better agility than the SQL-DBMS and that it "thinks like people think" about data (way too vague, I realize).

> There are better environments and
> libraries, but they're far from good.
>
> If you read Michael Jackson (not the king of pop, not the beer expert, but
> the English software engineer), he has an interesting approach to business
> (domain) analysis - and it advocates predicates prior to (or instead of)
> object analysis. It's interesting not just because it accords more with
> relational (a side benefit), but because it treats "phenomena", modeled by
> predicates and "owned" by different domains, the basis for design.
>
> > But I've been thinking about a few other things while this reply has
> > been sitting half-composed on my computer ... Relational Theory is all
> > about capturing *data*. BUT - a lot of information is *metadata* which
> > an RDBMS is incapable of storing as such. We were discussing ordering -
> > an RDBMS only captures this - as data - if the analyst thinks it
> > important. A Pick database captures it as a matter of course.
>
> True. An avenue for discussion might also be what other metadata is
useful,
> other than order. I think that more general question would cut more to the
> heart of why different data models appeal in different ways.
>
> > And constraints - I categorise them as "natural constraints" and
> > "business constraints". You can't have an invoice line item without an
> > invoice - that's a "natural constraint". But you *can* have an invoice
> > without a valid company. It might be an error, or it might be called a
> > receipt. But there's nothing to stop the accounts dept screwing up and
> > issuing an invoice to a non-existent company :-) That's what I call a
> > "business constraint". You seem to think that should be captured as
> > *data*. Pick captures it as *metadata*.
>
> Given that relational advocates (at least in recent writings by Date) a
> system catalog that is also composed of relational (and which effectively
> represents a partial second-order relational algebra/calculus), I'd say
that
> relational definitely wants all data, even metadata, as relations and
> constraints.

Yes, that is my understanding of the theory (not the practice)

> I have no particular reason to think that that's not desirable,
> but it also begs the question: what metadata is there, what's useful, and
> how does the importance of a given "type" of metadata influence the
utility
> of a given data model?
>
> There may be research on such... just haven't stumbled across it.
>
> > Now compare the amount of *metadata* available to Pick and/or
> > relational. It doesn't matter what your database is, the data in it is,
> > as far as the dbms is concerned, a meaningless "blob". To optimise
> > performance, storage, whatever, the only thing available of any use to
> > the dbms is *metadata*. Which Pick has in abundance.
>
> What, other than ordering?

The logical structures are built on lots of derived data (sort-of analogous to stored procedures).

<snip> I'll stop there - I can't get through the entire thing -- you guys can last a long time!
--dawn Received on Thu Jun 17 2004 - 01:34:43 CEST

Original text of this message