# Re: In an RDBMS, what does "Data" mean?

Date: Tue, 01 Jun 2004 14:12:51 GMT

Message-ID: <D90vc.5076$kP7.1240_at_newssvr32.news.prodigy.com>

"Dawn M. Wolthuis" <dwolt_at_tincat-group.com> wrote in message
news:c9fdv4$sff$1_at_news.netins.net...

> "mAsterdam" <mAsterdam_at_vrijdag.org> wrote in message

*> news:40bb0d26$0$559$e4fe514c_at_news.xs4all.nl...
**> > Alfredo Novoa wrote:
**> >
**> > >>>... It was mathematically proven
**> > >>>that it is better than the graph
**> > >>>based approaches.
**> > >>
**> > > mAsterdam wrote:
**> > >>This is a very strange statement...
**> > >
**> > > This is a very basic knowledge taught in every
**> > > serious database introductory course.
**> >
**> > The statement is made in just about every database
**> > course, without demonstrating it - that's exactly
**> > what I think is strange about it. If it's proven
**> > why not give or at least reference the proof?
**> > The way it is put, it is propaganda, not basic
**> > knowledge.
*

>

> Exactly. I've read a lot of what people have suggested is a mathematical

> proof that relational database theory is good for business.

I don't think such a thing is even possible. However, it seems obvious to me that in this industry we have the capacity to stay very close to theory, given that computers are very unforgiving, and therefore our programs have to achieve some degree of rigor with respect both to the language at hand and to the business conditions we're trying to automate. So I certainly think that theory is worth a good first look, given that it impacts us in a much more direct way than many other disciplines.

Now look at the relational theory vs. some of the emerging semantics of XPath and XQuery (from Philip Wadler, Don Chamberlin, and many others). It's extraordinarily complex in comparison, and furthermore offers nothing like normalization rules (at least that I've seen - Jan Hidders and others may have seen something like that). Therefore good design criteria are less formal. A more complex theory, with weaker criteria for good design, is (all other things being equal, which they never are) to be preferred over something with a simpler theory and stronger criteria for good design. At least that's my theory. ;-)

*> While the
**> mathematical theory itself is fine, the application of it to databases can
**> have no mathematical proof of its usefulness (math does not prove its
*

> usefulness!) and seems to also have no scientific proof of its usefulness

> either.

One could argue the same for XML, Pick, etc. (for which it would still be useful to see a theory), and we're back to the "...in my experience... bang for the buck..." argument. I'm not criticizing that argument, but "in my experience" I have seen many more problems with Pick-like designs, problems that have a direct impact on agility - on my ability to evolve a system toward the expanding and changing needs of a business.

*> There are exceptions to this, such as logically proving/showing
**> that if you handle functional dependencies one way or another, it affects
**> what changes need to be made when requirements change. So, I use those
*

> techniques. There are tradeoffs. You design one way with agility in mind

> and mitigate the risks.

Agreed - in the final analysis it seems somewhat like a typing exercise to me. Whereof one cannot speak, thereof one must be silent... if the business doesn't know of any rules or structure surrounding Field X, but know it's always been captured, then perhaps something that's just an untyped list is best. But you have to ask the question, and have it answered, even if the answer is "I dunno."

*> > >> ... - Better at what?
**> > > Simplicity
**> >
**> > This reduces the statement to
**> > "It was mathematically proven that it is simpler
**> > than the graph based approaches." and leaves the
**> > judgement to the reader/student. An improvement,
*

> > but it still leaves the questions unanswered:

> > simpler at what? etc.

>

> There is surely some mathematics that is simpler when putting data into

> what-once-was-the-def-of-1NF (no repeating groups). But it is also

simpler

> for the logic in retrieving data to have no relation-valued-attributes and

> yet they have now been tossed into the mix. So, what's simpler?

Even those who discuss relation-valued attributes (RVAs) (Codd, Date, Pascal, etc.) acknowledge that they're more complex, and argue against using them. However, in some cases they're simply the best model. Take example: a system catalog, family relationships, even prime factors (which introduces to me the interesting notion of "relation-valued functions" especially infinite ones). In all these cases, eliminating the RVA introduces keys with no real meaning, and adds some complexity. RVAs introduce a different type of complexity, perhaps - I'm not going to attempt to characterize the two here.

> The old

> version of 1NF or the new version? Is simpler always better? Applying

the

> simplest mathematics to complex problems isn't our goal here.

I disagree completely. If you can successfully apply simple math to complex problems, you're way ahead of the game. Part of the problem these days is jumping on one or more complex technologies because in some vague, unanalyzed way they "seem like" the structure of the problem. Mastery of the basic tools is a useful prerequisite to understanding when and how to apply the complex ones.

*> > * First of all, Codd realized that to compare the very concrete
**> > CODASYL specifications and the much more abstract relational
*

> > model would be an apples-and-oranges comparison and would

> > involve numerous distracting irrelevancies.

>

*> Let me guess -- so instead of taking the relational model to an*

*> implementation and playing on the IDMS playing field (which would only*

> provide data on once instance of each), he brought CODASYL onto his ball

> field and then beat it, right? Sorry, I'm getting ahead of you, excited

to

> hear the story unfold.

There isn't a home-court advantage here. The court had yet to be built. Screaming "that's not fair" at an attempt to compare several models, without implementations of both at hand, is disingenuous. The point of a model, which we should all understand, is to characterize X before going to all the work of implementing X (which requires many other concerns and distractions). Even businesses understand that.

*> > * Hence, it would be necessary first to define an abstract
**> > "network model." The comparison could then be done on a
*

> > level playing field, as it were, in a fair and sensible

> > manner.

>

> laughing

In what way is the "playing field" unfair?

*> > >>I happen to like graph based approaches
*

> > >>for the overall picture and to elicit design

> > >>ideas from non-IT professionals.

It's fine as a starting point, but the approach quickly breaks down as you get into details. I have no proof, simply my experience doing JAD sessions and user requirements...

> As I understand it, the purpose of the relational model is to have a way

to

> "view" the structure of the data.

It's a predicate-based model of data - not sure what you mean by "view". It's to define the structure of data, to retrieve relation values based on other values, and to update relation variables with new values.

*> It isn't intended to be the way that it
*

> is implemented. So, if users (e.g. me) want to view the data in a graph,

> then that's seems like a good model to use, right?

No. There is a logical and practical difference between what we do and what users see. The above is like saying that since the users see a graphic, that Adobe Photoshop is the proper GUI modeling tool.

> Has there been any proof, ever, of the use of the relational model

providing

> for a better realized solution for anything than any other model? It is

in

> the application of the model that I think we lack evidence.

That's a smooth bit of useless rhetoric - the use of the word "ever" in there is especially galling. I'd like to see what "proof" there is of such a thing for any technology, language, model, etc. You're not asking for evidence about the application of the model - you're looking for something like evidence on the statistically-support success of solutions using X compared with solutions using Y, where both X and Y aren't implementations, or even designs, but models on which those designs and implementations can be based. It isn't there.

Let me ask you this: Has there been any proof, ever, of the use of object-orientation providing for a better realized solution for anything than any other model? Has there been any proof, ever, of the use of three-valued logic providing for a better realized solution for anything than any other model? etc. etc... I was going to babble on with more examples but my fingers are tired.

No, there's no proof. Let's move on to a useful discussion...

- Eric