Re: Nearest Common Ancestor Report (XDb1's $1000 Challenge)

From: Hugo Kornelis <hugo_at_pe_NO_rFact.in_SPAM_fo>
Date: Fri, 21 May 2004 00:33:51 +0200
Message-ID: <0j9qa09trv8qeu86pfbtjarpm7oqmvekfr_at_4ax.com>


On 19 May 2004 21:56:15 -0700, Neo wrote:

>> > But the implemenation is generating [report] from unnormalized data
>>
>> Here are Date's formal definitions for the five
>> (actually six, BCNF never got it's own number) normal forms, copied and
>> pasted from http://www.databasedesign-resource.com/normal-forms.html
>
>Ok, so Date has defined 5 or 6 normal forms. And normal forms are the
>result of normalization. In his book, Date says normalization is "the
>sucessive reduction of a given collection of relations to some more
>desireable form". The problem with such definitions and normal forms
>is that they are RM specific.

Exactly how is that a "problem"?

In your challenge, you asked for an equivalent report "using the relational model (...) from normalized and NULL-less data". It is *you* who asked for normalized data using the relational model. If you know a better source for normalizing data using the relational model then Date, please name it, with relevant URLs.

You didn't set a challenge to make a report using a model that is normalized according to XDb1's own definition of normalisation and then kludged in an RDBMS. You wanted us to use the relational model. The normal forms as defined by Date are part of that model, if you like it or not.

> XDb1 does not have relation-like
>structures and the normal forms don't apply.

So? I don't have a problem with that. Please, by all means, let XDb1 store the data in every way it seems fit, normalized or not - I couldn't care less. The challenge was not about XDb1, it was about making a report from normalized data using the relational model. I certainly won't ask you to use the relation model's way of normalizing data you store in XDb1. If you want to use a hammer and nails, I'll never ask that you insert the nails by rotating the hammer.

> At the begining of the
>same chapter, Date gives a more general meaning of normalization which
>relates to removing redundancy.

My solution had no redundancy. I'll come back to that later.

> In XDb1, nomalization is the process
>of replacing duplicate things with a reference to the original thing.

I don't care how XDb1 normalizes. The challenge was for using the relational model, not for using XDb1's model.

>In the most general sense, normalization is the elmination of
>redundancy. The data in your tables, like the one replicated below,
>have redundancy, they are not normalized.
>
> Table hierarchies
>thing otherThing hierarchy
>--------- ---------- ---------
(snip some rows)
>'fido' 'john' 'leader'
>'fido' 'mary' 'leader'
>'fido' 'luke' 'leader'

You might be surprised to hear this, but there's a reason that the relational model is called "relational". It's about storing relations.

The relations stored in the non-snipped part of the above table can be expressed as follows:
 The thing referenced by the name 'fido' is just below the thing referenced by the name 'john' in the hierarchy referenced by the name 'leader'.
 The thing referenced by the name 'fido' is just below the thing referenced by the name 'mary' in the hierarchy referenced by the name 'leader'.
 The thing referenced by the name 'fido' is just below the thing referenced by the name 'luke' in the hierarchy referenced by the name 'leader'.

I could write this out for the entire table, but my point should already be clear: there is no redundancy. Each row in the table expresses one relation between two instances of the type "thing" and one instance of the type "hierarchy". Though one instance of the type "thing" may be referred to in several of the above expressions, each expression as a whole is different from each other expression. Nothing can be removed from the above table without changing the meaning of the data in the database.

There would have been redundancy if each object's class was stored along with it's reference, like this:

thing class otherThing otherClass hierarchy --------- --------- ---------- ---------- --------- (snip some rows)

'fido'    'dog'     'john'     'person'   'leader'
'fido'    'dog'     'mary'     'person'   'leader'
'fido'    'dog'     'luke'     'person'   'leader'

This example of a bad design has redundant storage of the same relation. The following appears three times in the non-snipped part:  The thing referenced by the name 'fido' is member of the class referenced by the name 'dog'.

You seem to be mixing up two things: storing the same relation multiple times (redundancy) vs storing the same reference to a thing multiple times (not redundancy, since the same thing can participate in many relations and needs to be referred to in all of them).

You seem to be thinking that storing the character strings 'fido' and 'leader' several times is bad. Well, let's for argument's sake assume that I would agree (which I don't) and that I would change the things table and add a hierarchy_types table:

create table things (thingID int not null,

                     thingName varchar(20) not null,
                     primary key (thingID))
create table hierarchy_types (hierarchyID int not null,
                              hierarchyName varchar(10) not null,
                              primary key (hierarchyID))

That would allow me to remove all 'fido's and 'leader's from the hierarchies table. Instead, it would now look like this:

    Table hierarchies
thingID otherThingID hierarchyID
--------- ------------ -----------
(snip some rows)

17        3            1
17        4            1
17        5            1

And the table things would read:

thingID thingName
------- ---------
(...)

3      'john'
4      'mary'
5      'luke'
(...)
17     'fido'

These relations could be expressed as:
 The thing referenced by the number 17 is just below the thing referenced by the number 3 in the hierarchy referenced by the number 1.  (etc)
 The thing referenced by the number 17 is referenced by humans by the name 'fido'.
 (etc)

This does indeed reduce the number of times you'll see the string 'fido' in the database to 1. But we still have multiple references to fido (note the important distinction between fido (no quotes - a dog) and 'fido' (quotes - a name). We didn't remove anything from the database, we only changed the way things are references and added an extra relation.

You claim that my model has redundancy. But if I follow your suggestion of "improving", I end up with the same data in the existing tables and some extra relations that I could do without.

I don't know what definition of "redundant" you've been taught. I think that "Exceeding what is necessary or natural; superfluous." (*) does quite accurately describe the schema with both thingID and thingName, not my original schema!

(*) Source: The American Heritage® Dictionary of the English Language, Fourth Edition
Copyright © 2000 by Houghton Mifflin Company. Published by Houghton Mifflin Company. All rights reserved.

>Also consider what will happen if the first person's name is 'john'
>and the dog's name is also 'john'.

In my relation implementation, this will be rejected because a primary key constraint is violated.

XDb1 fails in a much more spectacular way, as I'll explain in another message.

Best, Hugo

-- 

(Remove _NO_ and _SPAM_ to get my e-mail address)
Received on Fri May 21 2004 - 00:33:51 CEST

Original text of this message