Re: Nearest Common Ancestor Report (XDb1's $1000 Challenge)

From: Hugo Kornelis <hugo_at_pe_NO_rFact.in_SPAM_fo>
Date: Wed, 19 May 2004 23:49:39 +0200
Message-ID: <uclna09s8gkn0f4c5cj510f4hi9rb9imth_at_4ax.com>


On 18 May 2004 23:20:19 -0700, Neo wrote:

>> > Among other things, the difference in normalization between the
>> > implementations (still looking it over) is quite different.
>>
>> I don't demand that XDb1 should use a relational structure, you should not
>> demand that a RM solution uses XDb1's structure. If you want to compare,
>> do so at the level where it counts.
>
>The report your solution generates is correct, I have no disagreement
>with that.

Thanks for the acknowledgement.

> But the implemenation is generating it from unnormalized
>data

Isn't Google just great? Here are Date's formal definitions for the five (actually six, BCNF never got it's own number) normal forms, copied and pasted from http://www.databasedesign-resource.com/normal-forms.html

(start quote)

First Normal Form

‘A relation R is in first normal form (1NF) if and only if all underlying domains contain atomic values only.’

Second Normal Form

‘A relation R is in second normal form (2NF) if and only if it is in 1NF and every nonkey attribute is fully dependent on the primary key.’

Third Normal Form

‘A relation R is in third normal form (3NF) if and only if it is in 2NF and every nonkey attribute is nontransitively dependent on the primary key.’

Boyce/Codd Normal Form

‘A relation R is in Boyce/Codd normal form (BCNF) if and only if every determinant is a candidate key.’

Fourth Normal Form

‘A relation R is in fourth normal form (4NF) if and only if, wherever there exists an MVD in R, say A -> -> B, then all attributes of R are also functionally dependent on A. In other words, the only dependencies (FDs or MVDs) in R are of the form K -> X (i.e. a functional dependency from a candidate key K to some other attribute X). Equivalently: R is in 4NF if it is in BCNF and all MVD’s in R are in fact FDs.’

Fifth Normal Form

‘A relation R is in fifth normal form (5NF) – also called projection-join normal form (PJ/NF) if and only if every join dependency in R is a consequence of the candidate keys of R.’

For every normal form it is assumed that every occurrence of R can be uniquely identified by a primary key using one ore more attributes in R.

FD = Functional Dependency
MVD = Multi-Valued Dependency

(end quote)

And on another website (http://nunic.nu.edu/~ckettemb/DBNorm.html):

(start quote)

Domain Key Normal Form (DKNF)

We can also always define stricter forms that take into account additional types of dependencies and constraints. The idea behind domain-key normal form is to specify, (theoretically, at least) the "ultimate normal form" that takes into account all possible dependencies and constraints. A relation is said to be in DKNF if all constraints and dependencies that should hold on the relation can be enforced simply by enforcing the domain constraints and the key constraints specified on the relation.

For a relation in DKNF, it becomes very straightforward to enforce the constraints by simply checking that each attribute value in a tuple is of the appropriate domain and that every key constraint on the relation is enforced. However, it seems unlikely that complex constraints can be included in a DKNF relation; hence, its practical utility is limited.

(end quote)

If you really want to maintain that my implementation generated it's report from unnormalized data, please tell me which Normal Form you think I violated and why.

> where as XDb1 generates it from data normalized down to atomic
>symbols (ie a, b, c). Yes, I consider each symbol to be data. When
>XDb1 prints the words 'john' and 'god' on the report it resolves down
>to the one and only symbol 'o' in the db.

But a is not an atomis symbol at all! Most computers store an a as a collection of 8 bits: 01100001 (97, the ASCII value of 'a').

I fail to see why you would you break down 'John' into J, o, h and n, but since that's obviously what XBd1 does, I also fail to see why the breaking down should stop at this (arbitrarily chosen) level. Why letters? Why not bits? nibbles? syllables?

> The implementation provided
>doesn't model the atomic symbols (ie a-z).

And neither did XDb1's model. You're confusing implementation internals with the model. The RDBMS model had phrases such as "insert things (thing) values ('john')" / "insert types (thing, type) values ('john', 'person')". The XDb1 model had phrases such as "john isa person".

Though XDb1's syntax is shorter and more English-like, it doesn't drill down to the level of letters or symbols. I don't see any 'j' on it's own, I only see the symbols 'john' and 'person' as a whole.

How XDb1 stores these symbols internally is interesting, but totally irrelevant to the model. Same goes for the RDBMS. I don't know how SQL Server stores 'john', and frankly I don't want to know. Maybe it calculates a 400-byte bit battern from the individual letters, encrypts that via 196-bit Rijndael encryption, swaps some bits and calculates a checksum on top of that. Maybe it uses exactly the same way to store it's data as XDb1. Maybe it even changes the way it stores data on a daily basis, automatically converting all data at midnight. I don't know and I couldn't care less, as long as "select thing from things" returns "john" and not "jhon".

> As a starting point, the
>word 'john' needs to be normalized as it appears multiple times in
>multiple tables.

And how does that violate normalization rules? If I changed the things table to include an ID and a name, then changed all other tables to use the ID instead of the name to refer to a thing, I'd still have the symbol representing john multiple times in multiple tables. The fact that this symbol now is the number 3 instead of the character string 'john' doesn't change anything.

> Changing any one of them would corrupt the data.

No. Changing any of them would result in a foreign key violation and the transaction would be rolled back by SQL Server. You didn't miss the "references" clauses in my DDL, did you?

> In
>XDb1, there is only one instance of the word 'john'.

And how many pointers (or similar structures that serve the same goal) to that word?

>
>There is another issue that I will point out in another post.

It's now 15 hours since you posted this and I still don't see it. Could you repost that message?

>
>> > In an initial attempt to make them more comparable,
>> > I modified XDb1 algorithm so as to not
>> > resolve things' names and simply prints their IDs.
>>
>> Now you're changing the rules after the competition has started. Not a
>> sign of good sportsmanship, Neo!
>
>I do not think so, data must be as normalized as in XDb1's db.

        "I challenge you to make a wooden stair with screws and a screwdriver in at most twice the time it takes me to make a similar chair with nails and a hammer".

	"Done!"
	"Hmm, nice chair indeed. And you even finished before me. But I
see that you inserted the screws by rotating the screwdriver. That was not the intention. You should have hit the screws repeatedly on the head with the blunt end of the screwdriver, like I hit the nails with my hammer".

The challenge was "to produce the equivalent Nearest Common Ancestor Report from normalized and NULL-less data and the solution must be as generic, meaning allow the user to create any hierarchy, consisting of different types of things (each type to allow different attributes) and each thing in the hierarchy to have any number of parents".

There was nothing in the original message or the webpage mentioned in it about mimicing XDb1's internal data structure, not about storing the character 'o' only once. If that had been the challenge, I'd never have bothered - I'm not in the habit of using a screwdriver for hammering screws in wood.

I fulfilled all requirements you set for the challenge. I just checked all groups this is crossposted to but saw no other entries. I hereby claim the $1000 prize. Please trasfer the money to my bank account: International Bank Account Number (IBAN) NL59 RABO 0118 3365 68 (bank information: BIC/SWIFT address RABONL2U).

> In
>general, there is no way XDb can compete with RM where a report is
>being generated from simply one table.

I fully agree with that. Even without the where-part.

Best, Hugo

-- 

(Remove _NO_ and _SPAM_ to get my e-mail address)
Received on Wed May 19 2004 - 23:49:39 CEST

Original text of this message