Re: Nearest Common Ancestor Report (XDb1's $1000 Challenge)

From: Neo <neo55592_at_hotmail.com>
Date: 8 Jun 2004 13:53:18 -0700
Message-ID: <4b45d3ad.0406081253.8deef80_at_posting.google.com>


> > XDb1 normalizes the second two by making references to the first one.
> > There is only one brown in XDb1's db.
> > Using RM#1 or #2, brown is stored three separate times [in 2 tables].
>
> That is not normalisation, it is using surrogate keys or pointers.

In the most general sense, normalization is the process of removing redundancy. If you need to hear this from an authority in order to accept this common sense idea, see C.J. Date's "An Intro to Database Systems", 6th Ed, Chapter 10, Further Normalization I: 1NF, 2NF, 3NF, BCNF, pg 280 where he begins with "what is wrong with this design: redundancy", "redundancy leads to several problems", "so perhaps a good design principle is 'one fact in one place' (ie. avoid redundancy). The subject of further normalization is essentially just a formalization of simple ideas like this". In the summary of the same chapter, he restates "normalization ... the purpose of such reduction is to avoid redundancy".

How a db implements normalization is dependent on its model/implementation, but a main characteristic is replacing redundant things with a data-independent link/reference/id etc. The problem of using data-dependent links was already demonstrated in earlier posts of this thread.

And to what degree normalization is possible/practical is also dependent on its data model and implementation.

> You must be storing SOMETHING 3 times to indicate that "brown" is used
> 3 times. Whether it is the word "brown" or the surrogate ID 1429476
> or a physical memory address or whatever, you still have the SAME
> level of "redundancy" in your model.

The use of underlying, data-independent links/refs (ie ID 1429476) to the one and only original data within the db is not considered redundant in the context of dbs.

In Sol#1 and #2, "brown" is stored once in T_thing and twice in T_attributes_char. Changing one of them and not the others, would essentially corrupt the data. Users (typically) aren't given access to directly modify underlying links/refs.

> Now stop this nonsense and PAY UP!

Please show how RM Sol#1 or #2 can store the data without redundancy. Also, please show how RM Sol#1 or #2 can store the earlier mentioned three thing hierarchy containing an unnamed person and a person with mulitple names.

Thus far RM Sol#1 and #2 fail to meet the challenge because, they contain redundant schema (schema is data) to store the class hierarchy which cannot be accessed by the provided report procedure, and stores redundant data (ie "brown"). Deficiencies in the genericness of the solutions can be shown by the inability to handle unnamed things, things with multiple names, and the three thing hierarchy described in earlier posts.

Below are measurements made thus far using RM solutions that aren't as generic or normalized as XDb1's.

Small Report Generation Summary (provided by Hugo)



Solution Time(ms) Platform Notes
-------------- --------  ----------------- --------------------------
RM#1 SqlSrvr2K  14.3     1.3 Ghz PC
RM#2 SqlSrvr2K  11.0     1.3 Ghz PC
XDb1 4.4.7      16       1.3 Ghz PC        Debug version


Small Report Generation Summary (provided by Neo)



Solution Time(ms) Platform Notes
-------------  --------  ----------------- --------------------------
RM#1 SqlSrvr7   65.0     500 Mhz Server
RM#2 SqlSrvr7   68.9     500 Mhz Server
XDb1 4.5.7       1.632   500 Mhz Server
XDb1 4.5.9       6.561   233 MHz PocketPC


Large Report (28,940 rows) Generation Summary (provided by Neo) 200 Goat Hierarchy (5 generations x 40 goats/generation, each goat having two parents, except 1st gen).



Solution Time(sec) Platform Notes
-------------  --------  ----------------- --------------------------
RM#5 SqlSrvr7   40.5     500 Mhz Server
XDb1 4.5.7       2.9     500 Mhz Server
XDb1 4.5.9      16.971   233 Mhz PocketPC


Larger Report (276,620 rows) Generation Summary (provided by Neo). 400 Goat Hierarchy (10 gen x 40 goats/gen),



Solution Time(ms) Platform Notes
-------------  --------  ----------------- --------------------------
RM#5 SqlSrvr7  105 min   500 Mhz Srvr, NT  Avg of 2 runs
XDb1 4.5.10     44 min   500 Mhz Srvr, NT  Avg of 2 runs
XDb1 4.5.10     57 min   450 Mhz PC, 98    1 run
XDb1 4.5.10 195 min 233 Mhz PocketPC 1 run

Note: If one would like to verify or provide additional measurements, the SQL Server scripts are available from website. Received on Tue Jun 08 2004 - 22:53:18 CEST

Original text of this message