Re: Nearest Common Ancestor Report (XDb1's $1000 Challenge)

From: Nick Landsberg <hukolau_at_NOSPAM.att.net>
Date: Sun, 23 May 2004 22:31:17 GMT
Message-ID: <VC9sc.54582$hH.1017961_at_bgtnsc04-news.ops.worldnet.att.net>


Neo wrote:

>>OK folks, I have found this discussion fascinating and educational. 
>>Given the possible implementations, how long would this query run 
>>on a 1 million record database? 

>
>
> It could take hours, days, months, maybe even years, depending on the
> number of interconnections between things (I believe the human brain
> performs somewhat similar computations but in a massively parallel
> manner)!
>
> Sorry, but XDb1 is not suitable for commerical applications yet. It is
> currently mostly an experimental db meant to test out an alternate
> data model. ...

Thank you for the reply, Neo, and especially for the bit of unadorned honesty in the above sentence :)

> ... IMO, an apple-to-apple comparision hasn't been made yet
> because of the following differences:
>

[ Much Snippage ]

>
> Summary of processing time for a small data set (8 things):
> RM's simple solution: 65 ms (typical)
> XDb1's simpler solution: 2.23 ms (max)
> XDb1's generic solution: 2 or 3 ms (typical)
>
> On a larger hierarchy consisting of 200 goats (5 generations x 40
> goats/generation, each goat having two parents):
> RM's simple solution: 11 sec (lowest of 11,16,15,15,14,11,13,14,14,13)
> XDb1's generic solution: 5.438 sec (only 1 run thus far with beta
> v4.5.0)
>
> Please note: the above are apple-to-orange comparisions.

Yes they are, but they give me a feel for the relative orders of magnitude. (It was good of you to provide detailed specs on the machine in your reply. And it was also good of you to run the tests. I did not mean to impose.)

As a "less-than-first-order" approximation (zero-th approximation?) it would seem that XDb1 scales better to larger datasets, but with only two data points it is hardly more than conjecture at this time.

What I would be more curious about is along the lines of the following (bear with me please) -

Many users of databases want to have their cake and eat it too. That is, they want to have speed of access fo simple (single-record) retrieve operations and at the same time, have access to all the data for complex queries. In other words, have the speed for transaction processing while having the flexibility for reports. So, my next question is:

How does this critter perform on a simple query which extracts just the record for "John Smith" which may happen to be at a leaf node. (I hope I have the terminilogy right.)

No need to run the tests any time soon. I'm just curious. I think this critter has possibilities and may be of commercial interest several years down the road.

(Note that the Relational folks had to go through the same trial-by-fire some 20+ years ago. The first implementations were nowhere near as fast as the then-current CODASYL DBMS's, but we learned how to optimize and do strategic denormalization as we went along. This sounds like it needs to go through the same kind of learning curve.)

Nick L.

-- 
"It is impossible to make anything foolproof
because fools are so ingenious"
  - A. Bloch
Received on Mon May 24 2004 - 00:31:17 CEST

Original text of this message