| Oracle FAQ | Your Portal to the Oracle Knowledge Grid | |
Home -> Community -> Usenet -> comp.databases.theory -> Re: Nearest Common Ancestor Report (XDb1's $1000 Challenge)
On 3 Jun 2004 10:38:23 -0700, Neo wrote:
>> Small Report Generation Summary (provided by Hugo)
>> Platform: 1.3 Ghz PC, Non-SCSI HDs, SQL Server 2000, Windows 2000
>> ---------------------------------------------------------------------
>> Solution Time(ms) Notes
>> ---------- -------- -----------------------------------
>> RM's #1 14.3 Db not normalized to atomic symbols
>> Avg of 10 consequtive runs.
>>
>> RM's #2 11.0 Db not normalized to atomic symbols
>> Avg of 10 consequtive runs.
>>
>> XDb1 4.4.7 16 Db normalized to atomic symbols
>> Avg of 10 separate consequtive runs.
>> Unoptimized, debug version of exe
>
>The challenge is to generate the report from normalized and NULL-less
>data (I still need to convince you that 'o' in 'john' and 'god' is
>data).
Hi Neo,
No, you need to convince me that storing 'john' and 'god' as seperate values in a column is a violation of normalization rules as they apply in (and are defined for) the relational model. I already provided some quotes and URL's for Codd's normalization rules. Since the challenge was NOT "to generate the report from normalized and NULL-less data", but "to generate the report from normalized and NULL-less data (...) "using the relational model", the relational model's definitions for normalization should apply.
(In case you somehow lost your original message containing the challenge, here's a URL: http://tinyurl.com/yvm5g )
> With XDb1, the report is created from normalized and NULL-less
>data every time, without relying on any intermediate/cached results or
>statistics from prior runs. Aside from disk I/O, task switching, etc,
>the first run is as fast as subsequent runs (see consistency of
>reports generated in RAM only in earlier post). With XDb1, every
>report generation is "virgin".
OIC.
>However with SQL Server, the average of 10 runs may not be providing
>the time to generate a "virgin" report. There is a possibility that
>the last 9 runs are faster because they are relying on
>intermediate/cached data or statistics from the first run.
Strange remark. You have the full text of my query available. You can see that the first thing the stored procedure does is to remove all data from the work tables (which would actually serve to make a second execution *slower*, not faster!). Then, they are built again from the input data. Where do you see anything in my script that appears as if I'm reusing old data?
Further, there is excellent documentation on SQL Server itself. I can recommend Kalen Delaney's "Inside SQL Server 2000" if you really want to know all about SQL Server's internals. I've already posted a lot about what SQL Server does and does not cache elsewhere in this thread (thanks to Kalen Delaney's book, where I got it from!).
On the other hand - the source code for XDb1's ANC report is not disclosed and there is no "Inside XDb1" for sale. Why would I believe any of the claims you make above? Let me assure that I *DO* happen to believe these claims, but if you'd ask me why I believe you, I couldn't tell. If I count the claims you made about things that XDb1 apparently could do and had to admit later that it actually didn't or that it was bugged, I really see no reason to believe you, other than "gut feeling".
> Can you
>ensure the time measured is for generating a "virgin" report by using
>steps similar to below?
>
>1. Restart PC.
>2. Create new db.
>3. Run script to add data.
>4. Measure time for the first run.
I actually did the following:
Here's the output:
started ended elapsed ------------------------ ------------------------ --------------2004-06-03 21:36:40.480 2004-06-03 21:36:40.497 00:00:00:017
things hierarchies ancestors NCancestor
ThingX ThingY CmnAnc Dist
To compare this to XDb1 on the same hardware, I went on to do this:
Here's the output:
Common Ancestor Report for 'god'
ThingX ThingY CmnAnc Dist army john army 1 army laptop1 army 2 army fido army 2 army mary army 1 army trinity god 2 army luke god 3 john laptop1 john 1 john fido john 1 john mary army 2 john trinity god 3 john luke god 4 laptop1 fido john 2 laptop1 mary mary 1 laptop1 trinity trinity 2 laptop1 luke trinity 3 fido mary mary 1 fido trinity trinity 2 fido luke luke 1 mary trinity trinity 1 mary luke trinity 2 trinity luke trinity 1
That's 17 ms for RM#2, 15 ms for XDb1. Of course, I should have repeated the above procedure several times to get a good average, but frankly, I couldn't bother. This has taken me way too much tie already.
From the challenge (see URL above): "Report generation must not be more than 2X slower than XDb1 on equivalent hardware". I did it. I just proved that (again).
I think it is now time for you to admit that my entry for the challenge satisfies all requirements.
Best, Hugo
-- (Remove _NO_ and _SPAM_ to get my e-mail address)Received on Thu Jun 03 2004 - 17:20:31 CDT
![]() |
![]() |