Re: Nearest Common Ancestor Report (XDb1's $1000 Challenge)

From: Hugo Kornelis <hugo_at_pe_NO_rFact.in_SPAM_fo>
Date: Fri, 04 Jun 2004 00:20:31 +0200
Message-ID: <507vb054h6o2p0nm6ukre16lvnnmvbr54r_at_4ax.com>


On 3 Jun 2004 10:38:23 -0700, Neo wrote:

>> Small Report Generation Summary (provided by Hugo)
>> Platform: 1.3 Ghz PC, Non-SCSI HDs, SQL Server 2000, Windows 2000
>> ---------------------------------------------------------------------
>> Solution Time(ms) Notes
>> ---------- -------- -----------------------------------
>> RM's #1 14.3 Db not normalized to atomic symbols
>> Avg of 10 consequtive runs.
>>
>> RM's #2 11.0 Db not normalized to atomic symbols
>> Avg of 10 consequtive runs.
>>
>> XDb1 4.4.7 16 Db normalized to atomic symbols
>> Avg of 10 separate consequtive runs.
>> Unoptimized, debug version of exe
>
>The challenge is to generate the report from normalized and NULL-less
>data (I still need to convince you that 'o' in 'john' and 'god' is
>data).

Hi Neo,

No, you need to convince me that storing 'john' and 'god' as seperate values in a column is a violation of normalization rules as they apply in (and are defined for) the relational model. I already provided some quotes and URL's for Codd's normalization rules. Since the challenge was NOT "to generate the report from normalized and NULL-less data", but "to generate the report from normalized and NULL-less data (...) "using the relational model", the relational model's definitions for normalization should apply.

(In case you somehow lost your original message containing the challenge, here's a URL: http://tinyurl.com/yvm5g )

> With XDb1, the report is created from normalized and NULL-less
>data every time, without relying on any intermediate/cached results or
>statistics from prior runs. Aside from disk I/O, task switching, etc,
>the first run is as fast as subsequent runs (see consistency of
>reports generated in RAM only in earlier post). With XDb1, every
>report generation is "virgin".

OIC.
>However with SQL Server, the average of 10 runs may not be providing
>the time to generate a "virgin" report. There is a possibility that
>the last 9 runs are faster because they are relying on
>intermediate/cached data or statistics from the first run.

Strange remark. You have the full text of my query available. You can see that the first thing the stored procedure does is to remove all data from the work tables (which would actually serve to make a second execution *slower*, not faster!). Then, they are built again from the input data. Where do you see anything in my script that appears as if I'm reusing old data?

Further, there is excellent documentation on SQL Server itself. I can recommend Kalen Delaney's "Inside SQL Server 2000" if you really want to know all about SQL Server's internals. I've already posted a lot about what SQL Server does and does not cache elsewhere in this thread (thanks to Kalen Delaney's book, where I got it from!).

On the other hand - the source code for XDb1's ANC report is not disclosed and there is no "Inside XDb1" for sale. Why would I believe any of the claims you make above? Let me assure that I *DO* happen to believe these claims, but if you'd ask me why I believe you, I couldn't tell. If I count the claims you made about things that XDb1 apparently could do and had to admit later that it actually didn't or that it was bugged, I really see no reason to believe you, other than "gut feeling".

> Can you
>ensure the time measured is for generating a "virgin" report by using
>steps similar to below?
>
>1. Restart PC.
>2. Create new db.
>3. Run script to add data.
>4. Measure time for the first run.

I actually did the following:

  1. Restart PC.
  2. End all programs that get auto-started but can be easily stopped.
  3. Using control panel, start the SQL Server service (in normal, multi-user mode, though single-user mode might have been a fairer comparison if XDb1 doesn't support multi-user).
  4. Exit control panel, start command prompt (formerly known as DOS).
  5. Execute the script for RM#2 (reduced to 1 execution) via isql (the least resource-intensive front-end I culd find).
  6. Check output to know execution time for one run.

Here's the output:

 started                  ended                    elapsed        
 ------------------------ ------------------------ -------------- 
 2004-06-03 21:36:40.480 2004-06-03 21:36:40.497 00:00:00:017

 things hierarchies ancestors NCancestor

  • ----------- ----------- ----------- 8 11 28 29

 ThingX ThingY CmnAnc Dist

  • -------------------- -------------------- ----------- army fido army 2 army john army 1 army laptop1 army 2 army luke god 3 army mary army 1 army trinity god 2 fido laptop1 john 2 fido laptop1 mary 2 god army god 1 god fido god 3 god john god 2 god laptop1 god 3 god luke god 2 god mary god 2 god trinity god 1 john fido john 1 john laptop1 john 1 john luke god 4 john mary army 2 luke fido luke 1 luke laptop1 trinity 3 mary fido mary 1 mary laptop1 mary 1 mary luke trinity 2 trinity fido trinity 2 trinity john god 3 trinity laptop1 trinity 2 trinity luke trinity 1 trinity mary trinity 1

To compare this to XDb1 on the same hardware, I went on to do this:

  1. Start XDb1, create new database, copy, paste and executer all lines from Ex076 one by one, save database, exit XDb1.
  2. Restart PC.
  3. End all programs that get auto-started but can be easily stopped.
  4. Start XDb1. The new database I just created was auto-loaded.
  5. Rightclick 'god', ask for nearest common ancestor report.
  6. Check output to know execution time for one run.

Here's the output:

Common Ancestor Report for 'god'

ThingX	ThingY	CmnAnc	Dist
army	john	army	1
army	laptop1	army	2
army	fido	army	2
army	mary	army	1
army	trinity	god	2
army	luke	god	3
john	laptop1	john	1
john	fido	john	1
john	mary	army	2
john	trinity	god	3
john	luke	god	4
laptop1	fido	john	2
laptop1	mary	mary	1
laptop1	trinity	trinity	2
laptop1	luke	trinity	3
fido	mary	mary	1
fido	trinity	trinity	2
fido	luke	luke	1
mary	trinity	trinity	1
mary	luke	trinity	2
trinity	luke	trinity	1

Time elapsed: 15 msec

That's 17 ms for RM#2, 15 ms for XDb1. Of course, I should have repeated the above procedure several times to get a good average, but frankly, I couldn't bother. This has taken me way too much tie already.

From the challenge (see URL above): "Report generation must not be more than 2X slower than XDb1 on equivalent hardware". I did it. I just proved that (again).

I think it is now time for you to admit that my entry for the challenge satisfies all requirements.

Best, Hugo

-- 

(Remove _NO_ and _SPAM_ to get my e-mail address)
Received on Fri Jun 04 2004 - 00:20:31 CEST

Original text of this message