Re: Nearest Common Ancestor Report (XDb1's $1000 Challenge)

From: Hugo Kornelis <hugo_at_pe_NO_rFact.in_SPAM_fo>
Date: Sat, 22 May 2004 01:38:01 +0200
Message-ID: <ig3ta010uqlq1gsespuhk2tn0v0tr5uc9l_at_4ax.com>


On 20 May 2004 14:22:24 -0700, Neo wrote:

Hi Neo,

>> > > This indicates an elapsed time of less than 3 ms ...
>> >
>> > Among other things, the difference in normalization between the
>> > implementations is quite different.
>> > In an initial attempt to make them more comparable...
>>
>> Now you're changing the rules after the competition has started.
>> Not a sign of good sportsmanship, Neo!
>
>The provided implementation was not similarly generic

Yes, I have to admit that my implementation was not "similarly" generic, but even MORE generic than the XDb1 implementation, as I didn't limit the possible hierarchies to a few predefined hierarchy types (as XDb1 does), but to allow ANY hierarchies (as required in the challenge).

> or normalized.

This argument is getting stale. In another message, I quoted Date's definition of normal forms, as well as a description of Domain Key Normal Form (DKNF), said to be <<(theoretically, at least) the "ultimate normal form">>. Instead of pointing out which of theses normal forms you think my implementation violates, your message only repeats uninteresting stuff about XDb1's internals, followed by an unbacked claim that there is redundancy in my table and that it's not normalized.

Backup your claim, if you can. But please leave XDb1 out of it - the challenge was to use the relational model, not XDb1's model.

>To make a very rough comparision, I simplified my implementation to
>that which was provided.

Are you referring to this quote from another message you wrote: "I modified XDb1 algorithm so as to not resolve things' names and simply prints their IDs."

If so, I don't accept this simplification. The challenge was to create a report (that contains names) from some input data (that contains names). There were no IDs in either the input or the report. If there are IDs under the cover is irrelevant.

If you want to start a new challenge to create a report showing only IDs from input data containing names, by all means go ahead. Heck, I might even try and modify my winning entry for this competition to see if I can rid you of another $1000.

> In general, XDb1 can't compete with RM in the
>scope of a single table's worth of data since XDb1 has the overhead of
>variable storage structures for each table cell. An yes, IDs are
>faster than text so the comparision is far from perfect. XDb1 should
>gain advantage as more tables are joined. The provided solution
>doesn't join any tables since it is less generic and unnormalized.

The provided solution is not less, but even more generic. And it is completely normalized.

The reason that it doesn't join any tables is that I made proper use of my RDBMS' tools, instead of the brainless automatic insertion of an autonumber/identity/guid table_ID column in every table that so many of my fellow RDBMS developers fall victim to.

>As I said earlier, in the simplfied case, it takes XDb1 2 or 3 ms on a
>500 Mhz, 512MB Dell PowerEdge Server.

As I said earlier, this is irrelevant to the challenge. In the original form, XDb1 took 16 ms on your machine. Interesting, it also took 16 ms on my machine (AMD Athlon 1.3 GHz, 256MB)

> I ran the provided SQL Server
>script on the same machine and the report generation takes 65 ms
>(based on the avg of the 3 runs below):
>
>Starting time Ending time Time Elapsed
>------------- ------------ ------------
>14:44:57.670 14:44:57.733 67 ms
>15:02:26.233 15:02:26.297 64 ms
>15:07:57.780 15:07:57.843 63 ms
>
>What might I be doing wrong to get results different than yours?

I assume that you copied and pasted my script straight into Query Analyzer and executed it without any modification.

It's hard to diagnose the cause of this difference without access to your computer. But I did think of one thing that might cause this difference.

When I installed SQL Server, I chose the collation "Latin1_General_BIN" as deafult. That means that character data comparisons are done by straight comparing of ASCII values. This is the quickest, but it has the disadvantage that 'john', 'John' and 'JOHN' are treated as three distinct values. If I wanted to, I could have a person named 'john' and a dog named 'John'.

I don't know what the default collation is, but I do know that it is one of the case insensitive collations. One that treats 'john', 'John' and 'JOHN' as equal. This will of course introduce some overhead for string comparisons - which are used quite extensively in my implementation.

Since this is the default collation, I assume that your database will use case insensitive comparisons. There is a simple way to find out: change one of the occurences of 'fido' in the insert statements for hierarchies to 'Fido'. I accept that your database will accept this, showing that it uses a case sensitive collation. If it is not accepted, your databse uses a case insensitive collation; it might be a binary collation but it need not - it might still be accent insensitive or even double-byte character set!

Another possible explanation might be the other activity on the machine that runs the SQL Server process. I have SQL Server on my desktop; when I timed the queries I first shut down all other applications. (SQL Server is a "friendly" process - it will grab extra memory or processor time if it needs more, but only if that won't hinder other processes. Reducing other processes will generally improve SQL Server's performance). Earlier testing (with several other applications active) resulted in elapsed times between 10 and 20 ms.

If you have SQL Server running on another server, make sure that server is dedicated to SQL Server only, if you want the best performance. If you have SQL Server running on your desktop, close all applications except Query Analyzer when running this query.

Best, Hugo

-- 

(Remove _NO_ and _SPAM_ to get my e-mail address)
Received on Sat May 22 2004 - 01:38:01 CEST

Original text of this message