Re: Nearest Common Ancestor Report (XDb1's $1000 Challenge)
Date: Tue, 25 May 2004 01:25:49 +0200
Message-ID: <dmv4b05cafdulv5rv11681ui91r259tn3l_at_4ax.com>
On Mon, 24 May 2004 02:33:10 GMT, Nick Landsberg wrote:
Hi Nick,
(snip)
>Thanks Hugo,
>
>The reason why I posed the query is not so much to
>make work for you, but to get a flavor of how the
>two technolgies compare /as they are now/.
Your remark made me curious to find out how my model would hold against
more data. I created a script to generate lots of random data, ensuring
that the hierarchy would remain connected, strictly top-down yet
distributed as random as possible. I think I succeeded (generation script
available, in case anyone is interested).
The setup I used for these tests is:
Here are the results I gathered:
things hierarchies ancestors NCancestor elapsed
----------- ----------- ----------- ----------- --------------
And two larger tests in a very a-typical setup (running SQL Server in
single-user mode - normally only used for maintenance, but apparantly also
quicker)
things hierarchies ancestors NCancestor elapsed
----------- ----------- ----------- ----------- --------------
As you can see, elapsed time grows exponentially as the size of the input
increases.
* SQL Server 2000 SP3a, running as a service on my desktop computer, with
only one active application (Query Analyzer).
* 1.3 GHz Athlon processor / 256 MB RAM.
* Two 40GB 7200 rpm IDE Harddisks. The LOG file for my test DB is on one
of them. The DATA file for my test DB, plus the DATA and LOG file for
tempdb (used by SQL Server to store intermediate result sets) is on the
other harddisk. My Windows page file is on that hard disk as well (though
on another partition).
* I made sure that both the test database (DATA and LOG!!) and tempdb were
large enough to hold all intermittant data and the result. SQL Server will
automatically allocate extra disk space if a database needs more room, but
repeatedly allocating extra disk space is a performance killer (for the
first execution only!!), so I decided to preallocate the space.
* For all tests, I first recreated the tables and procedure, filled the
tables with the original data and then added extra rows in the things and
hierarchies tables.
* The results below show: number of rows in the things and hierarchies
table (indicating input volume), number of rows in ancestors and
NCancestor tables (indicating size of intermediate and output table) and
time elapsed for one execution of the stored procedure.
* I did some minor tweaking to the procedure after carrying out these
tests but before posting Neo the code (see other message). I think that
the improvement is only marginal - at least not enough to warrant
repeating all tests again.
18 31 81 153 00:00:00:017
108 210 1307 5778 00:00:00:513
208 505 4566 21528 00:00:02:407
408 1101 16938 83028 00:00:14:377
608 1700 36619 184528 00:00:51:877
1008 2900 94318 507528 00:05:03:577
1000 2883 94255 499500 00:04:00:250
2500 7381 464258 3123750 01:02:13:580
- Have SQL Server run on a dedicated server. Enable only the services that SQL Server needs, shut down everything else.
- Fastest processor currently available. Though SQL Server can use multiple processors, I don't think they would help for this particular query (but if money is not an issue, I'd still try, just to be sure).
- As much RAM as the server will hold. Over 2GB is only used by SQL Server Enterprise Edition, so get a license for that version as well.
- At least five seperate I/O devices: two stripe sets (for test DB Data and tempdb Data) (the more physical disks in the stripe set, the better), and three non-striped disks (one for test DB Log, one for tempdb Log and one for operating system, including pagefile).
- I'd hire a tuning expert. Though I do know a few tricks, I'm definitely not an expert on fine-tuning SQL Server.
- And I'd buy Joe Celko's latest book on trees and hierarchies in SQL. His alternative approach I hinted at in an earlier message will not do unfortunately - I expect it to be blindingly fast, but it can only handle hierarchies where each thing has at most one parent. But maybe he's got some other neat tricks in his book that I might want to steal.
>
>If Neo's techniques prove out, they may
>be interesting for commercial work in several
>years. One never knows. As I alluded to upthread,
>these discussion remind me a lot about the
>CODASYL vs. RDBMS discussions of roughly 20-25
>years ago. Assuming that Neo's ideas pan out,
>I would not presume that they would supplant
>traditional RDBMS's, just as RDBMS's have
>not completely supplanted hierarchical and
>network models. There is more than one way
>to skin a cat, and you use the best tools
>available depending on the size of the cat :)
Agreed. There is no "one size fits all" in our industry.
>
>You and Neo have had an interesting discussion.
>And, what is rare on usenet, haven't let the
>discussion degenerate into a flame war.
>I congratulate you both!
-- (Remove _NO_ and _SPAM_ to get my e-mail address)Received on Tue May 25 2004 - 01:25:49 CEST