Re: Nearest Common Ancestor Report (XDb1's $1000 Challenge)

From: Neo <neo55592_at_hotmail.com>
Date: 2 Jun 2004 17:09:24 -0700
Message-ID: <4b45d3ad.0406021609.299d6c01_at_posting.google.com>

Small Report Generation Summary (provided by Hugo)

Solution   Time(ms)  Platform            Notes
---------- --------  ----------          ----------------------------
RM's #1       14.3   1.3 Ghz PC          Apple-to-orange
RM's #2       11.0   1.3 Ghz PC          A step closer to apples
XDb1 4.4.7    16     1.3 Ghz PC          Unoptimized, debug version

Small Report Generation Summary (provided by Neo)

Solution Time(ms) Platform Notes

---------- --------  -----------------  -----------------------------
RM's #1      65.0    500 Mhz Server     Apple-to-orange
RM's #2      68.9    500 Mhz Server     A step closer to apples
RM's #4     152.13   500 Mhz Server     Another step closer to apples
XDb1 4.4.7   16      500 Mhz Server     Unoptimized, debug version
XDb1 4.5.7    1.632  500 Mhz Server     Optimized version

XDb1 4.5.9 6.561 233 MHz Pocket PC 32 MB

Large Report (28,940 rows) Generation Summary (provided by Neo)

Solution Time(sec) Platform Notes

---------- --------  -----------------  ----------------------------
RM's #1      15.2    500 Mhz Server     Apple-to-orange
XDb1 4.5.7    2.9    500 Mhz Server     Optimized version

Since RM Solution #2, didn't generate a class hierarchy report and didn't allow things with no or multiple names, etc... I have updated it to RM Solution#4 (see script below) which is another step closer to making an apple-to-apple comparision. The class hierarchy which had been stored in redundant tables T_Classes, T_ClassOfThings and T_ClassHierarchy have been dropped and the data moved to T_hierarchies, thus allowing a report on the class hierarchy also. In addition, the name attribute of things was moved to T_attributes_char, thus allowing a thing with no name or multiple names. However, T_attributes_char will have redundant data if a thing's property has multiple values. The next RM Sol should normalize by adding T_AttribChar_Value. As one can see from the measurements above, RM's solution gets slower as it is begins to approach the level of genericness and normalization in XDb1 (which is normalized down to atomic symbols).

If someone can also run the below script on SQL Server for PocketPC, I would be interested in comparing its performance.

RM Sol#4, Classes normalized into hierarchy table, Name Attrib normalized
suppress "(..) row(s) affected" messages set nocount on
define some tables to hold the user input create table things (thing int not null identity, junk int, primary key (thing)) create table HierarchyTypes (hierarchy int not null identity, hierarchyName varchar(20) not null unique, primary key (hierarchy))
update and delete triggers needed to cascade updates and deletes of classes to SuperClass;
SQL Server doesn't accept this second DRI action, unfortunately. create table attributes_int (thing int not null references things, -- on update cascade on delete cascade, attribute varchar(10) not null, value int not null, primary key (thing, attribute)) create table attributes_char (thing int not null references things, -- on update cascade on delete cascade, attribute varchar(10) not null, value varchar(100) not null, primary key (thing, attribute)) create table hierarchies (thing int not null references things, -- on update cascade on delete cascade, otherthing int not null references things,
on update cascade on delete cascade, hierarchy int not null references HierarchyTypes, -- on update cascade on delete cascade, primary key (hierarchy, thing, otherthing), check (thing <> otherthing))
update and delete triggers needed to cascade updates and deletes of things to otherthing;
SQL Server doesn't accept this second DRI action, unfortunately.
this table is used to generate the report create table ancestors (thing int not null, ancestor int not null, dist int not null, primary key (thing, ancestor)) create unique nonclustered index testindex on ancestors(ancestor, thing)
this table will hold the report - here, the things' name is used create table NCancestor (ThingX varchar(20) not null, ThingY varchar(20) not null, CmnAnc varchar(20) not null, Dist int not null, primary key (ThingX, ThingY, CmnAnc)) go
this procedure creates the report create proc CommonAncestors (_at_hierarchyName varchar(20)) as declare _at_hierarchy int select _at_hierarchy = hierarchy from HierarchyTypes where hierarchyName = _at_hierarchyName
delete data from previous execution truncate table ancestors truncate table NCancestor
starting data: thing itself is dist 0, direct leader is dist 1 insert ancestors with (tablockx holdlock) (thing, ancestor, dist) select distinct thing, thing, 0 from things with (tablockx holdlock) union all select thing, otherthing, 1 from hierarchies with (tablockx holdlock) where hierarchy = _at_hierarchy
use iteration to determine indirect leaders while (_at__at_rowcount<>0) insert ancestors with (tablockx holdlock) (thing, ancestor, dist) select a1.thing, a2.ancestor, min(a1.dist + a2.dist) from ancestors AS a1 join ancestors AS a2 on a2.thing = a1.ancestor where not exists (select * from ancestors AS a3 where a3.thing = a1.thing and a3.ancestor = a2.ancestor) group by a1.thing, a2.ancestor
now find nearest common ancestor for each pair of things insert NCancestor with (tablockx holdlock) (ThingX, ThingY, CmnAnc, Dist)
select t1.thingName, t2.thingName, ta.thingName, a1.dist + a2.dist select attrib1.value, attrib2.value, attribA.value, a1.dist + a2.dist from ancestors AS a1 with (tablockx holdlock) join ancestors AS a2 on a1.ancestor = a2.ancestor and a1.thing < a2.thing join things AS t1 on t1.thing = a1.thing join attributes_char AS attrib1 on attrib1.thing = t1.thing and attrib1.attribute = 'name' join things AS t2 on t2.thing = a2.thing join attributes_char AS attrib2 on attrib2.thing = t2.thing and attrib2.attribute = 'name' join things AS ta on ta.thing = a1.ancestor join attributes_char AS attribA on attribA.thing = ta.thing and attribA.attribute = 'name' where not exists (select * from ancestors AS a3 join ancestors AS a4 on a3.ancestor = a4.ancestor and a3.thing < a4.thing where a3.thing = a1.thing and a4.thing = a2.thing and a4.dist+a3.dist < a1.dist+a2.dist) go
dummy execution on empty table forces compilation of stored proc exec CommonAncestors _at_hierarchyName = 'whatever' go --
now add the predefined data insert things (junk) values (0) -- 1 'god' insert things (junk) values (0) -- 2 'army' insert things (junk) values (0) -- 3 'trinity' insert things (junk) values (0) -- 4 'john' insert things (junk) values (0) -- 5 'mary' insert things (junk) values (0) -- 6 'luke' insert things (junk) values (0) -- 7 'fido' insert things (junk) values (0) -- 8 'laptop1' insert things (junk) values (0) -- 9 'force' insert things (junk) values (0) -- 10 'church' insert things (junk) values (0) -- 11 'person' insert things (junk) values (0) -- 12 'dog' insert things (junk) values (0) -- 13 'computer' insert things (junk) values (0) -- 14 'thing' go insert HierarchyTypes (hierarchyName) values ('leader') insert HierarchyTypes (hierarchyName) values ('class') go insert attributes_char (thing, attribute, value) values (1, 'name', 'god') insert attributes_char (thing, attribute, value) values (2, 'name', 'army') insert attributes_char (thing, attribute, value) values (3, 'name', 'trinity') insert attributes_char (thing, attribute, value) values (4, 'name', 'john') insert attributes_int (thing, attribute, value) values (4, 'age', 35) insert attributes_char (thing, attribute, value) values (5, 'name', 'mary') insert attributes_int (thing, attribute, value) values (5, 'weight', 130) insert attributes_char (thing, attribute, value) values (6, 'name', 'luke') insert attributes_char (thing, attribute, value) values (6, 'color', 'red') insert attributes_char (thing, attribute, value) values (7, 'name', 'fido') insert attributes_char (thing, attribute, value) values (8, 'name', 'laptop1') insert attributes_char (thing, attribute, value) values (9, 'name', 'force') insert attributes_char (thing, attribute, value) values (10, 'name', 'church') insert attributes_char (thing, attribute, value) values (11, 'name', 'person') insert attributes_char (thing, attribute, value) values (12, 'name', 'dog') insert attributes_char (thing, attribute, value) values (13, 'name', 'computer') insert attributes_char (thing, attribute, value) values (14, 'name', 'thing') go insert hierarchies (thing, otherthing, hierarchy) values (9, 14, 2) insert hierarchies (thing, otherthing, hierarchy) values (10, 14, 2) insert hierarchies (thing, otherthing, hierarchy) values (11, 14, 2) insert hierarchies (thing, otherthing, hierarchy) values (12, 14, 2) insert hierarchies (thing, otherthing, hierarchy) values (13, 14, 2) insert hierarchies (thing, otherthing, hierarchy) values (2, 9, 2) insert hierarchies (thing, otherthing, hierarchy) values (3, 10, 2) insert hierarchies (thing, otherthing, hierarchy) values (4, 11, 2) insert hierarchies (thing, otherthing, hierarchy) values (5, 11, 2) insert hierarchies (thing, otherthing, hierarchy) values (6, 11, 2) insert hierarchies (thing, otherthing, hierarchy) values (7, 12, 2) insert hierarchies (thing, otherthing, hierarchy) values (8, 13, 2) -- insert hierarchies (thing, otherthing, hierarchy) values (2, 1, 1) insert hierarchies (thing, otherthing, hierarchy) values (3, 1, 1) insert hierarchies (thing, otherthing, hierarchy) values (4, 2, 1) insert hierarchies (thing, otherthing, hierarchy) values (5, 2, 1) insert hierarchies (thing, otherthing, hierarchy) values (5, 3, 1) insert hierarchies (thing, otherthing, hierarchy) values (6, 3, 1) insert hierarchies (thing, otherthing, hierarchy) values (8, 4, 1) insert hierarchies (thing, otherthing, hierarchy) values (8, 5, 1) insert hierarchies (thing, otherthing, hierarchy) values (7, 4, 1) insert hierarchies (thing, otherthing, hierarchy) values (7, 5, 1) insert hierarchies (thing, otherthing, hierarchy) values (7, 6, 1) go -- print 'Initialization done - starting actual test' go
generate the report - show elapsed time for 10 declare _at_started datetime, @ended datetime set _at_started = current_timestamp exec CommonAncestors _at_hierarchyName = 'leader' exec CommonAncestors _at_hierarchyName = 'leader' exec CommonAncestors _at_hierarchyName = 'leader' exec CommonAncestors _at_hierarchyName = 'leader' exec CommonAncestors _at_hierarchyName = 'leader' exec CommonAncestors _at_hierarchyName = 'leader' exec CommonAncestors _at_hierarchyName = 'leader' exec CommonAncestors _at_hierarchyName = 'leader' exec CommonAncestors _at_hierarchyName = 'leader' exec CommonAncestors _at_hierarchyName = 'leader' set _at_ended = current_timestamp select convert(char(24), _at_started, 121) as started, convert(char(24), _at_ended, 121) as ended, convert(char(14), (_at_ended - @started), 114) as "elapsed (10 executions)"
display the results select * from NCancestor order by ThingX, ThingY go --
cleanup after execution drop proc CommonAncestors drop table NCancestor drop table ancestors drop table hierarchies drop table attributes_int drop table attributes_char drop table HierarchyTypes drop table things go

Received on Thu Jun 03 2004 - 02:09:24 CEST

This message: [ Message body ]
Next message: Dawn M. Wolthuis: "Re: In an RDBMS, what does "Data" mean?"
Previous message: Paul: "Re: In an RDBMS, what does "Data" mean?"
Maybe in reply to: Dieter Nöth: "Re: Nearest Common Ancestor Report (XDb1's $1000 Challenge)"
Next in thread: Dawn M. Wolthuis: "Re: In an RDBMS, what does "Data" mean?"
Reply: Neo: "Re: Nearest Common Ancestor Report (XDb1's $1000 Challenge)"
Reply: Hugo Kornelis: "Re: Nearest Common Ancestor Report (XDb1's $1000 Challenge)"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

Original text of this message