Re: Nearest Common Ancestor Report (XDb1's $1000 Challenge)
Date: Tue, 25 May 2004 00:48:07 +0200
Message-ID: <mts4b09m5olm07v0b9ppienrdk8au4md6c_at_4ax.com>
Hi Neo,
I promised a second post - here it is.
When I was testing my algorith against larger amounts of data, I noticed that it was bugged. The algorithm to determine all ancestors of each thing may sometimes try to insert the same thing/ancestor combination with two different path lengths. I had to use a GROUP BY and a MIN(..) to solve this.
I didn't like the performance hit I took (up to 14.3 ms avg) - still better than the 16 ms that XDb1 needs on this same machine but I figured I could do better.
I changed my model to use integer columns with the IDENTITY property (SQL Server's name for autoincrementing artificial ID) as primary keys. This improved the performance of my report generation; the average execution time is now down to 11.0 ms. Note that my report still shows the names of the things in the hierarchies, not the ID's. (SQL and output are down below).
Incidentally, this quicker implementation is also better equipped to adapt
to many of the changes you suggested after your original post.
You want different things to share the same name? Easy - just remove the
unique constraint on thingName and you're done.
You want things without name? Remove thingName from the things table,
create a new table thingNames with columns thing (primary key / foreign
key to things) and thingName.
You want things with multiple names? As above, but primary key on
thingNames table should span both columns.
All this doesn't matter for the challenge you set, as these were all requirements that you brought into the equation at a later time. What does matter is my improved model, that you'll find below. The average execution time is 11.0 ms (as you can see from the output). This is based on an execution on my desktop system (1.3 GHz Athlon, 256 MB RAM, 2 harddisks at 40 GB / 7200 rpm each) with several open applications. XDb1 takes 16 ms for generating the report on the same machine with the same applications still active.
(Start of SQL for improved model)
- define some tables to hold the user input create table things (thing int not null identity, thingName varchar(20) not null unique, primary key (thing)) create table classes (class int not null identity, className varchar(20) not null unique, primary key (class)) create table HierarchyTypes (hierarchy int not null identity, hierarchyName varchar(20) not null unique, primary key (hierarchy)) create table ClassOfThings (thing int not null references things on update cascade on delete cascade, class int not null references classes on update cascade on delete cascade, primary key (thing)) create table ClassHierarchy (SubClass int not null references classes on update cascade on delete cascade, SuperClass int not null, -- references classes on update cascade on delete cascade, primary key(SubClass, SuperClass))
- update and delete triggers needed to cascade updates and deletes of classes to SuperClass;
- SQL Server doesn't accept this second DRI action, unfortunately. create table attributes_int (thing int not null references things on update cascade on delete cascade, attribute varchar(10) not null, value int not null, primary key (thing, attribute)) create table attributes_char (thing int not null references things on update cascade on delete cascade, attribute varchar(10) not null, value varchar(100) not null, primary key (thing, attribute)) create table hierarchies (thing int not null references things on update cascade on delete cascade, otherthing int not null, -- references things on update cascade on delete cascade, hierarchy int not null references HierarchyTypes on update cascade on delete cascade, primary key (hierarchy, thing, otherthing), check (thing <> otherthing))
- update and delete triggers needed to cascade updates and deletes of things to otherthing;
- SQL Server doesn't accept this second DRI action, unfortunately.
- this table is used to generate the report create table ancestors (thing int not null, ancestor int not null, dist int not null, primary key (thing, ancestor)) create unique nonclustered index testindex on ancestors(ancestor, thing)
- this table will hold the report - here, the things' name is used create table NCancestor (ThingX varchar(20) not null, ThingY varchar(20) not null, CmnAnc varchar(20) not null, Dist int not null, primary key (ThingX, ThingY, CmnAnc)) go
- this procedure creates the report create proc CommonAncestors (_at_hierarchyName varchar(20)) as declare _at_hierarchy int select _at_hierarchy = hierarchy from HierarchyTypes where hierarchyName = _at_hierarchyName
- delete data from previous execution truncate table ancestors truncate table NCancestor
- starting data: thing itself is dist 0, direct leader is dist 1 insert ancestors with (tablockx holdlock) (thing, ancestor, dist) select distinct thing, thing, 0 from things with (tablockx holdlock) union all select thing, otherthing, 1 from hierarchies with (tablockx holdlock) where hierarchy = _at_hierarchy
- use iteration to determine indirect leaders while (_at__at_rowcount<>0) insert ancestors with (tablockx holdlock) (thing, ancestor, dist) select a1.thing, a2.ancestor, min(a1.dist + a2.dist) from ancestors AS a1 join ancestors AS a2 on a2.thing = a1.ancestor where not exists (select * from ancestors AS a3 where a3.thing = a1.thing and a3.ancestor = a2.ancestor) group by a1.thing, a2.ancestor
- now find nearest common ancestor for each pair of things insert NCancestor with (tablockx holdlock) (ThingX, ThingY, CmnAnc, Dist) select t1.thingName, t2.thingName, ta.thingName, a1.dist + a2.dist from ancestors AS a1 with (tablockx holdlock) join ancestors AS a2 on a1.ancestor = a2.ancestor and a1.thing < a2.thing join things AS t1 on t1.thing = a1.thing join things AS t2 on t2.thing = a2.thing join things AS ta on ta.thing = a1.ancestor where not exists (select * from ancestors AS a3 join ancestors AS a4 on a3.ancestor = a4.ancestor and a3.thing < a4.thing where a3.thing = a1.thing and a4.thing = a2.thing and a4.dist+a3.dist < a1.dist+a2.dist) go
- dummy execution on empty table forces compilation of stored proc
- (I assume that Xdb1 uses a precompiled procedure as well)
exec CommonAncestors _at_hierarchyName = 'whatever'
go
--
- now add the predefined data
insert things (thingName)
values ('god')
insert things (thingName)
values ('army')
insert things (thingName)
values ('trinity')
insert things (thingName)
values ('john')
insert things (thingName)
values ('mary')
insert things (thingName)
values ('luke')
insert things (thingName)
values ('fido')
insert things (thingName)
values ('laptop1')
go
insert classes (className)
values ('force')
insert classes (className)
values ('church')
insert classes (className)
values ('person')
insert classes (className)
values ('dog')
insert classes (className)
values ('computer')
go
insert HierarchyTypes (hierarchyName)
values ('leader')
go
insert ClassOfThings (thing, class)
values (2, 1)
insert ClassOfThings (thing, class)
values (3, 2)
insert ClassOfThings (thing, class)
values (4, 3)
insert ClassOfThings (thing, class)
values (5, 3)
insert ClassOfThings (thing, class)
values (6, 3)
insert ClassOfThings (thing, class)
values (7, 4)
insert ClassOfThings (thing, class)
values (8, 5)
go
insert attributes_int (thing, attribute, value)
values (4, 'age', 35)
insert attributes_int (thing, attribute, value)
values (5, 'weight', 130)
insert attributes_char (thing, attribute, value)
values (6, 'color', 'red')
go
insert hierarchies (thing, otherthing, hierarchy)
values (2, 1, 1)
insert hierarchies (thing, otherthing, hierarchy)
values (3, 1, 1)
insert hierarchies (thing, otherthing, hierarchy)
values (4, 2, 1)
insert hierarchies (thing, otherthing, hierarchy)
values (5, 2, 1)
insert hierarchies (thing, otherthing, hierarchy)
values (5, 3, 1)
insert hierarchies (thing, otherthing, hierarchy)
values (6, 3, 1)
insert hierarchies (thing, otherthing, hierarchy)
values (8, 4, 1)
insert hierarchies (thing, otherthing, hierarchy)
values (8, 5, 1)
insert hierarchies (thing, otherthing, hierarchy)
values (7, 4, 1)
insert hierarchies (thing, otherthing, hierarchy)
values (7, 5, 1)
insert hierarchies (thing, otherthing, hierarchy)
values (7, 6, 1)
go
--
- random name / hierarchy generation for lots of data goes here
--
print 'Initialization done - starting actual test' go
--
- generate the report - show elapsed time declare _at_started datetime, @ended datetime set _at_started = current_timestamp exec CommonAncestors _at_hierarchyName = 'leader' exec CommonAncestors _at_hierarchyName = 'leader' exec CommonAncestors _at_hierarchyName = 'leader' exec CommonAncestors _at_hierarchyName = 'leader' exec CommonAncestors _at_hierarchyName = 'leader' exec CommonAncestors _at_hierarchyName = 'leader' exec CommonAncestors _at_hierarchyName = 'leader' exec CommonAncestors _at_hierarchyName = 'leader' exec CommonAncestors _at_hierarchyName = 'leader' exec CommonAncestors _at_hierarchyName = 'leader' set _at_ended = current_timestamp select convert(char(24), _at_started, 121) as started, convert(char(24), _at_ended, 121) as ended, convert(char(14), (_at_ended - @started), 114) as "elapsed (10 executions)"
- display the results
select *
from NCancestor
order by ThingX, ThingY
go
--
- cleanup after execution
drop proc CommonAncestors
drop table NCancestor
drop table ancestors
drop table hierarchies
drop table attributes_int
drop table attributes_char
drop table ClassHierarchy
drop table ClassOfThings
drop table HierarchyTypes
drop table classes
drop table things
go
(Output after running the above script)
Initialization done - starting actual test
started ended elapsed (10 executions)2004-05-25 00:37:03.780 2004-05-25 00:37:03.890 00:00:00:110
------------------------ ------------------------ -----------------------
ThingX ThingY CmnAnc Dist
-------------------- -------------------- -------------------- -----------
army fido army 2 army john army 1 army laptop1 army 2 army luke god 3 army mary army 1 army trinity god 2 fido laptop1 john 2 fido laptop1 mary 2 god army god 1 god fido god 3 god john god 2 god laptop1 god 3 god luke god 2 god mary god 2 god trinity god 1 john fido john 1 john laptop1 john 1 john luke god 4 john mary army 2 luke fido luke 1 luke laptop1 trinity 3 mary fido mary 1 mary laptop1 mary 1 mary luke trinity 2 trinity fido trinity 2 trinity john god 3 trinity laptop1 trinity 2 trinity luke trinity 1 trinity mary trinity 1
-------------------------------------------------------------------------
Best, Hugo
--
(Remove _NO_ and _SPAM_ to get my e-mail address) Received on Tue May 25 2004 - 00:48:07 CEST
