Hi Neo,
I promised a second post - here it is.
When I was testing my algorith against larger amounts of data, I noticed
that it was bugged. The algorithm to determine all ancestors of each thing
may sometimes try to insert the same thing/ancestor combination with two
different path lengths. I had to use a GROUP BY and a MIN(..) to solve
this.
I didn't like the performance hit I took (up to 14.3 ms avg) - still
better than the 16 ms that XDb1 needs on this same machine but I figured I
could do better.
I changed my model to use integer columns with the IDENTITY property (SQL
Server's name for autoincrementing artificial ID) as primary keys. This
improved the performance of my report generation; the average execution
time is now down to 11.0 ms. Note that my report still shows the names of
the things in the hierarchies, not the ID's. (SQL and output are down
below).
Incidentally, this quicker implementation is also better equipped to adapt
to many of the changes you suggested after your original post.
You want different things to share the same name? Easy - just remove the
unique constraint on thingName and you're done.
You want things without name? Remove thingName from the things table,
create a new table thingNames with columns thing (primary key / foreign
key to things) and thingName.
You want things with multiple names? As above, but primary key on
thingNames table should span both columns.
All this doesn't matter for the challenge you set, as these were all
requirements that you brought into the equation at a later time. What does
matter is my improved model, that you'll find below. The average execution
time is 11.0 ms (as you can see from the output). This is based on an
execution on my desktop system (1.3 GHz Athlon, 256 MB RAM, 2 harddisks at
40 GB / 7200 rpm each) with several open applications. XDb1 takes 16 ms
for generating the report on the same machine with the same applications
still active.
(Start of SQL for improved model)
- define some tables to hold the user input
create table things (thing int not null identity,
thingName varchar(20) not null unique,
primary key (thing))
create table classes (class int not null identity,
className varchar(20) not null unique,
primary key (class))
create table HierarchyTypes (hierarchy int not null identity,
hierarchyName varchar(20) not null unique,
primary key (hierarchy))
create table ClassOfThings (thing int not null references things on update
cascade on delete cascade,
class int not null references classes on
update cascade on delete cascade,
primary key (thing))
create table ClassHierarchy (SubClass int not null references classes on
update cascade on delete cascade,
SuperClass int not null, -- references
classes on update cascade on delete cascade,
primary key(SubClass, SuperClass))
- update and delete triggers needed to cascade updates and deletes of
classes to SuperClass;
- SQL Server doesn't accept this second DRI action, unfortunately.
create table attributes_int (thing int not null references things on
update cascade on delete cascade,
attribute varchar(10) not null,
value int not null,
primary key (thing, attribute))
create table attributes_char (thing int not null references things on
update cascade on delete cascade,
attribute varchar(10) not null,
value varchar(100) not null,
primary key (thing, attribute))
create table hierarchies (thing int not null references things on update
cascade on delete cascade,
otherthing int not null, -- references things on
update cascade on delete cascade,
hierarchy int not null references HierarchyTypes
on update cascade on delete cascade,
primary key (hierarchy, thing, otherthing),
check (thing <> otherthing))
- update and delete triggers needed to cascade updates and deletes of
things to otherthing;
- SQL Server doesn't accept this second DRI action, unfortunately.
- this table is used to generate the report
create table ancestors (thing int not null,
ancestor int not null,
dist int not null,
primary key (thing, ancestor))
create unique nonclustered index testindex on ancestors(ancestor, thing)
- this table will hold the report - here, the things' name is used
create table NCancestor (ThingX varchar(20) not null,
ThingY varchar(20) not null,
CmnAnc varchar(20) not null,
Dist int not null,
primary key (ThingX, ThingY, CmnAnc))
go
- this procedure creates the report
create proc CommonAncestors
(@hierarchyName varchar(20))
as
declare @hierarchy int
select @hierarchy = hierarchy
from HierarchyTypes
where hierarchyName = @hierarchyName
- delete data from previous execution
truncate table ancestors
truncate table NCancestor
- starting data: thing itself is dist 0, direct leader is dist 1
insert ancestors with (tablockx holdlock) (thing, ancestor, dist)
select distinct thing, thing, 0
from things with (tablockx holdlock)
union all
select thing, otherthing, 1
from hierarchies with (tablockx holdlock)
where hierarchy = @hierarchy
- use iteration to determine indirect leaders
while (@@rowcount<>0)
insert ancestors with (tablockx holdlock) (thing, ancestor, dist)
select a1.thing, a2.ancestor, min(a1.dist + a2.dist)
from ancestors AS a1
join ancestors AS a2
on a2.thing = a1.ancestor
where not exists
(select *
from ancestors AS a3
where a3.thing = a1.thing
and a3.ancestor = a2.ancestor)
group by a1.thing, a2.ancestor
- now find nearest common ancestor for each pair of things
insert NCancestor with (tablockx holdlock) (ThingX, ThingY, CmnAnc, Dist)
select t1.thingName, t2.thingName, ta.thingName, a1.dist + a2.dist
from ancestors AS a1 with (tablockx holdlock)
join ancestors AS a2
on a1.ancestor = a2.ancestor
and a1.thing < a2.thing
join things AS t1
on t1.thing = a1.thing
join things AS t2
on t2.thing = a2.thing
join things AS ta
on ta.thing = a1.ancestor
where not exists
(select *
from ancestors AS a3
join ancestors AS a4
on a3.ancestor = a4.ancestor
and a3.thing < a4.thing
where a3.thing = a1.thing
and a4.thing = a2.thing
and a4.dist+a3.dist < a1.dist+a2.dist)
go
- dummy execution on empty table forces compilation of stored proc
- (I assume that Xdb1 uses a precompiled procedure as well)
exec CommonAncestors @hierarchyName = 'whatever'
go
--
- now add the predefined data
insert things (thingName)
values ('god')
insert things (thingName)
values ('army')
insert things (thingName)
values ('trinity')
insert things (thingName)
values ('john')
insert things (thingName)
values ('mary')
insert things (thingName)
values ('luke')
insert things (thingName)
values ('fido')
insert things (thingName)
values ('laptop1')
go
insert classes (className)
values ('force')
insert classes (className)
values ('church')
insert classes (className)
values ('person')
insert classes (className)
values ('dog')
insert classes (className)
values ('computer')
go
insert HierarchyTypes (hierarchyName)
values ('leader')
go
insert ClassOfThings (thing, class)
values (2, 1)
insert ClassOfThings (thing, class)
values (3, 2)
insert ClassOfThings (thing, class)
values (4, 3)
insert ClassOfThings (thing, class)
values (5, 3)
insert ClassOfThings (thing, class)
values (6, 3)
insert ClassOfThings (thing, class)
values (7, 4)
insert ClassOfThings (thing, class)
values (8, 5)
go
insert attributes_int (thing, attribute, value)
values (4, 'age', 35)
insert attributes_int (thing, attribute, value)
values (5, 'weight', 130)
insert attributes_char (thing, attribute, value)
values (6, 'color', 'red')
go
insert hierarchies (thing, otherthing, hierarchy)
values (2, 1, 1)
insert hierarchies (thing, otherthing, hierarchy)
values (3, 1, 1)
insert hierarchies (thing, otherthing, hierarchy)
values (4, 2, 1)
insert hierarchies (thing, otherthing, hierarchy)
values (5, 2, 1)
insert hierarchies (thing, otherthing, hierarchy)
values (5, 3, 1)
insert hierarchies (thing, otherthing, hierarchy)
values (6, 3, 1)
insert hierarchies (thing, otherthing, hierarchy)
values (8, 4, 1)
insert hierarchies (thing, otherthing, hierarchy)
values (8, 5, 1)
insert hierarchies (thing, otherthing, hierarchy)
values (7, 4, 1)
insert hierarchies (thing, otherthing, hierarchy)
values (7, 5, 1)
insert hierarchies (thing, otherthing, hierarchy)
values (7, 6, 1)
go
--
- random name / hierarchy generation for lots of data goes here
--
print 'Initialization done - starting actual test'
go
--
- generate the report - show elapsed time
declare @started datetime, @ended datetime
set @started = current_timestamp
exec CommonAncestors @hierarchyName = 'leader'
exec CommonAncestors @hierarchyName = 'leader'
exec CommonAncestors @hierarchyName = 'leader'
exec CommonAncestors @hierarchyName = 'leader'
exec CommonAncestors @hierarchyName = 'leader'
exec CommonAncestors @hierarchyName = 'leader'
exec CommonAncestors @hierarchyName = 'leader'
exec CommonAncestors @hierarchyName = 'leader'
exec CommonAncestors @hierarchyName = 'leader'
exec CommonAncestors @hierarchyName = 'leader'
set @ended = current_timestamp
select convert(char(24), @started, 121) as started,
convert(char(24), @ended, 121) as ended,
convert(char(14), (@ended - @started), 114) as "elapsed (10
executions)"
- display the results
select *
from NCancestor
order by ThingX, ThingY
go
--
- cleanup after execution
drop proc CommonAncestors
drop table NCancestor
drop table ancestors
drop table hierarchies
drop table attributes_int
drop table attributes_char
drop table ClassHierarchy
drop table ClassOfThings
drop table HierarchyTypes
drop table classes
drop table things
go
(Output after running the above script)
Initialization done - starting actual test
started ended elapsed (10 executions)
------------------------ ------------------------ -----------------------
2004-05-25 00:37:03.780 2004-05-25 00:37:03.890 00:00:00:110
ThingX ThingY CmnAnc Dist
-------------------- -------------------- -------------------- -----------
army fido army 2
army john army 1
army laptop1 army 2
army luke god 3
army mary army 1
army trinity god 2
fido laptop1 john 2
fido laptop1 mary 2
god army god 1
god fido god 3
god john god 2
god laptop1 god 3
god luke god 2
god mary god 2
god trinity god 1
john fido john 1
john laptop1 john 1
john luke god 4
john mary army 2
luke fido luke 1
luke laptop1 trinity 3
mary fido mary 1
mary laptop1 mary 1
mary luke trinity 2
trinity fido trinity 2
trinity john god 3
trinity laptop1 trinity 2
trinity luke trinity 1
trinity mary trinity 1
-------------------------------------------------------------------------
Best, Hugo
--
(Remove _NO_ and _SPAM_ to get my e-mail address)
Received on Mon May 24 2004 - 17:48:07 CDT