Re: Nearest Common Ancestor Report (XDb1's $1000 Challenge)
Date: Mon, 17 May 2004 01:08:09 +0200
Message-ID: <59rfa0to95cfei0v18ncbo7je7us2hkfqh_at_4ax.com>
On 16 May 2004 11:27:07 -0700, Neo wrote:
>> Tony wrote: If your analogy holds any water at all (to give you the
>> benefit of very large doubt), it suggests that relational theory will do
>> just fine for pretty much anything we ever want to do "in the real world".
>
>$1000 to the first person who replicates the equivalent of
>www.xdb1.com/Example/Ex076.asp using the relational model. To claim
>the prize, one needs to produce the equivalent Nearest Common Ancestor
>Report from normalized and NULL-less data and the solution must be as
>generic, meaning allow the user to create any hierarchy, consisting of
>different types of things (each type to allow different attributes)
>and each thing in the hierarchy to have any number of parents. Report
>generation must not be more than 2X slower than XDb1 on equivalent
>hardware.
Hi Neo,
Furthermore, you seem to desire the possibility to enter untyped data, which is of course impossible in a strong-typed language. I do present "sort of" a way to do this in a relational database, but I'd never use a kludge even remotely like this for real. Just as I consider Xdb1 to be completely worthless for any real probel, for exactly this same reason. Remove types, and nothing prevents your user from entering "banana" as John's age.
One other thing that struck me in the input data:
>age isa thing.
>35 isa age.
>john is 35.
>
>weight isa thing.
>130 isa weight.
>mary is 130.
What is John were older (let's say 85) and Mary lost a lot of weight (let's say 45 pounds). I fail to see how Xdb1 could possibly be able to interpret the following correctly:
age isa thing. 80 isa age. john is 80. weight isa thing. 80 isa weight. mary is 80.
Since 80 is both an age and a weight, how can Xdb1 make sense of this?
Below, you'll find a script that runs against MS SQL Server to reproduce your report. There's one difference in the output: the "things" fido and laptop1 have two common ancestors at the same distance (john and mary), but Xdb1 is only able to find john. I consider this to be a bug in Xdb1's output, since john is definitely NOT a "nearer" common ancestor that mary.
- suppress "(..) row(s) affected" messages set nocount on
- define some tables to hold the user input create table things (thing varchar(20) not null, primary key (thing)) create table types (thing varchar(20) not null references things, type varchar(20) not null, primary key (thing)) create table attributes_int (thing varchar(20) not null references things, attribute varchar(10) not null, value int not null, primary key (thing, attribute)) create table attributes_char (thing varchar(20) not null references things, attribute varchar(10) not null, value varchar(100) not null, primary key (thing, attribute)) create table hierarchies (thing varchar(20) not null references things, otherthing varchar(20) not null references things, hierarchy varchar(20) not null, primary key (hierarchy, thing, otherthing), check (thing <> otherthing))
- this table is used to generate the report create table ancestors (thing varchar(20) not null, ancestor varchar(20) not null, dist int not null, primary key (thing, ancestor))
- this table will hold the report create table NCancestor (ThingX varchar(20) not null, ThingY varchar(20) not null, CmnAnc varchar(20) not null, Dist int not null, primary key (ThingX, ThingY, CmnAnc)) go
- this procedure creates the report create proc CommonAncestors (_at_hierarchy varchar(20)) as
- delete data from previous execution delete ancestors delete NCancestor
- starting data: thing itself is dist 0, direct leader is dist 1 insert ancestors (thing, ancestor, dist) select distinct thing, thing, 0 from things union all select thing, otherthing, 1 from hierarchies where hierarchy = _at_hierarchy
- use iteration to determine indirect leaders while (_at__at_rowcount<>0) insert ancestors (thing, ancestor, dist) select distinct a1.thing, a2.ancestor, a1.dist + a2.dist from ancestors AS a1 join ancestors AS a2 on a2.thing = a1.ancestor where not exists (select * from ancestors AS a3 where a3.thing = a1.thing and a3.ancestor = a2.ancestor)
- now find nearest common ancestor for each pair of things insert NCancestor (ThingX, ThingY, CmnAnc, Dist) select a1.thing, a2.thing, a1.ancestor, a1.dist + a2.dist from ancestors AS a1 join ancestors AS a2 on a1.ancestor = a2.ancestor and a1.thing < a2.thing where not exists (select * from ancestors AS a3 join ancestors AS a4 on a3.ancestor = a4.ancestor and a3.thing < a4.thing where a3.thing = a1.thing and a4.thing = a2.thing and a4.dist+a3.dist < a1.dist+a2.dist) go
- dummy execution on empty table forces compilation of stored proc
- (I assume that Xdb1 uses a precompiled procedure as well) exec CommonAncestors _at_hierarchy = 'whatever' go
- now add all the data insert things (thing) values ('god') insert things (thing) values ('army') insert things (thing) values ('trinity') insert things (thing) values ('john') insert things (thing) values ('mary') insert things (thing) values ('luke') insert things (thing) values ('fido') insert things (thing) values ('laptop1') go insert types (thing, type) values ('army', 'force') insert types (thing, type) values ('trinity', 'church') insert types (thing, type) values ('john', 'person') insert types (thing, type) values ('mary', 'person') insert types (thing, type) values ('luke', 'person') insert types (thing, type) values ('fido', 'dog') insert types (thing, type) values ('laptop1', 'computer') go insert attributes_int (thing, attribute, value) values ('john', 'age', 35) insert attributes_int (thing, attribute, value) values ('mary', 'weight', 130) insert attributes_char (thing, attribute, value) values ('luke', 'color', 'red') go insert hierarchies (thing, otherthing, hierarchy) values ('army', 'god', 'leader') insert hierarchies (thing, otherthing, hierarchy) values ('trinity', 'god', 'leader') insert hierarchies (thing, otherthing, hierarchy) values ('john', 'army', 'leader') insert hierarchies (thing, otherthing, hierarchy) values ('mary', 'army', 'leader') insert hierarchies (thing, otherthing, hierarchy) values ('mary', 'trinity', 'leader') insert hierarchies (thing, otherthing, hierarchy) values ('luke', 'trinity', 'leader') insert hierarchies (thing, otherthing, hierarchy) values ('laptop1', 'john', 'leader') insert hierarchies (thing, otherthing, hierarchy) values ('laptop1', 'mary', 'leader') insert hierarchies (thing, otherthing, hierarchy) values ('fido', 'john', 'leader') insert hierarchies (thing, otherthing, hierarchy) values ('fido', 'mary', 'leader') insert hierarchies (thing, otherthing, hierarchy) values ('fido', 'luke', 'leader') go
- generate the report - show elapsed time select current_timestamp AS 'Starting time' exec CommonAncestors _at_hierarchy='leader' select current_timestamp AS 'Ending time'
- display the results select * from NCancestor go
- cleanup after execution drop proc CommonAncestors drop table NCancestor drop table ancestors drop table hierarchies drop table attributes_int drop table attributes_char drop table types drop table things
Output:
Starting time
2004-05-17 00:58:40.047
Ending time
2004-05-17 00:58:40.047
ThingX ThingY CmnAnc Dist
-------------------- -------------------- -------------------- ----------- army fido army 2 army god god 1 army john army 1 army laptop1 army 2 army luke god 3 army mary army 1 army trinity god 2 fido god god 3 fido john john 1 fido laptop1 john 2 fido laptop1 mary 2 fido luke luke 1 fido mary mary 1 fido trinity trinity 2 god john god 2 god laptop1 god 3 god luke god 2 god mary god 2 god trinity god 1 john laptop1 john 1 john luke god 4 john mary army 2 john trinity god 3 laptop1 luke trinity 3 laptop1 mary mary 1 laptop1 trinity trinity 2 luke mary trinity 2 luke trinity trinity 1 mary trinity trinity 1
Note: the output shows the same starting and ending date/time. This indicates an elapsed time of less than 3 milliseconds (the smallest unit of time CURRENT_TIMESTAMP can measure). System used: PC running Windows 2000 + MS SQL Server 2000, AMD Athlon 1.3GHz, 256MB RAM.
Of course, the performance of any query against less than 20 rows of data is completely irrelevant for any real world need. What counts is the performance of a query against millions of rows of data.
Groetjes, Hugo
-- Sorry, vandaag geen grappige sig lines meer.Received on Mon May 17 2004 - 01:08:09 CEST