Re: Nearest Common Ancestor Report (XDb1's $1000 Challenge)

From: Hugo Kornelis <hugo_at_pe_NO_rFact.in_SPAM_fo>
Date: Mon, 17 May 2004 01:08:09 +0200
Message-ID: <59rfa0to95cfei0v18ncbo7je7us2hkfqh_at_4ax.com>


On 16 May 2004 11:27:07 -0700, Neo wrote:

>> Tony wrote: If your analogy holds any water at all (to give you the
>> benefit of very large doubt), it suggests that relational theory will do
>> just fine for pretty much anything we ever want to do "in the real world".
>
>$1000 to the first person who replicates the equivalent of
>www.xdb1.com/Example/Ex076.asp using the relational model. To claim
>the prize, one needs to produce the equivalent Nearest Common Ancestor
>Report from normalized and NULL-less data and the solution must be as
>generic, meaning allow the user to create any hierarchy, consisting of
>different types of things (each type to allow different attributes)
>and each thing in the hierarchy to have any number of parents. Report
>generation must not be more than 2X slower than XDb1 on equivalent
>hardware.

Hi Neo,

I'm not sure if I understand all requirements. You demand you have to start from normalized data, but you fail to specify what normal form you want: 1NF, 2NF, 3NF, etc.

Furthermore, you seem to desire the possibility to enter untyped data, which is of course impossible in a strong-typed language. I do present "sort of" a way to do this in a relational database, but I'd never use a kludge even remotely like this for real. Just as I consider Xdb1 to be completely worthless for any real probel, for exactly this same reason. Remove types, and nothing prevents your user from entering "banana" as John's age.

Third: you ask for the quoted example to be replicated "using the relational model". A model is not something that can be used to produce reports from data. I assume you meant to write "using an RDBMS"?

Many people in the comp.databases.theory group claim that MS SQL Server is not really relational, but as I don't have access to a "real" relational database, I hope it'll do.

One other thing that struck me in the input data:

>age isa thing.
>35 isa age.
>john is 35.
>
>weight isa thing.
>130 isa weight.
>mary is 130.

What is John were older (let's say 85) and Mary lost a lot of weight (let's say 45 pounds). I fail to see how Xdb1 could possibly be able to interpret the following correctly:

	age isa thing.
	80 isa age.
	john is 80.

	weight isa thing.
	80 isa weight.
	mary is 80.

Since 80 is both an age and a weight, how can Xdb1 make sense of this?

Below, you'll find a script that runs against MS SQL Server to reproduce your report. There's one difference in the output: the "things" fido and laptop1 have two common ancestors at the same distance (john and mary), but Xdb1 is only able to find john. I consider this to be a bug in Xdb1's output, since john is definitely NOT a "nearer" common ancestor that mary.

  • suppress "(..) row(s) affected" messages set nocount on
  • define some tables to hold the user input create table things (thing varchar(20) not null, primary key (thing)) create table types (thing varchar(20) not null references things, type varchar(20) not null, primary key (thing)) create table attributes_int (thing varchar(20) not null references things, attribute varchar(10) not null, value int not null, primary key (thing, attribute)) create table attributes_char (thing varchar(20) not null references things, attribute varchar(10) not null, value varchar(100) not null, primary key (thing, attribute)) create table hierarchies (thing varchar(20) not null references things, otherthing varchar(20) not null references things, hierarchy varchar(20) not null, primary key (hierarchy, thing, otherthing), check (thing <> otherthing))
  • this table is used to generate the report create table ancestors (thing varchar(20) not null, ancestor varchar(20) not null, dist int not null, primary key (thing, ancestor))
  • this table will hold the report create table NCancestor (ThingX varchar(20) not null, ThingY varchar(20) not null, CmnAnc varchar(20) not null, Dist int not null, primary key (ThingX, ThingY, CmnAnc)) go
  • this procedure creates the report create proc CommonAncestors (_at_hierarchy varchar(20)) as
  • delete data from previous execution delete ancestors delete NCancestor
  • starting data: thing itself is dist 0, direct leader is dist 1 insert ancestors (thing, ancestor, dist) select distinct thing, thing, 0 from things union all select thing, otherthing, 1 from hierarchies where hierarchy = _at_hierarchy
  • use iteration to determine indirect leaders while (_at__at_rowcount<>0) insert ancestors (thing, ancestor, dist) select distinct a1.thing, a2.ancestor, a1.dist + a2.dist from ancestors AS a1 join ancestors AS a2 on a2.thing = a1.ancestor where not exists (select * from ancestors AS a3 where a3.thing = a1.thing and a3.ancestor = a2.ancestor)
  • now find nearest common ancestor for each pair of things insert NCancestor (ThingX, ThingY, CmnAnc, Dist) select a1.thing, a2.thing, a1.ancestor, a1.dist + a2.dist from ancestors AS a1 join ancestors AS a2 on a1.ancestor = a2.ancestor and a1.thing < a2.thing where not exists (select * from ancestors AS a3 join ancestors AS a4 on a3.ancestor = a4.ancestor and a3.thing < a4.thing where a3.thing = a1.thing and a4.thing = a2.thing and a4.dist+a3.dist < a1.dist+a2.dist) go
  • dummy execution on empty table forces compilation of stored proc
  • (I assume that Xdb1 uses a precompiled procedure as well) exec CommonAncestors _at_hierarchy = 'whatever' go
  • now add all the data insert things (thing) values ('god') insert things (thing) values ('army') insert things (thing) values ('trinity') insert things (thing) values ('john') insert things (thing) values ('mary') insert things (thing) values ('luke') insert things (thing) values ('fido') insert things (thing) values ('laptop1') go insert types (thing, type) values ('army', 'force') insert types (thing, type) values ('trinity', 'church') insert types (thing, type) values ('john', 'person') insert types (thing, type) values ('mary', 'person') insert types (thing, type) values ('luke', 'person') insert types (thing, type) values ('fido', 'dog') insert types (thing, type) values ('laptop1', 'computer') go insert attributes_int (thing, attribute, value) values ('john', 'age', 35) insert attributes_int (thing, attribute, value) values ('mary', 'weight', 130) insert attributes_char (thing, attribute, value) values ('luke', 'color', 'red') go insert hierarchies (thing, otherthing, hierarchy) values ('army', 'god', 'leader') insert hierarchies (thing, otherthing, hierarchy) values ('trinity', 'god', 'leader') insert hierarchies (thing, otherthing, hierarchy) values ('john', 'army', 'leader') insert hierarchies (thing, otherthing, hierarchy) values ('mary', 'army', 'leader') insert hierarchies (thing, otherthing, hierarchy) values ('mary', 'trinity', 'leader') insert hierarchies (thing, otherthing, hierarchy) values ('luke', 'trinity', 'leader') insert hierarchies (thing, otherthing, hierarchy) values ('laptop1', 'john', 'leader') insert hierarchies (thing, otherthing, hierarchy) values ('laptop1', 'mary', 'leader') insert hierarchies (thing, otherthing, hierarchy) values ('fido', 'john', 'leader') insert hierarchies (thing, otherthing, hierarchy) values ('fido', 'mary', 'leader') insert hierarchies (thing, otherthing, hierarchy) values ('fido', 'luke', 'leader') go
  • generate the report - show elapsed time select current_timestamp AS 'Starting time' exec CommonAncestors _at_hierarchy='leader' select current_timestamp AS 'Ending time'
  • display the results select * from NCancestor go
  • cleanup after execution drop proc CommonAncestors drop table NCancestor drop table ancestors drop table hierarchies drop table attributes_int drop table attributes_char drop table types drop table things

Output:

Starting time



2004-05-17 00:58:40.047

Ending time



2004-05-17 00:58:40.047

ThingX ThingY CmnAnc Dist

-------------------- -------------------- -------------------- -----------
army                 fido                 army                 2
army                 god                  god                  1
army                 john                 army                 1
army                 laptop1              army                 2
army                 luke                 god                  3
army                 mary                 army                 1
army                 trinity              god                  2
fido                 god                  god                  3
fido                 john                 john                 1
fido                 laptop1              john                 2
fido                 laptop1              mary                 2
fido                 luke                 luke                 1
fido                 mary                 mary                 1
fido                 trinity              trinity              2
god                  john                 god                  2
god                  laptop1              god                  3
god                  luke                 god                  2
god                  mary                 god                  2
god                  trinity              god                  1
john                 laptop1              john                 1
john                 luke                 god                  4
john                 mary                 army                 2
john                 trinity              god                  3
laptop1              luke                 trinity              3
laptop1              mary                 mary                 1
laptop1              trinity              trinity              2
luke                 mary                 trinity              2
luke                 trinity              trinity              1
mary                 trinity              trinity              1


Note: the output shows the same starting and ending date/time. This indicates an elapsed time of less than 3 milliseconds (the smallest unit of time CURRENT_TIMESTAMP can measure). System used: PC running Windows 2000 + MS SQL Server 2000, AMD Athlon 1.3GHz, 256MB RAM.

Of course, the performance of any query against less than 20 rows of data is completely irrelevant for any real world need. What counts is the performance of a query against millions of rows of data.

Groetjes, Hugo

-- 

Sorry, vandaag geen grappige sig lines meer.
Received on Mon May 17 2004 - 01:08:09 CEST

Original text of this message