On 16 May 2004 11:27:07 -0700, Neo wrote:
>> Tony wrote: If your analogy holds any water at all (to give you the
>> benefit of very large doubt), it suggests that relational theory will do
>> just fine for pretty much anything we ever want to do "in the real world".
>
>$1000 to the first person who replicates the equivalent of
>www.xdb1.com/Example/Ex076.asp using the relational model. To claim
>the prize, one needs to produce the equivalent Nearest Common Ancestor
>Report from normalized and NULL-less data and the solution must be as
>generic, meaning allow the user to create any hierarchy, consisting of
>different types of things (each type to allow different attributes)
>and each thing in the hierarchy to have any number of parents. Report
>generation must not be more than 2X slower than XDb1 on equivalent
>hardware.
Hi Neo,
I'm not sure if I understand all requirements. You demand you have to
start from normalized data, but you fail to specify what normal form you
want: 1NF, 2NF, 3NF, etc.
Furthermore, you seem to desire the possibility to enter untyped data,
which is of course impossible in a strong-typed language. I do present
"sort of" a way to do this in a relational database, but I'd never use a
kludge even remotely like this for real. Just as I consider Xdb1 to be
completely worthless for any real probel, for exactly this same reason.
Remove types, and nothing prevents your user from entering "banana" as
John's age.
Third: you ask for the quoted example to be replicated "using the
relational model". A model is not something that can be used to produce
reports from data. I assume you meant to write "using an RDBMS"?
Many people in the comp.databases.theory group claim that MS SQL Server is
not really relational, but as I don't have access to a "real" relational
database, I hope it'll do.
One other thing that struck me in the input data:
>age isa thing.
>35 isa age.
>john is 35.
>
>weight isa thing.
>130 isa weight.
>mary is 130.
What is John were older (let's say 85) and Mary lost a lot of weight
(let's say 45 pounds). I fail to see how Xdb1 could possibly be able to
interpret the following correctly:
age isa thing.
80 isa age.
john is 80.
weight isa thing.
80 isa weight.
mary is 80.
Since 80 is both an age and a weight, how can Xdb1 make sense of this?
Below, you'll find a script that runs against MS SQL Server to reproduce
your report. There's one difference in the output: the "things" fido and
laptop1 have two common ancestors at the same distance (john and mary),
but Xdb1 is only able to find john. I consider this to be a bug in Xdb1's
output, since john is definitely NOT a "nearer" common ancestor that mary.
- suppress "(..) row(s) affected" messages
set nocount on
- define some tables to hold the user input
create table things (thing varchar(20) not null,
primary key (thing))
create table types (thing varchar(20) not null references things,
type varchar(20) not null,
primary key (thing))
create table attributes_int (thing varchar(20) not null references things,
attribute varchar(10) not null,
value int not null,
primary key (thing, attribute))
create table attributes_char (thing varchar(20) not null references
things,
attribute varchar(10) not null,
value varchar(100) not null,
primary key (thing, attribute))
create table hierarchies (thing varchar(20) not null references things,
otherthing varchar(20) not null references
things,
hierarchy varchar(20) not null,
primary key (hierarchy, thing, otherthing),
check (thing <> otherthing))
- this table is used to generate the report
create table ancestors (thing varchar(20) not null,
ancestor varchar(20) not null,
dist int not null,
primary key (thing, ancestor))
- this table will hold the report
create table NCancestor (ThingX varchar(20) not null,
ThingY varchar(20) not null,
CmnAnc varchar(20) not null,
Dist int not null,
primary key (ThingX, ThingY, CmnAnc))
go
- this procedure creates the report
create proc CommonAncestors
(@hierarchy varchar(20))
as
- delete data from previous execution
delete ancestors
delete NCancestor
- starting data: thing itself is dist 0, direct leader is dist 1
insert ancestors (thing, ancestor, dist)
select distinct thing, thing, 0
from things
union all
select thing, otherthing, 1
from hierarchies
where hierarchy = @hierarchy
- use iteration to determine indirect leaders
while (@@rowcount<>0)
insert ancestors (thing, ancestor, dist)
select distinct a1.thing, a2.ancestor, a1.dist + a2.dist
from ancestors AS a1
join ancestors AS a2
on a2.thing = a1.ancestor
where not exists
(select *
from ancestors AS a3
where a3.thing = a1.thing
and a3.ancestor = a2.ancestor)
- now find nearest common ancestor for each pair of things
insert NCancestor (ThingX, ThingY, CmnAnc, Dist)
select a1.thing, a2.thing, a1.ancestor, a1.dist + a2.dist
from ancestors AS a1
join ancestors AS a2
on a1.ancestor = a2.ancestor
and a1.thing < a2.thing
where not exists
(select *
from ancestors AS a3
join ancestors AS a4
on a3.ancestor = a4.ancestor
and a3.thing < a4.thing
where a3.thing = a1.thing
and a4.thing = a2.thing
and a4.dist+a3.dist < a1.dist+a2.dist)
go
- dummy execution on empty table forces compilation of stored proc
- (I assume that Xdb1 uses a precompiled procedure as well)
exec CommonAncestors @hierarchy = 'whatever'
go
- now add all the data
insert things (thing)
values ('god')
insert things (thing)
values ('army')
insert things (thing)
values ('trinity')
insert things (thing)
values ('john')
insert things (thing)
values ('mary')
insert things (thing)
values ('luke')
insert things (thing)
values ('fido')
insert things (thing)
values ('laptop1')
go
insert types (thing, type)
values ('army', 'force')
insert types (thing, type)
values ('trinity', 'church')
insert types (thing, type)
values ('john', 'person')
insert types (thing, type)
values ('mary', 'person')
insert types (thing, type)
values ('luke', 'person')
insert types (thing, type)
values ('fido', 'dog')
insert types (thing, type)
values ('laptop1', 'computer')
go
insert attributes_int (thing, attribute, value)
values ('john', 'age', 35)
insert attributes_int (thing, attribute, value)
values ('mary', 'weight', 130)
insert attributes_char (thing, attribute, value)
values ('luke', 'color', 'red')
go
insert hierarchies (thing, otherthing, hierarchy)
values ('army', 'god', 'leader')
insert hierarchies (thing, otherthing, hierarchy)
values ('trinity', 'god', 'leader')
insert hierarchies (thing, otherthing, hierarchy)
values ('john', 'army', 'leader')
insert hierarchies (thing, otherthing, hierarchy)
values ('mary', 'army', 'leader')
insert hierarchies (thing, otherthing, hierarchy)
values ('mary', 'trinity', 'leader')
insert hierarchies (thing, otherthing, hierarchy)
values ('luke', 'trinity', 'leader')
insert hierarchies (thing, otherthing, hierarchy)
values ('laptop1', 'john', 'leader')
insert hierarchies (thing, otherthing, hierarchy)
values ('laptop1', 'mary', 'leader')
insert hierarchies (thing, otherthing, hierarchy)
values ('fido', 'john', 'leader')
insert hierarchies (thing, otherthing, hierarchy)
values ('fido', 'mary', 'leader')
insert hierarchies (thing, otherthing, hierarchy)
values ('fido', 'luke', 'leader')
go
- generate the report - show elapsed time
select current_timestamp AS 'Starting time'
exec CommonAncestors @hierarchy='leader'
select current_timestamp AS 'Ending time'
- display the results
select *
from NCancestor
go
- cleanup after execution
drop proc CommonAncestors
drop table NCancestor
drop table ancestors
drop table hierarchies
drop table attributes_int
drop table attributes_char
drop table types
drop table things
Output:
Starting time
2004-05-17 00:58:40.047
Ending time
2004-05-17 00:58:40.047
ThingX ThingY CmnAnc Dist
-------------------- -------------------- -------------------- -----------
army fido army 2
army god god 1
army john army 1
army laptop1 army 2
army luke god 3
army mary army 1
army trinity god 2
fido god god 3
fido john john 1
fido laptop1 john 2
fido laptop1 mary 2
fido luke luke 1
fido mary mary 1
fido trinity trinity 2
god john god 2
god laptop1 god 3
god luke god 2
god mary god 2
god trinity god 1
john laptop1 john 1
john luke god 4
john mary army 2
john trinity god 3
laptop1 luke trinity 3
laptop1 mary mary 1
laptop1 trinity trinity 2
luke mary trinity 2
luke trinity trinity 1
mary trinity trinity 1
Note: the output shows the same starting and ending date/time. This
indicates an elapsed time of less than 3 milliseconds (the smallest unit
of time CURRENT_TIMESTAMP can measure). System used: PC running Windows
2000 + MS SQL Server 2000, AMD Athlon 1.3GHz, 256MB RAM.
Of course, the performance of any query against less than 20 rows of data
is completely irrelevant for any real world need. What counts is the
performance of a query against millions of rows of data.
Groetjes, Hugo
--
Sorry, vandaag geen grappige sig lines meer.
Received on Sun May 16 2004 - 18:08:09 CDT