Re: SQL for Modelling Generalization Hierarchy

From: Mikito Harakiri <mikharakiri_at_iahu.com>
Date: Thu, 27 May 2004 11:12:15 -0700
Message-ID: <Ymqtc.36$wQ3.92_at_news.oracle.com>


"Mikito Harakiri" <mikharakiri_at_iahu.com> wrote in message news:FNptc.35$wQ3.57_at_news.oracle.com...
> It depends how many subtypes you have. If you have 50, for example, you
> could still write it as a 50 table join. How much slower your query would
> be? Consider
>
> select * from Employees e, Doctors d, Nurses n, Janitors j, ...
> where e.id = d.id, e.id = n.id, e.id = j.id, ...
> and e.name = 'John'
>
> Assuming Employees table having the amount of data to fill in 3 levels of
> B-Tree indexes, and that you extract list of 1000 persons, you'll have
>
> access to Employees = 1000 records x 3 block reads + 1000 table access by
> row id
> access to Doctors = 1000 records x height of doctors' B-Tree + #doctors
> table access by row id
> access to Nurses = 1000 records x height of nurses' B-Tree + #nurses
table
> access by row id
> access to Janitors = 1000 records x height of janitors' B-Tree +
#janitors
> table access by row id
>
> Note that #doctors+#nurses+#janitors = 1000, so that second addition
> argument is not a problem. The height of each sybtype table b-tree,
however,
> is between 1 and 3, so that the cost of having 50 sybtypes could
potentially
> slow down your query to almost that factor -- 50.

Actually there is a better solution.

select e.*, d.*, <pad nulls for other types to match signature> from Employees e, Doctors d
where e.id = d.id
and e.name = 'John' and e.type = 'DOCTOR' union all
select e.*, n.*, <pad nulls for other types to match signature> from Employees e, Nurses n
where e.id = n.id
and e.name = 'John' and e.type = 'NURSE' union all
select e.*, j.*, <pad nulls for other types to match signature> from Employees e, Janitors n
where e.id = n.id
and e.name = 'John' and e.type = 'JANITOR'

We certainly want to have a composite index on Employees(type, name). We still assume that the index height is 3. Then, the cost is

1st disjunct:
access to Employees = #of Doctors * 3 block reads + #of Doctors access to Doctors = #of Doctors x height of doctors' B-Tree + #of Doctors

2nd disjunct:
access to Employees = #of Nurses * 3 block reads + #of Nurses access to Doctors = #of Nurses * height of nurses' B-Tree + #of Nurses

....

Now, adding all together we see that the total cost is:

1000*3 + 1000 +
+1000*<number between 1 and 3> + 1000

which is less than twice more expensive than accessing a single Employees table.

In short, this supports the message in the very beginning -- the perceived high cost of dealing with hierarchy implemented via joins is an urban legend. Received on Thu May 27 2004 - 20:12:15 CEST

Original text of this message