Re: SQL for Modelling Generalization Hierarchy

From: Mikito Harakiri <mikharakiri_at_iahu.com>
Date: Thu, 27 May 2004 10:32:28 -0700
Message-ID: <FNptc.35$wQ3.57_at_news.oracle.com>


"Mikito Harakiri" <mikharakiri_at_iahu.com> wrote in message news:Gfptc.34$wQ3.126_at_news.oracle.com...
> "Robert Brown" <robertbrown1971_at_yahoo.com> wrote in message
> news:240a4d09.0405270839.16582e0b_at_posting.google.com...
> > The only solution I found for this is a generalization hiearchy where
> > you have the employee table with all generic attributes and then you
> > add tables for each new employee subtype as necessary. The subtype
> > tables share the primary key of the employee table. The employee table
> > has a "discriminator" field that allows you to figure out which
> > subtype table to load for a particular entity.
> >
> > This solution does not seem to scale since for each value of
> > "discriminator" I need to perform a join with a different table. What
> > if I need to retrieve 1,000 employees at once. Is that possible with
> > SQL or do you I need to iterate look at the discriminator and then
> > perform the appropriate join? If this kind of iteration is necessary
> > then obviously this generalization hierarchy approach does not work
> > since it would be painfully slow.
>
> Urban legend. If your result set is small then indexed nested loops join
is
> fast no matter how many joins are involved. In a typical scenario
>
> select * from Employees e, Doctors d
> where e.id = d.id
> and e.name = 'Tegiri Nenashi'
>
> the RDBMS fetches small list of Employees by index range scan (several
block
> reads) and, then, for each Employee record found it would fetch Doctors
> subtype record by index unique scan (several block reads as well).

That was writing before reading your post:-)

It depends how many subtypes you have. If you have 50, for example, you could still write it as a 50 table join. How much slower your query would be? Consider

select * from Employees e, Doctors d, Nurses n, Janitors j, ... where e.id = d.id, e.id = n.id, e.id = j.id, ... and e.name = 'John'

Assuming Employees table having the amount of data to fill in 3 levels of B-Tree indexes, and that you extract list of 1000 persons, you'll have

access to Employees = 1000 records x 3 block reads + 1000 table access by row id
access to Doctors = 1000 records x height of doctors' B-Tree + #doctors table access by row id

access to Nurses = 1000 records x height of nurses' B-Tree + #nurses table access by row id
access to Janitors = 1000 records x height of janitors' B-Tree + #janitors table access by row id

Note that #doctors+#nurses+#janitors = 1000, so that second addition argument is not a problem. The height of each sybtype table b-tree, however, is between 1 and 3, so that the cost of having 50 sybtypes could potentially slow down your query to almost that factor -- 50. Received on Thu May 27 2004 - 19:32:28 CEST

Original text of this message