SQL for Modelling Generalization Hierarchy
Date: 27 May 2004 09:39:48 -0700
Message-ID: <240a4d09.0405270839.16582e0b_at_posting.google.com>
Is there a good approach to modelling many heterogeneous entity types with that have some attributes in common?
Say I have entities "employees" which share some attibutes (e.g. firstname, lastname, dateofbirth) but some subsets of employees (e.g. physicians, janitors, nurses, ambulance drivers) may have additional attributes that do not apply to all employees. Physicians may have attributes specialty and date of board certification, ambulance drivers may have a drivers license id, janitors may have preferredbroomtype and so on.
There are many employee subtypes and more can be dynamically added
after the application is deployed so it's obviously no good to keep
adding attributes to the employees table because most attributes will
be NULL (since janitors are never doctors at the same time).
The only solution I found for this is a generalization hiearchy where
you have the employee table with all generic attributes and then you
add tables for each new employee subtype as necessary. The subtype
tables share the primary key of the employee table. The employee table
has a "discriminator" field that allows you to figure out which
subtype table to load for a particular entity.
This solution does not seem to scale since for each value of
"discriminator" I need to perform a join with a different table. What
if I need to retrieve 1,000 employees at once. Is that possible with
SQL or do you I need to iterate look at the discriminator and then
perform the appropriate join? If this kind of iteration is necessary
then obviously this generalization hierarchy approach does not work
since it would be painfully slow.
Is there a better approach to modelling these kind of heterogeneous
entities that does not involve creating a table for each new employee
type or having sparce tables (mostly filled with NULLS)
I guess another approach would be to use name/value pairs but that
would make reporting really ugly.
Seems like a very common problem. Any ideas? Is this a fundamental
limitation of SQL?