SQL for Modeling Generalization Hierarchies

From: Robert Brown <robertbrown1971_at_yahoo.com>
Date: 27 May 2004 09:56:05 -0700
Message-ID: <240a4d09.0405270856.4ce55c7d_at_posting.google.com>



Is there a good approach to modelling many heterogeneous entity types with that have some attributes in common?

Say I have entities "employees" which share some attibutes (e.g. firstname, lastname, dateofbirth) but some subsets of employees (e.g. physicians, janitors, nurses, ambulance drivers) may have additional attributes that do not apply to all employees. Physicians may have attributes specialty and date of board certification, ambulance drivers may have a drivers license id, janitors may have preferredbroomtype and so on.

There are many employee subtypes and more can be dynamically added after the application is deployed so it's obviously no good to keep adding attributes to the employees table because most attributes will be NULL (since janitors are never doctors at the same time).

The only solution I found for this is a generalization hiearchy where you have the employee table with all generic attributes and then you add tables for each new employee subtype as necessary. The subtype tables share the primary key of the employee table. The employee table has a "discriminator" field that allows you to figure out which subtype table to load for a particular entity.

This solution does not seem to scale since for each value of "discriminator" I need to perform a join with a different table. What if I need to retrieve 1,000 employees at once?

Is that possible to obtain a single ResultSet with one SQL statement SQL? Or do you I need to iterate look at the discriminator and then perform the appropriate join? If this kind of iteration is necessary then obviously this generalization hierarchy approach does not work in practice
since it would be painfully slow.

Is there a better approach to modelling these kind of heterogeneous entities with shared attributes that does not involve creating a table for each new employee type or having sparce tables (mostly filled with NULLS) I guess another approach would be to use name/value pairs but that would make reporting really ugly.

Seems like a very common problem. Any ideas? Is this a fundamental limitation of SQL?

Thanks!

  • robert
Received on Thu May 27 2004 - 18:56:05 CEST

Original text of this message