Re: Question on Structuring Product Attributes

From: <derek.asirvadem_at_gmail.com>
Date: Sat, 2 Feb 2013 20:32:54 -0800 (PST)
Message-ID: <fa57be86-0926-4b30-b466-37a7fbb813eb_at_googlegroups.com>


On Saturday, October 22, 2011 8:21:07 PM UTC+11, Hugo Kornelis wrote:

> I was (and apparently still am) waiting for a reply from Derek, so when
> I saw a reply I immediately jumped to the wrong conclusion. My
> apologies.
>
> It's interesting to see that Derek has not yet responded to my
> question, even though he did already post a message elsewhere in this>
> thread a few hours after I posted...

I must have missed it that day, sorry. Your questions are posted in a reply to Kevin, which I did not read too much white noise in other people's posts on this thread). Sorry about the one year lag! I am responding to keep the thread complete.

> But I am a bit surprised, as your first critique on Joe Celko's
> version was not about logical/physical level, but about performance
> and overhead...

Yes. The thread got lost in cacophony to divert attention away from that core issue.

> I don't know where you get the 220% figure. I do agree that there is
> substantial overhead and cost involved in Joe Celko's solution, but I
> expect that your solution will require at least as much, maybe even
> more overhead.

The 220% overhead figure is loosely calculated as:

1. One additional redundant column in each Subtype
2. One additional redundant index in each Subtype
3. One additional redundant index in the Supertype

Of course the base for computing "additional overhead" is the model with the redundant columns removed, which has one index per supertype & subtype. So two extra indices are 200%, and one extra column times two subtypes is 20%

A tighter calculation would be:

1. 10% additional overhead per Subtype (since it has only two columns)
2. 100% additional indexing overhead per Subtype (since it has only one index)
3. 100% additional indexing overhead (since it has only one index)

> In your post to me, you introduce a different argument: that the
> columns introduced by Joe do not belong in the logical level. I do
> agree with that. But does that imply that they should not be in the
> tables?

I think Kevin stated that, not I.

In any case, it has to do with logical vs physical, which is not relevant to my post. Celko is introducing them as a logical construct (and Kevin may be trying to make it a physical one). Subtypes and supertypes are logical, not physical: the user sees them, understands them, and understands that the children of the subtypes are restricted to those subtypes, etc.

Anyway my basic point is, it is silly to implement sub-standard constructs substantial addition overhead for each subtype, and for the supertype, when (a) the standard that has been with us for 35 years, does so without the additional columns and indices, and (b) the implementation is pure DRI since 2007 (it was code until then).

For people who appreciate a Data Model:
http://www.softwaregems.com.au/Documents/Student%20Resolutions/Anders%20DM.pdf

Or those who want DDL including the CHECK constraint: http://www.softwaregems.com.au/Documents/Tutorial/Subtype_CHECK.sql

And code for the Function used by the CHECK constraint: http://www.softwaregems.com.au/Documents/Tutorial/Subtype%20ValidateExclusive_fn.sql

Subtypes eliminate the "nullable foreign keys" that unnormalised data heaps have, with the attendent damaged integrity. They allow full Data and Referential Integrity Declaratively. They also eliminate additional indices (that are otherwise required) in the subordinate tables. Here is an example of that: http://www.softwaregems.com.au/Documents/Documentary%20Examples/College%20Simple%20DM.pdf

Cheers
Derek Received on Sun Feb 03 2013 - 05:32:54 CET

Original text of this message