Re: Question on Structuring Product Attributes

From: Derek Asirvadem <derek.asirvadem_at_gmail.com>
Date: Sat, 15 Oct 2011 02:03:10 -0700 (PDT)
Message-ID: <2e8a9f61-b27f-482d-b720-e9ef021ab973_at_p27g2000prp.googlegroups.com>


Ok, on re-reading my post, I have come to realise the obvious, that if Joe had known how to ensure RI for Exclusive Subtypes, he would not have posted this "class hierarchy" monstrosity, with it 220% redundant Indices; additional processing; Views; Instead-Of triggers; etc. Additionally, he has issued a challenge, which is better answered directly. So here goes.


  • Referential Integrity of Exclusive Subtype --
    0. Attribution.

I am the original author, and you saw it here first, folks. As in, tiny nugget of value that it is, it is my original conception and work, I have been using this and deploying it for years, and I have not seen it anywhere else. It is always a blinking surprise to the development teams that I mentor in my travels. Even in this forum, it seems to be news.

I am publishing it here for the benefit of the community.


  1. No DRI in SQL

It is true that SQL has still not defined, let alone implemented, the DECLARATIVE RI constraint that we need.



2. The Method

Any SQL that supports Instead-Of triggers supports this, so we are using the same level playing field. Write these four SQL commands (plus parms and commentary):

--

  • Validate Supertype (Vehicle AND vehicle_type) for Subtype
    --
    CREATE FUNCTION ValidateExclVehicle_check ( _________at_vin CHAR(17), _________at_vehicle_type CHAR(3) ________) ____RETURNS TINYINT AS ____-- Exists = 1 ____-- Not Exists = Null, 0 substituted ____RETURN ( ________SELECT COALESCE(1, 0) ____________FROM Vehicle ____________WHERE vin = _at_vin ____________AND vehicle_type = _at_vehicle_type ____)

2.1 Note that in Relational terminology, vehicle_type is called the Discriminator, as in, it discriminates the Exclusive Subtype.



3. Changed DDL

The original or source or example or starting point we will use is post #2. That can be referenced, so I won't copy it here. The *changed* DDL required is as follows.

  • You need one of these per parent or supertype

CREATE TABLE Vehicle (

____vin          CHAR(17)  NOT NULL
________PRIMARY KEY,
____vehicle_type CHAR(3)   NOT NULL
________CHECK (vehicle_type IN ('SUV', 'SED') )
____-- no second duplicate index
____)

  • for each subtype:

CREATE TABLE SUV (

____vin CHAR(17) NOT NULL
________PRIMARY KEY,
____-- vehicle_type not duplicated here
____-- no second duplicate index
____CONSTRAINT Vehicle_SUV_fk     -- normal & expected FK
________FOREIGN KEY (vin)
________REFERENCES Vehicle (vin),
____CONSTRAINT SUV_Excl_check     -- validate subtype in parent
________CHECK ( dbo.ValidateExclVehicle_check ( vin, "SUV" ) = 1 )
____)

CREATE TABLE Sedan (

____vin CHAR(17) NOT NULL PRIMARY KEY,
____-- vehicle_type not duplicated here
____-- no second duplicate index
____CONSTRAINT Vehicle_Sedan_fk    -- normal & expected FK
________FOREIGN KEY (vin)
________REFERENCES Vehicle (vin),
____CONSTRAINT Sedan_Excl_check      -- validate subtype in parent
________CHECK ( dbo.ValidateExclVehicle_check ( vin, "SED" ) = 1 )
____)

3.1 All I am doing is implementing the hoped-for requirement of Declarative RI, in code, there is nothing magical about it. If and when SQL comes to the party, all I have to do is search for occs of "%_Excl_check", and change the *syntax* of the content within the brackets.



4. Chain

I repeat, Subtypes are ordinary, normal Relational constructs, they existed as such, decades before OO or 8-p or classes or class hierarchies or X< were invented. Subtypes are very important because they eliminate Nulls; support so-called "optional" Foreign Keys correctly; and service optional columns correctly, thus simplifying code and keeping the database fast.

There is no problem with subtypes having Subtypes of their own or with class hierarchies. The "chain" is clean, carrying the true PK, and nothing else. A stuffed-up chain, or one that is 220% heavier than it needs to be, is plain stupid, a resource drain.

This also corrects Joe's Normalisation error, of mixing vehicle_type and sedan_type together, damaging the integrity of both:

CREATE TABLE Sedan (

____vin        CHAR(17)               NOT NULL
________PRIMARY KEY,
____sedan_type CHAR(3)  DEFAULT '2DR' NOT NULL -- not vehicle_type
________CHECK (sedan_type IN ('2DR', '4DR') ),
____CONSTRAINT Vehicle_Sedan_fk    -- normal & expected FK
________FOREIGN KEY (vin)
________REFERENCES Vehicle (vin),
____CONSTRAINT Sedan_Excl_check    -- validate subtype in parent
________CHECK ( dbo.ValidateExclVehicle_check ( vin, "SED" ) = 1 )
____)

CREATE TABLE TwoDoor (

____vin CHAR(17) NOT NULL
________PRIMARY KEY,
____CONSTRAINT Sedan_TwoDoor_fk    -- normal & expected FK
________FOREIGN KEY (vin)
________REFERENCES Vehicle (vin),
____CONSTRAINT TwoDoor_Excl_check  -- validate subtype in parent
________CHECK ( dbo.ValidateExclSedan_check ( vin, "2DR" ) = 1 )


------------------------

5. Non-Exclusive Subtype

> If an entity doesn't have to be exclusively one subtype, you play with > the root of the class hierarchy:

If the Subtype is Non-Exclusive, simply exclude the CHECK CONSTRAINT and *Function*. The Discriminator is not required for these.

No playing around necessary. The chain is unchanged.



6. Object Classes

Now that the data has been modelled correctly, implementing OO classes *on top of that* is pedestrian. Trying to model data, which is quite different to processes and objects, using process or object modelling methods such as OO, is idiotic. Further, the data exists independently, and it existed long before the objects being contemplated, so it has to be modelled first; not second; not as embedded in an object. The reversal of the ancient chronology is for idiots who only read the books that they themselves have written, legends in their own mind. It is fair enough that unqualified florists are now "developing applications", but it is not fair that:

  1. the big names do not qualify and test the visions of these legendary publishers or
  2. that they do not know the traditional Relational construct for it. (this technique is mine, but any technique, which remains within the relational paradigm, and does not use masses of duplication, would be acceptable.)

7. Gotcha

Please refrain from implementing stupid things such as two-way CONSTRAINTS, otherwise you will tie yourself up in knots of your own making. Electricity flows in one direction only (thatis an analogy, please do not come back with that AC/DC nonsense unless you are willing to have it destroyed). That is a subject in itself, which I won't expand here, but at least, must be mentioned.



8. Why it is better ?

Well, "better" explicitly requires that the compared element is "good". Here the compared element is a monstrosity, there is nothing good about it. Since the Relational Subtype method existed before the bloated class hierarchy *method*, that former should be the measure against which the recent interloper is compared. Therefore, I will provide a comparison as to why the *posted method* of implementing class hierarchies is much worse than the pre-existing Relational method, that I have posted here. To be clear, I fully support Subtypes and class hierarchies and whatever new name isolated writers choose to come up with, I took issue only with the implementation *method*.

The pre-existing method is contained in (3) of this post, and used as the measure. It is the ZERO in the benchmark. In comparison, the "class hierarchy" method in post #2 uses the following *additional resources*, to obtain the same result as above, which is quite unnecessary:

  1. One additional 110% redundant index in every supertype - no matter what Joe believes, every index has a direct overhead, it directly affects the speed of every INSERT and DELETE - unless you are working with no concern for production databases, which means your database is irrelevant, every additional resource needs to be carefully considered, and every redundant resource needs to be identified and eliminated - such an approach eliminates the otherwise predictable post-mortem gathering of mountains of "diagnostic" info; and the laborious working through it; writing programs to filter and report; etc; etc
  2. One additional column in every subtype - this break 1NF, and increases table size, which affects overall performance (less rows on every I/O; less rows in every memory page)
  3. One additional 110% redundant index in every subtype - ditto (a)
  4. added complexity and administration - this cost is substantial, and if one does not treat it with respect, one's job migrates overseas
  5. Hiding, obfuscation, and subversion, as prescribed - odious in the extreme, - makes (d) all the more difficult and costly
  6. One additional View per supertype/subtype to "hide all this complexity" as advised by Joe. - no big deal, just a couple of superfluous views
  7. One additional Instead-Of trigger per supertype/subtype, to be able to INSERT/DELETE/UPDATE the otherwise simple tables (not views). - that is a big deal
  8. Development, testing, debugging, and maintenance of (d)(e)(f) and (g) - refer cost again

9. Conclusion

The total cost of the proposed class hierarchy *implementation method* is substantial, both in time/dollars, and in machine resources; that which I have quantified is at least 220% overhead plus development/ testing/maintenance. All of which can be eliminated, if the author of such (yet unattributed) would only read a book on Relational Databases, before making pronouncements of what it can and cannot do, and inventing monstrosities such as this.

The proposed class hierarchy *implementation method* is devoid of any value whatsoever.

Interested parties are referred instead to study ordinary Relational Subtypes, and pre-existing Relational constructs and SQL CONSTRAINTS as implementation methods.

"... there is nothing new under the Sun." Ecclesiastes 1:9. And those who say there is, are frauds. Those who believe the charlatans, are fools.

Regards
Derek Received on Sat Oct 15 2011 - 11:03:10 CEST

Original text of this message