Re: Question on Structuring Product Attributes

From: Derek Asirvadem <derek.asirvadem_at_gmail.com>
Date: Fri, 14 Oct 2011 23:03:37 -0700 (PDT)
Message-ID: <d682ce48-4725-407c-b880-8c215c874bbc_at_k16g2000prb.googlegroups.com>


On Oct 15, 1:29 pm, -CELKO- <jcelko..._at_earthlink.net> wrote:
> Now you are obligated to actually post better techniques and explain
> why they are better..

Ok. Happy to take up the challenge. I have already identified the "feature" is already available in Relational databases; that it is a CHECK Constraint, no need for duplicating columns and adding indices.

Do you want me to post the code ? Come on, you are the famous SQL guru in the family, I am just a little old practitioner with a few happy customers; your code will probably run circles around mine.

> I am not bothered by increasing the number of indexes.

Yes, that bothers me, especially adding *totally 100% redundant* indices. That is probably partly why my customers are happy.

> Sybase (nee WATCOM)

Actually:
- Their baby DBMS is "SQL Anywhere", which is nee Watcom. It is an excellent product for a limited market.

> You have made
> the assumption that references are done by more indexes. SQL implements  the constraints like
>  FOREIGN KEY (vin, vehicle_type)
>   REFERENCES Vehicles(vin, vehicle_type)
>
> by building pointer chains under the covers, so each compound key
> actually appears once in the schema.

Excuse me, but:

  1. You posted SQL. I answered SQL. In order to maintain focus, avoid tangents, and close this subject like gentlemen, let's talk SQL only.
  2. SQL is a standard, anything that does not comply with the Standard cannot make the claim to be an SQL. The SQL standard requires that all references in the database are by *KEY*, and specifically *NOT* by pointers.
  3. I made no assumptions. The SQL code that *you provided* specifically creates two indices:
    >>CREATE TABLE SUVs
    >>(vin CHAR(17) NOT NULL
    >> PRIMARY KEY, -- Index on (vin) for SUV.PK
    >> vehicle_type CHAR(3) DEFAULT 'SUV' NOT NULL
    >> CHECK(vehicle_type = 'SUV'),
    >> UNIQUE (vin, vehicle_type), -- Dupe Index [1] on (vin, vehicle_type)
    >> FOREIGN KEY (vin, vehicle_type) -- no suggestion that this FK is an additional index
    >> REFERENCES Vehicles(vin, vehicle_type)
    >> ON UPDATE CASCADE
    >> ON DELETE CASCADE,
    >> ..);
  4. The SQL code that *you provided* specifically identifies data columns, not pointers. Therefore, above the covers, there are two indices, visible, with data columns, visible. No pointers. What a product does under the covers is tangential.
  5. The issue I have is above the covers:

e.1 Index[1] is a 110% duplicate of Index SUV.PK.

e.2 (vin) is unique, so (vin) plus anything will be unique. It has no value (except to support this monstrous structure, which is not reqd to ensure Exclusive Subtype CHECK).

e.3 There is also another, additional, 110% redundant index in the parent, to provide the key for the REFERENCED key in the child/ Your SQL code again:

>>CREATE TABLE Vehicles
>>(vin CHAR(17) NOT NULL
>> PRIMARY KEY, -- Index on (vin) for Vehicle.PK
>> vehicle_type CHAR(3) NOT NULL
>> CHECK(vehicle_type IN ('SUV', 'SED')),
>> UNIQUE (vin, vehicle_type), -- Dupe Index [2] on (vin, vehicle_type)
>> ..);

e.4

> so each compound key actually appears once in the schema.

No. It appears once for each type you declare a CONSTRAINT, not once in the schema. SQL does not normalise the schema (wish that it would!).

> Teradata uses a hashing algorithm.

Teradata is not SQL.

Sybase, MS, DB2, all create Standard SQL visible indices from data columns, from the SQL command that you supplied.

Oracle is non-compliant, in that it twists the SQL definitions in order to appear compliant. It famously uses ROWIDs, above and below the covers, but the indices and data columns are visible, above the covers.

> RDBMS companies have been working on PK-FK joins for a few
> decades, so they are pretty good at it now.

Agreed. But the Foreign *KEYS* that you declared in your SQL code are still ordinary visible data columns, not pointers or pointer chains.

> What do you have against the Nested Sets?

There is a recent thread on that subject. I will take it up there.

Let's close this thread here.

Ambler and Fowler are famous for their "straw man" arguments, suggesting that the RM or RDBMS can't do X, then proposing some monstrous method of doing X. AFAIK, you do not do that. SQL, and any RDBMS supplying SQL, already has everything required to ensure the Referential Integrity of an Exclusive Subtype, with zero overhead. The problem does not exist (we can ensure RI for Exclusive Subtypes), therefore we do not need the "solution" of the additional structure.

Although you did not present the "straw man" intro to this article, you did imply that Exclusive Subtypes cannot be verified in SQL (which is not correct), and then promoted this structure. I do not understand why, when there is a perfectly simple, zero-overhead Exclusive Subtype CHECK Constraint, you Mr Celko, the famous SQL guru, would be promoting this non-SQL "class hierarchy" structure, which is massively inefficient, due to the number of totally redundant columns and redundant indices it requires.

Response please. Received on Sat Oct 15 2011 - 08:03:37 CEST

Original text of this message