Re: Question on Structuring Product Attributes

From: -CELKO- <jcelko212_at_earthlink.net>
Date: Tue, 18 Oct 2011 12:18:19 -0700 (PDT)
Message-ID: <364e8f52-53c5-4c83-b8a9-96d4acbce639_at_k2g2000yqh.googlegroups.com>


>> Sounds and looks like the volumes of text you are famous for posting. Smells like bulldust. All red herrings and distractions, in order to avoid dealing directly with any of the issues I have raised. <<

Personal attacks are not issues. You have never worked with the SQL Standards, probably have worked on less than a dozen SQL implementations, have never been part of a team designing optimizers. You also have a number of conceptual problems, which seem to be the result of confusing the abstract standards and RM versus particular implementations.

>> Tables have "cardinality", do they ? Is that in your new book “Mangled Relations for Dummies* <<

No actually it is in SQL:2005 Part II. I t is a bad idea to go up against a man who keeps the ANSI/ISO Standards on his desk :) And after as decade on X3H2, I have a good idea where to look.



4.14.1 Introduction to tables

A table is a collection of rows having one or more columns. A row is a value of a row type. Every row of the same table has the same row type. The value of the i-th field of every row in a table is the value of the i-th column of that row in the table. The row is the smallest unit of data that can be inserted into a table and deleted from a table.

A table T2 is part of a column C of a table T1 if setting the value of T1.C to a NULL value (ignoring any constraints or triggers defined on T1 or T1.C) would cause T2 to disappear.

The most specific type of a row is a row type. All rows of a table are of the same row type and this is called the row type of that table.

The degree of a table, and the degree of each of its rows, is the number of columns of that table. The number of rows in a table is its cardinality. A table whose cardinality is 0 (zero) is said to be empty.

The term Cardinality is originally from Set Theory, where we are more worried about finite and transfinite sets. And even more with kinds of infinite sets.

>> Well you should start checking the showplans or query plans and testing the code before you provide consulting. Until you post one of those, proving your assertion, it remains bulldust, and just another distraction. <<

Run a EXISTS (SELECT *), EXISTS (SELECT 1), EXISTS (SELECT x), and EXISTS (SELECT a+b), look for any difference in the SHOWPLAN, EXPLAIN, or what each vendor is used for this system utility.

>> 1. A query is a query. There is no such thing as a "table level construct" for what you have identified or for the SELECT you have given. <<

No, sorry. Do not confuse a uniform syntax with uniform semantics. A SELECT in a CURSOR body has limitations, a SELECT in a subquery has to be in parentheses, a SELECT in an INSERT INTO has limitations, etc.

The model we used in SQL is a hierarchy. The outer level is the schema; here the schema information tables and schema objects are exposed to an ADMIN . An ADMIN has the DROP, CREATE and ALTER privileges; a USER does not. Her is where DRI actions live, etc.

The “DROP <schema object>” is at the table level; we do not care about the rows or columns inside the table. Likewise EXISTS (SELECT * ) and COUNT(*) at the table level.

The next level is the rows. The INSERT INTO uses rows are its unit of work, multi-column CHECK() constraints, etc.

Finally, we have the column level, where simple column CHECK(), DEFAULT, NOT NULL, etc.

>> 2. Some of the SQLs are smart enough to *ignore* the column list and others are not. <<

I cannot think of any current SQL product that does not use this optimization pattern:
1) Throw exception if table not in the schema information tables 2) Look for single table parameter without a WHERE clause and check cardinality in schema information tables. This is short-circuit evaluation at the schema level. Hey, it pays off! 3) Look for empty tables in the FROM clause. This is short-circuit evaluation at the table level.
4) Perform the filters in the WHERE, then do the joins in the FROM clause. This step is done with special code for the joins, since we do not have to return the whole row.
5) Return FALSE when the result set has a row, This is short-circuit evaluation at the row level.
6) Otherwise, return TRUE; there is no UNKNOWN in this predicate.

The “special code:” in step #4 is the fun part. Skewed distributions, de-dupping in the join, hashing, throwing out values and using bit flag indexes, etc.

>> 3. Read the freaking manual for your particular platform. Most of them will categorically state: ____In an EXISTS, always use (SELECT 1 FROM ... WHERE ...) <<

Not in any of the five SQLs I have on my desk. The examples in Oracle tend to use a constant, but that is not recommended anywhere. Again, the use of EXIST(SELECT *..) is favored because it is not clear if the <select list> is invalid whether or not the SQL engine should return a warning and continue to process the predicate or throw an exception. The star is always safe, even if the base table columns change.

>> 4. Stop trying to bamboozle the readers with bulldust. <<

I will make a note of that :)

>> 5. You cannot even tell the difference between column issues and ualifiers in the WHERE clause. What you are stating simply does not apply. <,

Again, it is not a good idea to go up against a guy with a copy of the Standards on his machine :) I said

> EXISTS (SELECT * FROM .. WHERE..)
> is equivalent to
> (SELECT COUNT(*) FROM .. WHERE..) > 0

And SQL:2006 says:

8.10 <exists predicate>
Function
Specify a test for a non-empty set.

Format
<exists predicate> ::= EXISTS <table subquery>

Syntax Rules
None.

Access Rules
None.

General Rules
1) Let T be the result of the <table subquery>.

2) If the cardinality of T is greater than 0 (zero), then the result of the <exists predicate> is TRUE; otherwise, the result of the <exists predicate> is FALSE.

Conformance Rules

  1. Without Feature T501, “Enhanced EXISTS predicate”, conforming SQL language shall not contain an <exists predicate> that simply contains a <table subquery> in which the <select list> of a <query specification> directly contained in the <table subquery> does not comprise either an <asterisk> or a single <derived column>.

>> 7. In every SQL, SELECT COUNT(*) has to scan the *entire table* <<

No, the schema information tables have the cardinality of a table. If the stats are out of date, tehn they go to the indexes. The table is used only as a last resort.

>> 8. The is no "equivalence" between the two statements given. You need to take some formal education in SQL before you post further nonsense. Let alone "consult". <<

I remember that next time I see my name in an SQL text book :)

>> 10. The SELECT COUNT(*) is in brackets. That has to be resolved before the comparison outside the brackets is resolved. You need to read books written by others on the subject, not only the fairy tales
that you write about SQL. <<

The optimizer is free to implement an execution plan as it pleases, so long as it “effectively” returns the same results. Do you also believe that the FROM clause is executed before the WHERE clause? Since you did not know the term Cardinality, you might have to look up “Cartesian Product” to understand this point.

>> Finally, I note you have not addressed either the substance of my refutation of your proposal, or my alternative method. <<

I lost track of it in all your name-calling and screaming. Basically, you want to use this pattern of REFERENCES

vehicles ← sedans ← two_door

The classic schema I gave uses:

vehicles ← sedans
vehicles ← two_door

Validate a two-door and you have to access three tables. The classic schema accesses only two
Now drop the sedans and the two-doors are orphaned. The classic schema does not have this problem.

To find data about a two-door sedan I have to use all three tables. But to find the data for my four_door, I only need vehicles ← sedans. Now add convertibles as another class. Do you subordinatre it to two_doors or to sedans? Or directly to vehicles?

Pretty clearly, the paint job is in Vehicles. But where do the door locking attributes fit? This is a version of the network / hierarchical database design problems. How do I handle optional/ mandatory relationship? What level/node does an attribute belong?

Also, Codd did not have anything to do with class models; he was into DOMAINs. And SQL does have DRI actions; that is section 11.8 <referential constraint definition>. Received on Tue Oct 18 2011 - 21:18:19 CEST

Original text of this message