Re: Question on Structuring Product Attributes

From: <derek.asirvadem_at_gmail.com>
Date: Tue, 5 Feb 2013 03:02:36 -0800 (PST)
Message-ID: <4ee26e24-b0ca-4a5d-bcdb-4d5734d36fc1_at_googlegroups.com>


Hugo

A couple of things came to mind while I was waiting for your response.



Ordinary Data & Referential Integrity

A bit more clarity, re the subject UPDATE statement and my points [8] and [10] above, because the contexts are confusing to you. Let's take an example that has nothing to do with Subtypes, so that you can see the issue clearly, without the opportunity to mix things up. Let's say you have a typical Order (the parent) table and an OrderItem (the child) table. The data model states: • each Order has one-or-many OrderItems
We can implement Declarative Constraints to ensure that a child cannot exist without the appropriate parent (CONSTRAINT Child.FK REFERENCES Parent.PK), but we cannot do so to ensure that at least one child exists for each parent. How do you implement that ? In Standard SQL, please, not worrying about syntactical differences.

  1. Via an Add transaction (stored proc, SQL ACID), so that the parent and the first child are inserted as an Atomic change to the database.
  2. And via a Drop transaction, so that the delete does not delete the last Orderitem for an Order. (Or by not providing an OrderItem_Drop.)
  3. I trust that you know this, but for the benefit of anyone reading this, who does not know: in a normal secured SQL database, we do not allow users to make changes directly to the tables (and stuff things up), we allow changes to the database *only* through transactions. That scheme is provided via: • on every table, SELECT is GRANTED to Roles that are nominated • UPDATE/INSERT/DELETE is not granted to any Role or Group or User • dbo aliases are not permitted • EXECUTE permission for nominated transactions is GRANTED to nominated Roles Otherwise you have chaos, not a database.

(
It is worth noting that the seriously deranged will implement an FK in the parent, that references the child, which results in a circular reference, and then insist that the platform must support "deferred constraint checking". These people have no sense of Rules, or Order of Rules. But those of us who don't have that form of insanity in the first place, don't have the demands in the second place. DB2 and Sybase do not have "deferred constraint checking"; we have enjoyed secured databases with full data and referential integrity for 35 years without it. Typically the platforms that provided "deferred constraint checking" did not have transactions, and were not SQL anyway. Some of them have implemented a weird form of non-ACID non-SQL "transactions" since then. Therefore as stated, SQL only, and no weirdness with non-SQLs that provide asylum facilities. )

It is the same for a Subtype cluster, nothing more, nothing less.

Recall the subtype cluster is a single logical unit. Implement: • one Subtype_Add transaction for adding Subtypes, such that the generic Type and the *correct* Subtype is added together, as an Atomic change to the database. • one Subtype_Drop transaction for deleting Subtypes, such that both the Subtype and the generic Type are deleted together, as an Atomic changes to the database.

For changing a book to a CD, you have two choices. Implement a transaction that does one of the following. In both cases they are still transactions, Atomic changes to the database: • (given the above two transactions exist) one Subtype_Alter transaction to ••• execute the Drop transaction,
••• then execute the Add transaction.

• a fresh transaction that performs the following: ••• delete the old Subtype and generic Type ••• insert the new generic Type and Subtype

I can't stand code duplication, so I would go with the former, but some developers like to ensure their job security, they would go with the latter and with redundant indices.

I hope this illustrates that, the issue you raised re the UPDATE, although it is important, has nothing to do with Subtypes, or my Subtypes structure. It has everything to do with the context of ordinary Data and Referential Integrity across the whole database; transactions; the issues re transactions on your platform; and how you implement them. That is why it is not in the Subtype example code. Whatever it is that you do for those pairs of tables anywhere in the database, that require integrity between them ... do that same thing for each Subtype, no more, no less.



Performance

> > ... So insert a thousand rows, and instead of converting the subquery that is hidden in the function to a join, SQL Server will have to perform thousand executions of the subquery.

I am not sure that I understand the specific issue you are concerned about, so forgive me if I am wrong.

I have used MS SQL a fair bit over the decades, although I do not consider myself a specialist, and I can't keep up with the rewrites-from-scratch every three years. As you are probably aware, executing a QP without a suquery, thousands of times is a non-issue; the issue is in the qualifying set size and the table size. And here the set size is *one*, since it is not a streamed insert.

My experience of it, and this is true up to version 2008 (after which, I don't know), is that it generates a very good query plan for subqueries (no normalisation of the QP ala current Sybase, but very good, ala pre-2003 Sybase). The QP is "flattened", there is no subquery in it, it is just a node in the tree. So let me assure you that executing a QP *with* a subquery, thousands of times, is a non-issue.

But please don't take my word for it, you can always try it. Use my DDL, so as to save time, load up a Product table with 16 million rows (a thousand is nothing these days). It takes only 5 to 10 mins. In case it needs to be said, please do not use a cursor: not only is it suh-low, it will skew the result badly. I don't have anything MS or MD in my stable, otherwise I would offer.

(I have heard that recent releases of MS SQL have an issue with Functions, depending on the content of course, but not the QP or the QP which includes a subquery. I think it has to do with large amounts of code in Functions. Therefore we should be safe with a single-statement function.)

If that succeeds, but for some reason you are not blissfully happy, and you want to really push the construct further with a streamed insert, let me know, and I will supply the code.

> > Your version eliminates the storage overhead, but at the price of a performance overhead.
>
> Er, what "performance overhead" ? Please specify exactly, or withdraw the statement.

In your own time.

Cheers
Derek Received on Tue Feb 05 2013 - 12:02:36 CET

Original text of this message