Re: Question on Structuring Product Attributes

From: <hugokornelis_at_gmail.com>
Date: Tue, 5 Feb 2013 05:51:01 -0800 (PST)
Message-ID: <3c51789c-67b7-4ce3-8bf6-97177d6972c0_at_googlegroups.com>


Hi Derek,

A quick reply (I really need to get going on some other stuff, but I can't let this slip) to some issues.

Op dinsdag 5 februari 2013 12:02:36 UTC+1 schreef derek.a..._at_gmail.com het volgende:
> -------------------------------------------------
>
> Ordinary Data & Referential Integrity
>
> -------------------------------------------------
>
>
>
> A bit more clarity, re the subject UPDATE statement and my points [8] and [10] above, because the contexts are confusing to you. Let's take an example that has nothing to do with Subtypes, so that you can see the issue clearly, without the opportunity to mix things up. Let's say you have a typical Order (the parent) table and an OrderItem (the child) table. The data model states:
>
> � each Order has one-or-many OrderItems
>
> We can implement Declarative Constraints to ensure that a child cannot exist without the appropriate parent (CONSTRAINT Child.FK REFERENCES Parent.PK), but we cannot do so to ensure that at least one child exists for each parent. How do you implement that ? In Standard SQL, please, not worrying about syntactical differences.

In standard SQL, this *is* in fact possible with declarative constraints. On the OrderItems table, add a constraint CHECK (EXISTS (SELECT * FROM Orders WHERE Orders.OrderID = OrderItems.OrderID)) [actual syntax may be slightly different; I never worked on a product that supports subqueries in a CHECK constraint so I don't know the exact syntax for implementing the correlation]. Then make this constraint deferred (unless you want a very complicated way of making the two tables read-only). In a real database, this only works if both subqueries in CHECK constraints and deferrable constraints (feature F271 in SQL:2003) are supported.

Since the databases I know don't implement those features, I agree that this business rule can not be implemented as a declared constraint.

(snip basic stuff I full agree with)

> It is worth noting that the seriously deranged will implement an FK in the parent, that references the child, which results in a circular reference, and then insist that the platform must support "deferred constraint checking". These people have no sense of Rules, or Order of Rules. But those of us who don't have that form of insanity in the first place, don't have the demands in the second place. DB2 and Sybase do not have "deferred constraint checking"; we have enjoyed secured databases with full data and referential integrity for 35 years without it. Typically the platforms that provided "deferred constraint checking" did not have transactions, and were not SQL anyway. Some of them have implemented a weird form of non-ACID non-SQL "transactions" since then. Therefore as stated, SQL only, and no weirdness with non-SQLs that provide asylum facilities.

A foreign key instead of a check constraint would even be better, indeed. So, does this make me "fully deranged"? And if so - why? If you happen to be working on a platform that does support this feature set, using the tools available to you would, in my opinion, be the smart thing to do.

(You'd still need to provide the add and drop stored procedures and still need to use DCL to secure the database, but if you can use a declared constraint to express a business constraint, you'd be folly not to do it).

If your "fully deranged" refers to people trying to implement this on a database that doesn't support deferred constraints and then blaming the vendor - well, in that case I agree with the sentiment you expressed, allthough I would have used different words.

> It is the same for a Subtype cluster, nothing more, nothing less.

I sense a misunderstanding.

I am fully aware that your function-based framework does not force people to insert a row in the ProductBook table for a product of type 'B', or to insert a row in the ProductCD table for a product of type 'C'. Neither does Joe's framework. (And I actually believe that to be a good thing - the business requirements could easily allow us to know about books or CDs for which the number of pages/tracks is unknown). The code I posted before does indeed end up with product of type 'C' with no row in ProductCD, but that is not the problem I was trying to demonstrate. The real issue with the code I posted is that, after running it, I have a product of type 'C', that still has a row in ProductBook. This is not allowed. The framework does prevent this if the usual order of actions is followed (insert supertype; insert wrong subtype). But a different order (insert supertype; insert correct subtype; update supertype) is not prevented.

(snip)
> For changing a book to a CD, you have two choices. Implement a transaction that does one of the following. In both cases they are still transactions, Atomic changes to the database:
(snip)

Yes, I understand that I can code my way around the limitations. I can also write stored procs to ensure that a normal foreign key relationship is never violated. But I still declare the foreign key relationship.

To me (and your opinion may differ), it is an either/or, with no in-between. Either I accept that my database has no out-of-the-box declarative method to ensure that the integrity of my supertype/subtype relationship is never violated; I accept this shortcoming, write access modules that do the work instead and be done with it. Or I find a creative way to use whatever tools I have available to still get the RDBMS to do that work. (And I'd still write the access modules, as a convenience for the app developers - but I also know that whatevery anyone does, my data's integrity is covered). Joe's solution is the latter category. Yours is in-between - it uses functions and check constraints to prevent violation in some -admittedly the most common- cases, but leaves a back door open so that I still *need* those access modules, and *need* to ensure that they are 100% reliable in all cases.

(snip)

> -----------------
>
> Performance
>
> -----------------
>
>
>
> > > ... So insert a thousand rows, and instead of converting the subquery that is hidden in the function to a join, SQL Server will have to perform thousand executions of the subquery.
>
>
>
> I am not sure that I understand the specific issue you are concerned about, so forgive me if I am wrong.

There's actually a combination of two issues. Both specific to SQL Server. 1. SQL Server is heavily optimized towards set-based operations. With very, very, very, very few exceptions (like running totals before the ROW or RANGE clause of the OVER clause was implemented), set-based SQL will always run rings around the row-by-row counterpart. (I know that this is not the case in all DBMS's. For instance, I have heard from reliable sources that Oracle has almost no performance difference between well-coded row-by-row logic and the equivalent query). 2. Scalar user-defined functions can not be "inlined" - that is, there definition can not be sucked into the main query before it is sent to the optimizer. The optimizer will produce a plan for the main query that builds most of the result set as efficient as possible. Then it will send all rows one by one into the function - basically a cursur in disguise.

Add 1 and 2 together, and you'll see that everytime a scalar UDF has to be evaluated against a large row set, performance will suffer.

(snip SQL Server basics I already knew)
> But please don't take my word for it, you can always try it. Use my DDL, so as to save time, load up a Product table with 16 million rows (a thousand is nothing these days). It takes only 5 to 10 mins. In case it needs to be said, please do not use a cursor: not only is it suh-low, it will skew the result badly. I don't have anything MS or MD in my stable, otherwise I would offer.

As promised in my previous reply - I will test when I have the time.

> (I have heard that recent releases of MS SQL have an issue with Functions, depending on the content of course, but not the QP or the QP which includes a subquery. I think it has to do with large amounts of code in Functions. Therefore we should be safe with a single-statement function.)

Not just recent releases. All versions that support user-defined functions (which off the top of my head is all versions since SQL Server 2005) suffer from this issue. And the number of statements in the function is not related. See for example this blog post, where I demonstrate the performance difference with a UDF to triple the input value: http://sqlblog.com/blogs/hugo_kornelis/archive/2012/05/20/t-sql-user-defined-functions-the-good-the-bad-and-the-ugly-part-1.aspx

Cheers,
Hugo Received on Tue Feb 05 2013 - 14:51:01 CET

Original text of this message