Re: Question on Structuring Product Attributes

From: Hugo Kornelis <hugo_at_perFact.REMOVETHIS.info.INVALID>
Date: Fri, 21 Oct 2011 22:44:49 +0200
Message-ID: <kik3a7dc143n5nvcqlmcje0ts81ssjsred_at_4ax.com>


On Thu, 20 Oct 2011 14:07:41 -0700 (PDT), Kevin Kirkpatrick <kvnkrkptrck_at_gmail.com> wrote:

(snip)
>This seemed the most appropriate "jumping in" point. In my SQL vendor
>of choice (Oracle), while there is not support for ANSI CHECK
>constraints, I have found a combination of features that allow for
>cross-table constraints to be enforced in a purely declarative and
>high-performance manner. The basic idea is to use instant-refresh
>materialized-views that bring the data together, then put indexes on
>the materialized views to enforce cross-row constraints. All such
>objects (materialized view logs, materialized views, constraint
>functions, and constraint indexes) used to enforce the constraints
>should be considered physical implementation details like table
>indexes, i.e. NOT exposed at the logical level and NOT accessible to
>users/application code .
(snip)

Thanks for your reply. I have no experience with Oracle, but I think I can parse the code sufficiently to understand what is going on. You create a materialized view (probably very similar to the indexed view feature SQL Server has) that joins data from subtype and supertype table together. Then you add an extra index on that materialized view, but not on one of the columns itself, but on a virtual computed column that is computed using the DECODE function (not supported by SQL Server, but from the looks of it it appears to be a shorthand for a CASE expression), that will invoke a function if the type in the supertable does not match what should be in the subtable; and that function then invokes an error.

I *think* a similar approach would be possible in SQL Server, but using a few different angles. I can't force an error from a function (SQL Server functions allow no side effects), but I could add a column to the view that is computed using a CASE expression with possible results 1 or 1/0. That would not give a similar user-friendly error message, but it would provide the functionality (I think - I have not tried it, though!).

But I am a bit surprised, as your first critique on Joe Celko's version was not about logical/physical level, but about performance and overhead. I quote:

On Sat, 15 Oct 2011 02:41:24 -0700 (PDT), Derek Asirvadem <derek.asirvadem_at_gmail.com> wrote:

(...)
>Ok, so you are both fools.
>
>It is not common, because 220% overhead and substantial cost, is just
>plain stupid. People are not quite as stupid as you think.

On Wed, 12 Oct 2011 17:51:11 -0700 (PDT), Derek Asirvadem <derek.asirvadem_at_gmail.com> wrote:

>This "class hierarchy model" is one of the most idiotic things I have
>seen in my 34 years in databases.
(...)
>a. basically doubles the number of indices in the relevant tables, and
>that has a substantial overhead.
>
>b. requires vehicle_type to be carried in the subtypes, which AFAIC is
>completely redundant (in SUVs, we know all the rows are SUV, why carry
>a column in which every value will be "SUV").
(...)
>e. introduces complexity and processing; views; instead-of triggers,
>etc. I would rather the database remain open as possible, without
>such complications for use. Particularly when such nonsense is
>completely unnecessary.

I have no idea how Oracle implements materialized views, but the name alone suggests that, like SQL Server, the content of the view is stored on disk. So the storage overhead will be even MORE than the single extra column that Joe Celko's version adds. SQL Server also requires an index for each materialized view (hence the name indexed view). Maybe Oracle doesn't, but you do create one yourself. That should be about the same overhead as the index created for the extra UNIQUE constraint in Joe Celko's version (for those RDBMS's that use an index to implement a UNIQUE constraint). Maybe even more because of the computed column.

I don't know where you get the 220% figure. I do agree that there is substantial overhead and cost involved in Joe Celko's solution, but I expect that your solution will require at least as much, maybe even more overhead.

In your post to me, you introduce a different argument: that the columns introduced by Joe do not belong in the logical level. I do agree with that. But does that imply that they should not be in the tables? There are several examples of columns that are not part of the logical model but are added to the tables. One example is a surrogate key column. Another example is a rowversion column (that's the name in SQL Server; I don't know if Oracle has a similar feature) that automatically changes value every time a row is updated, to facilitate the implementation of optimistic concurrency checking.

There are various ways to not expose these columns to the database user, such as building views on top of the tables that don't expose these columns, or creating stored procedures to implement the data access layer.

-- 
Hugo Kornelis, SQL Server MVP
My SQL Server blog: http://sqlblog.com/blogs/hugo_kornelis
Received on Fri Oct 21 2011 - 22:44:49 CEST

Original text of this message