Re: Question on Structuring Product Attributes
Date: Sun, 3 Feb 2013 17:01:28 -0800 (PST)
Message-ID: <341d8a98-9245-4b46-b233-5066d8fa371e_at_googlegroups.com>
Hugo
On Sunday, 3 February 2013 22:45:43 UTC+11, hugoko..._at_gmail.com wrote:
>
> > A tighter calculation would be:
> > 1. 10% additional overhead per Subtype (since it has only two columns)
> > 2. 100% additional indexing overhead per Subtype (since it has only one index)
> > 3. 100% additional indexing overhead (since it has only one index)
> > [Corrected to]
> > 3. 100% additional indexing overhead for the Supertype (since it has only one index)
>
> Well, that explains why your overhead estimation is so much higher than I could imagine. Your mathematics are off.
<shopping snipped>
That is neither logic, nor my logic, nor my mathematics, not mathematics. That is your interpretation, and application to some other scenario. I do not see any point in commenting on it. I will answer on the basis that you are not being contrary, that you actually do not understand, and limit my answer to the statements made.
If you want to understand what I stated, please just read my words (which I am responsible for), do not jump to conclusions (which in any case, you are responsible for).
2. If my Subtype table has one index, and Joe's Subtype table (to perform the same function, provide the same integrity, etc) has two indices, then it is a simple fact that Joe's table has "100% additional indexing overhead per Subtype"
3. If my Supertype table has one index, and Joe's Supertype table (to perform the same function, provide the same integrity, etc) has two indices, then it is a simple fact that Joe's table has "100% additional indexing overhead per Supertype"
I did not state "100% additional overall overhead", I limited my calculations to the subject matter, the implementation of SubType. I did not imply anything about the rest of the content of the tables.
Now if you wish to apply my statements to the price of sausage, that's your business, but I can't join you there.
A more reasonable example, if you are trying to apply it to everyday life, would be: • I [Supertype] move around the world, and go about my consulting business with one heavy briefcase; Joe [Supertype] moves around carrying two heavy briefcases wherever he goes. • Additionally, my whole development team [Subtypes] moves around the country carrying one heavy briefcase each; Joe's subordinates [Subtypes] each carry two heavy briefcases. • Joe's team performs the same tasks as mine, using 100% additional *briefcase* overhead. No implications are made re the weight of each person, or their speed, or their eyesight, or the price of fish.
> The actual overhead of the extra columns and indexes depends on a lot of things,
Yes.
That is not so relevant. All that exercise would produce is a difference between 100% and 98% or between 100% and 102%.
In the example we have one index, and Joe has two, so unless you are moving away from the example, the fact of the existence of the second *additional, redundant* index is 100% (two minus one divided by one).
The number of indices that the table may have for other reasons is irrelevant to the problem at hand, we are discussing Subtype implementation, so regardless of the number of indices on the table, if you implement the standard Subtypes structure, you need one more index; if you implement Joe's structure, you need two more indices, which is "100% additional indexing overhead per Subtype, and 100% additional indexing overhead per Supertype" than the standard.
Yes, that might be a good approach to benchmark the *overall* overhead, a new subject, not relevant to the statements that I made.
The fact remains the that two minus one divided by one = 100% additional. If there are 10 Subtypes, Joe has 11 additional redundant indices, and 10 additional redundant columns.
> > Anyway my basic point is, it is silly to implement sub-standard constructs substantial addition overhead for each subtype, and for the supertype, when
> > (a) the standard that has been with us for 35 years, does so without the additional columns and indices, and
> > (b) the implementation is pure DRI since 2007 (it was code until then).
>
> The last version of the ANSI standard I have access to is 2003, so I can't comment on point b.
>
> re point a: I was under the impression that user-defined functions were not in the ANSI standard, but luckily I verified and turned out to be wrong. They are. So technically your are right - your method that relies on user-defined functions is in the standard.
Yes, I know.
- You statement re the goal is too simplistic, but let's not get distracted.
- Sure, the syntax is different. So what ? The syntax for every SQL statement is different, particularly in the area of (a) extensions and (b) features that each vendor implemented before the standard syntax was published. So what ?
If you can't read my code, and make the necessary syntax changes on your platform to implement my code on your platform, then yes, there is a problem, of a different sort. Name your platform and I will write the code for you.
3. Regardless of the consideration of Subtype implementation, migration from any flavour of SQL to another flavour has a series of problems. Those problems are neither more nor less due to the use of standard support for Subtypes or a sub-standard support of Subtypes which doubles the number of indices. The point does not apply to the discussion.
> I checked the links you provided, and despite all I've said above, I do like your approach. In the future, if people ask me how to implement subtypes, I will probably give them the two versions.
>
> Choosing which one to use depends on various things. Your version eliminates the storage overhead, but at the price of a performance overhead.
> When inserting or updating a single row, that won't be a big problem. But what happens when inserting or updating thousands or even millions of rows in a single statement?
That is a very stupid thing to do, for any scenario, not only for Subtypes, because it will blow the transaction log. ANSI/IEC/ISO SQL is transactional. So whatever proscriptions apply to such attempts, apply to Subtype insertions as well, no more, no less. Each platform will handle that differently.
> This probably depends on the chosen DBMS. I know that SQL Server is unable to inline scalar user-defined functions into an execution plan, so for SQL Server, the only option is to execute a seperate call to the function for each row (or at least for each unique set of input parameters -which in this case would be the same, since the primary key is included in the input parameters-). So insert a thousand rows, and instead of converting the subquery that is hidden in the function to a join, SQL Server will have to perform thousand executions of the subquery. Your performance will, excusez mon French, suck.
Excuse me, no, *your* performance will suck.
My performance (DB2 and Sybase) flies. I already have close to one hundred standard Subtype structures in production. Millions of transactional inserts have been tested. Probably a thousand full table unload/reloads (thousands to millions of rows each) have been tested.
Oracle carks itself with subqueries, so I wouldn't try even the single row insert on it. If an Oracle person reads this, and decides that they cannot implement Standard Subtypes, that they have to implement Joe's monstrosity instead, that would be a valid decision. (There are many standard features with Oracle does not support, resulting in their famously bloated databases, but let's not get distracted.)
> Another possible problem you have to be aware of with your solution is that, because the subquery is hidden in the UDF, it is unknown to the DRI mechanisms. Some scenarios will allow violations of the CHECK constraints, becuase the RDBMS does not know it has to re-check.
Excuse me, but it *IS* a DRI CHECK Constraint. Please read the code again.
"Some scenarios" is not correct, no, scenarios implies the general case, not platform specific. What you are talking about is how a particular platform operates, not scenarios. So the honest statement is, "some platforms will allow violations of the CHECK Constraint".
Well, that is a serious issue, and on the face of it I would say the indicated platform is totally unreliable, as well as non-compliant.
> Here's an example:
>
> -- I've got a new book:
>
> INSERT Product (ProductId, ProductType, Name) VALUES (1, 'B', 'Talk That Talk');
> INSERT ProductBook (ProductId, NumPages) VALUES (1, 11);
>
> -- Oh, wait - it's not a book, it's a CD!
>
> UPDATE Product SET ProductType = 'C' WHERE ProductId = 1;
>
> This won't trigger an error in SQL Server, because the CHECK constraint is only checked when the specific row in ProductBook is inserted or updated.
>
> Now this is an unusual scenario, but there are also lots of subtype situations where instances can migrate to another subtype. E.g. an employee that starts as a contractor but then decides to join the company and moves to the Payroll subtype. Or a supertype Person with subtypes Single, Married, Widowed, Divorced.
Yeah, sure. It appears that you do not understand context.
- The example code provides the implementation details for standard Subtypes. There is no suggestion that it does anything more than that. It is not a full blown application.
- The Supertype-Subtype structure is supposed to be considered as a single unit, not as a bunch of tables with disparate and singular relationships between them. You should not be updating anything in the cluster without considering the whole cluster, transactionally.
- When writing an application, you are best advised to implement all inserts/updates/deletes as transactions; you should never write singular inserts/updates/deletes.
- Due to the limitations of the SQL Spec (the Committee is about 20 years behind the high-end vendors, and about 35 years behind E F Codd), part of your data and referential integrity will be implemented declaratively, and the remainder will be implemented in transaction code. Those of us who care about quality and maintenance maximise the declarative part.
- The example code, being an example of Subtype implementation, uses singular inserts/updates/deletes, rather than transactions, as is commonly accepted in posted examples.
- The code is syntactically Sybase as declared in the header, but it is presented as standard code; you need to make the minor syntactic changes required for your platform.
- More important, when contemplating an application of said Subtypes, (as you have pointed out) you need to consider how your platform operates with regard to constraint checking; deferred constraint checking, limitations of functions; limitations of check constraints; limitations of transactions; your transaction design; etc, and design accordingly. I have not provided that; I have provided standard code only, and it is up to the competent implementer to implement it appropriately for their platform.
- I consider the constructs required to prevent your UPDATE to be ordinary Data Integrity or ordinary Referential Integrity, and that is another reason why the considered declarations or code is not in my posted example. UPDATE Product without the corresponding DELETE ProductBook, INSERT ProductCD falls into the same category of: • INSERT Product without a INSERT ProductBook • DELETE ProductBook without the corresponding DELETE Product • etc
Therefore:
9. It is a no-brainer for me, on my platform, to ensure that your example UPDATE as well as those of the same category [8] are disallowed, as part of the implementation of ordinary Data and Referential Integrity for the whole database. How I do that will be standard SQL, but the implementation will be specific to my platform; its limitations; my existing transactions; my transaction standards; etc.
> Bottom line: I like your solution. But it's not 100% safe, and though you avoid the storage overhead, you pay for this with a performance overhead. I'd recommend people to choose between the two solutions on a case by case basis.
I refute the "not 100% safe" on the basis described above: implement whatever safety you need, that is not shown in , and cannot reasonably be expected to be shown in, the example of Subtypes.
I strongly recommend that you do not recommend anything at all about this subject until:
Until then, your recommendations are based on contemplation, not facts, not experience.
Eg. I recommend that people use standard structures which require half the number of indices, over one that requires twice the number of indices. That is a general recommendation, not limited to Subtypes.
Cheers
Derek
Received on Mon Feb 04 2013 - 02:01:28 CET