Re: Principal of Orthogonal Design (was Re: Relational Catalog (was ...))

From: Jonathan Leffler <jleffler_at_earthlink.net>
Date: Tue, 10 Jun 2003 05:51:10 GMT
Message-ID: <3EE571A0.2010102_at_earthlink.net>


Leandro GuimarĂ£es Faria Corsetti Dutra wrote:
> On Mon, 09 Jun 2003 05:02:53 -0700, Alfredo Novoa wrote:

>>But Hugh Darwen told me recently that Date and himself are
>>reconsidering the POOD.

>
> Why would it need reconsideration?

See http://www.thethirdmanifesto.com/ and the article (presentation) on "How to Handle Missing Information without Using Nulls". Consider POOD and the proposed solution.

The original article expounding the POOD (Principle of Orthogonal Design, though it was not called that in the article) is available in C J Date 'Relational Database Writings 1991-94' under the title "A New Database Design Principle".

Personally, I find the motivating example - the LOVES and HATES relations - less than wholly compelling. The proposed solution has a pair of tables, each with a constant column in it. Unless I'm misunderstanding something, that means the table is not in BCNF, let alone any higher form.

Hugh Darwen mentioned the reconsideration to me in mid-May, in the context of the article cited above. I wrote back:

  • begin **

Hugh Darwen said:
 >Did you notice that my proposed solution to the missing info problem  >involves violating the POOD (though this is not absolutely >essential)?

...double check - POOD must be Orthogonal Design (found it by reference in RDBW 1994-97, and rechecked it back to RDBW 1991-94). It's the one that argues (paraphrasing - probably badly) that in a well-designed database, all non-loss decompositions of tables over columns without default values must not have any projections with overlapping meanings...

I have felt that it is a laudable goal, but I've not been sure that it is achievable in real databases. And the idea that you should be able to take a set of values given just the type and value and determine where to insert the data in the database does seem to follow in general, but I've never been sure that it was doable in practice, though I haven't had a concrete counter-example.

However, in the motivating 'loves and hates' example, I don't feel very comfortable with the recommended design where the HATES table has a constant attribute in it -- that has always seemed wasteful. If I wasn't afraid of being shown to be wrong, I'd say it isn't in a proper normal form - at least, not in a high-order normal form. Having just spent the time doing some thinking about it (again), I think that if the column is constant 'by definition', then it contributes no useful information as can be seen by noting that there is a non-loss decomposition of the relation HATES(X,R,Y) - where R is the column containing the 'hates' value from the L_OR_H domain - into two tables, one of them X_HATES_Y(X,Y) with one row for each row in HATES, and the other being H_CONST(R) containing a single row containing the single value 'hates'. Further, the join needed to reconstitute the HATES(X,R,Y) table is a degenerate join over zero common attributes, aka a cartesian product.

And you violated POOD? No, I didn't notice. Let's look...

Hmm, slide 11 has the tables UNEMPLOYED, JOB_UNK, SALARY_UNK and UNSALARIED, and those tables all have a single column containing just the Id of someone. Yes, those tables violate POOD, if I understand it correctly. There is no way for the system to tell whether or not to insert a given combination of Id and Name other than it must go in the CALLED table (which is trivial) and must not go in the EARNS or DOES_JOB tables (because the second value needed is missing), but the other four - no, it can't tell where to put the data without human assistance. Did you add a constraint that each entry in CALLED must have a matching entry in one of DOES_JOB, JOB_UNK and UNEMPLOYED? Yes: slide 12 point 2. And without the external information, there is no way for the system to determine which of JOB_UNK and UNEMPLOYED applies.

  • end **

Hugh hasn't responded to that commentary, so either I've goofed completely or I'm sufficiently on target that there was no need to add extra commentary. [Or he was too busy, or the email got lost in the spam, or ...]

-- 
Jonathan Leffler                   #include <disclaimer.h>
Email: jleffler_at_earthlink.net, jleffler_at_us.ibm.com
Guardian of DBD::Informix v2003.04 -- http://dbi.perl.org/
Received on Tue Jun 10 2003 - 07:51:10 CEST

Original text of this message