Re: quick FK question
Date: Mon, 21 Dec 2009 09:00:15 -0800 (PST)
So.... Stepping in the waters and hope I can swim on this one....
The web link is interesting reading. I'll confess that as I read it I was answering phones, talking to people and generally not giving it 100% attention. :) That being said, one thing that seemed to come to mind is that the debate on this page is about the relational mechanics/reasons/debate about NULL or "default" values. It is an RDBMS agnostic debate, and as such one important aspect isn't addressed, the performance of whatever the "perfect" solution is. In my mind, in the end, if I can assure data integrity (whatever the method) then performance becomes the next important consideration. The question of data integrity is a combination of the application and the data model (I could add things like testing, change control, etc too)...
As this is a data warehouse, I would assume (I know... I know) that the data comes from one or more OLTP systems, and I would assume (sigh...) that the data integrity of these systems would not be in question. With a warehouse, one might well be bringing in data from disparate sources. Simply based on that rule, some of the basic relational rules may not be able to be enforced all the time. In fact with a data warehouse we denormalize with a vengeance. For example, we may have facts without associated dimensions in a warehouse for a time (because of data integration issues) .... We might well have defined but not enforced FK's in a warehouse (later versions of Oracle), or we might not have any FK's if we are running older versions of Oracle that don't support non-enforcement of FK's. Thus, we ignore relational rules in warehouses all the time, in part for performance and in part based on an assumption that the source of record system is consistent.
In my mind in this case the real question is one of performance, not trying to protect relational integrity. That being the case, then the next question is "Is there a problem with performance". If the answer is no, then I would not worry about it. If the answer is Yes, then your modeler will need to try to justify his assertion that the design works better. If he can not justify, back to the drawing board....
Kimbal has addressed the issue of NULL's in a data warehouse... one link is here:
Robert G. Freeman
Ask me about on-site Oracle Training! RMAN, DBA, Tuning, you name it! Author:
Oracle Database 11g RMAN Backup and Recovery (Oracle Press) - ON ITS WAY SOON! OCP: Oracle Database 11g Administrator Certified Professional Study Guide (Sybex) Oracle Database 11g New Features (Oracle Press) Oracle Database 10g New Features (Oracle Press) Other various titles
Blog: http://robertgfreeman.blogspot.com Check out my new blog series on installing Oracle Database 11gR2 on Windows using VMWare!
From: Cary Millsap <cary.millsap_at_method-r.com> To: mwf_at_rsiz.com
Cc: rgravens_at_gmail.com; gints.plivna_at_gmail.com; JSweetser_at_icat.com; oracle-l_at_freelists.org Sent: Mon, December 21, 2009 8:03:38 AM
Subject: Re: quick FK question
This isat the root of the Codd vs. Date debate summarized at http://www.dbdebunk.com/page/page/1706814.htm.
I am eager to see Chris Date present on this near our office next month. Enrollment is still open, and there's plenty of space available (wink): http://method-r.com/education/107-cj-date-course
Method R Corporation
On Mon, Dec 21, 2009 at 8:36 AM, Mark W. Farnham <mwf_at_rsiz.com> wrote:
>This is subject to testing and experimentation to see which way works better
>>with the current technology.
>>It does raise and interesting question: When faced with incomplete data in
>>columns that are not mere scalars but rather are either part of a key of the
>>current relation or to another relation, is it better to include the tuple
>>in your representation of the relation with some indication the tuple is
>>incomplete or is it better to relegate the incomplete tuple to a staging
>>location until its critical column values are known?
>>There is meaning added to the the database by having the constraint and a
>>value that declares "this is a provisional value because we don't know the
>>real value" as opposed to merely allowing the column to be null. When the
>>provisional value can be used to drive a process to identify quality
>>problems in the data without discarding the bits of the tuple that are known
>>some good can come out of it. In terms of total work that must be done in an
>>Oracle database to complete the data, this may be superior to having the
>>incomplete tuples in a separate table.
>>But you mentioned "warehouse," which to a certain extent implies the data is
>>in final form. Will the answers gotten from the warehouse be improved or
>>degraded from including the incomplete tuple? Is there a good general answer
>>to this question? (Am I mistaken or is this at root one of the bits Codd and
>>Date had a mild disagreement about?)
>>Ah, the quasi-religious war that can be waged over this one!
>>So, does your "data warehouse modeler" produce models that work well to
>>answer the questions put to it? Does this practice tend to make the models
>>more or less usable?
>>From: oracle-l-bounce_at_freelists.org [mailto:oracle-l-bounce_at_freelists.org]
>On Behalf Of Rumpi Gravenstein
>>Sent: Monday, December 21, 2009 8:01 AM
>>Cc: JSweetser_at_icat.com; oracle-l_at_freelists.org
>>Subject: Re: quick FK question
>>I've seen once from some duhvelopers - they created all FK columns NOT
>>NULL. Unfortunately of course there were cases with FK columns where
>>actual value could not be provided. So what did they do? An obvious
>>solution! ;) Added one row with id = -1 as a stub to all db tables.
>>Gints I wanted to follow-up on this statement as our warehouse data
>>modeler insists on doing this for all dimension joins claiming that
>>this is good warehouse design as it avoids having to worry about outer
>>joins. How would you respond?