Re: Null-valued foreign keys...?

From: Radu Lascae <r.nospam.lascae_at_wanadoo.nl>
Date: Sun, 21 Oct 2001 10:39:53 GMT
Message-ID: <ZBxA7.50$Qd3.375_at_castor.casema.net>


Hi Lane.

Comments inline.

"Lane Wimberley" <lane_at_wayport.net> wrote in message news:3BD23FBA.5CDCEC94_at_wayport.net...
> In practice, I was wondering how folks view null-valued
> foeign keys; that is, allowing the value of a foreign key
> to be null.

Liberally:-)

> My guess is that a purist might argue that
> foreign keys should never have a null value. If a relationship
> is optional, then this should be modelled using a separate relating
> table. The presence of a row in that table relating rows in two
> other tables indicates a relationship, and an absence indicates no
> relationship.

I don't remember Cobb sugesting anything of the kind and I'd have trouble visualing the ideea. I admit though that my books have been a mess lately, due to some long due improvement work to my apartment, so I'd have trouble getting the reference right now:-)

To the point: the optionality of the relation would become a relation in itself. Is this 1:1? 1:n? In your sugestion, it appears like a n:n relation, to model any 1:n relation which is optional. In this case, how do you suggest this model would work when modelling an optional n:n relation?:-))

>Of course, this make joins a little more involved.

I'd say this is an understatement:-)

> In practice, I imagine that many DB engineers feel that it is
> reasonable to allow null as a valid foreign key value. The value
> null simply indicates no relationship.

That's an entirely different discussion, about the "purist" versus the "practical" view about the use of NULL. Please see the example I'm discussing bellow.

> To complicate matters, I can imagine that there are cases
> where the optional nature of a particular relationship is only
> temporary, in which case we might say that a foreign key can
> have a null value for a while, but ultimately it should have a
> valid, non-null value.

Ok, let's get back on track here: there are such cases, in which case we're talking about practical ways to make a model expandable. I'll show this aspect later on, too.

> Does anyone have a particularly strong opinion one way or
> another? What do the modelling theorists say? Is there a
> good engineering practice, an accepted convention?

I hold the strong opinion that engineers are involved into modelling real-world businesses. A good understanding of the business model and it's processes is the first step to a sound model. As far as the discussion on nullability of foreign keys is concerned, practical aspects such as readibility and performance are involved.

In the following example, one is modelling a LEAD, as part of the business process SALES. Comments inserted between rows apply to the rows bellow them. You (and I, for that matter) may or may not agree with certain aspects of this model, please feel free to comment. Note that this is not the entire table structure, only that part which I felt relevant to this discussion.

CREATE TABLE LEAD

  • The PK LEAD_ID NUMBER(14,4) NOT NULL,
  • FK's COMP_ID NUMBER(14,4) NOT NULL, -- references COMPANY CONT_ID NUMBER(14,4) NOT NULL, -- references CONTACT SALES_GRP_CODE VARCHAR2(12) NOT NULL, -- references SALES_GRP STRATEGY_CODE VARCHAR2(12), -- references STRATEGY LEAD_INTRST_CODE VARCHAR2(12), -- references LEAD_INTRST LEAD_STAT_CODE VARCHAR2(12), -- references LEAD_STATUS LEAD_TYPE_CODE VARCHAR2(12), -- references LEAD_TYPE LEAD_SRC_CODE VARCHAR2(12), -- references LEAD_SOURCE SLS_TERR_CODE VARCHAR2(12), -- references SALES_TERR PHASE_CODE VARCHAR2(12), -- references PHASE PROSP_ID NUMBER(14,4), -- references PROSPECT CAMPN_ID NUMBER(14,4), -- references CAMPAIGN PRIME_REP VARCHAR2(12), -- references EMPLOYEE (on EMP_CODE)
  • Candidates for FK's, not (yet) implemented
  • No table to reference exists yet LOGISTICAL_COMP VARCHAR2(24), -- LOGISTICAL_COMPANY? REAS_WON_CODE VARCHAR2(24), -- REASON? REAS_LOST_CODE VARCHAR2(24), -- REASON?
    • CONTACT? then change type to NUMBER(14,4)
    • EMPLOYEE? then change type to VARCHAR2(12)
    • REFERENCES? no such table exists yet, then think model through REFD_BY VARCHAR2(45),
    • Referenced table exists but the FK is implied, not coded as constraint LOCALE_CODE NUMBER DEFAULT 1033, -- LOCALE CREATE_EMP VARCHAR2(12) DEFAULT 'some_emp', -- EMPLOYEE LAST_MDFY_EMP VARCHAR2(12) DEFAULT 'some_emp', -- EMPLOYEE
  • De-normalized fields NAME VARCHAR2(120), -- CONTACT FIRST_NAME VARCHAR2(45), -- CONTACT LAST_NAME VARCHAR2(45), -- CONTACT PHONE VARCHAR2(18), -- ADDRESS PHONE_EXT VARCHAR2(6) -- ADDRESS ) ;

A discussion: this model pertains to a particular type of sales process. Evidently, in this model, a LEAD _must_ have come from a particular company, _must_ be handled by a sales team which in turn _must_ have a contact person in the company (the not-nullable fields). The company not being nullable tells us this must be a B2B sales proces or you'll have to implement it in a very convoluted way (read: non-relational).

The nullable foreign keys are clearly optional. Now we may look at _each_ of them and get into the long discussion of what NULL is:-) Being a bit of a purist myself and knowledgeable of this particular process, I'd say that at least LEAD_TYPE_CODE merits a NOT NULL DEFAULT 'unknown', as the entire follow-up, including pricing, depends on it. Introducing this complication to any of the other fields would overcomplicate the model.

The candidate FK's are a bit more complicated. I consider them all design flaws, for different reasons.

  • The LOGISTICAL_COMP is a field outside the model, which relates a COMPANY to its back-office in a logistical package to which the sales application integrates, giving the pricing model. As such, it has no business in LEAD, rather in QUOTE and QUOTE_DTL, which are the tables holding the FK's to the price model area - you see, I'm reluctanct to call it an entity:-)
  • REAS_WON_CODE and REAS_LOST_CODE are mutually exclusive, this needs either different modelling, certainly a CHECK constraint. They should stay NULL until the deal was either won or lost (info held in LEAD_STAT_CODE), which implies a CHECK constraint to this effect. Btw, this gives you the case in which the optional character of the relation is temporary.
  • REFD_BY (referred by->some person) gives a name, making it impossible to search for the persons details, as such details are not held anywhere.
  • LOCALE_CODE, CREATE_EMP and LAST_MDFY_EMP are straightforward FK's to existing tables in the model and should have been implemented as such.

As for de-normalization, that's a very practical thing that relates to performance: one felt that the fields involved are important enough for the LEAD to hold in the table itself rather then first joining to the respective table to make them available. From a business perspective, a sales rep wants his/her contact name and phone number asap. It's arguably a correct solution.

To conclude: real business processes lead to real life data models. Nullable foreign keys, surrogate PK's (as oposed to natural PK's), CHECK constraints and de-normalizations are facts of life. Understanding the business process and compromising when necessary it's important, or we'll build beautiful models nobody can build on.

Regards,
Radu Received on Sun Oct 21 2001 - 12:39:53 CEST

Original text of this message