Re: Newbie - what's all this BCNF stuff anyway?

From: Larry Coon <larry_at_assist.org>
Date: Tue, 19 Nov 2002 09:13:38 -0800
Message-ID: <3DDA7142.765D_at_assist.org>


Seth Brewer wrote:

> O.K. I'm starting to come around...
>
> > The correct "fix" to the CLASS table would be to decompose
> > it (break it up) like this:
>
> > TEACHER-SUBJECT
> > teacher subject
> > ------- -------
> > White Math
> > Green Physics
> > Brown Physics
> >
> >
> > STUDENT-TEACHER
> > student teacher
> > ------- -------
> > Smith White
> > Smith Green
> > Jones White
> > Jones Green
>
> Yes, I didn't think that through far enough. So now it's in 3nf AND BCNF?

Yes. In TEACHER-SUBJECT the functional dependency ("FD" from now on) is teacher --> subject. It's in 1NF because everything is scalar. It's a simple PK, so it's in 2NF by default. There are no FDs among non-key attributes, so it's in 3NF. And there is only one candidate key, so it's in BCNF by default.

For STUDENT-TEACHER, the same thing applies. The FD is student --> teacher, and the normalization works out exactly the same.

> I read the previous posts you sent, and I understand you are not
> anti-identity field, but I'm still left with some confusion...
>
> 1. True or false: Our example would be 3nf, but would NOT be BCNF if we had
> used identities because there would be another table involved (SUBJECT (ID,
> Subject)) and we would not be using natural keys.

I think you're lumping two different subjects together when you shouldn't be. The issues of normalization are separate and apart from the issuess of identity columns. In the previous CLASS table, you could use ALL surrogate keys and be in the same dilemma:

          CLASS
studID subID teachID
------- ------- -------

123      10       999
123      20       998
456      10       999
456      20       997

Here every column is replaced by a surrogate key, yet this is still not in BCNF. The FDs are the same, and so are the candidate keys, so the normal form is the same as well. Surrogate keys (such as identity values) are simply a different topic.

> 2. As it is, what if another teacher named White joins the faculty? (this is
> what I thought was a big part of the reason for identity fields.)

Sure, but again, this is a different topic. Here are three versions of the CLASS table, all of which are in 3NF but not BCNF:           CLASS1
student subject teacher
------- ------- -------
Smith Math White
Smith Physics Green
Jones Math White
Jones Physics Brown

          CLASS2
student subject teach_ln teach_fn
------- ------- -------- --------

Smith    Math     White     Jim
Smith    Physics  Green     John
Jones    Math     White     Jim
Jones    Physics  Brown     Joe

          CLASS

student subject teachID
------- ------- -------
Smith Math 999
Smith Physics 998
Jones Math 999
Jones Physics 997

Again, the identity field addresses a specific issue (unique identification), as does using teach_ln + teach_fn (unless two teachers have the same first + last), but this is not the same issue as the normalization issue.

Let's go back to the fully-surrogated version of the CLASS table:

          CLASS
studID subID teachID
------- ------- -------

123      10       999
123      20       998
456      10       999
456      20       997

Now let's fill in the related tables:

    STUDENT
studID student
------ -------

123     Smith
456     Jones

    SUBJECT
subID subject
----- -------

10     Math
20     Physics

    TEACHER
teachID teach_ln teach_fn
------- -------- --------

999      White     Jim
998      Green     John
997      Brown     Joe
996      Smith     Judy

Note that I added another teacher, Smith. What subject does she teach? You can't tell unless there's a row in the CLASS table in which 996 (Smith) appears. But you can't have a row in the CLASS table until a student enrolls in Smith's class. So a fact (the subject Smith teaches) can't be represented in this database unless something unrelated happens. That's an insertion anomaly. That's why we need BCNF.

> 3. Isn't it better to have something small like "123" in three places than
> something like "Johnjakobjinglehiemerschmidt" in two?

Yes, brevity is one argument for surrogate keys.

> 4. Could I (loosely) define BCNF as: The use of "natural" keys alone or in
> conjuction to establish a unique identity and/or relationship?

No. It doesn't have anything to do with "natural" keys, and unique identity is something you need anyway, but is a separate and distinct issue. If you're looking for a way to describe BCNF in as few words as possible, you might go with "No overlapping composite candidate keys."

> 5. Should I lose sleep over this?

Depends on what you're using you data for. Is it used in conjunction with a life-support system, where a data anomaly has the potential to result in a loss of life? Is it used in conjunction with a heavy-use financial system, where an anomaly can cost your company $1 million, for which someone's head will roll?

That being said, BCNF iteself isn't that big of an issue, because most 3NF designs are already in BCNF by default. Not that many tables have multiple, overlapping candidate keys. It's pretty easy to recognize it when it comes up, and remind yourself "don't do that."

Larry Coon
University of California
larry_at_assist.org
and lmcoon_at_home.com Received on Tue Nov 19 2002 - 18:13:38 CET

Original text of this message