Re: Newbie - what's all this BCNF stuff anyway?

From: Larry Coon <larry_at_assist.org>
Date: Mon, 18 Nov 2002 15:01:14 -0800
Message-ID: <3DD9713A.4688_at_assist.org>


Seth Brewer wrote:

> Well, I wouldn't use them in a join table, say when I'm joining two tables
> in a many-to-many, but I would use the auto-incremented ID fields from the
> two joining tables as FKs. Why would someone say they are pure evil?

Joe Celko has been one of the most vocal opponents of identity columns. Here is one of his posts to this newsgroup on the subject:

<http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&selm=93de1q%24cjk%241%40nnrp1.deja.com>

But like I said, I'm not entirely in the same camp. Here's one of my posts on the subject, in reply to the above post:

<http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&selm=3A5B5C31.78AF%40assist.org>

> This may just further place my ignorace out on display, but wouldn't the
> WHERE clause in your SQL query address this?

No. Bad designs leave open the possibility for data anomalies. Queries don't fix the designs.

> > Whichever candidate key is chosen as the primary key, the
> > table will be in 3NF (I'll let you verify this for yourself).
> > However, it's not in BCNF because teacher is a determinant
> > but not a candidate key. An example anomaly that can result
> > is when we delete the row (Jones, Physics, Brown) -- we lose
> > the fact that Brown teaches Physics, which is a deletion
> > anomaly.
>
> YIKES! I though this meant it was NOT 3nf.

3NF simply means: A) It's in 2NF; and B) No dependencies among non-key attributes.

The 3NF definition relies on the 2NF definition, which is: A) It's in 1NF, and B) No dependencies exist where a key attribute determines a non-key attribute, where the key attribute is not the entire primary key...

...which relies on the definition of 1NF, which is that all attributes are scalar.

So the CLASS example -is- in 3NF, according to the definition. That's why BCNF was proposed -- to close the loophole.

> Therefore, I would have moved
> teacher into a new table, added the evil auto-incremented primary key and
> created this:
>
> CLASS
> student subject teacherID
> ------- ------- -------
> Smith Math 1
> Smith Physics 2
> Jones Math 1
> Jones Physics 3
>
> TEACHER
> teacherID name
> ------- -------
> 1 White
> 2 Green
> 3 Brown
>
> Am I missing the concept completely? If so, please slap me up side the head
> and send me somewhere that and idiot can go to learn more about this.

Your solution does not fix the problem, it only adds a layer of indirection. The teacherID is simply an alternate name for teacher, but the design of the class table is still the same, except the alternate name for teacher is now used. So the CLASS table is still in 3NF but still not in BCNF.

Again, what happens to the fact that Brown teaches Physics if you delete the row (Jones, Physics, 3) from the CLASS table? You lose that fact. This is a deletion anomaly.

What subject does Smith teach? You can't tell until you have a student enroll in one of Smith's classes. This is an insertion amonaly.

The presence of these anomalies indicate a bad design, which is what normalization is supposed to address. But anomalies can still exist with lower forms of normalization, which is why the higher normal forms exist.

The correct "fix" to the CLASS table would be to decompose it (break it up) like this:

 TEACHER-SUBJECT
teacher subject
------- -------

White     Math
Green     Physics
Brown     Physics


 STUDENT-TEACHER
student teacher
------- -------

Smith     White
Smith     Green
Jones     White
Jones     Green


Larry Coon
University of California
larry_at_assist.org
and lmcoon_at_home.com Received on Tue Nov 19 2002 - 00:01:14 CET

Original text of this message