Re: Newbie - what's all this BCNF stuff anyway?

From: Seth Brewer <fieury_at_hotmail.com>
Date: Tue, 19 Nov 2002 20:25:25 GMT
Message-ID: <V6xC9.53742$P31.24986_at_rwcrnsc53>


O.K. I think I'm begining to see the light. Thanks for sticking with me! I thought I might be mixing up concepts! I'm going to start watching for this and using it in practice when I recognize it.

Here is the way I would have approached the data we have defined. If you are a believer in the evil of identies it is a nightmare, but I believe it to be 3nf and BCNF (there are no overlapping CKs). I will warrent that it makes things more complex and scripting querys might be a little scary, but I feel more comfortable with it because it eliminates the possibilty of encountering two teachers with the same first and last name. I don't see any way to define PKs and CKs within this and eliminate any of the identity columns. Of course if the teachers and students had SSNs in their tuples then we could eliminate the identity field in those tables (am I anywhere near getting it?)

TEACHERS

TchID        name
----            ----
1                White
2                Green
3                Brown

STUDENTS
StuID        name
----            ----
1                Smith
2                Jones

SUBJECTS
SubID        name
----            ----
1                Math
2                Physics

JOIN_TEACHERS_SUBJECTS
TSID        TchID        SubID
----            ----            ----
1                1               1
2                2               2
3                3               2

JOIN_STUDENTS
StuID        TSID
----            ----
1                1
1                2
2                1
2                3

Again, thanks so much Larry, this has been very enlightening.

Seth

"Larry Coon" <larry_at_assist.org> wrote in message news:3DDA7142.765D_at_assist.org...
> Seth Brewer wrote:
>
> > O.K. I'm starting to come around...
> >
> > > The correct "fix" to the CLASS table would be to decompose
> > > it (break it up) like this:
> >
> > > TEACHER-SUBJECT
> > > teacher subject
> > > ------- -------
> > > White Math
> > > Green Physics
> > > Brown Physics
> > >
> > >
> > > STUDENT-TEACHER
> > > student teacher
> > > ------- -------
> > > Smith White
> > > Smith Green
> > > Jones White
> > > Jones Green
> >
> > Yes, I didn't think that through far enough. So now it's in 3nf AND
BCNF?
>
> Yes. In TEACHER-SUBJECT the functional dependency ("FD" from
> now on) is teacher --> subject. It's in 1NF because everything
> is scalar. It's a simple PK, so it's in 2NF by default. There
> are no FDs among non-key attributes, so it's in 3NF. And there
> is only one candidate key, so it's in BCNF by default.
>
> For STUDENT-TEACHER, the same thing applies. The FD is
> student --> teacher, and the normalization works out exactly
> the same.
>
> > I read the previous posts you sent, and I understand you are not
> > anti-identity field, but I'm still left with some confusion...
> >
> > 1. True or false: Our example would be 3nf, but would NOT be BCNF if we
had
> > used identities because there would be another table involved (SUBJECT
(ID,
> > Subject)) and we would not be using natural keys.
>
> I think you're lumping two different subjects together when you
> shouldn't be. The issues of normalization are separate and
> apart from the issuess of identity columns. In the previous
> CLASS table, you could use ALL surrogate keys and be in the
> same dilemma:
>
> CLASS
> studID subID teachID
> ------- ------- -------
> 123 10 999
> 123 20 998
> 456 10 999
> 456 20 997
>
> Here every column is replaced by a surrogate key, yet this
> is still not in BCNF. The FDs are the same, and so are the
> candidate keys, so the normal form is the same as well.
> Surrogate keys (such as identity values) are simply a
> different topic.
>
> > 2. As it is, what if another teacher named White joins the faculty?
(this is
> > what I thought was a big part of the reason for identity fields.)
>
> Sure, but again, this is a different topic. Here are three
> versions of the CLASS table, all of which are in 3NF but not
> BCNF:
>
> CLASS1
> student subject teacher
> ------- ------- -------
> Smith Math White
> Smith Physics Green
> Jones Math White
> Jones Physics Brown
>
>
> CLASS2
> student subject teach_ln teach_fn
> ------- ------- -------- --------
> Smith Math White Jim
> Smith Physics Green John
> Jones Math White Jim
> Jones Physics Brown Joe
>
> CLASS
> student subject teachID
> ------- ------- -------
> Smith Math 999
> Smith Physics 998
> Jones Math 999
> Jones Physics 997
>
> Again, the identity field addresses a specific issue (unique
> identification), as does using teach_ln + teach_fn (unless
> two teachers have the same first + last), but this is not
> the same issue as the normalization issue.
>
> Let's go back to the fully-surrogated version of the CLASS
> table:
>
> CLASS
> studID subID teachID
> ------- ------- -------
> 123 10 999
> 123 20 998
> 456 10 999
> 456 20 997
>
> Now let's fill in the related tables:
>
> STUDENT
> studID student
> ------ -------
> 123 Smith
> 456 Jones
>
> SUBJECT
> subID subject
> ----- -------
> 10 Math
> 20 Physics
>
> TEACHER
> teachID teach_ln teach_fn
> ------- -------- --------
> 999 White Jim
> 998 Green John
> 997 Brown Joe
> 996 Smith Judy
>
> Note that I added another teacher, Smith. What subject does
> she teach? You can't tell unless there's a row in the CLASS
> table in which 996 (Smith) appears. But you can't have a row
> in the CLASS table until a student enrolls in Smith's class.
> So a fact (the subject Smith teaches) can't be represented in
> this database unless something unrelated happens. That's an
> insertion anomaly. That's why we need BCNF.
>
> > 3. Isn't it better to have something small like "123" in three places
than
> > something like "Johnjakobjinglehiemerschmidt" in two?
>
> Yes, brevity is one argument for surrogate keys.
>
> > 4. Could I (loosely) define BCNF as: The use of "natural" keys alone or
in
> > conjuction to establish a unique identity and/or relationship?
>
> No. It doesn't have anything to do with "natural" keys, and
> unique identity is something you need anyway, but is a separate
> and distinct issue. If you're looking for a way to describe
> BCNF in as few words as possible, you might go with "No
> overlapping composite candidate keys."
>
> > 5. Should I lose sleep over this?
>
> Depends on what you're using you data for. Is it used in
> conjunction with a life-support system, where a data anomaly
> has the potential to result in a loss of life? Is it used
> in conjunction with a heavy-use financial system, where an
> anomaly can cost your company $1 million, for which someone's
> head will roll?
>
> That being said, BCNF iteself isn't that big of an issue,
> because most 3NF designs are already in BCNF by default. Not
> that many tables have multiple, overlapping candidate keys.
> It's pretty easy to recognize it when it comes up, and remind
> yourself "don't do that."
>
>
> Larry Coon
> University of California
> larry_at_assist.org
> and lmcoon_at_home.com
Received on Tue Nov 19 2002 - 21:25:25 CET

Original text of this message