Re: BCNF and relational algebra

From: Jonathan Leffler <jleffler_at_earthlink.net>
Date: Tue, 24 Feb 2004 06:14:00 GMT
Message-ID: <IYB_b.7208$aT1.5237_at_newsread1.news.pas.earthlink.net>


Perception wrote:

> "Jonathan Leffler" <jleffler_at_earthlink.net> wrote:

>>What are the unspoken rules about the data?  Can a single patient see
>>more than one doctor on a given date?  Can they see the same doctor
>>several times at the same clinic, in the same ward on the same date?

>
> All yes.

In that case, you don't have a primary key for the schema you outline (unless your interpretation of 'date' includes a sufficiently precise time component). If the 'date' only identifies the day, then the only way to record the fact that the same patient sees the same doctor several times at the same clinic in the same ward on the same date is by using some sort of duplicate record - a possibility which the RM (relational model) denies.

>>Given the example data, it seems we can infer that both clinics 3 and
>>4 have a ward 2.  We can clearly create a 1NF relation by replicating
>>the patient ID, patient name, and ward values for the second row.  We
>>are then missing all the information that would help us deduce how
>>this data can be normalized further.
>>
>>The chances are good that Patient ID --> Patient Name.
>>
>>Every other FD in your initial solution is disputable.  A given doctor
>>may work at several clinics.  There is unlikely to be a rule that once
>>a patient has been admitted to ward 2 at some clinic, thereafter, he
>>must always be admitted to ward 2 at every clinic.  As already
>>discussed in other postings, the unary relation date is not significant. 

>
> I actually mentioned in my first post that a doctor is only associated with
> one clinic, and that patients will not change wards during their stay (and
> that doctor names are unique). These were not an assumption on my part, but
> facts that came with the data. Sorry if I wasn't clearer.

OK - I didn't read that bit carefully enough; sorry.

So, we have an FD that Doctor --> Clinic. We also appear to have an FD that { Patient ID, Date } --> { Clinic, Ward }, though I'm not quite sure about the Clinic/Ward interrelationships.

>>Note that relational databases work very largely by comparing data in
>>relations (tables) on the basis of comparable attributes (columns).
>>Depending on the ideological purity of your RTD (relational theory of
>>data), the column names and types of the attributes (columns) should
>>be the same in the two relations being compared.  In your breakdown
>>into 3 relations, there are no common attributes between any pair of
>>the relations.  As Codd delighted in saying, a database is
>>partitionable into subsets c and C if the relations in c use no common
>>domains (types) with those in C.  Your initial solution is fully
>>partionable into three relations.  As you said in your original post,
>>it is not possible to frame any queries between such disjoint tables.

>
>
> Yes exactly, which is why this is all so confusing. If I left the data in
> 1NF then the FDs would be:
>
> Patient Id, Date, Doctor -> Patient Name

That is, I suppose, an FD, but it is not a minimal FD. The minimal FD is Patient ID --> Patient Name. The Date and Doctor have no effect on things.

> Patient Id, Date, Doctor -> Ward

You have not said that a given doctor only works on one ward at the given clinic.

> Patient Id, Date, Doctor -> Clinic

Again, you have a non-minimal FD; Doctor --> Clinic you said, so the augmented determinant { Patient ID, Date, Doctor } --> Clinic is also an FD, but it is not minimal.

> since (Patient Id, Date, Doctor) must be the primary key.

Not according to what you said at the start of this - a single patient can be visited several times on a given date by a given doctor.

However, I'm willing to think that you misunderstood what I was asking and that this system only records that a given patient saw the given doctor on the given date, but not how many times the doctor actually visited the patient (or, equivalently, that the doctor only sees the patient once on any given date). Under that modified defnition, { Patient ID, Date, Doctor } is the primary key.

> But also I can
> deduce that Patient Id -> Patient Name, and Patient Id -> Ward

No; { Patient ID, Date } --> Ward.

> and finally
> Doctor -> Clinic from the info I mentioned a couple of lines above. Also
> Date on its own doesn't determine anything.

Agreed.

> And basically that was how I
> decomposed my data into BCNF, by having three relations where Patient Id ->
> Patient Name, Ward and Doctor -> Clinic and Date which followed from the
> above observations. Clearly this is BCNF because the determinants are
> candidate keys (and obviously 3NF since full functional dependence and no
> transitivity). The puzzling thing is that I can't make any query on this (as
> mentioned), so either I've missed a functional dependence somewhere or there
> is an alternate way to express the data in BCNF (but I can't see how at
> all). Or perhaps I can make a query if I do a cross product of date with the
> other two relations and then join the results over date--is this
> possible/make sense (since the cross product operaton does not require
> common attributes)?

In another posting, you said:
> The original un-normalised datebase had tuples of the form:
>
> Patient Id, Patient Name, Ward, Date, Clinic, Doctor
>
> with data of the form, for instance
>
> 3842, Smith, 2, 6.7.03, 4, Anderson
> 9.9.03, 3, Haswell
>
> corresponding to the above attributes for one particular patient.

So, we have:

{ Patient ID, Date, Doctor } --> { Patient Name, Ward, Clinic }

But this is nowhere near BCNF because there are all sorts of other FDs too. Notably:

PATIENT:       { Patient ID } --> { Patient Name }
DOCTOR:        { Doctor } --> { Clinic }
PATIENT_VISIT: { Patient ID, Date } --> { Ward }

There is still an FD associating Patient, Date and Doctor: DOCTOR_VISIT: { Patient ID, Date, Doctor } --> { }

I'm not clear about how clinics and wards are related, if at all.

Given the four relations implied by the FDs named above (primary key on the LHS of the FD), it is possible to reconstruct your original data. There might also be some extra unidentified constraints related to clinics and wards - there's also a constraint that the Patient ID, Date combination in DOCTOR_VISIT must appear in PATIENT_VISIT; this is a foreign key constraint.

PATIENT:       3842 --> Smith
DOCTOR:        Haswell --> 3
                Anderson --> 4
PATIENT_VISIT: 3842, 6.7.03 --> 2
                3842, 9.9.03 --> 3
DOCTOR_VISIT:  3842, 6.7.03, Anderson --> { }
                3842, 9.9.03, Haswell  --> { }

These four relations can be joined on their various sets of common columns to recreate the original data. Oh well, I guess that sorts out your homework for you...

-- 
Jonathan Leffler                   #include <disclaimer.h>
Email: jleffler_at_earthlink.net, jleffler_at_us.ibm.com
Guardian of DBD::Informix v2003.04 -- http://dbi.perl.org/
Received on Tue Feb 24 2004 - 07:14:00 CET

Original text of this message