Re: BCNF and relational algebra

From: Jonathan Leffler <jleffler_at_earthlink.net>
Date: Mon, 23 Feb 2004 04:36:54 GMT
Message-ID: <Grf_b.5791$aT1.1280_at_newsread1.news.pas.earthlink.net>


Perception wrote:
> I have the following relations (expressed as functional dependencies) which
> I consider to be in BCNF (please correct me if I'm wrong):
>
> Patient Identifier -> Patient Name, Ward
> Date
> Doctor -> Clinic

And later, Perception also wrote:

> Sorry for the misunderstanding. Specific dates correspond to when a patient
> visited a doctor. Thus if a date is in the relation, it means a patient had
> seen a doctor/attended a clinic on that date.
>
> The original un-normalised datebase had tuples of the form:
>
> Patient Id, Patient Name, Ward, Date, Clinic, Doctor
>
> with data of the form, for instance
>
> 3842, Smith, 2, 6.7.03, 4, Anderson
> 9.9.03, 3, Haswell
>
> corresponding to the above attributes for one particular patient.

Your BCNF breakdown bears little or no resemblance to the original data - you have lost lots of information, but the decomposition process is supposed to be lossless.

What are the unspoken rules about the data? Can a single patient see more than one doctor on a given date? Can they see the same doctor several times at the same clinic, in the same ward on the same date? Given the example data, it seems we can infer that both clinics 3 and 4 have a ward 2. We can clearly create a 1NF relation by replicating the patient ID, patient name, and ward values for the second row. We are then missing all the information that would help us deduce how this data can be normalized further.

The chances are good that Patient ID --> Patient Name.

Every other FD in your initial solution is disputable. A given doctor may work at several clinics. There is unlikely to be a rule that once a patient has been admitted to ward 2 at some clinic, thereafter, he must always be admitted to ward 2 at every clinic. As already discussed in other postings, the unary relation date is not significant.

You still have not given us all the information we need to do your homework for you, I'm afraid.

Note that relational databases work very largely by comparing data in relations (tables) on the basis of comparable attributes (columns). Depending on the ideological purity of your RTD (relational theory of data), the column names and types of the attributes (columns) should be the same in the two relations being compared. In your breakdown into 3 relations, there are no common attributes between any pair of the relations. As Codd delighted in saying, a database is partitionable into subsets c and C if the relations in c use no common domains (types) with those in C. Your initial solution is fully partionable into three relations. As you said in your original post, it is not possible to frame any queries between such disjoint tables.

-- 
Jonathan Leffler                   #include <disclaimer.h>
Email: jleffler_at_earthlink.net, jleffler_at_us.ibm.com
Guardian of DBD::Informix v2003.04 -- http://dbi.perl.org/
Received on Mon Feb 23 2004 - 05:36:54 CET

Original text of this message