Re: Proposal: 6NF

From: Sampo Syreeni <decoy_at_iki.fi>
Date: Thu, 5 Oct 2006 20:36:34 +0300
Message-ID: <Pine.SOL.4.62.0610051811000.18113_at_kruuna.helsinki.fi>


On 2006-10-05, Roy Hann wrote:

> I always suspect an erroneous intention when I seem to need an outer
> join. Animal is a supertype of bird. Why should I expect to be able to
> ask how many eggs a camel lays?

Suppose you want a list of all animals which lay few to no eggs, with no eggs taken to include the inapplicability of the whole concept of laying anything. Furthermore, at the time your code is expected to be finalized, you only know that there will be animals, some of which lay eggs, but you don't know which kinds of properties all of the animals eventually described in the database might have. Structural multiple inheritance in its full generality is a factor, so your DBA insists that the joint presence and absence of attributes must be efficiently enforceable. At the current moment it must be that any animal which lays eggs also has a known wingspan, but unbeknownst to you, two years down the road a second kind of egg-laying animal will enter the database: the one with a known extinction date but no wingspan. Eventually your listing has to include both kinds of animals because both do lay eggs, but your code can no longer be changed at the time it finally dawns on the data architect that extinct species have to be included as well.

Evidently you can make all of this work by querying a union view of all animals, with nulls for the eggs_laid attribute for those species that don't lay anything, from the start.

You cannot implement the view without the nulls because the laying stuff is not applicable to all animals, and you want both the animals which lay few eggs and those to which the concept of laying eggs is inapplicable. You cannot factor out the number of eggs laid into its own relation because then it would be difficult to enforce the constraint that it occurs in conjunction with either wingspan or extinction date, but not both; each indivisible group of attributes has to live in a single relation, while partial overlaps between the sets of attributes possessed by each type can cause each individual attribute to live in multiple relations at the same time. Hence, at the time you compile your routine, you cannot know the name of the relation eventually containing the extinct species, so you cannot work around the problem by taking a union of projections in your code; in the interests of logical data independence, the union has to be on the database side. In principle you could use two views, one summing up all the animals with eggs_laid and one summing up all the ones without, but this approach seems rather inelegant to me, if applied more than once. Finally, it just so happens that your DBMS/DML doesn't support introspection, so that you cannot find all the relations containing the eggs_laid attribute dynamically, by catalog lookup.

The example is somewhat contrived, I know, and I'm not completely sure one couldn't enforce the typing constraints in some less cumbersome fashion. But still, I find this sort of thing interesting, if not else then because it gives a nontrivial example of horizontal splitting.

> Usually this kind of idiocy appears when using crappy tools that
> simply assume you're OK about outer joins and their consequent spray
> of nulls, instead of solving the problem properly. Report generators
> are a particular object of my loathing in this respect.

Agreed. Nulls aren't exactly an ease-of-use feature.

-- 
Sampo Syreeni, aka decoy - mailto:decoy_at_iki.fi, tel:+358-50-5756111
student/math+cs/helsinki university, http://www.iki.fi/~decoy/front
openpgp: 050985C2/025E D175 ABE5 027C 9494 EEB0 E090 8BA9 0509 85C2
Received on Thu Oct 05 2006 - 19:36:34 CEST

Original text of this message