Re: How to normalize this?

From: <nvitacolonna_at_gmail.com>
Date: Fri, 10 May 2013 04:24:35 -0700 (PDT)
Message-ID: <8ea50cab-aefe-47c8-9ff7-1ab190ba2f43_at_googlegroups.com>


> The old goal was to remove redundancy, something we can
> define exactly and could always achieve, albeit sometimes at the cost
> of inter-relational dependencies, but with INDs there's always a little
> bit of redundancy that we cannot really remove and probably don't want
> to.

Indeed.

> First let me define what I mean with redundancy: a database schema
> allows redundancy if there is an instance of that schema where there is
> a a particar field of a tuple that is derivable from the remainder of
> the instance. In other words, if we replace the value of the field with
> a variable x the we can derive the value of x. I will call a schema
>that does not allow redundancy, redundancy-free.

Your definition is the so-called “value redundancy”, which characterizes BCNF. It has already been proved that any database schema with non trivial INDs cannot be devoid of value redundancy (see Levene & Millist's paper, Lemma 4.2). In that same paper, they give a more appropriate definition of redundancy and they prove that IDNF (inclusion dependency normal form) is equivalent to avoiding that kind of redundancy (a database schema in IDNF also avoids insertion and modification anomalies, in a sense that can be precisely defined). So, I think that this direction has been already explored and, unless one disagrees with the definitions, IDNF captures the salient notions adequately.

For completeness, a database schema with a set of FDs F and a set of INDs I is in IDNF iff it is in BCNF with respect to F and the INDs represent acyclic referential integrity constraints (where the targets are keys, not super-keys).

Despite its nice properties, this normal form may be too restrictive in two respects: first, cyclic dependencies arise in practice (the authors, at the end of the paper, propose to relax IDNF by allowing INDs that express “pairwise consistency”, as in your example); second, are designs with INDs that are not key-based necessarily “bad”? One motivation for restricting INDs that way is that those conditions imply that FDs and IND do not interact (in particular, the keys of each schema can be determined based only on the FDs on that schema).  

> Clearly the goal is not being completely redundancy-free, although it
> is of course better if we can achieve that.

Definitely.

> So we are maximally normalized if:
> 1. There are no columns that can be omitted, and
> 2. The schema is redundancy-free if we consider all derivable
> dependencies except the INDs.

It would be interested to verify how this compares with the above. I am not sure whether I understand exactly what you mean by 2.

> Final note on the type of operations we can use when normalizing; I
> suspect we should also allow equi-joins, or actualy any conjunctive
> query.

So, your starting point is an arbitrary set of schemas instead of one, to which you add FDs and inter-relational constraints as per your requisites, and then you apply some rule to split as well as merge the schemas. Sounds challenging :) Btw, is there any intrinsic difference in considering several schemas instead of a single universal schema, as far as this process is concerned?

Nicola Received on Fri May 10 2013 - 13:24:35 CEST

Original text of this message