Re: How to normalize this?

From: Jan Hidders <hidders_at_gmail.com>
Date: Sat, 11 May 2013 12:20:14 +0200
Message-ID: <518e1b5e$0$6092$e4fe514c_at_dreader36.news.xs4all.nl>


On 2013-05-10 15:34:34 +0000, Jan Hidders said:

> On 2013-05-10 11:24:35 +0000, nvitacolonna_at_gmail.com said:
>

>>> 
>>> The old goal was to remove redundancy, something we can> define exactly 
>>> and could always achieve, albeit sometimes at the cost> of 
>>> inter-relational dependencies, but with INDs there's always a little> 
>>> bit of redundancy that we cannot really remove and probably don't want> 
>>> to.
>> 
>> Indeed.
>> 
>>> First let me define what I mean with redundancy: a database schema> 
>>> allows redundancy if there is an instance of that schema where there 
>>> is> a a particar field of a tuple that  is derivable from the remainder 
>>> of> the instance. In other words, if we replace the value of the field 
>>> with> a variable x the we can derive the value of x. I will call a 
>>> schema>that does not allow redundancy, redundancy-free.
>> 
>> Your definition is the so-called “value redundancy”, which 
>> characterizes BCNF. It has already been proved that any database schema 
>> with non trivial INDs cannot be devoid of value redundancy (see Levene 
>> & Millist's paper, Lemma 4.2). In that same paper, they give a more 
>> appropriate definition of redundancy and they prove that IDNF 
>> (inclusion dependency normal form) is equivalent to avoiding that kind 
>> of redundancy  (a database schema in IDNF also avoids insertion and 
>> modification anomalies, in a sense that can be precisely defined). So, 
>> I think that this direction has been already explored and, unless one 
>> disagrees with the definitions, IDNF captures the salient notions 
>> adequately.
>> 
>> For completeness, a database schema with a set of FDs F and a set of 
>> INDs I is in IDNF iff it is in BCNF with respect to F and the INDs 
>> represent acyclic referential integrity constraints (where the targets 
>> are keys, not super-keys).

>
> You are right (I have looked in the past at that paper, but apparently
> had forgotten about the details) but I don't agree that this is closed
> subject. By no means. Note that I am in a sense trying to generalize
> what they did for larger classes of dependencies. I have to do that
> since I want to address the questions of achievabality of the normal
> forms (under liberal/semi-conservative/conservative normalization, as I
> called them). Under conservative normalization it can happen that
> starting from a schema with FDs and INDs I end up with a schema with
> more complex TGDs then just INDs. Therefore I need to be able to take
> those into account. What is also missing in their work is a
> characterization of when you can get to IDNF, and an algorithm that
> when this is possible gets you there. I realized that strictly speaking
> such an algorithm will not exist because of the undecidability of
> reasoning over the dependencies in question, but it would still be
> interesting to see if it exists while assuming an oracle that can do
> the reasoning for you. It might be practical for certain restricted
> sets of dependencies where inference is computable. Maybe this is the
> case, for example for the dependencies implied by conservative
> normalization when starting out with only FDs.
>
> Note that the type of analyis I want to do is the one that tells you
> for example that even if you start with just FDs then (generalized)
> RFNF / BCNF is not always reachable, assuming you conservatively
> normalize. That was actually a bit of a suprise to me. I always assumed
> you could reach a redundancy-free normal form if you were willing to
> pay the price of a non-local dependency, but that is actually not true.
> So what about the cases where you can? Can I use the usual "split of
> violating dependencies" strategy without backtracking? It's that type
> of question that I'd like to investigate.

Answer: yes, the choice of dependency matters. Example:

R{A, B, C, D} with { B -> C, C -> D }

Verify this is not in BCNF and so also not in gRFNF(*). Suppose I start with splitting of B -> D (which of course also holds), then I get:

R1{B, D} R2{B, C} R3{A, B}

But this still allows redundancy because of B -> C (now translated to the corresponding EGD in this schema), and so is not in gRFNF. Yet, there is a better form that does not have this problem:

R1{B, C} R2{C, D} R3{A, B}

Conjecture: if you start with a single relation with a set of FDs then you can reach gRFNF while conservatively normalizing iff the Bernstein synthesis algorithm gets you there.

(*) By gRFNF I mean my generalzation of RFNF as presented earlier.

  • Jan Hidders
Received on Sat May 11 2013 - 12:20:14 CEST

Original text of this message