Re: How to normalize this?
Date: Sat, 11 May 2013 12:20:14 +0200
Message-ID: <518e1b5e$0$6092$e4fe514c_at_dreader36.news.xs4all.nl>
On 2013-05-10 15:34:34 +0000, Jan Hidders said:
> On 2013-05-10 11:24:35 +0000, nvitacolonna_at_gmail.com said:
>
>>> >>> The old goal was to remove redundancy, something we can> define exactly >>> and could always achieve, albeit sometimes at the cost> of >>> inter-relational dependencies, but with INDs there's always a little> >>> bit of redundancy that we cannot really remove and probably don't want> >>> to. >> >> Indeed. >> >>> First let me define what I mean with redundancy: a database schema> >>> allows redundancy if there is an instance of that schema where there >>> is> a a particar field of a tuple that is derivable from the remainder >>> of> the instance. In other words, if we replace the value of the field >>> with> a variable x the we can derive the value of x. I will call a >>> schema>that does not allow redundancy, redundancy-free. >> >> Your definition is the so-called “value redundancy”, which >> characterizes BCNF. It has already been proved that any database schema >> with non trivial INDs cannot be devoid of value redundancy (see Levene >> & Millist's paper, Lemma 4.2). In that same paper, they give a more >> appropriate definition of redundancy and they prove that IDNF >> (inclusion dependency normal form) is equivalent to avoiding that kind >> of redundancy (a database schema in IDNF also avoids insertion and >> modification anomalies, in a sense that can be precisely defined). So, >> I think that this direction has been already explored and, unless one >> disagrees with the definitions, IDNF captures the salient notions >> adequately. >> >> For completeness, a database schema with a set of FDs F and a set of >> INDs I is in IDNF iff it is in BCNF with respect to F and the INDs >> represent acyclic referential integrity constraints (where the targets >> are keys, not super-keys).
>
> You are right (I have looked in the past at that paper, but apparently
> had forgotten about the details) but I don't agree that this is closed
> subject. By no means. Note that I am in a sense trying to generalize
> what they did for larger classes of dependencies. I have to do that
> since I want to address the questions of achievabality of the normal
> forms (under liberal/semi-conservative/conservative normalization, as I
> called them). Under conservative normalization it can happen that
> starting from a schema with FDs and INDs I end up with a schema with
> more complex TGDs then just INDs. Therefore I need to be able to take
> those into account. What is also missing in their work is a
> characterization of when you can get to IDNF, and an algorithm that
> when this is possible gets you there. I realized that strictly speaking
> such an algorithm will not exist because of the undecidability of
> reasoning over the dependencies in question, but it would still be
> interesting to see if it exists while assuming an oracle that can do
> the reasoning for you. It might be practical for certain restricted
> sets of dependencies where inference is computable. Maybe this is the
> case, for example for the dependencies implied by conservative
> normalization when starting out with only FDs.
>
> Note that the type of analyis I want to do is the one that tells you
> for example that even if you start with just FDs then (generalized)
> RFNF / BCNF is not always reachable, assuming you conservatively
> normalize. That was actually a bit of a suprise to me. I always assumed
> you could reach a redundancy-free normal form if you were willing to
> pay the price of a non-local dependency, but that is actually not true.
> So what about the cases where you can? Can I use the usual "split of
> violating dependencies" strategy without backtracking? It's that type
> of question that I'd like to investigate.
Answer: yes, the choice of dependency matters. Example:
R{A, B, C, D} with { B -> C, C -> D }
Verify this is not in BCNF and so also not in gRFNF(*). Suppose I start with splitting of B -> D (which of course also holds), then I get:
R1{B, D} R2{B, C} R3{A, B}
But this still allows redundancy because of B -> C (now translated to the corresponding EGD in this schema), and so is not in gRFNF. Yet, there is a better form that does not have this problem:
R1{B, C} R2{C, D} R3{A, B}
Conjecture: if you start with a single relation with a set of FDs then you can reach gRFNF while conservatively normalizing iff the Bernstein synthesis algorithm gets you there.
(*) By gRFNF I mean my generalzation of RFNF as presented earlier.
- Jan Hidders