Re: Reigniting Probability theory debate

From: Sampo Syreeni <decoy_at_iki.fi>
Date: Mon, 16 Apr 2012 16:46:33 -0700 (PDT)
Message-ID: <232b316e-9a4e-4d89-812e-1423e97249a4_at_do4g2000vbb.googlegroups.com>


On Apr 17, 1:12 am, com..._at_hotmail.com wrote:

I would seriously consider looking at the cloud of the articles revolving around this one: http://dl.acm.org/citation.cfm?id=588011.588050 . I mean, those "weak multivalued dependencies" and their ilk -- in two precise articles which I cannot find, don't remember the author of, and which I don't have access to anymore thanks to being laid off -- basically prove that fourth normal form in database normalization is equivalent to the natural factorization within bayesian probabilistic networks.

The only difference is that you put in an extra attribute in your database which denotes the basic marginal probability of your tuple being true, and then follow the natural rules of probability calculus when taking joins or cartesian products: the rest of the columns become attributes/selection conditions upon the state of a closed world, any union of such states becomes a sum on the probability column, any intersection becomes a minus, and any join becomes a multiplication+sum.

The first paper in the series shows that normalizing something to 4NF, with the probability column present, is consistent with the resulting database representing a base form Bayesian network. The second one responds to criticism where it was claimed that the RM-BN-analogy leads to a contradiction. The way it shows that is essentially the same which was in its time used to show that not all universal relations have a decomposition as a natural join of their projections into their constituent, smaller relations. Only in this case the argument was played in something of a reverse: since Bayesian networks can never lead to the kinds of problems described, by their structure, wrt the relational model, then we can derive an easy contradiction from the supposed claim of inequivalence.

Seriously, look it up, even if I can't find a proper reference to the first article. It has something to do with 4NF, Bayesianism, weak multivalued normal forms, it's only about two little-known articles with a single author in both, it's probably something that was published by the ACM, and so on. But I at least thought it was a beautiful result, if superficially trivial, when I saw it. At the deeper level it tells us something about why that "independent component" and "normalization" business happened in the first place, in the history of the development of the relational model. Received on Tue Apr 17 2012 - 01:46:33 CEST

Original text of this message