Re: Reigniting Probability theory debate

From: <compdb_at_hotmail.com>
Date: Mon, 16 Apr 2012 15:12:27 -0700 (PDT)
Message-ID: <20307219.15.1334614347967.JavaMail.geo-discussion-forums_at_pboo1>

On Tuesday, 10 April 2012 15:04:59 UTC-7, Tegiri Nenashi wrote:
> Several years ago there was a lively discussion here on c.d.t about
> parallels between PT and RT. Most people defended the analogy, while
> some refused to accept it. Here is my take:
>
> http://cstheory.stackexchange.com/questions/10873/is-analogy-between-database-and-probability-concepts-legitimate

Having now read the paper, the situation is as expected. (Except its notion of distribution has a very simple relation representation.)

This paper has nothing to do with the notion of tuples' associated 0-to-1 values that was discussed in comp.databases.theory.

1.
It is a great paper. Although it could be clearer about what it means by "correspond" and "parallel". (Hill also doesn't really understand the relational model: why its values and operators and primitive operator set are what they are.)

The paper defines functions representing relations and functions representing probability mass distributions. The functions have domains that are all possible tuples on some attributes. A relation's indicator function returns 0-or-1 per whether a tuple is absent from or present in the relation. A distribution's probability function returns a 0-to-1 probability. By definition the sum of a probability function over all tuples is 1.

The paper gives certain definitions and theorems with parallel structure in expressions and semantics. The definitions and theorems involve the function representations of relations and distributions. (The parallels are not about tuple-based representations per se.) It does not purport to prove them or to make any other claim than that there are a lot of parallels.

2.
The paper does not associate probabilities with tuples in the relation viewpoint. Its use of an indicator function has nothing to do with probabilities.

The paper would be clearer if the indicator functions were boolean instead of 0-or-1. Because Hill uses multiplication (implicitly expressed by adjacency) as a hack to express conjunction: 0-or-1 TIMES 0-or-1 for F-or-T AND F-or-T. Only the distribution semantics has multiplication.

3.
We can define for a distribution a corresponding "positive" relation whose indicator is zero iff the probability is: its tuples are those the distribution gives positive probabilities. A distribution operator maps its arguments' positive relations to its result's positive relation the way its corresponding relation operator maps those positive relations. (There is a homomorphism from distributions to relations.) The paper doesn't mention this. But it isn't needed by the paper!

Since every distribution has probabilities summing to 1, there is at least one tuple in every distribution, and no distribution has an empty positive relation. This is irrelevant to the paper!

Informally, the fact that the relation theorems are parallel to distribution theorems means that they are either about the values returned by functions or about functions; ie they are about non-empty relations. (Formally, they are vacuously true for empty relations.)

Interestingly the (only) non-emtpy 0-attribute relation, holding the (only) 0-attribute tuple, corresponds to the (only) distribution on the (only) 0-dimensional space, namely the one giving 1 for its (only) point, the (only) 0-dimensional point.

4.
The set of tuples mapped by a distribution to positive values can be represented by a relation. (Namely its positive relation.) So distribution operations that drop indicated attributes (marginals) or that drop tuples per conditions (conditionals) have definitions and theorems reminiscent of those of relation projection and restriction (respectively).

A distribution can be represented by a relation. (Namely the relation got from its positive relation by extending each tuple by an attribute giving its image in the probability function.) So some theorems about distributions as mappings are like theorems about relations as mappings.

A relation can be represented by a relation got from it by extending each tuple by an attribute with the value 1. So some theorems about relations having given attributes plus a 0-to-1 one will be similar for such relations and relations representing distributions. And some theorems will be the very same for the 0-to-1 extra-attribute relation representations of relations and distributions. Put another way, some theorems about functions on given attributes returning 0-to-1 will apply to both relation indicator functions and to distribution probability functions.

5.
Is it not obvious what a distribution union is, ie when an event can be generated from either of two independent distributions on the same variables, and that it would involve addition and division by 2, and its positive relation the union of argument positive relations? (Even though that still has nothing to do with corresponding primitive operator sets.)

6.
I don't know what you mean by "the analogy is weak". (FD vs transformation constraint.) The paper is clear about the "analogy". It gives precise definitions and theorems. Read the paper more closely. Don't jump to conclusions about the paper's claims and then criticize a misinterpretation of them.

In conclusion (1) (a) there is no correspondence set up with relations let alone empty relations; but there's nothing wrong with that and (b) it happens that there is a correspondence between distributions and non-empty relations; but there's nothing wrong with that and (c) the paper does not use "indicator function value as probability"; but absent relation tuples are nevertheless associated with zero; but there's nothing wrong with that and (2) (a) there is no correspondence set up with primitive operator sets; but there is no such correspondence; so there's nothing wrong with that and (b) distribution union has a simple correspondence with relation union. You don't seem to understand the senses of the correspondences and parallels.

philip Received on Tue Apr 17 2012 - 00:12:27 CEST

This message: [ Message body ]
Next message: Sampo Syreeni: "Re: Is the party over?"
Previous message: Troels Arvin: "Re: Ping everybody (was: Is the party over?)"
In reply to: compdb_at_hotmail.com: "Re: Reigniting Probability theory debate"
In reply to Tegiri Nenashi: "Reigniting Probability theory debate"
Next in thread: Vadim Tropashko: "Re: Reigniting Probability theory debate"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

Original text of this message