Re: Constraints and Functional Dependencies

From: Marshall <marshall.spight_at_gmail.com>
Date: 26 Feb 2007 00:07:06 -0800
Message-ID: <1172477226.371735.142330_at_m58g2000cwm.googlegroups.com>


On Feb 25, 9:56 am, paul c <toledobythe..._at_oohay.ac> wrote:
> Marshall wrote:
> > ...
> > Hmmm. Can we express keyness otherwise? I can't think how.
> > ...
>
> I've long imagined that Codd would have defined it (as you also did) via
> a cartesian product but some others like to use cardinality, eg., adding
> a "count" operation to their theory. My guess is that they see this as
> something many implementations would want anyway.

I don't know why exactly but I have this vague aversion to using aggregate operators in constraints. I think I'm doing premature optimization, and should probably slap my own wrist for it, but I can't shake the feeling that given two ways to write a constraint, one using an aggregate, and one not, one should use the form without the aggregate.

Of course, I can imagine how enforcing constraints with aggregates could be trivially easy. And aggregates, especially count, being as transparent as they are, I'm probably doubly in the wrong.

Now that you mention it, a constraint that said that the count of a relation projected over the candidate key attributes equals the count of the unprojected relation would enforce uniqueness; that is, would also serve as a key constraint.

(Happily I managed to avoid the word "keyness" in this post.)

> I can't guess whether Codd would have also required a RENAME operator or
> whether he would have stuck with the math approach of numbering
> attributes for explanatory purposes.

The question of naming vs. positioning crops up various places in computer science. I'm specifically thinking of Backus's 1977 Turing award paper, "Can Programming Be Liberated from the Von Neumann Style?" which advocates for what is sometimes called "point free style" or coding without names, a la APL. I have to say I found the advocacy part of the paper totally unconvincing. My current thinking is names are the better choice about 97-98% of the time. The big advantage of point free style is being able to write things like:

  avg = sum / count

instead of the supposedly-cumbersome:

  avg(x) = sum(x) / count()

Woo hoo!

Nonetheless there are some very few cases where inferring or avoiding names entirely seems beneficial.

In fact, there is even some tiny support for this mentioned in TTM:

"If the <agg op name> is COUNT, <attribute ref> is irrelevant and must be omitted; otherwise, it can be omitted if and only if the <relation exp> denotes a relation of degree one, in which case the sole attribute of the result of <relation exp> is assumed by default."

(TTM, 2nd ed, page 71)

Which I take to mean you can invoke an aggregate directly on a relation, and not on a specific attribute in the case where either the relation has exactly one attribute or the aggregate operator is COUNT, or both. If R is a relation with one attribute, you can say "sum(R)" and it will infer the attribute name and sum over it.

> Then there is projection, always
> lurking in the background, and we can't ignore it without breaking who
> knows what.

Projection more than anything else is what dissatisfies me about the canonical relational algebra, and draws me to the Tropashko algebra. I mean, those operators are supposed to be *algebraic* and they don't even limit themselves to relation operands.

(And yes, Paul, I do owe you a longer post about that topic.)

> I sometimes wonder whether an alternative concept or two,
> such as a variation on D&D's GROUP/UNGROUP operators might allow
> definition of keys without rename or your prime operator. By itself I
> doubt whether there would be any practical point to examining that
> unless some other useful questions could be handled as well, perhaps
> deciding when two relations that involve seemingly different rva's are
> equivalent but I don't have anything actually useful to say about that.

I would like to explore group/ungroup and rva issues in general in more depth relatively soon. I don't really know the right questions to get the ball rolling, though.

The problem that primes solve is a fundamental one, though. There are other ways to solve it, but the problem itself derives directly from the system having named attributes and needing to refer to more than one instance of an attribute in a single constraint. Once you have both of those, you cannot escape needing to have multiple names for the same thing.

Marshall

PS. This was supposed to be a short post; what happened?! Received on Mon Feb 26 2007 - 09:07:06 CET

Original text of this message