Re: Functions and Relations

From: Marshall <marshall.spight_at_gmail.com>
Date: 18 Nov 2006 13:51:05 -0800
Message-ID: <1163886665.418503.107140_at_h48g2000cwc.googlegroups.com>


On Nov 17, 12:19 pm, "Aloha Kakuikanu" <aloha.kakuik..._at_yahoo.com> wrote:
> > NENASHI, Tegiri wrote:
> > How one presents 'sqrt' as a relation in the database ? Please show.
> This looks like a good opportunity to post a reworked snippet from my
> exchange with Marshall.
>
> Function u=f(x,y,z) in general is a predicate Pf(x,y,z,u) with some
> additional constraint. Given that a predicate is often an infinite
> relation we have obvious difficulty implementing it. Let's discuss this
> later, but focus on the benefits of representing functions as
> relations. We have
>
> 1. Function invocation is a join:
>
> f(x) /\ x=1

(A small detail, but function invocation (I like to say "application") would
more usually be defined as join followed by projection over the result attributes.) We can be more general, however, and define a form of application that also encompasses partial application. Thus, application
is join followed by projection over the symmetric difference of the operand attributes. This lets us do things like

predicate Power = {x, y, z | x ^ y = z}

partial application:

Square = Power /\ `y=2` \/ `x, z`

(The join with the empty (x,z) relation is what differentiates plain join from application.)

We are left with a function Square(x) = x^2

Thus, partial application (defined only in terms of the relational lattice operations) can encompass what functional programming languages call Currying.

> 2. Function composition is a join:
>
> g(f(x)) <==> g(z,y) /\ f(y,x)

Also an exciting result.

There is a subtle concern here with variable capture aka alpha conversion. In other words, we can run into trouble when f and g have attributes with names in common that we didn't expect. We can get out of it via the usual way: renaming. I recall that the original Tropashko paper pointed out how renaming could also be defined only in relational lattice terms.

> Suppose we have the relation R(x,y) where y=x^2. No function yet, just
> a relation.
>
> As it has two columns the simplest joins we can think of are:
>
> R /\ `y=9`
>
> R /\ `x=3`
>
> where `y=9` and `x=3` are constant unary relations.
>
> In the first case, we probably want to project the resulting relation
> to `x`
>
> (R /\ `y=9`) \/ 'x'
>
> in the second case to `y`
>
> (R /\ `x=3`) \/ 'y'
>
> Informally, in the first case we want to know the result of sqrt(9), in
> the first case just 3^2
>
> How the join is evaluated in both cases? Well, let's go into the second
> case, because it's easier. As one relation is infinite, the only
> feasible join method is nested loops. Moreover, we have to start from
> the relation which is finite, that is `x=3`. Now, there are the two
> possibilities:
> 1. Scan the whole R relation and find all the matching tuples. Not
> feasible too!

Indeed, the difference between the two kinds of relations seems to be exactly that: intensional relations are computation and extensional relations are the result of computation! Thus, we cannot scan the relation until/unless we have performed the computation.

> 2. Find matching tuples by a some kind of index:
>
> create index unique_x on R(x);
>
> (pseudo SQL syntax). This index is not a conventional b-tree of course,
> as all what is needed to do when x is known, and y is not is just to
> calculate y by a simple formula x^2. The corollary here is that the
> function x->x2 is essentially an index.

I love this perspective; it is so totally foreign to me, and yet it has that same property of an elegant proof, in that when you get it, it makes you smile.

> The other case is only marginally more complex. Likewise, we quickly
> arrive to the conclusion that the only feasible execution is the
> indexed nested loops with the scan of the `y=9` as the leading
> relation. Then, the required index is the function y->sqrt(y).
>
> In a word, functions to the predicates are what indexes to the tables
> in traditional RDBMS are. In RDBMS world there are index organized
> tables. In the functions analogy we have "function organized"
> predicates.
>
> Indexes are not something that is supposed to be exposed to the end
> user, at least in theory. Indexes should be created/destroyed
> automatically by RDBMS engine. In todays imperfect world this is done
> by DBAs.
>
> This little snippet is just an unconventional perspective to Meyer's
> "design-by-contract" idea. The sqrt() function interface is defined
> by the assertion y=x^2, and the programmer's job is to write the
> sqrt() implementation. We see that functions being implementational
> detail has much in common with indexes being implementation details
> too.

Since I am in the business of what should have been called Computation Science, I am more inclined to view the index as primary and the predicate as metadata. In the typed lambda calculus, the predicate would be approximately comparable to the type and the "index" as you call it would be comparable to the lambda form. (And yet ...)

> Here is little more complex example. For x^2+y^2=1, there are 3 access
> methods:
> 1. Given x, and y return {(x,y)} if they satisfy the equation, and {}
> otherwise.
> 2. Given x return {(x,sqrt(1-x^2)),(x,-sqrt(1-x^2))} or empty set.
> 3. Given y return {(sqrt(1-y^2),y),(-sqrt(1-x^2),y)} or empty set.
> Once again the way to represent infinite predicates in future
> relational systems is via access path specification.

And yet, the above shows that my earlier viewpoint is incomplete. Here we see three distinct lambda forms described, and yet it is clear from the above that while distinct, they are all explicitly related.

I'm short on time so I have to sign off before I'm done, but I'll just briefly mention the predicate DivMod over integers { x, y, z, r | x, y, z, r : N, x * y = z + r }

Marshall Received on Sat Nov 18 2006 - 22:51:05 CET

Original text of this message