Re: MERGE as the imperative form of aggregation

From: Aloha Kakuikanu <aloha.kakuikanu_at_yahoo.com>
Date: 9 Apr 2007 12:06:02 -0700
Message-ID: <1176145562.666066.17210_at_l77g2000hsb.googlegroups.com>


On Apr 7, 11:46 am, "Marshall" <marshall.spi..._at_gmail.com> wrote:
> But there is MERGE.
>
> Merge is annoying in that the way it is usually
> specified it takes a lot of parameters. Wikipedia
> gives its general form thusly:
>
> MERGE INTO table_name USING table_name ON (condition)
> WHEN MATCHED THEN
> UPDATE SET column1 = value1 [, column2 = value2 ...]
> WHEN NOT MATCHED THEN
> INSERT column1 [, column2 ...] VALUES (value1 [, value2 ...])
>
> It's clear why all those parameters are there, but it's
> still a mess. Furthermore, I find that I only ever want to
> use it in a more restricted sense. The UPDATE part
> and the INSERT part are completely uncorrelated, but
> that's more generality than is useful. In fact, the UPDATE
> and the INSERT ought to be thought of as the same
> transformation. The UPDATE is a transform on an
> existing row, and the INSERT ought to be that same
> transform applied to the identity for that transform.
> The existing row must be uniquely identified (that is,
> a key must be specified.)

I personally didn't invest even a minute of my time looking into what Merge is -- you post changed that:-) First of all, there is a nice symmetry of insert and delete, and update is a combination of the insert and delete. So why don't we go along generalizing update, instead of introducing an ugly combination of insert and update?

Next, there are insert, update, and delete triggers. Now merge is expected to be combination of insert, update; naturally insert and update triggers are expected to be fired when issuing a merge statement?

> Example: I am maintaining a count of something
> on a per-foo basis. I have a table of foo ids and
> counts. For all foo ids that are not present in the
> count table, the count is 0. The sort of MERGE
> I would do is
>
> MERGE into FooCount f using
> (select id, count from NewFooCount) f'
> ON f.id = f'.id
> WHEN MATCHED THEN
> UDPATE f.count = f.count + f'.count
> WHEN NOT MATCHED THEN
> insert id, count values (id, count)
>
> So, what else has a transform that has a concept of
> identity and also specifies a key? An aggregate
> with GROUP BY.
>
> MERGE NewFooCount f' into FooCount f
> GROUP BY id
> SET f.count = sum(f.count, f'.count);
>
> It's less general, but syntactically and conceptually
> cleaner and simpler.
>
> Thus, MERGE can be seen as the imperative form of
> GROUP BY and aggregates.

In a way insert and delete statements change counts too. Consider converting a relation into characteristic function:

R(x,y,z) --> RC(x,y,z,cnt)

where RC is an infinite relation. For every tuple (x,y,z) in the R, we have tuple (x,y,z,1) in the RC. For every tuple (x,y,z) which is not in the R,
we have tuple (x,y,z,0) in the RC.

Now, insert, delete, or update on the original relation is just an update of the cnt column of the RC. Received on Mon Apr 09 2007 - 21:06:02 CEST

Original text of this message