MERGE as the imperative form of aggregation

From: Marshall <marshall.spight_at_gmail.com>
Date: 7 Apr 2007 12:46:24 -0700
Message-ID: <1175975183.973567.299670_at_l77g2000hsb.googlegroups.com>



INSERT is the imperative form UNION. There are some restrictions, because imperative statements must operate on variables, and so are restricted in what they can do about types.

  declare newRows;

  var v;
  v = v UNION newRows;

does the same thing as

  INSERT into v values newRows;

Because we are assigning the result of the union back into v, newRows and v have to have the same header.

Similarly, DELETE is the imperative form of SUBTRACT. DELETE could be defined
to remove a relation, but instead it is usually expressed as a predicate P(x), where the type of x is the header of v. (Tuple type instead of relation type.)

  v = v MINUS (v where P(x));

  DELETE from v where P(x);

UPDATE is a little weird; it specifies
both an indicator predicate, like DELETE does, and also a function f:x->x', where the type of x and x' is the row type. It can also be expressed as a delete and
an insert accordingly.

For a while I had the idea that there would likely be a whole bunch of useful imperative statements, defined in terms of assignment and relational operators. But that doesn't seem to be true. Outside of INSERT/DELETE/UPDATE there doesn't seem to be much.

But there is MERGE.

Merge is annoying in that the way it is usually specified it takes a lot of parameters. Wikipedia gives its general form thusly:

    MERGE INTO table_name USING table_name ON (condition)     WHEN MATCHED THEN
      UPDATE SET column1 = value1 [, column2 = value2 ...]     WHEN NOT MATCHED THEN
      INSERT column1 [, column2 ...] VALUES (value1 [, value2 ...])

It's clear why all those parameters are there, but it's still a mess. Furthermore, I find that I only ever want to use it in a more restricted sense. The UPDATE part and the INSERT part are completely uncorrelated, but that's more generality than is useful. In fact, the UPDATE and the INSERT ought to be thought of as the same transformation. The UPDATE is a transform on an existing row, and the INSERT ought to be that same transform applied to the identity for that transform. The existing row must be uniquely identified (that is, a key must be specified.)

Example: I am maintaining a count of something on a per-foo basis. I have a table of foo ids and counts. For all foo ids that are not present in the count table, the count is 0. The sort of MERGE I would do is

  MERGE into FooCount f using
    (select id, count from NewFooCount) f'     ON f.id = f'.id
  WHEN MATCHED THEN
    UDPATE f.count = f.count + f'.count
  WHEN NOT MATCHED THEN
    insert id, count values (id, count)

So, what else has a transform that has a concept of identity and also specifies a key? An aggregate with GROUP BY.

  MERGE NewFooCount f' into FooCount f
  GROUP BY id
  SET f.count = sum(f.count, f'.count);

It's less general, but syntactically and conceptually cleaner and simpler.

Thus, MERGE can be seen as the imperative form of GROUP BY and aggregates.

Comments, questions, criticisms?

Marshall Received on Sat Apr 07 2007 - 21:46:24 CEST

Original text of this message