Oracle FAQ Your Portal to the Oracle Knowledge Grid

Home -> Community -> Usenet -> comp.databases.theory -> Re: MERGE as the imperative form of aggregation

Re: MERGE as the imperative form of aggregation

From: Brian Selzer <>
Date: Mon, 16 Apr 2007 03:16:52 GMT
Message-ID: <EUBUh.5764$>

"Marshall" <> wrote in message
> On Apr 9, 12:06 pm, "Aloha Kakuikanu" <>
> wrote:
>> On Apr 7, 11:46 am, "Marshall" <> wrote:
>> > But there is MERGE.
>> > Merge is annoying in that the way it is usually
>> > specified it takes a lot of parameters. Wikipedia
>> > gives its general form thusly:
>> > MERGE INTO table_name USING table_name ON (condition)
>> > UPDATE SET column1 = value1 [, column2 = value2 ...]
>> > INSERT column1 [, column2 ...] VALUES (value1 [, value2 ...])
>> > It's clear why all those parameters are there, but it's
>> > still a mess. Furthermore, I find that I only ever want to
>> > use it in a more restricted sense. The UPDATE part
>> > and the INSERT part are completely uncorrelated, but
>> > that's more generality than is useful. In fact, the UPDATE
>> > and the INSERT ought to be thought of as the same
>> > transformation. The UPDATE is a transform on an
>> > existing row, and the INSERT ought to be that same
>> > transform applied to the identity for that transform.
>> > The existing row must be uniquely identified (that is,
>> > a key must be specified.)
>> I personally didn't invest even a minute of my time looking into what
>> Merge is -- you post changed that:-) First of all, there is a nice
>> symmetry of insert and delete, and update is a combination of the
>> insert and delete. So why don't we go along generalizing update,
>> instead of introducing an ugly combination of insert and update?
> That is certainly one way to go. In fact a friend has suggested
> that I ought to try to design a single generalized imperative
> statement: one that can insert, update, or delete. If I am ever
> able to come up with such a statement that has any conceptual
> integrity I might be more attracted to this idea. Every attempt
> has just resulted in an uncoordinated heap of parameters and
> special cases.
> I find the symmetry and completeness of insert/delete to be
> compelling. My reaction is that update is *already* an "ugly
> combination of insert and delete."

Update may be "ugly," but it's a necessary and primative operation. It is not just a combination of insert and delete. Because a key value can only be used to identify a tuple in single database state, update provides the means to correlate the tuples in the preceding state with those in the succeeding state so that transition constraints can be enforced.

> Insert can be seen as a simple imperative generalization
> of union. It can be even the non-relational union, since
> it doesn't make much sense for imperative operations
> to alter the type of the variable. Delete is usually thought
> of as a subtraction from a set by a subset thereof determined
> by a characteristic function, but we could instead consider
> it a join (specifically, an intersection!) of a set and a subset
> of that set determined by the negation of that characteristic
> function.
> So that's what got me to thinking about trying to make a
> closer correspondence between the imperative and algebraic
> operators. I have to say I find the idea appealing.
>> Next, there are insert, update, and delete triggers. Now merge is
>> expected to be combination of insert, update; naturally insert and
>> update triggers are expected to be fired when issuing a merge
>> statement?
> Triggers, sigh. I have a poor grasp of what they are for and
> what they mean from a theoretical standpoint. What are they
> necessary for? I have seen good descriptions of how to
> express view updating as triggers that model the inverse
> view, but would that be necessary if we had full view updating
> in the engine? What are triggers necessary for? I am clear
> on the value of *notification* of changes being sent to other
> components, but *actions* are less clear to me. Especially
> "instead of" actions.

From a theoretical standpoint, triggers are necessary because there is no deterministic mechanism defined in the Relational Model to correlate tuples during a transition from one database state to the next. You can't use a key:

Given a relation PartLocation with two attributes, PART# and BIN,

Preceding State               Succeeding State
---------------------       ---------------------
PART#    BIN                PART#     BIN
--------    ----------        --------    ----------
4567        A123              4567        A313
4567        A223              4567        A332

Since PART#, BIN is the only key, which tuple in the preceding state corresponds to the tuple {PART#:4567, BIN:A313}? It can't be determined!

How then can you define a transition constraint? There are two ways: You could add a surrogate, thus guaranteeing that the values for at least one key will remain constant throughout the transition, or you could write a FOR EACH ROW trigger. Both offend my senses. One adds extraneous information, whereas the other is inherently iterative. A better solution would be to clarify the Model so that transition constraints could be defined declaratively. I suggest that three constructs be made available for each relation that would be used for declaring transition constraints: deleted, updated, and inserted. In the above example, PartLocation.updated might look like this:


--------- --------- --------- -------
4567        A313       4567        A223
4567        A332       4567        A123

Now there's no question about how the tuples correlate. With these constructs, it would be possible to define transition constraints declaratively.

>> > Example: I am maintaining a count of something
>> > on a per-foo basis. I have a table of foo ids and
>> > counts. For all foo ids that are not present in the
>> > count table, the count is 0. The sort of MERGE
>> > I would do is
>> > MERGE into FooCount f using
>> > (select id, count from NewFooCount) f'
>> > ON = f'.id
>> > UDPATE f.count = f.count + f'.count
>> > insert id, count values (id, count)
>> > So, what else has a transform that has a concept of
>> > identity and also specifies a key? An aggregate
>> > with GROUP BY.
>> > MERGE NewFooCount f' into FooCount f
>> > GROUP BY id
>> > SET f.count = sum(f.count, f'.count);
>> > It's less general, but syntactically and conceptually
>> > cleaner and simpler.
>> > Thus, MERGE can be seen as the imperative form of
>> > GROUP BY and aggregates.
>> In a way insert and delete statements change counts too. Consider
>> converting a relation into characteristic function:
>> R(x,y,z) --> RC(x,y,z,cnt)
>> where RC is an infinite relation. For every tuple (x,y,z) in the R, we
>> have tuple (x,y,z,1) in the RC. For every tuple (x,y,z) which is not
>> in the R,
>> we have tuple (x,y,z,0) in the RC.
>> Now, insert, delete, or update on the original relation is just an
>> update of the cnt column of the RC.
> Ah! That's much like how I think of boolean functions. They
> could be thought of as total functions from domain D to boolean:
> f: D -> boolean
> or they could be thought of as partial functions to 0 return
> values:
> f: D ->?
> (Here "->" indicates a total function and "->?" indicates a
> partial function.)
> Marshall
> PS. May be a repeat; posted a second time after the original post
> didn't show up for 10 minutes.
Received on Sun Apr 15 2007 - 22:16:52 CDT

Original text of this message