Re: MERGE as the imperative form of aggregation

From: Brian Selzer <brian_at_selzer-software.com>
Date: Mon, 16 Apr 2007 03:16:52 GMT
Message-ID: <EUBUh.5764$5e2.3469_at_newssvr11.news.prodigy.net>


"Marshall" <marshall.spight_at_gmail.com> wrote in message news:1176652634.000524.188710_at_w1g2000hsg.googlegroups.com...
> On Apr 9, 12:06 pm, "Aloha Kakuikanu" <aloha.kakuik..._at_yahoo.com>
> wrote:
>> On Apr 7, 11:46 am, "Marshall" <marshall.spi..._at_gmail.com> wrote:
>>
>>
>>
>> > But there is MERGE.
>>
>> > Merge is annoying in that the way it is usually
>> > specified it takes a lot of parameters. Wikipedia
>> > gives its general form thusly:
>>
>> > MERGE INTO table_name USING table_name ON (condition)
>> > WHEN MATCHED THEN
>> > UPDATE SET column1 = value1 [, column2 = value2 ...]
>> > WHEN NOT MATCHED THEN
>> > INSERT column1 [, column2 ...] VALUES (value1 [, value2 ...])
>>
>> > It's clear why all those parameters are there, but it's
>> > still a mess. Furthermore, I find that I only ever want to
>> > use it in a more restricted sense. The UPDATE part
>> > and the INSERT part are completely uncorrelated, but
>> > that's more generality than is useful. In fact, the UPDATE
>> > and the INSERT ought to be thought of as the same
>> > transformation. The UPDATE is a transform on an
>> > existing row, and the INSERT ought to be that same
>> > transform applied to the identity for that transform.
>> > The existing row must be uniquely identified (that is,
>> > a key must be specified.)
>>
>> I personally didn't invest even a minute of my time looking into what
>> Merge is -- you post changed that:-) First of all, there is a nice
>> symmetry of insert and delete, and update is a combination of the
>> insert and delete. So why don't we go along generalizing update,
>> instead of introducing an ugly combination of insert and update?
>
> That is certainly one way to go. In fact a friend has suggested
> that I ought to try to design a single generalized imperative
> statement: one that can insert, update, or delete. If I am ever
> able to come up with such a statement that has any conceptual
> integrity I might be more attracted to this idea. Every attempt
> has just resulted in an uncoordinated heap of parameters and
> special cases.
>
> I find the symmetry and completeness of insert/delete to be
> compelling. My reaction is that update is *already* an "ugly
> combination of insert and delete."
>

Update may be "ugly," but it's a necessary and primative operation. It is not just a combination of insert and delete. Because a key value can only be used to identify a tuple in single database state, update provides the means to correlate the tuples in the preceding state with those in the succeeding state so that transition constraints can be enforced.

> Insert can be seen as a simple imperative generalization
> of union. It can be even the non-relational union, since
> it doesn't make much sense for imperative operations
> to alter the type of the variable. Delete is usually thought
> of as a subtraction from a set by a subset thereof determined
> by a characteristic function, but we could instead consider
> it a join (specifically, an intersection!) of a set and a subset
> of that set determined by the negation of that characteristic
> function.
>
> So that's what got me to thinking about trying to make a
> closer correspondence between the imperative and algebraic
> operators. I have to say I find the idea appealing.
>
>
>> Next, there are insert, update, and delete triggers. Now merge is
>> expected to be combination of insert, update; naturally insert and
>> update triggers are expected to be fired when issuing a merge
>> statement?
>
> Triggers, sigh. I have a poor grasp of what they are for and
> what they mean from a theoretical standpoint. What are they
> necessary for? I have seen good descriptions of how to
> express view updating as triggers that model the inverse
> view, but would that be necessary if we had full view updating
> in the engine? What are triggers necessary for? I am clear
> on the value of *notification* of changes being sent to other
> components, but *actions* are less clear to me. Especially
> "instead of" actions.
>

From a theoretical standpoint, triggers are necessary because there is no deterministic mechanism defined in the Relational Model to correlate tuples during a transition from one database state to the next. You can't use a key:

Given a relation PartLocation with two attributes, PART# and BIN,

Preceding State               Succeeding State
---------------------       ---------------------
PART#    BIN                PART#     BIN
--------    ----------        --------    ----------
4567        A123              4567        A313
4567        A223              4567        A332

Since PART#, BIN is the only key, which tuple in the preceding state corresponds to the tuple {PART#:4567, BIN:A313}? It can't be determined!

How then can you define a transition constraint? There are two ways: You could add a surrogate, thus guaranteeing that the values for at least one key will remain constant throughout the transition, or you could write a FOR EACH ROW trigger. Both offend my senses. One adds extraneous information, whereas the other is inherently iterative. A better solution would be to clarify the Model so that transition constraints could be defined declaratively. I suggest that three constructs be made available for each relation that would be used for declaring transition constraints: deleted, updated, and inserted. In the above example, PartLocation.updated might look like this:

PartLocation.updated



PART# BIN PART#' BIN'
--------- --------- --------- -------
4567        A313       4567        A223
4567        A332       4567        A123

Now there's no question about how the tuples correlate. With these constructs, it would be possible to define transition constraints declaratively.

>
>> > Example: I am maintaining a count of something
>> > on a per-foo basis. I have a table of foo ids and
>> > counts. For all foo ids that are not present in the
>> > count table, the count is 0. The sort of MERGE
>> > I would do is
>>
>> > MERGE into FooCount f using
>> > (select id, count from NewFooCount) f'
>> > ON f.id = f'.id
>> > WHEN MATCHED THEN
>> > UDPATE f.count = f.count + f'.count
>> > WHEN NOT MATCHED THEN
>> > insert id, count values (id, count)
>>
>> > So, what else has a transform that has a concept of
>> > identity and also specifies a key? An aggregate
>> > with GROUP BY.
>>
>> > MERGE NewFooCount f' into FooCount f
>> > GROUP BY id
>> > SET f.count = sum(f.count, f'.count);
>>
>> > It's less general, but syntactically and conceptually
>> > cleaner and simpler.
>>
>> > Thus, MERGE can be seen as the imperative form of
>> > GROUP BY and aggregates.
>>
>> In a way insert and delete statements change counts too. Consider
>> converting a relation into characteristic function:
>>
>> R(x,y,z) --> RC(x,y,z,cnt)
>>
>> where RC is an infinite relation. For every tuple (x,y,z) in the R, we
>> have tuple (x,y,z,1) in the RC. For every tuple (x,y,z) which is not
>> in the R,
>> we have tuple (x,y,z,0) in the RC.
>>
>> Now, insert, delete, or update on the original relation is just an
>> update of the cnt column of the RC.
>
> Ah! That's much like how I think of boolean functions. They
> could be thought of as total functions from domain D to boolean:
>
> f: D -> boolean
>
> or they could be thought of as partial functions to 0 return
> values:
>
> f: D ->?
>
> (Here "->" indicates a total function and "->?" indicates a
> partial function.)
>
>
> Marshall
>
> PS. May be a repeat; posted a second time after the original post
> didn't show up for 10 minutes.
>
Received on Mon Apr 16 2007 - 05:16:52 CEST

Original text of this message