Re: A new proof of the superiority of set oriented approaches: numerical/time serie linear interpolation

From: Brian Selzer <brian_at_selzer-software.com>
Date: Sun, 29 Apr 2007 14:33:50 -0400
Message-ID: <jy5Zh.4247$uJ6.3542_at_newssvr17.news.prodigy.net>


"Cimode" <cimode_at_hotmail.com> wrote in message news:1177863186.258396.136950_at_y5g2000hsa.googlegroups.com...
> On 29 avr, 16:21, "Brian Selzer" <b..._at_selzer-software.com> wrote:
>> I would have done it differently. I would have used a cursor to generate
> [Snipped]
> I respectfully think you miss the point of the entire thread.
> Thinking procedurally (use of cursors) is inherently inferior to set
> oriented approaches independently of a specific implementation and
> regardless of how one defines performance as response time, ability to
> run well in concurrent query environment and so forth.
>

You're right. As a rule, set-based thinking is superior to procedural thinking. But there are times when it becomes necessary to break the rules. It is critical to think procedurally in order to understand the execution plans that an optimizer produces. Consider a merge join: both inputs are sorted, and then a single pass is made through each result--similar to an old COBOL program that updates a master file from a transaction file. If you don't understand what each step in an execution plan does and how it works, then how can you possibly optimize a query?

When it becomes clear that an iterative approach could eliminate several self-joins, thus decreasing the duration that locks are held, reducing the impact on temporary storage and decreasing cpu utilization, then it might be a good idea to break the rules. You mentioned that switching from a poorly designed procedural solution to one that is set-based dropped the time it took for the process to run from 10 hours to 30 minutes. That's great! I merely suggest that reintroducing a cursor to eliminate the several self-joins in your solution might further decrease the time it would take for the process to run--without any negative impact on concurrency or overall performance.

>
>> I would bet the process would run much faster than your solution. In
>> rare
>> cases, cursors can actually increase performance.
> I don't think one can establish that for a certainty.

The proof is in the pudding. I suggest you try it. I think you'll be pleasantly surprised. Through judicious use of cursors, I was able to transform a set-based process that was taking 42 hours to complete into a hybrid solution that ran in just under 18 minutes. I also transformed a set-based solution that was only able to process 14,000 transactions per hour into a hybrid solution that was able to process over 150,000 transactions per hour. Key to both of these improvements was the elimination of multiple self-joins from the execution plan.

> If you ask me I tend to think that even in direct image systems, set
> oriented are superior because they give less time outs in highly
> concurrent environment. I had the opportunity to take away several
> hundred cursors based processes that would run faster on development
> environments but would totally time out in heavy load environment. I
> have not encoutered so far any cursor based solution that was not
> imposed by a poor design....

That has also been my experience--almost all of the time. Cursors should only be used as a last resort, and definitely not by the uninitiated. A thorough, intimate understanding of not only the database, but also the underlying implementation should be a prerequisite for breaking the rules.

> ...And I have not found any satifying cursor
> based solution....

It should always put a bad taste in your mouth, but that doesn't mean you should dismiss it without due consideration.

> ...Often the use of cursor and (procedural) dictates
> unacceptable tarde off where evrything is given up for reponse time
> sake...
>

I agree. But eliminating self-joins doesn't trade anything off. Quite the opposite: it can improve both throughput and concurrency.

> One should keep in mind that response time is only one of several
> criterias that may define performance. Defining performance only
> according to response time is simply a consequence of using poorly
> designed direct image systems.
> [Snipped implementation specific diret image internals]
>
>> I generally don't advocate the use of cursors, but sometimes the
>> optimizer
>> just isn't smart enough to generate what I know to be a minimum plan, so
>> in
>> those cases I use them.
> Speaking of optimizers in the case of direct image systems is like
> asking a cow to run a race. Only a TRDBMS that would correctly
> separate logicala and physical could inherentlyy allow optimization.
>

Too bad none exists today.

>
> Regards...
>
Received on Sun Apr 29 2007 - 20:33:50 CEST

Original text of this message