Re: A new proof of the superiority of set oriented approaches: numerical/time serie linear interpolation

From: Brian Selzer <brian_at_selzer-software.com>
Date: Thu, 03 May 2007 12:22:28 GMT
Message-ID: <8uk_h.20772$Um6.11010_at_newssvr12.news.prodigy.net>


"Cimode" <cimode_at_hotmail.com> wrote in message news:1178184812.726479.264240_at_l77g2000hsb.googlegroups.com...
> On May 3, 10:12 am, "Brian Selzer" <b..._at_selzer-software.com> wrote:

>> "Cimode" <cim..._at_hotmail.com> wrote in message
>>
>> news:1178173985.960152.227740_at_l77g2000hsb.googlegroups.com...
>> [snip]
>>
>> >> Eliminating self-joins is beneficial regardless of the implementation.
>> > Beneficial to what ?  Performance ?  What performance ? Response
>> > time ? concurrency ? cost of administration ?  (Please answer these
>> > precise questions).  Do you realize that despite some lengthy (maybe
>> > worthy) attempts at clarifying your point, some people here have no
>> > clue what you are talking about.
>>
>> I'll try to be as concise as possible.  Eliminating multiple self-joins
>> reduces the number of passes through the data required to answer a query.

> So you are taking about physical IO's. Right? Can you explain to me
> how you reduce physical IO's when itterating on the basis of number of
> lines multiplied by the number of operations required.

With a cursor, each row is read once. With a multiple self-join set-based operation, each row is read once and then once per join. So for six joins, that's seven reads instead of one.

> Can you
> explain to me how

>

>> This reduces (1) execution time,

85% fewer reads, fewer compares.

> How ? Could you establish that execution time is reduced in any
> concurrent environment.
>

You would have to sample execution under different loads and develop an average, but 85% fewer reads....

> (2) the duration that locks are held,
> How ? What kind of lock ? What isolation level are you refering to ?
>

I would use update locks. That would make the isolation level moot. I would use update locks in the set-based solution too--reduces the possibility of a deadlock.

>> (3) the amount of I/O required,

> What IO's ? physical ? logical IO?

>

Both. 85% fewer reads.

>> (4) CPU utilization, and
> Do you mean percentage of usage ?

Total CPU cycles.

>
>> (5) the amount of
>> memory required.  The reduction in execution time improves response time.

> I am not sure I understand I understand this last sentence. Are you
> aware that most of the time used in a process is mostly (compile +
> binding). Execution is a minor part of the resources used.
>

Depends on the implementation. In Sql Server, for example, execution plans are cached and reused. The lion share of compile + binding time is the generation of the execution plan, and if you compare execution plans, you'll find that with the cursor the number of operations that bind to database objects is smaller than with the solution with multiple self-joins. That means fewer index selections, fewer potential plans, etc.

>> The reduction in the duration that locks are held improves concurrency.

> What locks ? How does the duration of a lock improve concurrency in
> most modern systems. Are your familiar with lock sharing ? Do you
> realize that most SQL DBMS do mutualize object physical accesses in a
> multi transactional context?
>

I am indeed familiar with lock sharing. Locks, however, use resources, and update locks block other updates, so reducing the duration of locks is a good thing.

>> The
>> reduction in I/O required, CPU utilization, and memory required improves
>> overall throughput.  I would venture a guess that the cost of 
>> administration
>> or at least of maintenance would be reduced as well: I personally think 
>> it's
>> a lot easier to follow a simple fetch loop followed by a simple UPDATE 
>> FROM
>> than a complicated set-based solution with four outer theta-joins, an 
>> outer
>> equijoin, an inner equijoin, an aggregation and a union.

> I still do not understand how having one view would be more cumbersome
> to maintain then several operations (update, creation of additional
> objects)...
>

It's not so much more cumbersome as it is easier to follow. A few simple operations within the body of a fetch loop and then a simple update. My eyes crossed a couple times when I was trying to figure out what your statement did.

> Cursors are the cause of most timeouts I face daily. (whether due to
> dealock, excessive HD/RAM swapping or exlusive object locking).
>

That is the result of poor design. The instances when a cursor is indicated are rare. In one system I wrote that has over 300 stored procedures and triggers, there are only 6 cursors, and two were used because Sql Server doesn't support FOR EACH ROW triggers, so I had to improvise in order to implement a transition constraint.

> Regards...

>> [snip]
>

>
Received on Thu May 03 2007 - 14:22:28 CEST

Original text of this message