Re: A new proof of the superiority of set oriented approaches: numerical/time serie linear interpolation

From: Kevin Kirkpatrick <kvnkrkptrck_at_gmail.com>
Date: 3 May 2007 04:41:25 -0700
Message-ID: <1178192485.028723.208580_at_l77g2000hsb.googlegroups.com>


On May 3, 4:33 am, Cimode <cim..._at_hotmail.com> wrote:
> On May 3, 10:12 am, "Brian Selzer" <b..._at_selzer-software.com> wrote:> "Cimode" <cim...@hotmail.com> wrote in message
>
> >news:1178173985.960152.227740_at_l77g2000hsb.googlegroups.com...
> > [snip]
>
> > >> Eliminating self-joins is beneficial regardless of the implementation.
> > > Beneficial to what ? Performance ? What performance ? Response
> > > time ? concurrency ? cost of administration ? (Please answer these
> > > precise questions). Do you realize that despite some lengthy (maybe
> > > worthy) attempts at clarifying your point, some people here have no
> > > clue what you are talking about.
>
> > I'll try to be as concise as possible. Eliminating multiple self-joins
> > reduces the number of passes through the data required to answer a query.
>
> So you are taking about physical IO's. Right? Can you explain to me
> how you reduce physical IO's when itterating on the basis of number of
> lines multiplied by the number of operations required. Can you
> explain to me how
>
> > This reduces (1) execution time,
>
> How ? Could you establish that execution time is reduced in any
> concurrent environment.
>
> (2) the duration that locks are held,
> How ? What kind of lock ? What isolation level are you refering to ?
>
> > (3) the amount of I/O required,
>
> What IO's ? physical ? logical IO?
>
> > (4) CPU utilization, and
>
> Do you mean percentage of usage ?
>
> > (5) the amount of
> > memory required. The reduction in execution time improves response time.
>
> I am not sure I understand I understand this last sentence. Are you
> aware that most of the time used in a process is mostly (compile +
> binding). Execution is a minor part of the resources used.
>
> > The reduction in the duration that locks are held improves concurrency.
>
> What locks ? How does the duration of a lock improve concurrency in
> most modern systems. Are your familiar with lock sharing ? Do you
> realize that most SQL DBMS do mutualize object physical accesses in a
> multi transactional context?
>
> > The
> > reduction in I/O required, CPU utilization, and memory required improves
> > overall throughput. I would venture a guess that the cost of administration
> > or at least of maintenance would be reduced as well: I personally think it's
> > a lot easier to follow a simple fetch loop followed by a simple UPDATE FROM
> > than a complicated set-based solution with four outer theta-joins, an outer
> > equijoin, an inner equijoin, an aggregation and a union.
>
> I still do not understand how having one view would be more cumbersome
> to maintain then several operations (update, creation of additional
> objects)...
>
> Cursors are the cause of most timeouts I face daily. (whether due to
> dealock, excessive HD/RAM swapping or exlusive object locking).
>
> Regards...
>
>
>
> > [snip]- Hide quoted text -
>
> - Show quoted text -

Not sure if it advances the conversation, but I thought I'd look into this a bit further.
I decided to look into this a bit further. Obviously, no by-hand iteration-based code is going to beat a declarative solution executed by a "sufficiently intelligent" optimizer. Yet, I think the query can be expressed much more concisely (allowing the optimizer a much better
chance at an efficient execution) than what Cimode posted.

To test this theory, I created an N-row table in Oracle called K1:

K1 (n int, d float, t time) -- "Station <n> is <d> miles from previous station and recorded the train at time <t>"

The table was populated as follows (given some number N): <n = sequence (2..N-1),
d = random float between 0 and 1,
t = "coin-toss", if "heads" then current_time + <n>minutes + random(0,1) minutes, else NULL>
... plus 2 boundary tuples:
<1,0,current_time>
<N, 1, current_time+N minutes>

A sample N=10 table:

1	0	5/2/2007 12:50:13 PM
2	0.47
3	0.94	5/2/2007 12:53:38 PM
4	0.65	5/2/2007 12:54:51 PM
5	0.64
6	0.98
7	0.6
8	0.65
9	0.12
10	1	5/2/2007 1:00:13 PM

I then wrote a query which produced this output for the above K1 (which can be checked for accuracy):

1	0	5/2/2007 12:50:13 PM
2	0.47	5/2/2007 12:51:21 PM
3	0.94	5/2/2007 12:53:38 PM
4	0.65	5/2/2007 12:54:51 PM
5	0.64	5/2/2007 12:55:43 PM
6	0.98	5/2/2007 12:57:02 PM
7	0.6	5/2/2007 12:57:50 PM
8	0.65	5/2/2007 12:58:43 PM
9	0.12	5/2/2007 12:58:52 PM
10	1	5/2/2007 1:00:13 PM

I then loaded K1 with N=1000000 (1 million), and submitted the query, storing all results into a new table, K2. The index was built on K1(n) in 5 seconds, and the 1-million-row query ran in 1 minute 42 seconds, giving a total runtime of 1 minute 47 seconds. Based on 50k, 100k, and 500k tests, the performance is O(N log N), so you can project this to any K1 tablesize you'd like.

I'll forgo a complete proof of correctness, and will simply note that Card(K2) = 1000000, and that a spotcheck comparison of K1 and the resulting K2 using the 10 ID's between 400000 and 400009 yields:

In K1:

400000	0.22
400001	0.53	2/4/2008 8:10:19 AM
400002	0.17
400003	0.27	2/4/2008 8:11:50 AM
400004	1	2/4/2008 8:12:57 AM
400005	0.63
400006	0.97
400007	0.37
400008	0.56
400009	0.36	2/4/2008 8:18:06 AM

and in K2:
400000	0.22	2/4/2008 8:08:43 AM
400001	0.53	2/4/2008 8:10:19 AM
400002	0.17	2/4/2008 8:10:54 AM
400003	0.27	2/4/2008 8:11:50 AM
400004	1	2/4/2008 8:12:57 AM
400005	0.63	2/4/2008 8:14:04 AM
400006	0.97	2/4/2008 8:15:48 AM
400007	0.37	2/4/2008 8:16:28 AM
400008	0.56	2/4/2008 8:17:28 AM
400009	0.36	2/4/2008 8:18:06 AM


2 caveats before posting the query.
Caveat1: Remember my words, "sufficiently intelligent" optimizer? The Oracle optimizer (aka CBO) is good, but comes up short in approaching theta-joins. Here, the CBO consistently fails to realize that joining K1 (n) against some relation R (n_start, n_end, ...) with criteria "WHERE n BETWEEN n_start AND n_end" is performed in O(r_n log k1_n) by indexing K1(n) then scanning R and finding matching rows in K1 using the index.
Lacking this insight, the Oracle CBO consistently chose to join K1 with R using a O(r_n*k_n) approach: match each row in K1 with every row in R where n >= n_start, then scan the results for all rows where n <=n_end. Thus, I needed to manually build an index on K1(n), and force the CBO to use the O(n logn) algorithm via "hints" (embedded comments that force the CBO's hand), e.g. /*+ ordered use_nl(md) index(md) */.

Caveat2:
W/ respect to the window-fucntion "RANK() over(ORDER BY n)", I'd be curious to see a solution that returns the same results as in the "hd" subquery, yet runs in O(n log n), using only standard relational operators. I could only come up with the following O(n^2) version: select k1.n, k1.d, k1.t, count(*) rnk
from k1, k1 t2
where k1.t is not null
  and t2.t is not null
  and t2.n <= k1.n
  group by k1.n, k1.d, k1.t;

That said, here is the query:



with hd as (select n, d, t, rank() over (order by n) rnk
            from    k1
            where   t is not null),
     rng as (select  /*+ ordered use_nl(k1) */
                    s.n ns, e.n ne, s.t ts, e.t te, sum(k1.d) rd
             from   hd s, hd e, k1
             where  e.rnk = s.rnk+1
               and  k1.n > s.n and k1.n <= e.n
             group  by s.n , e.n , s.t , e.t
               )
select  /*+ ordered use_nl(md) use_nl(prev) */
        md.n, md.d, ts+(sum(prev.d) / rd)*(te-ts) t
from rng, k1 md, k1 prev
where md.n > rng.ns and md.n < rng.ne
  and prev.n > rng.ns and prev.n <= md.n group by md.n, md.d, rd, ts, te
union
select n, d, t from hd;

I should add another bonus of declarative programming - the ease of which I can take advantage of concurrency. The query took 107 seconds. Using a simple parallel hint of degree 8 the query ran in 45 seconds (indexing K1 with parallel degree 8 took about 2 seconds). Degree 8 happens to be the max concurrency of the dev box used for this test; but that's a fraction of what our production machine could do (were I to implement this in our production system with full allocation of all parallel resources, it'd conceivably run in 15 seconds or less).

Brian - does this compare favorably with your procedural approach? Received on Thu May 03 2007 - 13:41:25 CEST

Original text of this message