Re: A new proof of the superiority of set oriented approaches: numerical/time serie linear interpolation

From: Kevin Kirkpatrick <kvnkrkptrck_at_gmail.com>
Date: 3 May 2007 05:45:36 -0700
Message-ID: <1178196336.567341.221580_at_c35g2000hsg.googlegroups.com>


On May 3, 6:53 am, Cimode <cim..._at_hotmail.com> wrote:
> On May 3, 1:41 pm, Kevin Kirkpatrick <kvnkrkpt..._at_gmail.com> wrote:> Not sure if it advances the conversation, but I thought I'd look into
> > this a bit further.
> > I decided to look into this a bit further. Obviously, no by-hand
> > iteration-based code is going to beat a declarative solution executed
> > by a "sufficiently intelligent" optimizer. Yet, I think the query can
> > be expressed much more concisely (allowing the optimizer a much better
> > chance at an efficient execution) than what Cimode posted.
>
> The query can probably be seriously improved while remaining in set
> oriented approach.
>
Probably - I've found that my habits in writing SQL have a bias towards using joins where set operations would be much more sensible. I have no doubt my solution can be improved upon by a better query (again, Vadim's commentary points in this direction); however, I'm quite skeptical that a procedural approach exists which could even compete.

> [Snipped]
> Kevin,
>
> It seems you diverted from the initial example given.
>
> I suggest to redo the initial experience without changing the initial
> facts (I purposely posted it for that and without changing the
> incorrect design of it - that's how it was given to me), it would
> beneficial to retake the exact same canvas than the one proposed.
> Ideally I will try to adapt Vadim's regressive solution. Finally I
> believe it would be difficult to consider such experiment as
> conclusive without some kind of stress tool simulating concurrent
> random activity while running the query.
>
> Then and only then we could imagine some experiment to see how current
> SQL DBMS's behave...(ORACLE, SQL Server,...)..
>
> Regards...

> It seems you diverted from the initial example given.
>
> I suggest to redo the initial experience without changing the initial
> facts (I purposely posted it for that and without changing the
> incorrect design of it - that's how it was given to me), it would
> beneficial to retake the exact same canvas than the one proposed.
> Ideally I will try to adapt Vadim's regressive solution. Finally I
> believe it would be difficult to consider such experiment as
> conclusive without some kind of stress tool simulating concurrent
> random activity while running the query.
>
> Then and only then we could imagine some experiment to see how current
> SQL DBMS's behave...(ORACLE, SQL Server,...)..
>
> Regards...

Cimode - I confess - I read the problem statement but did not apply the forumla you presented (I'd skimmed over the formula, assuming it was a straight linear interpolation formula that I could easily intuit from the problem). Upon re-reading the OP, I realized that the formula you presented is not valid. I'd been assuming that the station ID's corresponded to an ordering of stations, and the distances were distances between stations. So in your example, I saw the problem as:

The train was at ID=24 at 6:29, passed through ID 25, 26, 27, 28, 29, and 30; then arrived at ID=31 at 6:33 (having traveled 2.078 miles). Given the distances between stations, compute the interpolated arrival times for stations 25-29.

However, if "what the boss says, gos" and the boss says that this formula is correct (and believe me, I've got a lot of experience with "tune the process, and prove it's correct by virtue of yielding identical results as the original" translating into a need to reverseengineer  hideously incorrect "solutions" that nobody would allow me to correct), then here's the tweaked query (which is actually simpler):

select * from (
with hd as (select n, d, t, rank() over (order by n) rnk

            from    k1
            where   t is not null),
     rng as (select  /*+ ordered use_nl(k1) index(k1) */
                    s.n ns, e.n ne, s.t ts, e.t te, sum(k1.d) rd
             from   hd s, hd e, k1
             where  e.rnk = s.rnk+1
               and  k1.n > s.n and k1.n < e.n
             group  by s.n , e.n , s.t , e.t
               )
select  /*+ ordered use_nl(md) index(md) use_nl(prev) index(prev) */
        md.n, md.d, ts+(md.d / rd)*(te-ts) t
from rng, k1 md
where md.n > rng.ns and md.n < rng.ne
union
select n, d, t from hd
order by n);

before:

24	0.289	1/1/2000 6:29:00 AM
25	0.193
26	0.299
27	0.131
28	0.444
29	0.16
30	0.665
31	0.186	1/1/2000 6:33:00 AM

after:
24	0.289	1/1/2000 6:29:00 AM
25	0.193	1/1/2000 6:29:24 AM
26	0.299	1/1/2000 6:29:38 AM
27	0.131	1/1/2000 6:29:17 AM
28	0.444	1/1/2000 6:29:56 AM
29	0.16	1/1/2000 6:29:20 AM
30	0.665	1/1/2000 6:30:24 AM
31	0.186	1/1/2000 6:33:00 AM

When presenting this solution, I'd be sure to point out to the boss how interesting it is that the train arrived at station 27 before 25 and 26... but that's just me... Received on Thu May 03 2007 - 14:45:36 CEST

Original text of this message