Re: A new proof of the superiority of set oriented approaches: numerical/time serie linear interpolation

From: Cimode <cimode_at_hotmail.com>
Date: 3 May 2007 06:00:09 -0700
Message-ID: <1178197209.115692.309840_at_y5g2000hsa.googlegroups.com>


On May 3, 2:45 pm, Kevin Kirkpatrick <kvnkrkpt..._at_gmail.com> wrote:
> On May 3, 6:53 am, Cimode <cim..._at_hotmail.com> wrote:> On May 3, 1:41 pm, Kevin Kirkpatrick <kvnkrkpt...@gmail.com> wrote:> Not sure if it advances the conversation, but I thought I'd look into
> > > this a bit further.
> > > I decided to look into this a bit further. Obviously, no by-hand
> > > iteration-based code is going to beat a declarative solution executed
> > > by a "sufficiently intelligent" optimizer. Yet, I think the query can
> > > be expressed much more concisely (allowing the optimizer a much better
> > > chance at an efficient execution) than what Cimode posted.
>
> > The query can probably be seriously improved while remaining in set
> > oriented approach.
>
> Probably - I've found that my habits in writing SQL have a bias
> towards using joins where set operations would be much more sensible.
> I have no doubt my solution can be improved upon by a better query
> (again, Vadim's commentary points in this direction); however, I'm
> quite skeptical that a procedural approach exists which could even
> compete.
>
>
>
> > [Snipped]
> > Kevin,
>
> > It seems you diverted from the initial example given.
>
> > I suggest to redo the initial experience without changing the initial
> > facts (I purposely posted it for that and without changing the
> > incorrect design of it - that's how it was given to me), it would
> > beneficial to retake the exact same canvas than the one proposed.
> > Ideally I will try to adapt Vadim's regressive solution. Finally I
> > believe it would be difficult to consider such experiment as
> > conclusive without some kind of stress tool simulating concurrent
> > random activity while running the query.
>
> > Then and only then we could imagine some experiment to see how current
> > SQL DBMS's behave...(ORACLE, SQL Server,...)..
>
> > Regards...
> > It seems you diverted from the initial example given.
>
> > I suggest to redo the initial experience without changing the initial
> > facts (I purposely posted it for that and without changing the
> > incorrect design of it - that's how it was given to me), it would
> > beneficial to retake the exact same canvas than the one proposed.
> > Ideally I will try to adapt Vadim's regressive solution. Finally I
> > believe it would be difficult to consider such experiment as
> > conclusive without some kind of stress tool simulating concurrent
> > random activity while running the query.
>
> > Then and only then we could imagine some experiment to see how current
> > SQL DBMS's behave...(ORACLE, SQL Server,...)..
>
> > Regards...
>
> Cimode - I confess - I read the problem statement but did not apply
> the forumla you presented (I'd skimmed over the formula, assuming it
> was a straight linear interpolation formula that I could easily intuit
> from the problem). Upon re-reading the OP, I realized that the
> formula you presented is not valid. I'd been assuming that the
> station ID's corresponded to an ordering of stations, and the
> distances were distances between stations. So in your example, I saw
> the problem as:
>
> The train was at ID=24 at 6:29, passed through ID 25, 26, 27, 28, 29,
> and 30; then arrived at ID=31 at 6:33 (having traveled 2.078 miles).
> Given the distances between stations, compute the interpolated arrival
> times for stations 25-29.
>
> However, if "what the boss says, gos" and the boss says that this
> formula is correct (and believe me, I've got a lot of experience with
> "tune the process, and prove it's correct by virtue of yielding
> identical results as the original" translating into a need to reverse-
> engineer hideously incorrect "solutions" that nobody would allow me to
> correct), then here's the tweaked query (which is actually simpler):
>
> select * from (
> with hd as (select n, d, t, rank() over (order by n) rnk
> from k1
> where t is not null),
> rng as (select /*+ ordered use_nl(k1) index(k1) */
> s.n ns, e.n ne, s.t ts, e.t te, sum(k1.d) rd
> from hd s, hd e, k1
> where e.rnk = s.rnk+1
> and k1.n > s.n and k1.n < e.n
> group by s.n , e.n , s.t , e.t
> )
> select /*+ ordered use_nl(md) index(md) use_nl(prev) index(prev) */
> md.n, md.d, ts+(md.d / rd)*(te-ts) t
> from rng, k1 md
> where md.n > rng.ns and md.n < rng.ne
> union
> select n, d, t from hd
> order by n);
>
> before:
> 24 0.289 1/1/2000 6:29:00 AM
> 25 0.193
> 26 0.299
> 27 0.131
> 28 0.444
> 29 0.16
> 30 0.665
> 31 0.186 1/1/2000 6:33:00 AM
>
> after:
> 24 0.289 1/1/2000 6:29:00 AM
> 25 0.193 1/1/2000 6:29:24 AM
> 26 0.299 1/1/2000 6:29:38 AM
> 27 0.131 1/1/2000 6:29:17 AM
> 28 0.444 1/1/2000 6:29:56 AM
> 29 0.16 1/1/2000 6:29:20 AM
> 30 0.665 1/1/2000 6:30:24 AM
> 31 0.186 1/1/2000 6:33:00 AM
>
> When presenting this solution, I'd be sure to point out to the boss
> how interesting it is that the train arrived at station 27 before 25
> and 26... but that's just me...
I had already noticed the inadequacies of both the formula and data but that was not what I wanted to discuss. As I said I pick up the example from a another helping board and posted it as it is to trigger a discussion about how interpolation could be a way to handle missing data.

Regards... Received on Thu May 03 2007 - 15:00:09 CEST

Original text of this message