Re: Self Joins and optimization
Date: Sat, 05 May 2007 17:06:52 GMT
Message-ID: <MQ2%h.4456$H_.2052_at_newssvr21.news.prodigy.net>
"David Cressey" <cressey73_at_verizon.net> wrote in message
news:ZE%_h.163$D52.68_at_trndny04...
> In the "interpolation" thread, Brian has been expounding on the idea that
> he can discover algorithms that he knows to be correct, and that
> outperform
> anything an optimizer can generate. He's mentioned "self joins" and the
> idea that he can get a job doen with only one read, where the automatic
> solution generated in response to declarative code generates multiple
> passes
> through the data.
>
> My experience leads me to the opposite conclusion from that.
>
> Here's the questions I'd like the newsgroup to consider:
>
> What, if any, algorithms that are demonstrably correct are going to be
> overlooked by the optimizer, although discoverable by a human coder? Why
> would an optimizer fail to discover the possibility of eliminating
> self-joins? Are there any undecidability issues here?
>
Here's a maybe not-so-simple example:
Given a table with the columns,
EMPLOYEE#, JOB#, STEP#, MACHINE#, TX_TIME, TX_TYPE
where TX_TYPE can be either 'ON' or 'OFF' and the key is the entire heading,
compute the total labor performed to date for each job step.
Notes: (1) an employee cannot be perfoming the same job step on the same
A procedural approach would require two passes through the data: One to
compute labor per job step per employee, and one to compute the aggregate,
labor per job step.
A set-based approach would require a minimum of six passes through the data:
Determining the intervals during which each employee was performing a set of
job steps requires a left outer self-join, or two passes through the data.
Computing the cardinality of each of those sets so that labor can be
Since the final pass is common to both approaches, the procedural approach
effectively replaces five passes through the data with a single pass.
Why would the optimizer fail? Because I don't think it's possible for the
optimizer to detect every intrarelational dependency in a table, nor do I
think it would be advisable, unless it were possible to enumerate them in
the schema.
[snip] Received on Sat May 05 2007 - 19:06:52 CEST