| Oracle FAQ | Your Portal to the Oracle Knowledge Grid | |
Home -> Community -> Usenet -> comp.databases.theory -> Re: Self Joins and optimization
"David Cressey" <cressey73_at_verizon.net> wrote in message
news:uBj0i.14554$rm.10970_at_trndny03...
[snip]
>
> Brian,
>
> I'm not ignoring your example. I'm trying to digest it, a little bit at
> a
> time. As you said, it's not quite so simple. I have no response at this
> time, other than vague philosophical generalities. I think it would be
> insulting to offer such generalities in response to your input of code and
> data.
>
> So I'm hoping, before too long, to have something more specific to say.
>
> In the meantime, your example provides me much food for thought about
> temporal data, transition versus state, and simultaneity.
>
Temporal data can really make you tear what little hair you have left out.
I bought Date, Darwin and Lorentzos' /Temporal Data and the Relational
Model/ recently, and found that there really isn't much in it that I hadn't
already figured out on my own. I also think some of their arguments
are lacking. Their arguments against using NOW for present data are totally
unfounded, and I find their solution, splitting current and historical data
into
separate relvars abhorrent. First of all, an interval of the form [NOW,
d16]
does not make any sense in a closed world. Secondly, why couldn't there
be different kinds of intervals: definite, indefinite start, indefinite end,
and
universal. (They certainly allow for different granularities and
precision.)
In a closed world, an indefinite interval would have to refer to either
positive
or negative infinity, and even given the discreteness assumption, the number
of possible timestamps is countably infinite. The introduction of
indefinite
intervals would eliminate the need for separate relvars, and would also
allow
invariant information to be recorded in the same relvar. The presence of the
interval (-infinity,+infinity) would clearly indicate that the information
represented by the tuple applies for all time, and as such, must never
change.
This could even be enforced by the engine. So their solution is the result
of
a faulty argument and a poorly designed interval data type.
Sorry for diverging from the topic at hand. Below is a procedural solution. It took a bit of extra time to write, because there are so many conditions. Also, in order to make the set-based solution easier to understand, I'm reincluding it here further annotated for comparison with the procedural solution.
/*
Need to cache CURRENT_TIMESTAMP so that apples and apples can be compared
*/
DECLARE @CURRENT_TIME DATETIME SET @CURRENT_TIME = CURRENT_TIMESTAMP DECLARE @NUMJOBS INT DECLARE @PREV_EMPLOYEE INT, @NEXT_EMPLOYEE INT, @PREV_TX_TIME DATETIME, @NEXT_TX_TIME DATETIME, @PREV_TX_TYPE CHAR(3), @NEXT_TX_TYPE CHAR(3), @PREV_JOB INT, @NEXT_JOB INT, @PREV_STEP INT, @NEXT_STEP INT,
|
V
SET @X = CURSOR LOCAL STATIC FORWARD_ONLY FOR
SELECT EMPLOYEE#, TX_TIME, TX_TYPE
FROM TX
ORDER BY EMPLOYEE#, TX_TIME, TX_TYPE DESC
OPEN @X
FETCH @X INTO @PREV_EMPLOYEE, @PREV_TX_TIME, @PREV_TX_TYPE
IF @@FETCH_STATUS = 0
BEGIN
IF @PREV_TX_TYPE = 'ON'
BEGIN
SET @NUMJOBS = 1
FETCH @X INTO @NEXT_EMPLOYEE, @NEXT_TX_TIME, @NEXT_TX_TYPE
WHILE @@FETCH_STATUS = 0
BEGIN
IF @PREV_EMPLOYEE = @NEXT_EMPLOYEE
BEGIN
IF @PREV_TX_TIME < @NEXT_TX_TIME
IF @NUMJOBS > 0
INSERT @I VALUES (@PREV_EMPLOYEE, @PREV_TX_TIME, @NEXT_TX_TIME,
@NUMJOBS)
SELECT @PREV_EMPLOYEE = @NEXT_EMPLOYEE,
@PREV_TX_TIME = @NEXT_TX_TIME,
@PREV_TX_TYPE = @NEXT_TX_TYPE
|
V
SET @X = CURSOR LOCAL FORWARD_ONLY FOR
SELECT EMPLOYEE#, JOB#, STEP#, MACHINE#, TX_TIME, TX_TYPE
FROM TX
ORDER BY EMPLOYEE#, JOB#, STEP#, MACHINE#, TX_TIME, TX_TYPE DESC
OPEN @X
FETCH @X INTO @PREV_EMPLOYEE, @PREV_JOB, @PREV_STEP, @PREV_MACHINE,
@PREV_TX_TIME, @PREV_TX_TYPE
WHILE @@FETCH_STATUS = 0
BEGIN
FETCH @X INTO @NEXT_EMPLOYEE, @NEXT_JOB, @NEXT_STEP, @NEXT_MACHINE,
@NEXT_TX_TIME, @NEXT_TX_TYPE
IF (@@FETCH_STATUS != 0 OR @PREV_EMPLOYEE != @NEXT_EMPLOYEE OR @PREV_JOB != @NEXT_JOB OR @PREV_STEP != @NEXT_STEP OR @PREV_MACHINE != @NEXT_MACHINE) AND @PREV_TX_TYPE = 'ON'
AND @PREV_JOB = @NEXT_JOB
AND @PREV_STEP = @NEXT_STEP
AND @PREV_MACHINE = @NEXT_MACHINE
AND @PREV_TX_TYPE = 'ON'
AND @NEXT_TX_TYPE = 'OFF'
@PREV_JOB = @NEXT_JOB,
@PREV_STEP = @NEXT_STEP,
@PREV_MACHINE = @NEXT_MACHINE,
@PREV_TX_TIME = @NEXT_TX_TIME,
@PREV_TX_TYPE = @NEXT_TX_TYPE
Finally join the two table variables and aggregate to produce the result.
It should be noted that the other half of the Table Spool/Eager Spool
optimizer steps referred to above would feed a similar join in the set-based
solution.
It should also be noted that the table variables are already in the correct
order
for the join, due to the primary key constraint on each. I think the
set-based
solution might require an extra sort, though as is often the case, I may be
wrong.
*/
SELECT JOB#, STEP#, SUM(CAST(DATEDIFF(ss, a.START_TIME, a.STOP_TIME) AS
FLOAT)/(3600.0 * CAST(NUMJOBS AS FLOAT))) AS LABOR
FROM @I a JOIN @P b
ON (b.EMPLOYEE# = a.EMPLOYEE#
AND b.START_TIME < a.STOP_TIME
AND b.STOP_TIME > a.START_TIME)
GROUP BY JOB#, STEP#
--Here's the further annotated set-based solution:
/*
This view does with a correlated subquery combined with aggregation
exactly what the cursor that produces @P does in the set-based solution.
I acknowledge that there may be another, better way to do this that doesn't
generate a Loop Join. I used this method because I feel that it is easier
to read and understand.
*/
CREATE VIEW PAIRED_TX AS
SELECT a.EMPLOYEE#,
a.JOB#,
a.STEP#,
a.MACHINE#,
TX_TIME AS TIME_ON,
COALESCE(
(SELECT MIN(TX_TIME)
FROM TX
WHERE EMPLOYEE# = a.EMPLOYEE#
AND JOB# = a.JOB#
AND STEP# = a.STEP#
AND MACHINE# = a.MACHINE#
AND TX_TIME > a.TX_TIME
AND TX_TYPE = 'OFF'),
CURRENT_TIMESTAMP) AS TIME_OFF
Note: this was acutally a requirement in the original solution: if the same
employee clocked onto or off of more than one job during the same
3 minute period, then the clock time of the first job applied to them all.
*/
CREATE VIEW INTERVALS AS
SELECT a.EMPLOYEE#, a.START_TIME, a.STOP_TIME
FROM
(SELECT DISTINCT a.EMPLOYEE#,
a.TX_TIME AS START_TIME,
(SELECT MIN(TX_TIME)
FROM TX
WHERE EMPLOYEE# = a.EMPLOYEE#
AND TX_TIME > a.TX_TIME) AS STOP_TIME
FROM TX a
UNION
SELECT EMPLOYEE#,
MAX(TX_TIME) AS START_TIME,
CURRENT_TIMESTAMP AS STOP_TIME
FROM TX
GROUP BY EMPLOYEE#
HAVING SUM(CASE WHEN TX_TYPE = 'ON' THEN 1 ELSE -1 END) > 0) a
WHERE STOP_TIME IS NOT NULL
SELECT JOB#,
STEP#,
SUM(CAST(DATEDIFF(ss, d.START_TIME, d.STOP_TIME) AS FLOAT) / PAIRS) /
3600.0 AS LABOR
(SELECT b.EMPLOYEE#, START_TIME, STOP_TIME, CAST(COUNT(*) AS FLOAT) AS PAIRS
FROM PAIRED_TX a
JOIN INTERVALS b
ON (b.EMPLOYEE# = a.EMPLOYEE#
AND b.START_TIME >= a.TIME_ON
AND b.STOP_TIME <= a.TIME_OFF)
GROUP BY b.EMPLOYEE#, b.START_TIME, b.STOP_TIME) c
JOIN
(SELECT a.EMPLOYEE#, a.JOB#, a.STEP#, a.MACHINE#, START_TIME, STOP_TIME
FROM PAIRED_TX a
JOIN INTERVALS b
ON (b.EMPLOYEE# = a.EMPLOYEE#
AND b.START_TIME >= a.TIME_ON
AND b.STOP_TIME <= a.TIME_OFF)) d
ON (d.EMPLOYEE# = c.EMPLOYEE#
AND d.START_TIME = c.START_TIME
AND d.STOP_TIME = c.STOP_TIME)
/*
Finally, the result is joined and aggregated to produce the result.
Here's where the results of the above subqueries would need to be sorted,
or hashes computed (but not so likely, due to the theta joins involved--
it all depends on what the optimizer produces) in order to complete
the join. Then the result of the join would need to be sorted or hashed
again in order to compute the aggregate. Of course, the final sort/hash
step would be necessary in either solution.
*/
GROUP BY JOB#, STEP#
Received on Wed May 09 2007 - 11:15:01 CDT
![]() |
![]() |