Re: Self Joins and optimization

From: David Cressey <cressey73_at_verizon.net>
Date: Mon, 14 May 2007 20:31:39 GMT
Message-ID: <LG32i.8392$1X1.5731_at_trndny02>


"Brian Selzer" <brian_at_selzer-software.com> wrote in message news:JBe%h.16839$YL5.304_at_newssvr29.news.prodigy.net...
>
> "David Cressey" <cressey73_at_verizon.net> wrote in message
> news:Vg5%h.166$83.81_at_trndny08...
> >
> > "Brian Selzer" <brian_at_selzer-software.com> wrote in message
> > news:MQ2%h.4456$H_.2052_at_newssvr21.news.prodigy.net...
> >>
> >> "David Cressey" <cressey73_at_verizon.net> wrote in message
> >> news:ZE%_h.163$D52.68_at_trndny04...

> >> Here's a maybe not-so-simple example:
> >>
> >> Given a table with the columns,
> >>
> >> EMPLOYEE#, JOB#, STEP#, MACHINE#, TX_TIME, TX_TYPE
> >>
> >> where TX_TYPE can be either 'ON' or 'OFF' and the key is the entire
> > heading,
> The table definition:
>
> CREATE TABLE TX
> (
> EMPLOYEE# INT NOT NULL,
> JOB# INT NOT NULL,
> STEP# INT NOT NULL,
> MACHINE# INT NOT NULL,
> TX_TYPE CHAR(3) NOT NULL
> CHECK (TX_TYPE = 'ON' OR TX_TYPE = 'OFF'),
> TX_TIME DATETIME NOT NULL,
> PRIMARY KEY (EMPLOYEE#, JOB#, STEP#, MACHINE#, TX_TYPE, TX_TIME)
> )
>
> A little bit of test data:
>
> INSERT TX VALUES (1,1,1,1,'ON', '2007-05-03 08:00:00')
> INSERT TX VALUES (1,2,1,1,'ON', '2007-05-03 08:00:00')
> INSERT TX VALUES (1,1,1,1,'OFF', '2007-05-03 09:00:00')
> INSERT TX VALUES (1,2,1,1,'OFF', '2007-05-03 09:30:00')
> INSERT TX VALUES (1,1,1,1,'ON', '2007-05-03 10:00:00')
> INSERT TX VALUES (1,1,1,1,'OFF', '2007-05-03 11:00:00')
> INSERT TX VALUES (1,1,2,1,'ON', '2007-05-03 11:30:00')
> INSERT TX VALUES (1,1,2,1,'OFF', '2007-05-03 12:00:00')
> INSERT TX VALUES (1,2,2,1,'ON', '2007-05-03 13:00:00')
> INSERT TX VALUES (1,2,2,1,'OFF', '2007-05-03 14:00:00')
>

Except for droping the pound signs, and not putting in the NOT NULL constyraints, my TX Table is the same as the above.

Ok, I have another step in producing a set oriented solution to the not so simple example given.

Ok, I have another step in a comparison between set oriented as procedural solutions for the labor cost example problem. This time, I created "segments" as a view off of TX, the table that you (Brian) supplied as the sample inputs.

Here's what I built:

I added a tiny table to store the translation between 'ON' and 1 and 'OFF' and -1.

CREATE TABLE TXREF
    (TX_TYPE CHAR(3),
      TX_OPEN INT); INSERT TXREF VALUES ('ON', 1);
INSERT TXREF VALUES ('OFF', -1); and a view that delivers TX_OPEN instead of TX_TYPE.

CREATE VIEW TXR AS
SELECT TX.EMPLOYEE, TX.JOB, TX.STEP, TX.MACHINE, TXREF.TX_OPEN, TX.TX_TIME FROM TXREF INNER JOIN TX ON TXREF.TX_TYPE = TX.TX_TYPE; Next, I created the Segments view, that presents the same data as in my last example, but this time derived from TX (and TXREF).

CREATE VIEW SEGMENTS AS SELECT

     A.EMPLOYEE,
     A.JOB,
     A.STEP,
     A.MACHINE,
     B.TX_TIME AS START_TIME,
     MIN (C.TX_TIME) AS STOP_TIME,
    (SELECT SUM(D.TX_OPEN) FROM TXR AS D WHERE
            D.EMPLOYEE = A.EMPLOYEE AND D.TX_TIME <= B.TX_TIME) AS PAIRING

FROM (TXR AS A
              INNER JOIN TX AS B
                    ON (B.EMPLOYEE = A.EMPLOYEE) AND (B.TX_TIME >=
A.TX_TIME))
              INNER JOIN TX AS C
                    ON (C.TX_TIME > B.TX_TIME) AND (C.EMPLOYEE = A.EMPLOYEE)

GROUP BY
            A.EMPLOYEE,
            A.JOB,
            A.STEP,
            A.MACHINE,
            B.TX_TIME

HAVING SUM (A.TX_OPEN) > 0; While this passes the test data, it still needs a little work. One of the INNER JOINs should be an outer join, in order to deal with the case where a job step is still running when the data is pulled. Then the current time should be coalesced into the stop time, to replace resulting NULLS that come out the outer join. The test data has no incomplete job steps, so this error didn't affect my test run. There could be other bugs.

Then, finally, there's the query that summarizes the labor costs, as before:

CREATE VIEW LABOR_COSTS AS
SELECT

       SEGMENTS.JOB,
       SEGMENTS.STEP,
       SUM(DATEDIFF("S",START_TIME,STOP_TIME)/3600/PAIRING) AS LABOR
FROM SEGMENTS GROUP BY SEGMENTS.JOB, SEGMENTS.STEP; The next step might be to collect enough data to analyze performance, and run it side by side against a procedural solution. I don't know how the race is going to go, but with the right indexes, a six way join in Oracle doesn't waste enough time to bother with further greasing of the code, IMO.

I suppose, if I get down to analyzing performance, it would behoove me to download a copy of SQL Server Express, so I could have something better than MS Access to run on. I may just leave this effort where it is. Brian isn't going to be persuaded in any event, and no one else in the NG has expressed interest in it. Received on Mon May 14 2007 - 22:31:39 CEST

Original text of this message