Re: Self Joins and optimization

From: David Cressey <cressey73_at_verizon.net>
Date: Thu, 17 May 2007 13:49:51 GMT
Message-ID: <34Z2i.21146$5Z6.19472_at_trndny05>

"Brian Selzer" <brian_at_selzer-software.com> wrote in message news:R%t1i.17665$YL5.5318_at_newssvr29.news.prodigy.net...
>
> "David Cressey" <cressey73_at_verizon.net> wrote in message
> news:zEk1i.1282$4a1.80_at_trndny07...

> > You have the same kind of problems with the TX table. If you delete an
> > "OFF" entry, then it affects the Pairing for every interval between the
> > entry deleted and the current time. If you insert a new "ON" entry,
you
> > have to recompute the pairing for every interval between the new "ON"
and
> > its corresponding "OFF".
> >
>
> I wouldn't consider them the same kind of problems. No recomputation is
> necessary until it is queried! What is recorded in the database is
assumed
> to be true, so the delete and insert anomalies you're describing to are
not
> anomalies at all--just the current state of affairs. Of course, if you
try
> to delete an 'ON' row where there also exists a corresponding 'OFF' row,
> then that would leave the table in an inconsistent state, so therefore,
that
> delete should be rejected.
>
> > With all due respect, if this data is being captured in real time, and
> > never altered afterwards, then the anomalies associated with random
> > inserts, updates, and deletes of partially normalized data are
irrelevant
> > to
> > the process at hand (both data collection and query processing).
> >
> >
> People make mistakes. Sometimes they enter the wrong machine#; sometimes
> they enter the wrong step#, sometimes they even screw up their own
> employee#. Someone has to fix them.

I took me a while, but I just now noticed that your two assertions above are mutually contradictory. "What is in the database is assumed to be true" and "People make mistakes ... Someone has to fix them." are opposite arguments.

BTW, I finally came up with a set oriented solution that could be matched against some procedural (cursor oriented?) solution you might have. It seems that MS Access 97 lacks the CASE construct, and the COALESCE function. So I had to make do with IIf and isnull (I'm suitably embarrassed.) I used an OUTER JOIN instead of a subquery.

Note: I only needed 3 self-joins (4 references to TX) instead of the number you gave. I wonder what the reason is for the difference.

At the end of this post I've included an SQL script derived from the MS Access solution, hopefully without transcription errors.

Now what I need is enough data to make MS Access run slow, so that I can see a noticeable delay. ;) (The smiley face is included for lurkers who can't recognize irony without it.)

Brian, can you tell me approximately how much time you saved your enterprise by doing this procedurally, instead of set oriented?

/*
The segments view transforms the transitions given in TX into segments of a job step.
The ratio between elapsed time and labor charged time is constant within a segment.

For performance comparison to a procedural approach to the same problem.

DGC 5-16-07 */

CREATE VIEW SEGMENTS AS
SELECT

      A.Employee,
      C.Job,
      C.Step,
      C.Machine,
      A.TX_TIME AS START_TIME,
      min (IIf (isnull(B.TX_Time), NOW, B.TX_TIME))  AS STOP_TIME,
      (SELECT SUM(IIf (D.TX_Type = 'ON', 1, -1)) FROM TX AS D WHERE

D.EMPLOYEE = A.EMPLOYEE
AND D.TX_TIME <= A.TX_TIME) AS PAIRING
FROM

      (TX AS A
      LEFT OUTER JOIN TX AS B
                        ON  B.Employee = A.Employee
                       AND  B.TX_Time > A.TX_Time)
      INNER JOIN TX AS C
                       ON C.Employee = A.Employee
                       AND   C.TX_Time  <=  A.TX_Time
 GROUP BY
     A.Employee,
     C.Job,
     C.Step,
     C.Machine,
     A.TX_Time

HAVING
sum (IIf (C.TX_Type= 'ON', 1, -1)) > 0;

/* The Labor cost Query calculates the amount of employee labor allocated to each job step. */

CREATE VIEW LABOR_COST AS
SELECT

     Segments.Job,
     Segments.Step,
     Sum(DateDiff("s",START_TIME,STOP_TIME)/3600/Pairing) AS Labor

FROM Segments
GROUP BY

     Segments.Job,
     Segments.Step;

/* The TX Table stores the inputs as given. The actual TX table was built using MS Access design view, And it is supposed to mimic what's given below. */ CREATE TABLE TX
(
EMPLOYEE# INT NOT NULL,
JOB# INT NOT NULL,
STEP# INT NOT NULL,
MACHINE# INT NOT NULL,
TX_TYPE CHAR(3) NOT NULL
CHECK (TX_TYPE = 'ON' OR TX_TYPE = 'OFF'), TX_TIME DATETIME NOT NULL,
PRIMARY KEY (EMPLOYEE#, JOB#, STEP#, MACHINE#, TX_TYPE, TX_TIME) )

A little bit of test data:

INSERT TX VALUES (1,1,1,1,'ON', '2007-05-03 08:00:00')
INSERT TX VALUES (1,2,1,1,'ON', '2007-05-03 08:00:00')
INSERT TX VALUES (1,1,1,1,'OFF', '2007-05-03 09:00:00')
INSERT TX VALUES (1,2,1,1,'OFF', '2007-05-03 09:30:00')
INSERT TX VALUES (1,1,1,1,'ON', '2007-05-03 10:00:00')
INSERT TX VALUES (1,1,1,1,'OFF', '2007-05-03 11:00:00')
INSERT TX VALUES (1,1,2,1,'ON', '2007-05-03 11:30:00')
INSERT TX VALUES (1,1,2,1,'OFF', '2007-05-03 12:00:00')
INSERT TX VALUES (1,2,2,1,'ON', '2007-05-03 13:00:00')
INSERT TX VALUES (1,2,2,1,'OFF', '2007-05-03 14:00:00')

/* Notes:

In my implementation, I used "Iif" and "isnull" instead of "case" and "coalesce". "Case" and "coalesce" are better, IMO, but they aren't available in MS Access 97, AFAIK.

An additional entry in TX, to provide an unmatched "ON" provides a better test of the data.
It's probably a good idea to make about an hour before the present time.*/ Received on Thu May 17 2007 - 15:49:51 CEST

This message: [ Message body ]
Next message: Bob Hairgrove: "Re: Newsgroup Signal to Noise Ratio"
Previous message: Brian Selzer: "Re: Isolation levels"
Maybe in reply to: David Cressey: "Self Joins and optimization"
In reply to Brian Selzer: "Re: Self Joins and optimization"
Next in thread: Bob Hairgrove: "Re: Newsgroup Signal to Noise Ratio"
Reply: Brian Selzer: "Re: Self Joins and optimization"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

Original text of this message