Re: Need to query 4 tables....3 are 'subordinates'...(need outer join?)

From: --CELKO-- <71062.1056_at_compuserve.com>
Date: 4 Oct 2002 12:31:32 -0700
Message-ID: <c0d87ec0.0210041131.3caeed4b_at_posting.google.com>


>> 1. Is this hub-and-spoke design common? <<

Yep. And it is getting to be even more so when OO programmers strart to map OO type-subtype designs into SQL.

>> 2. Is it possible to do a multi-table outer join? <<

Not as such. The JOIN operators are binary. The outer join operators do not commute, associate or play in a nice simple algebraic way, so you have to watch your parentheses

>> 3. I'm working w/ some mainframe programmers who say they never
write
crazy SQL. Instead, they just go out and get the data for each row one table at a time. <<

They are morons. You can tell them I said this and let them look up my name on the Internet to see that I have the credentials to back up that statement. I will make a guess that they were all COBOL programmers who took their old, crappy legacy code and used a text editor to make each file into a table, then each READ into a FETCH. They have never actually written SQL.

There is an old joke about the Polish lumberjack who has been using an axe all of his life and finally decides to buy his first chainsaw. He comes back to the hardware store the next day and the owner asked him how many trees he cut on the first day with the new tool.

"Just one."

"That cannot be right. Let me see that chainsaw."

The hardware store owner clears out the teeth, inspects the machine, then fills up the gas tank with fresh gasoline. "It looks a little beat up, but otherwise it looks fine to me."

Then he pulls the cord and starts the engine.

The Polish lumberjack jumps staight and backwards, screaming "Why is it making all that noise?"

>> 4. Does it matter whether I do a LEFT OUTER vs. a RIGHT OUTER
join,
and switch the order of the tables in the query? <<

They are the same thing, just different "syntax sugar" on them.

>> 5. I took a shot at SQL code for all four tables below.

 SELECT * -- replace with actual columns in real code    FROM T1 -- the preserved table

        LEFT OUTER JOIN
        T2
        ON T1.pk1 = T2.pk1 
           AND T1.pk2 = T2.pk2      
        LEFT OUTER JOIN
        T3
        ON T1.pk1 = T3.pk1 
           AND T1.pk2 = T3.pk2
           AND T3.col = <rt_val> 
        LEFT OUTER JOIN
        T4
        ON T1.pk1 = T4.pk1 
           AND T1.pk2 = T4.pk2
           AND T4.col = <rt_val>;

What will happen is that all the rows in T1 will be preserved in the result set. The other three tables will be "attached" to it in the order of the left joins. What I guessed at was the placement of the "col = <val>" predicates. You might want to put them in a WHERE clause instead.

Here is how OUTER JOINs work in SQL-92. Assume you are given:

Table1       Table2
 a   b        a   c
 ======       ======
 1   w        1   r
 2   x        2   s
 3   y        3   t

 4 z

and the outer join expression:

 Table1
 LEFT OUTER JOIN
 Table2

 ON Table1.a = Table2.a      <== join condition
    AND Table2.c = 't';      <== single table condition

We call Table1 the "preserved table" and Table2 the "unpreserved table" in the query. What I am going to give you is a little different, but equivalent to the ANSI/ISO standards.

  1. We build the CROSS JOIN of the two tables. Scan each row in the result set.
  2. If the predicate tests TRUE for that row, then you keep it. You also remove all rows derived from it from the CROSS JOIN
  3. If the predicate tests FALSE or UNKNOWN for that row, then keep the columns from the preserved table, convert all the columns from the unpreserved table to NULLs and remove the duplicates.

So let us execute this by hand:

 Let _at_ = passed the first predicate
 Let * = passed the second predicate

 Table1 CROSS JOIN Table2
 a b a c


 1   w       1   r _at_
 1   w       2   s
 1   w       3   t *
 2   x       1   r
 2   x       2   s _at_
 2   x       3   t *
 3   y       1   r
 3   y       2   s
 3   y       3   t _at_* <== the TRUE set
 4   z       1   r
 4   z       2   s
 4   z       3   t *

 Table1 LEFT OUTER JOIN Table2
 a   b        a   c
 =========================
 3   y     3      t      <= only TRUE row
 -----------------------
 1   w     NULL   NULL   Sets of duplicates
 1   w     NULL   NULL
 1   w     NULL   NULL
 -----------------------
 2   x     NULL   NULL
 2   x     NULL   NULL
 2   x     NULL   NULL
 3   y     NULL   NULL  <== derived from the TRUE set - Remove  
 3   y     NULL   NULL
 -----------------------
 4   z     NULL   NULL
 4   z     NULL   NULL
 4   z     NULL   NULL

the final results:

 Table1 LEFT OUTER JOIN Table2
 a b a c


 1   w     NULL   NULL
 2   x     NULL   NULL
 3   y     3      t
 4   z     NULL   NULL

The basic rule is that every row in the preserved table is represented in the results in at least one result row.

There are limitations and very serious problems with the extended equality version of an outer join used in some diseased mutant products. Consider the two Chris Date tables

 Suppliers        SupParts
 supno             supno partno qty
 =========        ==============
 S1               S1   P1    100
 S2               S1   P2    250
 S3               S2   P1    100
                  S2   P2    250

and let's do an extended equality outer join like this:

 SELECT *
  FROM Supplier, SupParts
 WHERE Supplier.supno *= SupParts.supno
   AND qty < 200;

If I do the outer first, I get:

 Suppliers LOJ SupParts
 supno supno partno qty


 S1     S1   P1    100
 S1     S1   P2    250
 S2     S2   P1    100
 S2     S2   P2    250

 S3 NULL NULL NULL Then I apply the (qty < 200) predicate and get

 Suppliers LOJ SupParts
 supno supno partno qty



 S1 S1 P1 100
 S2 S2 P1 100

Doing it in the opposite order

 Suppliers LOJ SupParts
 supno supno partno qty



 S1 S1 P1 100
 S2 S2 P1 100
 S3 NULL NULL NULL Sybase does it one way, Oracle does it the other and Centura (nee Gupta) lets you pick which one -- the worst of both non-standard worlds! In SQL-92, you have a choice and can force the order of execution. Either do the predicates after the join ...

 SELECT *
   FROM Supplier

        LEFT OUTER JOIN
        SupParts
        ON Supplier.supno = SupParts.supno
 WHERE qty < 200;

 ... or do it in the joining:

 SELECT *
  FROM Supplier

       LEFT OUTER JOIN
       SupParts
       ON Supplier.supno = SupParts.supno
          AND qty < 200;

Another problem is that you cannot show the same table as preserved and unpreserved in the extended equality version, but it is easy in SQL-92. For example to find the students who have taken Math 101 and might have taken Math 102:

 SELECT C1.student, C1.math, C2.math
  FROM (SELECT * FROM Courses WHERE math = 101) AS C1

       LEFT OUTER JOIN
       (SELECT * FROM Courses WHERE math = 102) AS C2
       ON C1.student = C2.student;

The other advice you got is wrong. The use of a surrogate key only adds meaningless data to the table, making the tables bigger. You have to maintain that redundant, meaningless data and you have absolutely no way to verify the surrogate against the reality you are modeling.

A lot of older programmers still don't realize just how fast and large a modern computer is. The time to do a comparison of almost any size byte strings inside main storage is far, far less than the time to locate, read and write data from a disk drive.

Cursors are typically orders of magnitude slower than pure SQL code, not portable, and lock down the tables. I have written about five cursors in my career and I could have avoided three of them if we had a CASE expression back then. Cursors are like the axe. Received on Fri Oct 04 2002 - 21:31:32 CEST

Original text of this message