Re: Finding Orphans

From: Joe Celko <71062.1056_at_compuserve.com>
Date: Wed, 10 Jan 2001 18:14:07 GMT
Message-ID: <93i8p4$6ih$1_at_nnrp1.deja.com>


>> I have a situation whereby I am trying to determine the number of
orphans
in an addresses table by checking for those address keys NOT IN the five tables which use it, to wit: <<

I assume you mean "not in ANY of five tables" instead of "not in ALL of five tables". The first thing to do is to add DRI to your tables, so they will reference the address table and will automatically drop rows when someone is taken out of the central address book. You might also want to use collective or plural nouns for table names, since they are sets or classes of entities (if you do have only one client, then that was a good name).

Also, why do you want a COUNT(*) instead of a list of the orphans? This is like George Carlin's joke -- "and now for some baseball scores; 7, 12, 2 and 4."

I would give this a try, since [NOT] EXISTS () predicates are often faster than [NOT] IN () predicates.

SELECT A1.addr_key
  FROM Addresses AS A1
 WHERE NOT EXISTS (SELECT *

                     FROM Clients AS C1
                    WHERE C1.addr_key = A1.addr_key)
   AND NOT EXISTS (SELECT *
                     FROM Employees AS E1
                    WHERE E1.addr_key = A1.addr_key)
   AND NOT EXISTS (SELECT *
                     FROM Groups AS G1
                    WHERE G1.addr_key = A1.addr_key)
   AND NOT EXISTS (SELECT *
                     FROM Patients AS P1
                    WHERE P1.addr_key = A1.addr_key)
   AND NOT EXISTS (SELECT *
                     FROM Providers AS PV1
                    WHERE PV1.addr_key = A1.addr_key);

If the optimizer is not smart about DeMorgan's Laws in this case because of the subquery predicates, re-write it as:

SELECT A1.addr_key
  FROM Addresses AS A1
 WHERE NOT

       (EXISTS (SELECT *
                  FROM Clients AS C1
                 WHERE C1.addr_key = A1.addr_key)
        OR EXISTS (SELECT *
                     FROM Employees AS E1
                    WHERE E1.addr_key = A1.addr_key)
        OR EXISTS (SELECT *
                     FROM Groups AS G1
                    WHERE G1.addr_key = A1.addr_key)
        OR EXISTS (SELECT *
                     FROM Patients AS P1
                    WHERE P1.addr_key = A1.addr_key)
        OR EXISTS (SELECT *
                     FROM Providers AS PV1
                    WHERE PV1.addr_key = A1.addr_key));

Order the clauses by table size, so that the most likely match will occur first.

>> After pulling hair (and I don't have that much left to pull!), I
still cannot see what I am missing. Any ideas? Is there a better way to do this? <<

You think you have hair problems, wimp? I have varicose veins and liver spots on my skull <g>?

--CELKO--
Joe Celko, SQL Guru & DBA at Trilogy
When posting, inclusion of SQL (CREATE TABLE ..., INSERT ..., etc) which can be cut and pasted into Query Analyzer is appreciated.

Sent via Deja.com
http://www.deja.com/ Received on Wed Jan 10 2001 - 19:14:07 CET

Original text of this message