Re: Finding Orphans

From: Vaughan Powell <vaughan_at_workmail.com>
Date: Wed, 10 Jan 2001 11:13:24 GMT
Message-ID: <93hg4h$frg$1_at_nnrp1.deja.com>


In article <Pine.GSO.4.31.0101081607350.21269-100000_at_shell1>,   "Lucky Leavell [RIS]" <ris_at_iglou.com> wrote:
> I have a situation whereby I am trying to determine the number of
 orphans
> in an addresses table by checking for those address keys NOT IN the
 five
> tables which use it, to wit:
>
> SELECT count(*)
> FROM address
> WHERE addr_key NOT IN (SELECT addr_key
> FROM client)
> AND addr_key NOT IN (SELECT addr_key
> FROM employee)
> AND addr_key NOT IN (SELECT addr_key
> FROM groups)
> AND addr_key NOT IN (SELECT addr_key
> FROM patient)
> AND addr_key NOT IN (SELECT addr_key
> FROM provider)
>
> These tables contain anywhere from a few hundred to a few million
 rows and
> only the address and patient tables have an index on the addr_key
 column.
> After running three hours the query returns a zero count which I know
 is
> NOT correct.
>
> After pulling hair (and I don't have that much left to pull!), I still
> cannot see what I am missing. Any ideas? Is there a better way to do
> this?
>
> Thank you,
> Lucky
>
> Lucky Leavell Phone: (800) 481-2393 or (812) 366-
 4066
> UniXpress - Your Source for SCO FAX: (888) 231-9640 or (812) 366-
 3618
> 1560 Zoar Church Road NE Email: lucky_at_UniXpress.com
> Corydon, IN 47112-7374 WWW Home Page: http://www.UniXpress.com
>
>

If my understanding of your question is correct, and it is not made clear, you want to ensure that each address occurs in ALL 5 of the other tables. Your query actually checks that the address appears in any 1 of the 5 tables.

If my interpretation is correct the query should be:

 SELECT count(*)
   FROM address
  WHERE addr_key NOT IN (SELECT addr_key

                           FROM client)
    OR addr_key NOT IN (SELECT addr_key
                           FROM employee)
    OR addr_key NOT IN (SELECT addr_key
                           FROM groups)
    OR addr_key NOT IN (SELECT addr_key
                           FROM patient)
    OR addr_key NOT IN (SELECT addr_key
                          FROM provider)

Alternatively

 SELECT count(*)
   FROM address
  WHERE NOT
   (
    addr_key IN (SELECT addr_key

                 FROM client)
AND addr_key IN (SELECT addr_key
                 FROM employee)
AND addr_key IN (SELECT addr_key
                 FROM groups)
AND addr_key IN (SELECT addr_key
                 FROM patient)
AND addr_key IN (SELECT addr_key
                 FROM provider)

   )

Both of the above are logically the same but you should compare the query plans for both and use the most efficient.

Could it be worth creating indexes on the addr_key where they curently do not exist - I would seriously consider this - you will find that it speeds up your query and may well speed up the system generally. Dowside is that INSERT may be slowed and more disk space needed. The trade-off is your decision.

Just a final helpfull suggestion to help you keep your remaining hair - why don't you test you queries on a small subset of the data first, simulating the conditions you are looking for?

--
Vaughan Powell MCDBA, MCSD, MCSE
Data Architecture and Applications Design Manager
BuildOnline


Sent via Deja.com
http://www.deja.com/
Received on Wed Jan 10 2001 - 12:13:24 CET

Original text of this message