Re: Finding Orphans
Date: Wed, 10 Jan 2001 11:13:24 GMT
Message-ID: <93hg4h$frg$1_at_nnrp1.deja.com>
In article <Pine.GSO.4.31.0101081607350.21269-100000_at_shell1>,
"Lucky Leavell [RIS]" <ris_at_iglou.com> wrote:
> I have a situation whereby I am trying to determine the number of
orphans
> in an addresses table by checking for those address keys NOT IN the
five
> tables which use it, to wit:
>
> SELECT count(*)
> FROM address
> WHERE addr_key NOT IN (SELECT addr_key
> FROM client)
> AND addr_key NOT IN (SELECT addr_key
> FROM employee)
> AND addr_key NOT IN (SELECT addr_key
> FROM groups)
> AND addr_key NOT IN (SELECT addr_key
> FROM patient)
> AND addr_key NOT IN (SELECT addr_key
> FROM provider)
>
> These tables contain anywhere from a few hundred to a few million
rows and
> only the address and patient tables have an index on the addr_key
column.
> After running three hours the query returns a zero count which I know
is
> NOT correct.
>
> After pulling hair (and I don't have that much left to pull!), I still
> cannot see what I am missing. Any ideas? Is there a better way to do
> this?
>
> Thank you,
> Lucky
>
> Lucky Leavell Phone: (800) 481-2393 or (812) 366-
4066
> UniXpress - Your Source for SCO FAX: (888) 231-9640 or (812) 366-
3618
> 1560 Zoar Church Road NE Email: lucky_at_UniXpress.com
> Corydon, IN 47112-7374 WWW Home Page: http://www.UniXpress.com
>
>
If my understanding of your question is correct, and it is not made clear, you want to ensure that each address occurs in ALL 5 of the other tables. Your query actually checks that the address appears in any 1 of the 5 tables.
If my interpretation is correct the query should be:
SELECT count(*)
FROM address
WHERE addr_key NOT IN (SELECT addr_key
FROM client) OR addr_key NOT IN (SELECT addr_key FROM employee) OR addr_key NOT IN (SELECT addr_key FROM groups) OR addr_key NOT IN (SELECT addr_key FROM patient) OR addr_key NOT IN (SELECT addr_key FROM provider)
Alternatively
SELECT count(*)
FROM address
WHERE NOT
(
addr_key IN (SELECT addr_key
FROM client) AND addr_key IN (SELECT addr_key FROM employee) AND addr_key IN (SELECT addr_key FROM groups) AND addr_key IN (SELECT addr_key FROM patient) AND addr_key IN (SELECT addr_key FROM provider)
)
Both of the above are logically the same but you should compare the query plans for both and use the most efficient.
Could it be worth creating indexes on the addr_key where they curently do not exist - I would seriously consider this - you will find that it speeds up your query and may well speed up the system generally. Dowside is that INSERT may be slowed and more disk space needed. The trade-off is your decision.
Just a final helpfull suggestion to help you keep your remaining hair - why don't you test you queries on a small subset of the data first, simulating the conditions you are looking for?
-- Vaughan Powell MCDBA, MCSD, MCSE Data Architecture and Applications Design Manager BuildOnline Sent via Deja.com http://www.deja.com/Received on Wed Jan 10 2001 - 12:13:24 CET