Re: Finding Orphans

From: Lucky Leavell [RIS] <ris_at_iglou.com>
Date: Thu, 11 Jan 2001 10:48:08 -0500
Message-ID: <Pine.GSO.4.31.0101111039160.7079-100000_at_shell1>


On Wed, 10 Jan 2001, Joe Celko wrote:

>
> >> I have a situation whereby I am trying to determine the number of
> orphans
> in an addresses table by checking for those address keys NOT IN the five
> tables which use it, to wit: <<
>
> I assume you mean "not in ANY of five tables" instead of "not in ALL of
> five tables". The first thing to do is to add DRI to your tables, so
> they will reference the address table and will automatically drop rows
> when someone is taken out of the central address book. You might also
> want to use collective or plural nouns for table names, since they are
> sets or classes of entities (if you do have only one client, then that
> was a good name).
>
Correct, I want to know how many entries in the address tables no longer have an entry in any one or more of the "parent" tables.

> Also, why do you want a COUNT(*) instead of a list of the orphans?
> This is like George Carlin's joke -- "and now for some baseball scores;
> 7, 12, 2 and 4."
>
The count is preliminary to formulating a DELETE to remove them ...

> I would give this a try, since [NOT] EXISTS () predicates are often
> faster than [NOT] IN () predicates.
>
> SELECT A1.addr_key
> FROM Addresses AS A1
> WHERE NOT EXISTS (SELECT *
> FROM Clients AS C1
> WHERE C1.addr_key = A1.addr_key)
> AND NOT EXISTS (SELECT *
> FROM Employees AS E1
> WHERE E1.addr_key = A1.addr_key)
> AND NOT EXISTS (SELECT *
> FROM Groups AS G1
> WHERE G1.addr_key = A1.addr_key)
> AND NOT EXISTS (SELECT *
> FROM Patients AS P1
> WHERE P1.addr_key = A1.addr_key)
> AND NOT EXISTS (SELECT *
> FROM Providers AS PV1
> WHERE PV1.addr_key = A1.addr_key);
>
I had already done something like this which ran MUCH faster (approx. 3 times!) on Informix 7.31. A sidekick added indices on those tables which didn't have them and it ran 4 times faster ... yep, a grand total of 12 times faster!

 > If the optimizer is not smart about DeMorgan's Laws in  this case > because of the subquery predicates, re-write it as:
>

As an Ingres refugee, I make no comments about the Informix query optimizer or anything else Informix except to say I MISS INGRES!

> SELECT A1.addr_key
> FROM Addresses AS A1
> WHERE NOT
> (EXISTS (SELECT *
> FROM Clients AS C1
> WHERE C1.addr_key = A1.addr_key)
> OR EXISTS (SELECT *
> FROM Employees AS E1
> WHERE E1.addr_key = A1.addr_key)
> OR EXISTS (SELECT *
> FROM Groups AS G1
> WHERE G1.addr_key = A1.addr_key)
> OR EXISTS (SELECT *
> FROM Patients AS P1
> WHERE P1.addr_key = A1.addr_key)
> OR EXISTS (SELECT *
> FROM Providers AS PV1
> WHERE PV1.addr_key = A1.addr_key));
>

Didn't try this variant ... yet.

> Order the clauses by table size, so that the most likely match will
> occur first.
>
That sound like a winner! I'll try it.

> >> After pulling hair (and I don't have that much left to pull!), I
> still cannot see what I am missing. Any ideas? Is there a better way
> to do this? <<
>
> You think you have hair problems, wimp? I have varicose veins and
> liver spots on my skull <g>?
>
Actually, I'm not hung up on hair or lack of it. I am VERY thankful for excellent health ... my last hospital stay was in August, 1945 on the occasion of my birth but that was an involuntary admission.

 > --CELKO--
> Joe Celko, SQL Guru & DBA at Trilogy
> When posting, inclusion of SQL (CREATE TABLE ..., INSERT ..., etc)
> which can be cut and pasted into Query Analyzer is appreciated.
>
>
> Sent via Deja.com
> http://www.deja.com/
>

Thank you,
Lucky

Lucky Leavell                      Phone: (800) 481-2393 or (812) 366-4066
UniXpress - Your Source for SCO      FAX: (888) 231-9640 or (812) 366-3618
1560 Zoar Church Road NE           Email: lucky_at_UniXpress.com
Corydon, IN 47112-7374     WWW Home Page:  http://www.UniXpress.com
Received on Thu Jan 11 2001 - 16:48:08 CET

Original text of this message