Re: Which is the more efficient SQL formulation?

From: Lawrence James <James.Lawrence_at_epamail.epa.gov>
Date: Thu, 20 Oct 1994 19:10:51 GMT
Message-ID: <James.Lawrence.9.000E2EB3_at_epamail.epa.gov>


In article <380j53$n7k_at_ixnews1.ix.netcom.com> skp_at_ix.netcom.com (SANJAY PURI) writes:
>From: skp_at_ix.netcom.com (SANJAY PURI)
>Subject: Re: Which is the more efficient SQL formulation?
>Date: 18 Oct 1994 13:38:43 GMT
 

>In <37vbo3$4f0_at_grivel.une.edu.au> mpaulo_at_metz.une.edu.au (Maria Paulo) writes:
 

>>
>>Hi all,
>>
>>Consider a table MY_TABLE that has the following values:
>>
>> column1 column2 column3
>> ------- ------- -------
>> 123 Citizen John
>> 456 Citizen john
>> 789 citizen John
>> 001 Kent Clark
>> 002 Lang Loise
>>
>>The first 3 records above actually refer to the same
>>person but has been entered in the database 3 times
>>with 3 different values in column1. Records 5 and 6
>>are unique records and are alright.
>>
>>What I would like to do is create a view that displays
>>only the records that I have same values in column1 and
>>column3. The view, for example, should display the
>>following:
>>
>> column1 column2 column3
>> ------- ------- -------
>> 123 Citizen John
>> 456 Citizen john
>> 789 citizen John
>>
>>I can think of 2 formulations which are:
>>
>> 1) create view MY_VIEW
>> as select distinct *
>> from MY_TABLE A, MY_TABLE B
>> where upper(ltrim(rtrim(A.column1)))=upper(ltrim(rtrim(B.column1))) and
>> upper(ltrim(rtrim(A.column2)))=upper(ltrim(rtrim(B.column2))) and
>> A.rowid != B.rowid;
>>
>> 2) create view MY_VIEW
>> as select *
>> from MY_TABLE A
>> where EXISTS (select '*' from MY_TABLE B
>> where upper(ltrim(rtrim(A.column1)))=upper(ltrim(rtrim(B.column1))) and
>> upper(ltrim(rtrim(A.column2)))=upper(ltrim(rtrim(B.column2))) and
>> A.rowid != B.rowid);
>>
>>My question is, which of the 2 formulations is more efficient?
>>
>>I appreciate any help.
>>
>>Maria (mpaulo_at_mossman.une.edu.au)
>>
 

>I think the second option is the good one.
 

>VK Nair

It sounds like you are trying to locate potential duplicates in your table. The most obvious solution from the example you have shown is to stop allowing lower case and put a unique index on the combination of last name and first name. For more complex situations such as James Smith and Jim Smith you should explore the use of soundex(). I have used the combination of soundex on lastnames with other data such as zip code and/or first initial to attempt to find duplicates in customer databases. Received on Thu Oct 20 1994 - 20:10:51 CET

Original text of this message