Re: Simple SQL Puzzle

From: Chris Brady <chrisjbrady_at_yahoo.com>
Date: Thu, 01 Nov 2001 15:35:38 GMT
Message-ID: <3be167db.16960542@news.baplc.com>

Folks - thank you very much evereyone for you help. Indeed using MINUS solution appears to have worked!!

I tried

SELECT DISTINCT(i.myfield) FROM table1 i WHERE i.myfield NOT IN
(SELECT w.myfield FROM table2 w)

and this took 8 hours to run - there are 4.5 millions records in table1 and 5,000 in table2.

Then I tried

SELECT DISTINCT(i.myfield) FROM table1 i MINUS
SELECT DISTINCT(w.myfield) FROM table2 w

and this took only 30 mins.

Now I THINK that the table2 above is a pure subset of table1. So the above produced the records I needed.

However my question now is that if table1 and table2 overlap in the conventional Venn diagram kind of way, that is if (table1 INTERSECT table2) is not NULL, and table one contains records not in table2, and table2 contains records not in table1 - does:

SELECT DISTINCT(i.myfield) FROM table1 i MINUS
SELECT DISTINCT(w.myfield) FROM table2 w

give the same results as

SELECT DISTINCT(w.myfield) FROM table2 w MINUS
SELECT DISTINCT(i.myfield) FROM table1 i

which are all the records not in the INTERSECTION of table1 and table2.

If so this is not quite what I want because I really need all of the records that are in table2 but not in table1 - guaranteed.

I think the logic for this is:

table 1 MINUS (table1 INTERSECT table2)

or rather

SELECT DISTINCT(i.myfield) from table1 i MINUS
(SELECT DISTINCT(j.myfield) from table1 j INTERSECT DISTINCT(w.myfield) from table2 w)

Does anyone agree please?!!

Thanks again - Chris B.

On Tue, 30 Oct 2001 12:45:46 -0000, "Niall Litchfield" <n-litchfield_at_audit-commission.gov.uk> wrote:

>Hi
>
>I thought you might be doing this.
>
>Try instead
>
>select myfield from table1
>MINUS
>select myfield from table2;
>
>or
>
>select distinct(i.myfield) from table1
>where i.myfield not in
>(select myfield from table2);
>
>
>I assume myfield is indexed in both cases.
>--
>Niall Litchfield
>Oracle DBA
>Audit Commission UK
>"Chris Brady" <chrisjbrady_at_yahoo.com> wrote in message
>news:51604466.0110300417.6f1f05d9_at_posting.google.com...
>> Niall - many thanks for responding so quickly. In fact the reason I
>> asked is that I need to extract data from one table (table1) that is
>> not in table2. Both tables have identical structure but the contents
>> of table2 is a subset of table1.
>>
>> However I had a suspicion that this didn't work as expected, hence the
>> original question:
>>
>> SQL> select count(distinct(i.myfield)) from table1 i, table2 w
>> where i.myfield <> w.myfield;
>>
>> Instead I've used:
>>
>> SQL> select distinct(i.myfield) from table1 i
>> where i.myfield not in
>> (select distinct(w.myfield) from table2 w);
>>
>> The only problem with half a million records in table1 and a few
>> thousand in table 2 is that this take three hours to run!! I think
>> that this does the second select for every record from the first
>> select!!
>>
>> By-the-way I'm a newbie to Oracle and SQL as you might have realised!!
>>
>> Chris B.
>>
>> "Niall Litchfield" <n-litchfield_at_audit-commission.gov.uk> wrote in message
>news:<3bdd3db5$0$225$ed9e5944_at_reading.news.pipex.net>...
>> > Comments below
>> > "Chris Brady" <chrisjbrady_at_yahoo.com> wrote in message
>> > news:51604466.0110290314.226d2546_at_posting.google.com...
>> > > Please can someone explain this? Its the same table compared with
>> > > itself. The first case I can understand but why is the second case
>> > > giving the same total? I would have thought that it would be zero.
>> > <snip>
>> > > SQL> select count(distinct(i.myfield)) from temp_data i, temp_data w
>> > > where i.myfield <> w.myfield;
>> > >
>> > > COUNT(DISTINCT(I.MYFIELD))
>> > > --------------------------
>> > > 47548
>> >
>> > Your table has 47548 distinct values of myfield. Each of those is
>clearly
>> > not equal to the other 47547 values in the table. Thus each value of
>myfield
>> > will satisfy your where clause at least 47547 times. Assuming myfield is
>not
>> > in fact unique try taking the distinct keyword out of the two pieces of
>sql
>> > to see the results more clearly.
>
>
Received on Thu Nov 01 2001 - 09:35:38 CST