Home » SQL & PL/SQL » SQL & PL/SQL » Minus to Not Exists
Minus to Not Exists [message #203911] Thu, 16 November 2006 23:03 Go to next message
a_developer
Messages: 194
Registered: January 2006
Senior Member
I have an SQL:

select e.emp_id, t.share_id
  from employee e, temp_emp t
 where e.comp    = 10
   and e.it_id   = t.it_id
   and e.code_id = t.code_id
minus
select e2.emp_id, c.share_id
  from employee e2, company  c
 where e2.comp   = 10
   and e2.emp_id = c.emp_id 

When I converted it to NOT EXISTS, I am not getting the same result set:

               select e.emp_id,
                      t.share_id
                from employee e,
                     temp_emp t
               where e.comp    = 10
                 and e.it_id   = t.it_id
                 and e.code_id = t.code_id
                 and not exists (select 1
                                   from company c
                                  where e.emp_id = c.emp_id)


What am I missing here?? Please help.. thanks!
Re: Minus to Not Exists [message #203939 is a reply to message #203911] Fri, 17 November 2006 00:31 Go to previous messageGo to next message
rleishman
Messages: 3724
Registered: October 2005
Location: Melbourne, Australia
Senior Member
The 1st SQL compares SHARE_ID as well as EMP_ID. It is equivalent to:

select e.emp_id,
       t.share_id
  from employee e,
       temp_emp t
 where e.comp    = 10
   and e.it_id   = t.it_id
   and e.code_id = t.code_id
   and not exists (select 1
     from employee e2, company  c
    where e2.comp   = 10
      and e2.emp_id = c.emp_id
      and e2.emp_id = e.emp_id
      and c.share_id = t.share_id )


Ross Leishman
Re: Minus to Not Exists [message #203942 is a reply to message #203939] Fri, 17 November 2006 00:36 Go to previous messageGo to next message
a_developer
Messages: 194
Registered: January 2006
Senior Member
Thanks, Ross. I got that but I am yet to verify if the result sets are the same. The table 'employee' is big. It has 110,000 records. Temp_emp has 9.5M and company has 22.3M. Hence, I prefer to read it only once (i think it's faster, is it??). Something like:

   select e.emp_id,
                      t.share_id
                from employee e,
                     temp_emp t
               where e.comp    = 10
                 and e.it_id   = t.it_id
                 and e.code_id = t.code_id
                 and not exists (select 1
                                   from company c
                                  where e.emp_id = c.emp_id
                                    and t.share_id = c.share_id)


It is still running (for the last one hour).. not sure though if this will return the same rows. But I have no doubt about your SQL.

[Updated on: Fri, 17 November 2006 00:40]

Report message to a moderator

Re: Minus to Not Exists [message #203946 is a reply to message #203942] Fri, 17 November 2006 00:42 Go to previous messageGo to next message
rleishman
Messages: 3724
Registered: October 2005
Location: Melbourne, Australia
Senior Member
Depends on your execution plan. I hope you have an index on company(emp_id, company).

Ross Leishman
Re: Minus to Not Exists [message #203953 is a reply to message #203946] Fri, 17 November 2006 01:16 Go to previous messageGo to next message
a_developer
Messages: 194
Registered: January 2006
Senior Member
Quote:
...an index on company(emp_id, company).



You mean company(emp_id, shared_id), right? Yes, I do, but not composite.
Re: Minus to Not Exists [message #204019 is a reply to message #203953] Fri, 17 November 2006 06:09 Go to previous messageGo to next message
rleishman
Messages: 3724
Registered: October 2005
Location: Melbourne, Australia
Senior Member
Well then, you have a problem.

Ross Leishman
Re: Minus to Not Exists [message #204097 is a reply to message #203911] Fri, 17 November 2006 15:43 Go to previous messageGo to next message
Bill B
Messages: 1484
Registered: December 2004
Senior Member
For something like this, the minus union will normally be much faster.
Re: Minus to Not Exists [message #204108 is a reply to message #204097] Fri, 17 November 2006 19:41 Go to previous messageGo to next message
rleishman
Messages: 3724
Registered: October 2005
Location: Melbourne, Australia
Senior Member
I dispute that. MINUS will perform a SORT on both data sets. A NOT EXISTS that uses a UNIQUE index and does not need to lookup the table will perform a signgle pass through the main table, and unique probes into a largely cached index. This will be noticebaly faster, especially in large data sets that cannot be sorted in memory.

MINUS will tend to perform better when the sub-query probe cannot be satisfied entirely by the columns in the index, or if it needs to range-scan multiple rows in the index.

Ross Leishman
Re: Minus to Not Exists [message #204621 is a reply to message #204108] Tue, 21 November 2006 07:16 Go to previous messageGo to next message
a_developer
Messages: 194
Registered: January 2006
Senior Member
Thanks for all your feedback. I would further appreciate if you can look into this.

On COMPANY, I have b-tree indexes on emp_id (indx_company_emp) and share_id (indx_company_share).
On EMPLOYEE, primary key is emp_id, composite index on comp, it_id and code_id (indx_cic)
On TEMP_EMP, b-tree indexes on it_id (indx_temp_it) and code_id (indx_temp_code)

Using NOT EXISTS:

        select e.emp_id, t.share_id
         from employee e,
              temp_emp t
        where e.comp    = 10
          and e.it_id   = t.it_id
          and e.code_id = t.code_id
          and not exists (select 1
                            from company c
                           where e.emp_id = c.emp_id
                             and t.share_id = c.share_id)

Plan Table
--------------------------------------------------------------------------------------
| Operation                 |  Name           |  Rows | Bytes|  Cost  | Pstart| Pstop |
--------------------------------------------------------------------------------------
| SELECT STATEMENT          |                 |     2K|   87K|   8262 |       |       |
|  FILTER                   |                 |       |      |        |       |       |
|   NESTED LOOPS            |                 |     2K|   87K|   3432 |       |       |
|    VIEW                   |index$_join$_004 |    16K|  345K|     58 |       |       |
|     HASH JOIN             |                 |    16K|  345K|        |       |       |
|      INDEX RANGE SCAN     |INDX_CIC         |    16K|  345K|     33 |       |       |
|      INDEX FAST FULL SCAN |XPKT_EMPLOYEE    |    16K|  345K|     33 |       |       |
|    TABLE ACCESS BY INDEX R|TEST_EMP         |     1 |   16 |   3432 |       |       |
|     AND-EQUAL             |                 |       |      |        |       |       |
|      INDEX RANGE SCAN     |INDX_TEMP_IT     |       |      |        |       |       |
|      INDEX RANGE SCAN     |INDX_TEMP_CODE   |    16K|      |        |       |       |
|   AND-EQUAL               |                 |     1 |   12 |        |       |       |
|    INDEX RANGE SCAN       |INDX_COMPANY_EMP |       |      |        |       |       |
|    INDEX RANGE SCAN       |INDX_COMPANY_SHA |   115 |      |      3 |       |       |
---------------------------------------------------------------------------------------

USING MINUS:

select e.emp_id, t.share_id
  from employee e, temp_emp t
 where e.comp    = 10
   and e.it_id   = t.it_id
   and e.code_id = t.code_id
minus
select e2.emp_id, c.share_id
  from employee e2, company  c
 where e2.comp   = 10
   and e2.emp_id = c.emp_id 

Plan Table
---------------------------------------------------------------------------------
| Operation                 |  Name           |  Rows | Bytes|  Cost  | Pstart| Pstop |
---------------------------------------------------------------------------------------
| SELECT STATEMENT          |                 |    48K|   99M|  28199 |       |       |
|  MINUS                    |                 |       |      |        |       |       |
|   SORT UNIQUE             |                 |    48K|    1M|   3598 |       |       |
|    NESTED LOOPS           |                 |    48K|    1M|   3432 |       |       |
|     VIEW                  |index$_join$_001 |    16K|  345K|     58 |       |       |
|      HASH JOIN            |                 |    16K|  345K|        |       |       |
|       INDEX RANGE SCAN    |INDX_CIC         |    16K|  345K|     33 |       |       |
|       INDEX FAST FULL SCAN|XPKT_EMPLOYEE    |    16K|  345K|     33 |       |       |
|     TABLE ACCESS BY INDEX |TEST_EMP         |     3 |   48 |   3432 |       |       |
|      AND-EQUAL            |                 |       |      |        |       |       |
|       INDEX RANGE SCAN    |INDX_TEMP_IT     |       |      |        |       |       |
|       INDEX RANGE SCAN    |INDX_TEMP_CODE   |    16K|      |        |       |       |
|   SORT UNIQUE             |                 |     4M|   97M|  24601 |       |       |
|    HASH JOIN              |                 |     4M|   97M|  13767 |       |       |
|     VIEW                  |index$_join$_003 |    16K|  164K|     51 |       |       |
|      HASH JOIN            |                 |    16K|  164K|        |       |       |
|       INDEX RANGE SCAN    |INDX_CIC         |    16K|  164K|     33 |       |       |
|       INDEX FAST FULL SCAN|XPKT_EMPLOYEE    |    16K|  164K|     33 |       |       |
|     TABLE ACCESS FULL     |EMPLOYEE         |    22M|  256M|  11184 |       |       |
---------------------------------------------------------------------------------------


Both are taking more than an hour. Ross, before I resort to creating a composite index on share_id and emp_id for COMPANY, is there a way to further improve my query?? I tried an OUTER JOIN but it became worse..

thanks in advance..
Re: Minus to Not Exists [message #204720 is a reply to message #204621] Tue, 21 November 2006 19:49 Go to previous messageGo to next message
rleishman
Messages: 3724
Registered: October 2005
Location: Melbourne, Australia
Senior Member
The AND-EQUAL step in your first plan is very telling. AND-EQUAL is an access path used only by the RULE BASED OPTIMIZER. The cases where it is useful are rare, and usually does more harm than good.

The reason it is being used is because you have not gathered statistics on all of your indexes (INDX_TEMP_IT and INDX_COMPANY_EMP) and possibly the tables as well. It could also be that you have the OPTIMIZER_MODE set to RULE.

Even if you do this, it may still perform badly without the index I mentioned earlier. Without the index, you could improve the performance with a
NOT IN (SELECT emp_id, share_id FROM ...)
rather than NOT EXISTS, as this should permit the use of a HASH Anti-Join, providing both emp_id and share_id are non-nullable in COMPANY.

Ross Leishman

[Updated on: Tue, 21 November 2006 19:51]

Report message to a moderator

Re: Minus to Not Exists [message #204773 is a reply to message #204720] Wed, 22 November 2006 01:33 Go to previous messageGo to next message
a_developer
Messages: 194
Registered: January 2006
Senior Member
Thanks, Ross.

Now, here's what's up. I checked my optimizer_mode and it's 'CHOOSE'. I again computed the stats for the TEMP_EMP and COMPANY tables where the AND-EQUAL was showing. Then ran explain plan. The explain plans for both are THE SAME as before. The AND-EQUAL is still there Sad


But surprisingly, the 'minus' ran for 6 minutes and the 'not exists' was still running past 30 mins.
6 minutes is already acceptable. But here's another puzzle. I have to insert the result set to COMPANY.

insert /*+ append nologging */
into COMPANY (company_id, emp_id, share_id)
  select seq_company_id.nextval, emp_id, share_id
    from (select e.emp_id, t.share_id
            from employee e, temp_emp t
           where e.comp = 10
             and e.it_id = t.it_id
             and e.code_id = t.code_id
          minus
          select e2.emp_id, c.share_id
            from employee e2, company c
           where e2.comp = 10
             and e2.emp_id = c.emp_id);


Now, this has taken more than 12 hours! I had to abort it. I am planning to try several things:
- disable indexes/constraints before insert and then enable them after insert
- put the result set of the query to a collection and then insert them into COMPANY

I don't know if the second one would help.. but I'm running out of options. Can you please advise?? thanks.
Re: Minus to Not Exists [message #204950 is a reply to message #204773] Wed, 22 November 2006 19:50 Go to previous messageGo to next message
rleishman
Messages: 3724
Registered: October 2005
Location: Melbourne, Australia
Senior Member
Did you compute stats on the indexes? Cos it didn't look like it on your old plan.

Just for a baseline, try CREATE TABLE new NOLOGGING AS SELECT ...

How long does that take? That is the best performance you can expect from INSERT /*+ append nologging*/.

Is that fast enough? If not, nothing you do will fix it (well, you could try parallel query, but thats another conversation).

Also, run an Explain Plan on the INSERT statement to make sure it actually using Direct Path load. If the table has triggers or foreign keys, it will ignore the APPEND hint.

Loading via PL/SQL will not be faster.

Ross Leishman
Re: Minus to Not Exists [message #205132 is a reply to message #204950] Thu, 23 November 2006 07:13 Go to previous messageGo to next message
a_developer
Messages: 194
Registered: January 2006
Senior Member
I recomputed the statistics. I used this for TEMP_EMP and COMPANY:
dbms_stats.gather_table_stats (ownname => 'DEV_OWNER',tabname => 'TEMP_EMP',estimate_percent => 20,method_opt => 'FOR ALL INDEXED COLUMNS SIZE AUTO',cascade => TRUE);


Using the CTAS (with nologging), it took an amazing 3 mins!!

The explain plan for Insert is:
Plan Table
-----------------------------------------------------------------------------------
| Operation                 |  Name        |  Rows | Bytes|  Cost  | Pstart| Pstop |
------------------------------------------------------------------------------------
| INSERT STATEMENT          |              |     5K|  127K|  29836 |       |       |
|  SEQUENCE                 |SEQ_COMPANY_I |       |      |        |       |       |
|   VIEW                    |              |     5K|  127K|  29836 |       |       |
|    MINUS                  |              |       |      |        |       |       |
|     SORT UNIQUE           |              |     5K|  181K|        |       |       |
|      HASH JOIN            |              |     5K|  181K|   3985 |       |       |
|       VIEW                |index$_join$_ |    16K|  345K|     58 |       |       |
|        HASH JOIN          |              |    16K|  345K|        |       |       |
|         INDEX RANGE SCAN  |INDX_CIC      |    16K|  345K|     33 |       |       |
|         INDEX FAST FULL SC|XPKT_EMPLOYEE |    16K|  345K|     33 |       |       |
|       TABLE ACCESS FULL   |TEMP_EMP      |     9M|  142M|   2105 |       |       |
|     SORT UNIQUE           |              |     4M|   77M|        |       |       |
|      HASH JOIN            |              |     4M|   77M|  16804 |       |       |
|       VIEW                |index$_join$_ |    16K|  164K|     51 |       |       |
|        HASH JOIN          |              |    16K|  164K|        |       |       |
|         INDEX RANGE SCAN  |INDX_CIC      |    16K|  164K|     33 |       |       |
|         INDEX FAST FULL SC|XPKT_EMPLOYEE |    16K|  164K|     33 |       |       |
|       TABLE ACCESS FULL   |COMPANY       |    22M|  174M|  14600 |       |       |
------------------------------------------------------------------------------------


Is this using a 'Direct-Path' load? How do I know that??
Quote:
If the table has triggers or foreign keys, it will ignore the APPEND hint.

Never knew this before..thanks! But that's bad news for me Sad ...my tables have foreign keys - no triggers though.

I'm thinking it may be something with the memory, like rollback segments?? since I am inserting 8M rows in one shot. But weird, there's no error that pops up..

Re: Minus to Not Exists [message #205199 is a reply to message #205132] Thu, 23 November 2006 19:51 Go to previous messageGo to next message
rleishman
Messages: 3724
Registered: October 2005
Location: Melbourne, Australia
Senior Member
Now that the indexes are analyzed, the AND-EQUAL has disappeared in favour of index-joins. These are a bit more trustworthy.

So, 3 mins is the benchmark - you will not get faster than that without parallel query, and even then you won't get musch faster.

Things that could cause it to take longer than 3 minutes:
- Foreign keys. Not only does it check the foreign keys for each row inserted, they stop it using Direct Path. You can see this in the Plan - the INSERT STATEMENT would read LOAD AS SELECT.
- Indexes. These are maintained on the fly even if you use Direct Path. Even if you specify NOLOGGING, index mods will produce both REDO (Archive Logs) and UNDO (Rollback Segments). This will significantly affect your IO.

Try disabling FKs and invalidating indexes, and then rebuilding them after the INSERT.

Ross Leishman

Re: Minus to Not Exists [message #207070 is a reply to message #205199] Mon, 04 December 2006 01:26 Go to previous message
a_developer
Messages: 194
Registered: January 2006
Senior Member
Quote:
Try disabling FKs and invalidating indexes, and then rebuilding them after the INSERT.

Thanks,Ross!! This solves it. It took 12 mins to insert and 15 mins to enable FKs and rebuild indexes.
Previous Topic: outer joins
Next Topic: select random numbers /
Goto Forum:
  


Current Time: Fri Dec 09 11:57:35 CST 2016

Total time taken to generate the page: 0.10249 seconds