RE: Bad Execution Plan with Left Join and predicates in both tables

From: Jonathan Lewis <jonathan_at_jlcomp.demon.co.uk>
Date: Wed, 12 Oct 2016 09:45:03 +0000
Message-ID: <CE70217733273F49A8A162EE074F64D90150343452_at_exmbx05.thus.corp>



Thomas,

I was trying to explain how the original outer join form of the query was constraining the way Oracle handled the execution of the query and the necessity for a rewrite to get to the solution you wanted. I wasn't making any comment about your actual rewrite. In principle the optimizer could be enhanced to produce a suitable concatenation rewrite - and possibly there are already some circumstances (depending on uniqueness, declaration of not null columns, foreign key constraints etc.) where the perfect rewrite would take place.

Your rewrite, below, however, is not generic - it gets the right answer "by accident" - and the optimizer needs to be able to produce generic rewrites, which is why it sometimes doesn't do things that are "obvious" to the human eye.

You have a UNION in your rewrite - if your select was capable of returning two rows from the join that LOOKED identical but were sourced from different initial customer/customer_detail rows then your query would (I presume) have returned the wrong results.

When you turn a query with OR into a concatenation you need to do a UNION ALL and then eliminate the rows that have already appeared in the earlier parts of the concatenation. The following blog post describes the issue (though in a different context): http://jonathanlewis.wordpress.com/2007/02/26/subquery-with-or/

For example;
If you insert (2,11) into customer_detail and run your UNION query you will get 5 rows returned If you change your UNION query to select just c.id, c.last_order you will get 4 rows That means there's something wrong with your GENERAL strategy.

The optimizer needs to find a rewrite which will produce the same number of rows from the original outer join in both cases.

Regards
Jonathan Lewis
http://jonathanlewis.wordpress.com
_at_jloracle



From: Thomas Aregger [thomas.aregger_at_gmail.com] Sent: 12 October 2016 10:15
To: Jonathan Lewis
Cc: oracle-l_at_freelists.org
Subject: Re: Bad Execution Plan with Left Join and predicates in both tables

I'm probably a bit dull but I still don't get the difference. Could you please help me along in which situations the two queries would return different results (see testcase below)?

I'm a bit confused because in the second variation of the query, the first select uses a left join and the second select uses a simple join.

The desired result is:

  • Customers with LAST_ORDER = x and no detail
  • AND Customers with LAST_ORDER = x and detail
  • AND Customers with detail LAST_ORDER = X

I did a small testcase:

create table customer (id number(10), last_order number(10), status number(10)); create table customer_detail (id number(10), last_order number(10));

create index i_custdet_id on customer_detail(id);
create index i_custdet_last_order on customer_detail(last_order);
create index i_cust_last_order on customer(last_order);
create index i_cust_id on customer(id);

insert into customer values (1, 10, 3);

insert into customer values (2, 10, 3);
insert into customer_detail values (2, 10);

insert into customer values (3, 11, 3);
insert into customer_detail values (3, 10);

insert into customer values (4, 11, 3);
insert into customer_detail values (4, 11);

insert into customer values (5, 11, 3);

insert into customer values (6, 11, 3);
insert into customer_detail values (6, 10);

insert into customer_detail values (7, 10);

With this data I executed both versions of the query (with last_order = 10 and status <> 1) and got the same results.

SELECT *
  FROM CUSTOMER C
  LEFT JOIN CUSTOMER_DETAIL CD
    ON C.ID<http://c.id/> = CD.ID<http://cd.id/>  WHERE (C.LAST_ORDER = 10 OR CD.LAST_ORDER = 10)    AND C.STATUS <> 1
order by 1
/

SELECT *
  FROM CUSTOMER C
  LEFT JOIN CUSTOMER_DETAIL CD
    ON C.ID<http://c.id/> = CD.ID<http://cd.id/>  WHERE (C.LAST_ORDER = 10)
   AND C.STATUS <> 1
UNION
SELECT *
  FROM CUSTOMER C
  JOIN CUSTOMER_DETAIL CD
    ON C.ID<http://c.id/> = CD.ID<http://cd.id/>  WHERE (CD.LAST_ORDER = 10)
   AND C.STATUS <> 1
order by 1
/

_at_Jonathan: I did not specifiy any hint.

2016-10-12 9:33 GMT+02:00 Jonathan Lewis <jonathan_at_jlcomp.demon.co.uk<mailto:jonathan_at_jlcomp.demon.co.uk>>:

Further to Lothar's post.

The left outer join preserves rows from the left table even if there is no match in the right table.

i.e. your ON clause says you should see a customer even if there is no matching customer_detail, it does NOT say that you should see a row if there is a customer detail for which there is no customer, so the optimizer will not consider looking for customer_detail.last_order = 443867 if is hasn't already appeared as it joined every customer to its matching customer_detail.

I am a little puzzled by the fact that the concatenation plan you showed us even appeared - did you have to supply the use_concat hint to get this plan ? It doesn't look to me as if it should have appeared unhinted ?

Regards
Jonathan Lewis
http://jonathanlewis.wordpress.com
_at_jloracle



From: oracle-l-bounce_at_freelists.org<mailto:oracle-l-bounce_at_freelists.org> [oracle-l-bounce_at_freelists.org<mailto:oracle-l-bounce_at_freelists.org>] on behalf of Lothar Flatz [l.flatz_at_bluewin.ch<mailto:l.flatz_at_bluewin.ch>] Sent: 12 October 2016 05:21
To: thomas.aregger_at_gmail.com<mailto:thomas.aregger_at_gmail.com>; oracle-l_at_freelists.org<mailto:oracle-l_at_freelists.org> Subject: Re: Bad Execution Plan with Left Join and predicates in both tables

Hi Thomas,

I am afraid the two queries are not identical. A left join means that for any customer i will have a result, wether or not a detail exists. For that reason the plan needs to start with customer. An outer join always determines a direction. Think about what your first query means: I want to see all customers that have an last order of x or a detail with a last order of x. Which is kind of a contradiction. You don't want to see all customers. Your second means:
I want to see customers that have an last order of x or a detail with a last order of x. That is clearer and probably reflect what you really want.

I accept that the difference is subtle and I hope I could make it understandable enough. It helps to look at it from the point of three valued logic. An outer join to customer detail, if the detail does not exists would evaluate to null for cd.last_order. Null = 443867 does not evaluate to false but to null, which is just treated as false. A small but important difference.

Bottom line:
You should probably stick with the second variation of your query.

Regards

Lothar

On 12.10.2016 00:29, Thomas Aregger wrote: Hello

I have a simple query (running on Oracle 12.1.0.2) with a left join and a where clause checking that either a field in the left table is set or a field in the right table is set.

That's how the query looks like:

SELECT *
  FROM CUSTOMER C
  LEFT JOIN CUSTOMER_DETAIL CD
    ON C.ID<http://C.ID> = CD.ID<http://CD.ID>  WHERE (C.LAST_ORDER = 443867 OR CD.LAST_ORDER = 443867)    AND C.STATUS <> 1
/

For some reason Oracle is not able to avoid a full table scan on CUSTOMER, which can be seen in the following execution plan (line 4):



| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers |


| 0 | SELECT STATEMENT | | 1 | | 4780 (100)| 1 |00:00:01.10 | 33302 |
| 1 | CONCATENATION | | 1 | | | 1 |00:00:01.10 | 33302 |
|*  2 |   FILTER                               |                        |      1 |        |            |      0 |00:00:01.10 |   33295 |

| 3 | NESTED LOOPS OUTER | | 1 | 253K| 4778 (1)| 255K|00:00:01.06 | 33295 |
|* 4 | TABLE ACCESS FULL | CUSTOMER | 1 | 253K| 4776 (1)| 255K|00:00:00.32 | 16692 |
| 5 | TABLE ACCESS BY INDEX ROWID BATCHED| CUSTOMER_DETAIL | 255K| 1 | 1 (0)| 3481 |00:00:00.53 | 16603 |
|* 6 | INDEX RANGE SCAN | I_CUSTDET_ID | 255K| 1 | 1 (0)| 3481 |00:00:00.34 | 15318 | |* 7 | FILTER | | 1 | | | 1 |00:00:00.01 | 7 |
| 8 | NESTED LOOPS OUTER | | 1 | 1 | 2 (0)| 1 |00:00:00.01 | 7 |
|* 9 | TABLE ACCESS BY INDEX ROWID BATCHED| CUSTOMER | 1 | 1 | 1 (0)| 1 |00:00:00.01 | 5 | |* 10 | INDEX RANGE SCAN | I_CUST_LAST_ORDER | 1 | 1 | 1 (0)| 1 |00:00:00.01 | 4 |
| 11 | TABLE ACCESS BY INDEX ROWID BATCHED| CUSTOMER_DETAIL | 1 | 1 | 1 (0)| 0 |00:00:00.01 | 2 |
|* 12 | INDEX RANGE SCAN | I_CUSTDET_ID | 1 | 1 | 1 (0)| 0 |00:00:00.01 | 2 | ----------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):


   2 - filter("CD"."LAST_ORDER"=443867)
   4 - filter("C"."STATUS"<>3)
   6 - access("C"."ID"="CD"."ID")
   7 - filter(LNNVL("CD"."LAST_ORDER"=443867))
   9 - filter("C"."STATUS"<>3)
  10 - access("C"."LAST_ORDER"=443867)
  12 - access("C"."ID"="CD"."ID")


In the first half of the plan (under the first FILTER operation) I would expect Oracle to use the index on CUSTOMER_DETAIL.LAST_ORDER to access CUSTOMER_DETAIL and then join to the CUSTOMER table. For some reason Oracle is not doing that.

I rewrote the query a little bit:

SELECT *
  FROM CUSTOMER C
  LEFT JOIN CUSTOMER_DETAIL CD
    ON C.ID<http://C.ID> = CD.ID<http://CD.ID>  WHERE (C.LAST_ORDER = 443867)
   AND C.STATUS <> 3
UNION
SELECT *
  FROM CUSTOMER C
  JOIN CUSTOMER_DETAIL CD
    ON C.ID<http://C.ID> = CD.ID<http://CD.ID>  WHERE (CD.LAST_ORDER = 443867)
   AND C.STATUS <> 3
/

The rewritten query leads to the following execution plan:



| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers |


| 0 | SELECT STATEMENT | | 1 | | 6 (100)| 1 |00:00:00.01 | 8 |
| 1 | SORT UNIQUE | | 1 | 2 | 6 (34)| 1 |00:00:00.01 | 8 |
| 2 | UNION-ALL | | 1 | | | 1 |00:00:00.01 | 8 |
| 3 | NESTED LOOPS OUTER | | 1 | 1 | 2 (0)| 1 |00:00:00.01 | 6 |
|*  4 |     TABLE ACCESS BY INDEX ROWID BATCHED | CUSTOMER                |      1 |      1 |     1   (0)|      1 |00:00:00.01 |       4 |
|*  5 |      INDEX RANGE SCAN                   | I_CUST_LAST_ORDER       |      1 |      1 |     1   (0)|      1 |00:00:00.01 |       3 |

| 6 | TABLE ACCESS BY INDEX ROWID BATCHED | CUSTOMER_DETAIL | 1 | 1 | 1 (0)| 0 |00:00:00.01 | 2 |
|* 7 | INDEX RANGE SCAN | I_CUSTDET_ID | 1 | 1 | 1 (0)| 0 |00:00:00.01 | 2 |
| 8 | NESTED LOOPS | | 1 | 1 | 2 (0)| 0 |00:00:00.01 | 2 |
| 9 | NESTED LOOPS | | 1 | 1 | 2 (0)| 0 |00:00:00.01 | 2 |
| 10 | TABLE ACCESS BY INDEX ROWID BATCHED| CUSTOMER_DETAIL | 1 | 1 | 1 (0)| 0 |00:00:00.01 | 2 |
|* 11 | INDEX RANGE SCAN | I_CUSTDET_LAST_ORDER | 1 | 1 | 1 (0)| 0 |00:00:00.01 | 2 | |* 12 | INDEX UNIQUE SCAN | I_CUST_ID | 0 | 1 | 1 (0)| 0 |00:00:00.01 | 0 | |* 13 | TABLE ACCESS BY INDEX ROWID | CUSTOMER | 0 | 1 | 1 (0)| 0 |00:00:00.01 | 0 | ------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):


   4 - filter("C"."STATUS"<>3)
   5 - access("C"."LAST_ORDER"=443867)
   7 - access("C"."ID"="CD"."ID")
  11 - access("CD"."LAST_ORDER"=443867)
  12 - access("C"."ID"="CD"."ID")
  13 - filter("C"."STATUS"<>3)


The second plan is obviously a lot more efficient. Am I missing something, or is it just an opimizer limitation in the first version of the query?

Regards
Thomas

--








--
http://www.freelists.org/webpage/oracle-l
Received on Wed Oct 12 2016 - 11:45:03 CEST

Original text of this message