Re: Bad Execution Plan with Left Join and predicates in both tables

From: Thomas Aregger <thomas.aregger_at_gmail.com>
Date: Wed, 12 Oct 2016 11:15:18 +0200
Message-ID: <CAOJe5Ki6piLqGNqvE-xQWsvJ1DktY4uXifQd4A+jKRi3SmmvRg_at_mail.gmail.com>



I'm probably a bit dull but I still don't get the difference. Could you please help me along in which situations the two queries would return different results (see testcase below)?

I'm a bit confused because in the second variation of the query, the first select uses a left join and the second select uses a simple join.

The desired result is:

  • Customers with LAST_ORDER = x and no detail
  • AND Customers with LAST_ORDER = x and detail
  • AND Customers with detail LAST_ORDER = X

I did a small testcase:

create table customer (id number(10), last_order number(10), status number(10));
create table customer_detail (id number(10), last_order number(10));

create index i_custdet_id on customer_detail(id);
create index i_custdet_last_order on customer_detail(last_order);
create index i_cust_last_order on customer(last_order);
create index i_cust_id on customer(id);

insert into customer values (1, 10, 3);

insert into customer values (2, 10, 3);
insert into customer_detail values (2, 10);

insert into customer values (3, 11, 3);
insert into customer_detail values (3, 10);

insert into customer values (4, 11, 3);
insert into customer_detail values (4, 11);

insert into customer values (5, 11, 3);

insert into customer values (6, 11, 3);
insert into customer_detail values (6, 10);

insert into customer_detail values (7, 10);

With this data I executed both versions of the query (with last_order = 10 and status <> 1) and got the same results.

SELECT *
  FROM CUSTOMER C
  LEFT JOIN CUSTOMER_DETAIL CD
    ON C.ID <http://c.id/> = CD.ID <http://cd.id/>  WHERE (C.LAST_ORDER = 10 OR CD.LAST_ORDER = 10)    AND C.STATUS <> 1
order by 1
/

SELECT *
  FROM CUSTOMER C
  LEFT JOIN CUSTOMER_DETAIL CD
    ON C.ID <http://c.id/> = CD.ID <http://cd.id/>  WHERE (C.LAST_ORDER = 10)
   AND C.STATUS <> 1
UNION
SELECT *
  FROM CUSTOMER C
  JOIN CUSTOMER_DETAIL CD
    ON C.ID <http://c.id/> = CD.ID <http://cd.id/>  WHERE (CD.LAST_ORDER = 10)
   AND C.STATUS <> 1
order by 1
/

_at_Jonathan: I did not specifiy any hint.

2016-10-12 9:33 GMT+02:00 Jonathan Lewis <jonathan_at_jlcomp.demon.co.uk>:

>
> Further to Lothar's post.
>
> The left outer join preserves rows from the left table even if there is no
> match in the right table.
>
> i.e. your ON clause says you should see a customer even if there is no
> matching customer_detail, it does NOT say that you should see a row if
> there is a customer detail for which there is no customer, so the optimizer
> will not consider looking for customer_detail.last_order = 443867 if is
> hasn't already appeared as it joined every customer to its matching
> customer_detail.
>
>
> I am a little puzzled by the fact that the concatenation plan you showed
> us even appeared - did you have to supply the use_concat hint to get this
> plan ? It doesn't look to me as if it should have appeared unhinted ?
>
>
>
> Regards
> Jonathan Lewis
> http://jonathanlewis.wordpress.com
> _at_jloracle
> ------------------------------
> *From:* oracle-l-bounce_at_freelists.org [oracle-l-bounce_at_freelists.org] on
> behalf of Lothar Flatz [l.flatz_at_bluewin.ch]
> *Sent:* 12 October 2016 05:21
> *To:* thomas.aregger_at_gmail.com; oracle-l_at_freelists.org
> *Subject:* Re: Bad Execution Plan with Left Join and predicates in both
> tables
>
> Hi Thomas,
>
> I am afraid the two queries are not identical.
> A left join means that for any customer i will have a result, wether or
> not a detail exists.
> For that reason the plan needs to start with customer. An outer join
> always determines a direction.
> Think about what your first query means:
> I want to see *all *customers that have an last order of x or a detail
> with a last order of x. Which is kind of a contradiction. You don't want to
> see all customers.
> Your second means:
> I want to see customers that have an last order of x or a detail with a
> last order of x.
> That is clearer and probably reflect what you really want.
>
> I accept that the difference is subtle and I hope I could make it
> understandable enough.
> It helps to look at it from the point of three valued logic.
> An outer join to customer detail, if the detail does not exists would
> evaluate to null for cd.last_order. Null = 443867 does not evaluate to
> false but to null, which is just treated as false.
> A small but important difference.
>
> Bottom line:
> You should probably stick with the second variation of your query.
>
> Regards
>
> Lothar
>
>
> On 12.10.2016 00:29, Thomas Aregger wrote:
>
> Hello
>
> I have a simple query (running on Oracle 12.1.0.2) with a left join and a
> where clause checking that either a field in the left table is set or a
> field in the right table is set.
>
> That's how the query looks like:
>
> SELECT *
> FROM CUSTOMER C
> LEFT JOIN CUSTOMER_DETAIL CD
> ON C.ID = CD.ID
> WHERE (C.LAST_ORDER = 443867 OR CD.LAST_ORDER = 443867)
> AND C.STATUS <> 1
> /
>
> For some reason Oracle is not able to avoid a full table scan on CUSTOMER,
> which can be seen in the following execution plan (line 4):
>
> ------------------------------------------------------------
> ------------------------------------------------------------
> ----------------
> | Id | Operation | Name |
> Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers |
> ------------------------------------------------------------
> ------------------------------------------------------------
> ----------------
> | 0 | SELECT STATEMENT | |
> 1 | | 4780 (100)| 1 |00:00:01.10 | 33302 |
> | 1 | CONCATENATION | |
> 1 | | | 1 |00:00:01.10 | 33302 |
> |* 2 | FILTER | |
> 1 | | | 0 |00:00:01.10 | 33295 |
> | 3 | NESTED LOOPS OUTER | |
> 1 | 253K| 4778 (1)| 255K|00:00:01.06 | 33295 |
> |* 4 | TABLE ACCESS FULL | CUSTOMER |
> 1 | 253K| 4776 (1)| 255K|00:00:00.32 | 16692 |
> | 5 | TABLE ACCESS BY INDEX ROWID BATCHED| CUSTOMER_DETAIL |
> 255K| 1 | 1 (0)| 3481 |00:00:00.53 | 16603 |
> |* 6 | INDEX RANGE SCAN | I_CUSTDET_ID |
> 255K| 1 | 1 (0)| 3481 |00:00:00.34 | 15318 |
> |* 7 | FILTER | |
> 1 | | | 1 |00:00:00.01 | 7 |
> | 8 | NESTED LOOPS OUTER | |
> 1 | 1 | 2 (0)| 1 |00:00:00.01 | 7 |
> |* 9 | TABLE ACCESS BY INDEX ROWID BATCHED| CUSTOMER |
> 1 | 1 | 1 (0)| 1 |00:00:00.01 | 5 |
> |* 10 | INDEX RANGE SCAN | I_CUST_LAST_ORDER |
> 1 | 1 | 1 (0)| 1 |00:00:00.01 | 4 |
> | 11 | TABLE ACCESS BY INDEX ROWID BATCHED| CUSTOMER_DETAIL |
> 1 | 1 | 1 (0)| 0 |00:00:00.01 | 2 |
> |* 12 | INDEX RANGE SCAN | I_CUSTDET_ID |
> 1 | 1 | 1 (0)| 0 |00:00:00.01 | 2 |
> ------------------------------------------------------------
> ------------------------------------------------------------
> ----------------
>
> Predicate Information (identified by operation id):
> ---------------------------------------------------
>
> 2 - filter("CD"."LAST_ORDER"=443867)
> 4 - filter("C"."STATUS"<>3)
> 6 - access("C"."ID"="CD"."ID")
> 7 - filter(LNNVL("CD"."LAST_ORDER"=443867))
> 9 - filter("C"."STATUS"<>3)
> 10 - access("C"."LAST_ORDER"=443867)
> 12 - access("C"."ID"="CD"."ID")
>
>
> In the first half of the plan (under the first FILTER operation) I would
> expect Oracle to use the index on CUSTOMER_DETAIL.LAST_ORDER to access
> CUSTOMER_DETAIL and then join to the CUSTOMER table. For some reason Oracle
> is not doing that.
>
> I rewrote the query a little bit:
>
> SELECT *
> FROM CUSTOMER C
> LEFT JOIN CUSTOMER_DETAIL CD
> ON C.ID = CD.ID
> WHERE (C.LAST_ORDER = 443867)
> AND C.STATUS <> 3
> UNION
> SELECT *
> FROM CUSTOMER C
> JOIN CUSTOMER_DETAIL CD
> ON C.ID = CD.ID
> WHERE (CD.LAST_ORDER = 443867)
> AND C.STATUS <> 3
> /
>
>
> The rewritten query leads to the following execution plan:
>
> ------------------------------------------------------------
> ------------------------------------------------------------
> ------------------
> | Id | Operation | Name
> | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers |
> ------------------------------------------------------------
> ------------------------------------------------------------
> ------------------
> | 0 | SELECT STATEMENT |
> | 1 | | 6 (100)| 1 |00:00:00.01 | 8 |
> | 1 | SORT UNIQUE |
> | 1 | 2 | 6 (34)| 1 |00:00:00.01 | 8 |
> | 2 | UNION-ALL |
> | 1 | | | 1 |00:00:00.01 | 8 |
> | 3 | NESTED LOOPS OUTER |
> | 1 | 1 | 2 (0)| 1 |00:00:00.01 | 6 |
> |* 4 | TABLE ACCESS BY INDEX ROWID BATCHED | CUSTOMER
> | 1 | 1 | 1 (0)| 1 |00:00:00.01 | 4 |
> |* 5 | INDEX RANGE SCAN | I_CUST_LAST_ORDER
> | 1 | 1 | 1 (0)| 1 |00:00:00.01 | 3 |
> | 6 | TABLE ACCESS BY INDEX ROWID BATCHED | CUSTOMER_DETAIL
> | 1 | 1 | 1 (0)| 0 |00:00:00.01 | 2 |
> |* 7 | INDEX RANGE SCAN | I_CUSTDET_ID
> | 1 | 1 | 1 (0)| 0 |00:00:00.01 | 2 |
> | 8 | NESTED LOOPS |
> | 1 | 1 | 2 (0)| 0 |00:00:00.01 | 2 |
> | 9 | NESTED LOOPS |
> | 1 | 1 | 2 (0)| 0 |00:00:00.01 | 2 |
> | 10 | TABLE ACCESS BY INDEX ROWID BATCHED| CUSTOMER_DETAIL
> | 1 | 1 | 1 (0)| 0 |00:00:00.01 | 2 |
> |* 11 | INDEX RANGE SCAN | I_CUSTDET_LAST_ORDER
> | 1 | 1 | 1 (0)| 0 |00:00:00.01 | 2 |
> |* 12 | INDEX UNIQUE SCAN | I_CUST_ID
> | 0 | 1 | 1 (0)| 0 |00:00:00.01 | 0 |
> |* 13 | TABLE ACCESS BY INDEX ROWID | CUSTOMER
> | 0 | 1 | 1 (0)| 0 |00:00:00.01 | 0 |
> ------------------------------------------------------------
> ------------------------------------------------------------
> ------------------
>
>
> Predicate Information (identified by operation id):
> ---------------------------------------------------
>
> 4 - filter("C"."STATUS"<>3)
> 5 - access("C"."LAST_ORDER"=443867)
> 7 - access("C"."ID"="CD"."ID")
> 11 - access("CD"."LAST_ORDER"=443867)
> 12 - access("C"."ID"="CD"."ID")
> 13 - filter("C"."STATUS"<>3)
>
>
> The second plan is obviously a lot more efficient. Am I missing something,
> or is it just an opimizer limitation in the first version of the query?
>
> Regards
> Thomas
>
>
>
> --
>
>
>
>
>
>

--
http://www.freelists.org/webpage/oracle-l
Received on Wed Oct 12 2016 - 11:15:18 CEST

Original text of this message