Completely insane execution plan with 11.2.0.3 but not with 11.2.0.2

From: Thomas Kellerer <thomas.kellerer_at_mgm-tp.com>
Date: Mon, 23 Jul 2012 17:01:59 +0200
Message-ID: <500D6767.30307_at_mgm-tp.com>



Hello,

(I have also posted this to forums.oracle.com in case this sounds familiar to someone).

we have a production server that is running Oracle 11.2.0.2 on Windows and a Development/Test server that is running 11.2.0.3 on Linux.

We have one statement that selects from a rather large view (that statement does a simple select * from viewname). The execution plan shows a cost of ~20k and a row estimate of ~800

On our test server which has similar data the optimizer completely mis-estimating the cost for one join which results in totally insane estimates higher up the chain. All other row estimates are pretty close to reality.

After testing several things we discovered that setting optimizer_features_enable = '11.2.0.2' things are fine on our test server.

The bad plan (optimizer_features_enable = '11.2.0.3')



| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |


| 0 | SELECT STATEMENT | | 39P| 7042P| | 14T(100)|999:59:59 |
| 1 | VIEW | XV_POSITION | 39P| 7042P| | 14T(100)|999:59:59 |
| 2 | WINDOW SORT | | 39P| 7750P| 8784P| 14T(100)|999:59:59 |
|*  3 |    VIEW                             |                      |    39P|  7750P|       |  9614G(100)|999:59:59 |

| 4 | WINDOW SORT | | 39P| 3538P| 4026P| 9614G(100)|999:59:59 |
| 5 | WINDOW SORT | | 39P| 3538P| 4026P| 9614G(100)|999:59:59 |
|* 6 | HASH JOIN RIGHT OUTER | | 39P| 3538P| 10G| 189G(100)|999:59:59 |
| 7 | VIEW | V_POS_GUELTIGKEIT | 199M| 8377M| | 43M (1)|146:17:54 |
| 8 | HASH GROUP BY | | 199M| 121G| 126G| 43M (1)|146:17:54 |
|* 9 | HASH JOIN RIGHT OUTER | | 199M| 121G| | 6573K (2)| 21:54:40 |
| 10 | TABLE ACCESS FULL | VORGANG | 7698 | 135K| | 32 (0)| 00:00:01 |
|* 11 | HASH JOIN RIGHT OUTER | | 199M| 118G| | 6572K (2)| 21:54:28 |
| 12 | VIEW | V_ALLE_POSITIONEN | 25974 | 634K| | 6006 (2)| 00:01:13 |
.......
| 594 | NESTED LOOPS | | 199M| 113G| | 6564K (2)| 21:52:58 |
|*595 | HASH JOIN RIGHT OUTER | | 7686 | 315K| | 303 (1)| 00:00:04 |
| 596 | TABLE ACCESS FULL | VORGANG | 7698 | 135K| | 32 (0)| 00:00:01 |
| 597 | VIEW | | 7686 | 180K| | 270 (1)| 00:00:04 |
| 598 | UNION-ALL | | | | | | |
|*599 | TABLE ACCESS FULL | VERTRAGSKOPF | 1089 | 30492 | | 37 (0)| 00:00:01 |
.......

The good plan: (optimizer_features_enable = '11.2.0.2')



| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |


| 0 | SELECT STATEMENT | | 296 | 58904 | 20370 (2)| 00:04:05 |
| 1 | VIEW | XV_POSITION | 296 | 58904 | 20370 (2)| 00:04:05 |
| 2 | WINDOW SORT | | 296 | 64824 | 20370 (2)| 00:04:05 |
|*  3 |    VIEW                             |                     |   296 | 64824 | 20369   (2)| 00:04:05 |

| 4 | WINDOW SORT | | 296 | 34928 | 20369 (2)| 00:04:05 |
| 5 | WINDOW SORT | | 296 | 34928 | 20369 (2)| 00:04:05 |
|* 6 | HASH JOIN RIGHT OUTER | | 296 | 34928 | 20367 (2)| 00:04:05 |
| 7 | VIEW | V_POS_GUELTIGKEIT | 296 | 13024 | 13983 (2)| 00:02:48 |
| 8 | HASH GROUP BY | | 296 | 254K| 13983 (2)| 00:02:48 |
|* 9 | HASH JOIN RIGHT OUTER | | 296 | 254K| 13982 (2)| 00:02:48 |
| 10 | TABLE ACCESS FULL | VORGANG | 7698 | 135K| 32 (0)| 00:00:01 |
|* 11 | HASH JOIN OUTER | | 296 | 249K| 13949 (2)| 00:02:48 | |* 12 | HASH JOIN RIGHT OUTER | | 296 | 241K| 7943 (2)| 00:01:36 |
| 13 | TABLE ACCESS FULL | VORGANG | 7698 | 135K| 32 (0)| 00:00:01 |
|* 14 | HASH JOIN | | 296 | 236K| 7910 (2)| 00:01:35 | .................

Looking at the trace file I could verify that it really does a lot of reads when using the bad plan:

The 11.2.0.3 optimizer decides to use a nested loop (line 594 in the bad plan) instead of the hash join (line 14 in the good plan). The question is now, why is it doing that?

On OTN I was pointed to v$system_fix_control and I tried disabling the following "features" (as shown here http://jonathanlewis.wordpress.com/2009/12/22/optimizer-features/)

   9814067 Sanity check when estimating range inner join   11830663 disallow HASH GROUP BY for subquery (in SELECT) processing    9980661 sanity check when estimating range inner join cardinality   10038373 Fix usage of transitive join predicate list in subquery pruning   11699884 fix selectivity of Table Lookup By NL view and dimensions   12410972 push predicate with NLS_SORT in window function    8683604 allow NLJ if there is no sub-query pruning (absolute dimension)

but none of them changed anything.

A quick search on Metalink did not turn up anything useful either.

How can I find out which feature exactly causes this behaviour?

For the time being it's absolutely OK for us to set optimizer_features_enable='11.2.0.2' but I'm curious which "pattern" triggers this behaviour. If I can pinpoint that, I'm inclined to open a SR with Oracle.

Thanks
Thomas

--
http://www.freelists.org/webpage/oracle-l
Received on Mon Jul 23 2012 - 10:01:59 CDT

Original text of this message