Skip navigation.

Jonathan Lewis

Syndicate content Oracle Scratchpad
Just another Oracle weblog
Updated: 10 hours 2 min ago

Execution Plans

Wed, 2014-12-17 13:31

This is the index to a series of articles I’ve been writing for redgate, published on their AllThingsOracle site, about generating and reading execution plans. I’ve completed a few articles that haven’t yet been published, but I’ll add their URLs when they’re available.

I don’t really know how many parts it’s going to end up as – there’s an awful lot that that you could say about reading execution plans, even when you’re trying to cover just the basics; every time I’ve started writing an episode in the series it’s turned into two episodes.  I’ve delivered 10 parts to redgate so far; the active URLs below are the ones that they are currently online.

Chapter 11 is about to be published, so I’ve popped this catalogue to the top of the stack.  Episode 12 is written, but waiting for its final proof read.

 


push_pred – evolution

Fri, 2014-12-12 08:22

Here’s a query (with a few hints to control how I want Oracle to run it) that demonstrates the difficulty of trying to solve problems by hinting (and the need to make sure you know where all your hinted code is):


select
	/*+
		qb_name(main)
		leading (@main t1@main v1@main t4@main)
		push_pred(v1@main)
	*/
	t1.*,v1.*,t4.*
from
	t1,
	(
	select	/*+ qb_name(inline) no_merge */
		t2.n1, t3.n2, count(*)
	from	t2, t3
	where exists (
		select	/*+ qb_name(subq) no_unnest push_subq */
			null
		from	t5
		where	t5.object_id = t2.n1
		)
	and	t3.n1 = t2.n2
	group by t2.n1, t3.n2
	)	v1,
	t4
where
	v1.n1 = t1.n1
and	t4.n1(+) = v1.n1
;

Nominally it’s a three-table join, except the second table is an in-line view which joins two tables and includes an existence subquery. Temporarily I have made the join to t4 an outer join – but that’s just to allow me to make a point, I don’t want an outer join in the final query. I’ve had to include the no_merge() hint in the inline view to stop Oracle using complex view merging to “join then aggregate” when I want it to “aggregate then join”; I’ve included the no_unnest and push_subq hints to make sure that the subquery is operated as a subquery, but operates at the earliest possible moment in the inline view. Ignoring the outer join (which would make operation 1 a nested loop outer), this is the execution plan I want to see:


-------------------------------------------------------------------------------------------
| Id  | Operation                         | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                  |       |    50 | 12850 |  4060   (1)| 00:00:21 |
|   1 |  NESTED LOOPS                     |       |    50 | 12850 |  4060   (1)| 00:00:21 |
|   2 |   NESTED LOOPS                    |       |    50 | 12850 |  4060   (1)| 00:00:21 |
|   3 |    NESTED LOOPS                   |       |    50 |  7400 |  4010   (1)| 00:00:21 |
|   4 |     TABLE ACCESS FULL             | T1    |  1000 |   106K|     3   (0)| 00:00:01 |
|   5 |     VIEW PUSHED PREDICATE         |       |     1 |    39 |     4   (0)| 00:00:01 |
|   6 |      SORT GROUP BY                |       |     1 |    16 |     4   (0)| 00:00:01 |
|   7 |       NESTED LOOPS                |       |     1 |    16 |     3   (0)| 00:00:01 |
|   8 |        TABLE ACCESS BY INDEX ROWID| T2    |     1 |     8 |     2   (0)| 00:00:01 |
|*  9 |         INDEX UNIQUE SCAN         | T2_PK |     1 |       |     1   (0)| 00:00:01 |
|* 10 |          INDEX RANGE SCAN         | T5_I1 |     1 |     4 |     1   (0)| 00:00:01 |
|  11 |        TABLE ACCESS BY INDEX ROWID| T3    |     1 |     8 |     1   (0)| 00:00:01 |
|* 12 |         INDEX UNIQUE SCAN         | T3_PK |     1 |       |     0   (0)| 00:00:01 |
|* 13 |    INDEX UNIQUE SCAN              | T4_PK |     1 |       |     0   (0)| 00:00:01 |
|  14 |   TABLE ACCESS BY INDEX ROWID     | T4    |     1 |   109 |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   9 - access("T2"."N1"="T1"."N1")
       filter( EXISTS (SELECT /*+ PUSH_SUBQ NO_UNNEST QB_NAME ("SUBQ") */ 0 FROM
              "T5" "T5" WHERE "T5"."OBJECT_ID"=:B1))
  10 - access("T5"."OBJECT_ID"=:B1)
  12 - access("T3"."N1"="T2"."N2")
  13 - access("T4"."N1"="V1"."N1")

Note, particularly, operation 5: VIEW PUSHED PREDICATE, and the associated access predicate at line 9 “t2.n1 = t1.n1″ where the predicate based on t1 has been pushed inside the inline view: so Oracle will evaluate a subset view for each selected row of t1, which is what I wanted. Then you can see operation 10 is an index range scan of t5_i1, acting as a child to the index unique scan of t2_pk of operation 9 – that’s Oracle keeping the subquery as a subquery and executing it as early as possible.

So what happens when I try to get this execution plan using the SQL and hints I’ve got so far ?

Here’s the plan I got from 10.2.0.5:


--------------------------------------------------------------------------------------
| Id  | Operation                    | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |       |    50 | 12750 |    62   (4)| 00:00:01 |
|   1 |  NESTED LOOPS                |       |    50 | 12750 |    62   (4)| 00:00:01 |
|*  2 |   HASH JOIN                  |       |    50 |  7350 |    12  (17)| 00:00:01 |
|   3 |    TABLE ACCESS FULL         | T1    |  1000 |   105K|     3   (0)| 00:00:01 |
|   4 |    VIEW                      |       |    50 |  1950 |     9  (23)| 00:00:01 |
|   5 |     HASH GROUP BY            |       |    50 |   800 |     9  (23)| 00:00:01 |
|*  6 |      HASH JOIN               |       |    50 |   800 |     7  (15)| 00:00:01 |
|*  7 |       TABLE ACCESS FULL      | T2    |    50 |   400 |     3   (0)| 00:00:01 |
|*  8 |        INDEX RANGE SCAN      | T5_I1 |     1 |     4 |     1   (0)| 00:00:01 |
|   9 |       TABLE ACCESS FULL      | T3    |  1000 |  8000 |     3   (0)| 00:00:01 |
|  10 |   TABLE ACCESS BY INDEX ROWID| T4    |     1 |   108 |     1   (0)| 00:00:01 |
|* 11 |    INDEX UNIQUE SCAN         | T4_PK |     1 |       |     0   (0)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("V1"."N1"="T1"."N1")
   6 - access("T3"."N1"="T2"."N2")
   7 - filter( EXISTS (SELECT /*+ PUSH_SUBQ NO_UNNEST QB_NAME ("SUBQ") */ 0
              FROM "T5" "T5" WHERE "T5"."OBJECT_ID"=:B1))
   8 - access("T5"."OBJECT_ID"=:B1)
  11 - access("T4"."N1"="V1"."N1")

In 10g the optimizer has not pushed the join predicate down into the view (the t1 join predicate appears in the hash join at line 2); I think this is because the view has been declared non-mergeable through a hint. So let’s upgrade to 11.1.0.7:

------------------------------------------------------------------------------------------
| Id  | Operation                        | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                 |       |    50 | 12950 |  4008K  (1)| 05:34:04 |
|   1 |  NESTED LOOPS                    |       |    50 | 12950 |  4008K  (1)| 05:34:04 |
|   2 |   MERGE JOIN CARTESIAN           |       |  1000K|   205M|  2065   (3)| 00:00:11 |
|   3 |    TABLE ACCESS FULL             | T1    |  1000 |   105K|     3   (0)| 00:00:01 |
|   4 |    BUFFER SORT                   |       |  1000 |   105K|  2062   (3)| 00:00:11 |
|   5 |     TABLE ACCESS FULL            | T4    |  1000 |   105K|     2   (0)| 00:00:01 |
|   6 |   VIEW PUSHED PREDICATE          |       |     1 |    43 |     4   (0)| 00:00:01 |
|   7 |    SORT GROUP BY                 |       |     1 |    16 |     4   (0)| 00:00:01 |
|*  8 |     FILTER                       |       |       |       |            |          |
|   9 |      NESTED LOOPS                |       |     1 |    16 |     3   (0)| 00:00:01 |
|  10 |       TABLE ACCESS BY INDEX ROWID| T2    |     1 |     8 |     2   (0)| 00:00:01 |
|* 11 |        INDEX UNIQUE SCAN         | T2_PK |     1 |       |     1   (0)| 00:00:01 |
|* 12 |         INDEX RANGE SCAN         | T5_I1 |     1 |     4 |     1   (0)| 00:00:01 |
|  13 |       TABLE ACCESS BY INDEX ROWID| T3    |  1000 |  8000 |     1   (0)| 00:00:01 |
|* 14 |        INDEX UNIQUE SCAN         | T3_PK |     1 |       |     0   (0)| 00:00:01 |
------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   8 - filter("T4"."N1"="T1"."N1")
  11 - access("T2"."N1"="T4"."N1")
       filter( EXISTS (SELECT /*+ PUSH_SUBQ NO_UNNEST QB_NAME ("SUBQ") */ 0 FROM
              "T5" "T5" WHERE "T5"."OBJECT_ID"=:B1))
  12 - access("T5"."OBJECT_ID"=:B1)
  14 - access("T3"."N1"="T2"."N2")

Excellent – at operation 6 we see VIEW PUSHED PREDICATE, and at operation 11 we can see that the join predicate “t2.n1 = t1.n1″.

Less excellent – we have a Cartesian Merge Join between t1 and t4 before pushing predicates. Of course, we told the optimizer to push join predicates into the view, and there are two join predicates, one from t1 and one from t4 – and we didn’t tell the optimizer that we only wanted to push the t1 join predicate into the view. Clearly we need a way of specifying where predicates should be pushed FROM as well as a way of specifying where they should be pushed TO.

If we take a look at the outline information from the execution plan there’s a clue in one of the outline hints: PUSH_PRED(@”MAIN” “V1″@”MAIN” 3 2) – the hint has a couple of extra parameters to it – perhaps the 2 and 3 refer in some way to the 2nd and 3rd tables in the query. If I test with an outer join to t4 (which means the optimizer won’t be able to use my t4 predicate as a join INTO the view) I get the plan I want (except it’s an outer join, of course), and the hint changes to: PUSH_PRED(@”MAIN” “V1″@”MAIN” 2) – so maybe the 2 refers to t1 and the 3 referred to t4, so let’s try the following hints:


push_pred(v1@main 2)
no_push_pred(v1@main 3)

Unfortunately this gives us the following plan:


--------------------------------------------------------------------------------------
| Id  | Operation                    | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |       |    50 | 12300 |    62   (4)| 00:00:01 |
|   1 |  NESTED LOOPS OUTER          |       |    50 | 12300 |    62   (4)| 00:00:01 |
|*  2 |   HASH JOIN                  |       |    50 |  6900 |    12  (17)| 00:00:01 |
|   3 |    TABLE ACCESS FULL         | T1    |  1000 |   105K|     3   (0)| 00:00:01 |
|   4 |    VIEW                      |       |    50 |  1500 |     9  (23)| 00:00:01 |
|   5 |     HASH GROUP BY            |       |    50 |   800 |     9  (23)| 00:00:01 |
|*  6 |      HASH JOIN               |       |    50 |   800 |     7  (15)| 00:00:01 |
|*  7 |       TABLE ACCESS FULL      | T2    |    50 |   400 |     3   (0)| 00:00:01 |
|*  8 |        INDEX RANGE SCAN      | T5_I1 |     1 |     4 |     1   (0)| 00:00:01 |
|   9 |       TABLE ACCESS FULL      | T3    |  1000 |  8000 |     3   (0)| 00:00:01 |
|  10 |   TABLE ACCESS BY INDEX ROWID| T4    |     1 |   108 |     1   (0)| 00:00:01 |
|* 11 |    INDEX UNIQUE SCAN         | T4_PK |     1 |       |     0   (0)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("V1"."N1"="T1"."N1")
   6 - access("T3"."N1"="T2"."N2")
   7 - filter( EXISTS (SELECT /*+ PUSH_SUBQ NO_UNNEST QB_NAME ("SUBQ") */ 0
              FROM "T5" "T5" WHERE "T5"."OBJECT_ID"=:B1))
   8 - access("T5"."OBJECT_ID"=:B1)
  11 - access("T4"."N1"(+)="V1"."N1")

We don’t have join predicate pushdown; on the other hand we’ve got the join order we specified with our leading() hint – and that didn’t appear previously when we got the Cartesian Merge Join with predicate pushdown (our hints were incompatible, so something had to fail). So maybe the numbering has changed because the join order has changed and I should push_pred(v1 1) and no_push_pred(v1 3). Alas, trying all combinations of 2 values from 1,2, and 3 I can’t get the plan I want.

So let’s upgrade to 11.2.0.4. As hinted we get the pushed predicate with Cartesian merge join, but this time the push_pred() hint that appears in the outline looks like this: PUSH_PRED(@”MAIN” “V1″@”MAIN” 2 1) – note how the numbers have changed between 11.1.0.7 and 11.2.0.4. So let’s see what happens when I try two separate hints again, fiddling with the third parameter, e.g.:


push_pred(v1@main 1)
no_push_pred(v1@main 2)

With the values set as above I got the plan I want – it’s just a pity that I’m not 100% certain how the numbering in the push_pred() and no_push_pred() hints is supposed to work. In this case, though, it no longer matters as all I have to do now is create an SQL Baseline for my query, transferring the hinted plan into the the SMB with the unhinted SQL.

In passing, I did manage to get the plan I wanted in 11.1.0.7 by adding the hint /*+ outline_leaf(@main) */ to the original SQL. I’m even less keen on doing that than I am on adding undocumented parameters to the push_pred() and no_push_pred() hints, of course; but having done it I did wonder if there are any SQL Plan Baslines in 11.1.0.7 production systems that include the push_pred() hint that are going to change plan on the upgrade to 11.2.0.4 because the numbering inside the hint is supposed to change with version.

Footnote:

Loosely speaking, this blog note is the answer to a question posted about five years ago.


Parse Time

Tue, 2014-12-09 17:53

Here’s a little query I wrote some time ago to see where my time went while running a query. It’s nothing sophisticated, just one of those simple things you can do with v$active_session_history (or dba_hist_active_sess_history, if you don’t get to the crime scene in time).


set null CPU

select
        sql_exec_id, in_parse, in_hard_parse, event, count(*)
from
        v$active_session_history
where
        sql_id = '{your choice here}'
group by
        sql_exec_id, in_parse, in_hard_parse, event
order by
        sql_exec_id, in_parse, in_hard_parse, count(*)
;

SQL_EXEC_ID I I EVENT                                            COUNT(*)
----------- - - ---------------------------------------------- ----------
   40341649 N N db file sequential read                                68
   40341649 N N CPU                                                    21
   40341649 N N db file scattered read                                  9
            Y Y CPU                                                     7

I had run the query that I was tracking exactly once, but my ASH query allows for, and separates, multiple executions of the same query by summing on sql_exec_id (the thing that the SQL Monitor also uses). The last row looks a little odd, though: it does’t have a value for sql_exec_id; that’s because those are ASH samples when the query is being optimized, not being executed – note that I’ve reported the columns in_parse and in_hard_parse – and both are set to “Y” for that row.

So  (statistically speaking) it’s probably taken about 7 CPU seconds for Oracle to optimise the statement, and from the rest of the results you can see that it’s taken about 21 CPU seconds to run, with 68 seconds spent on random I/Os and 9 seconds spent on multiblock reads for a total of 103 seconds elapsed.

Seven seconds sounds like quite a lot of time for parsing – but it was a fairly complex statement. However, the reason I’d been running the statement on a test system (a fairly good clone of production) was that I’d been seeing something stranger on production and I needed to get a baseline on the test system before I starting trying to fix the problem. Here’s the equivalent ASH reports for the same statement when it had run on production at a time that allowed me to capture its ASH samples.


SQL_EXEC_ID I I EVENT                                            COUNT(*)
----------- - - ---------------------------------------------- ----------
  514257929 N N CPU                                                    21
  514257929 Y Y latch: row cache objects                                1
            Y Y CPU                                                   119

Note the 119 CPU seconds spent parsing to run a 22 second query ! But that wasn’t the worst of it – sometimes the results looked more like this:


SQL_EXEC_ID I I EVENT                                            COUNT(*)
----------- - - ---------------------------------------------- ----------
  523748347 N N db file sequential read                                 1
  523748347 N N CPU                                                    32
            Y Y resmgr:cpu quantum                                     11
            Y Y latch: row cache objects                               38
            Y Y CPU                                                   415

That’s a pretty catastrophic optimsation time – especially since the statement can be optimised in seven seconds in another environment. You might note the resource manager kicking in there, the session is exceeding the CPU limit set for its resource group – though not very often given how infrequently it seems to be waiting on “resmgr:cpu quantum”. But there’s another important wrinkle to this report – which you can see when compare v$active_session_history with v$sql.


  1  select
  2     sql_id,
  3     round(cpu_time/1000000,2) cpu,
  4     round(elapsed_time/1000000,2) ela from v$sql
  5  where
  6     sql_text like '{some identifying text}'
  7* and        sql_text not like '%v$sql%'
SQL> /

SQL_ID               CPU        ELA
------------- ---------- ----------
2atyuc3vtpswy     285.81     516.13

The figures from v$sql don’t match very well with the summed results from ASH which has a total sample of 497 seconds and a CPU sample of 447 seconds. I think I can live with a statistical error of 4% ((516-497)/516) in a random sample for total time, but how do you explain the 36% error in the CPU time ?

The samples reporting “resmgr:cpu quantum” are a clue: the machine is overloaded; it’s trying to use far more CPU time than is available. As a result a process that gets pushed off the CPU by the operating system scheduler while it’s running can spend a long time in the run queue waiting to start running again. And if it’s an Oracle process that got pre-empted it doesn’t “know” that it’s not running, it didn’t put itself into a wait state so all it “knows” is that it’s not in a wait state.

So how do ASH and v$sql differ ? The code that derives the cpu_time for v$sql issues a call to the O/S asking “how much CPU have I used”. The code that takes an ASH sample says: “is this session active, if so is it in a wait state and if it’s not in a wait state then it’s either on the CPU or in the run queue waiting for the CPU”. So when we compare v$sql with ASH the difference in CPU is (statistically speaking) time spent in the run queue. So of our 447 seconds of CPU recorded by ASH, we spent 161 seconds in the CPU run queue waiting for CPU.

We still have to account for the difference between the 7 CPU seconds on a test system and the variation between 119 CPU seconds and 415 CPU seconds in optimisation on production. In a word – concurrency. Apart from everything else going on at the time there were, in the worst case, 16 slightly different versions of the same statement being called at the same time (on a machine with 24 cores) – all 16 statement were competing violently for the same resources at the same time, and the escalating conflict as more session joined in produced an exponential growth in time spent competing for resources rather than doing the job. (I managed to demonstrate the effect quite nicely during the evening by limiting the batch to 4 concurrent executions – and got a typical parse time of 40 CPU seconds).

I’ve often warned people about the problems of concurrency and encouraged them to think about how much time is being spent in competition rather then doing the job; I think this is the most extreme case I’ve seen in a production system. Given how dramatic the variation is, I can’t help wondering if the problem has been exaggerated by some corner case of sessions spinning for mutexes or latches; perhaps even an error in the code that allows resource management to put a session into “resmgr:cpu quantum” while it’s holding a latch or mutex. (I wasn’t able to emulate such an extreme display of the problem on a slightly newer version of Oracle, but I was able to construct a test that demonstrated the effect with a much smaller wastage of CPU.)

The solution (almost certainly): the statements are extremely similar, varying in just one predicate that is using a literal constant. It ought to be a relatively safe and simple change to make the query use a bind variable in that predicate. If the solution is adopted I’d expect to see the (once only) parse time on production drop back to about 7 seconds. Of course, if any other session tries to call the same statement at the same time it ought to end up reporting 7 seconds waiting on “cursor: pin S wait on X” before it starts executing – but that 7 seconds wait is a lot better than an extra 493 CPU seconds trying to optimise the “same” statement at the same time.

Footnote:

Running a minor variation on my ASH query to report the sql_plan_hash_value along with the sql_exec_id, I found that the “fix-up” code that updates older ASH rows with information that only becomes available later (e.g. plan_hash_value when optimising, or long time_waited values for a single wait event) only goes back 255 rows – so when I queried ASH for the statements that took 500 seconds to optimizer only 255 of the in_parse rows showed the final sql_plan_hash_value.

 


Cardinality Change

Mon, 2014-12-08 15:35

Here’s an entertaining little change across versions of Oracle, brought to my attention by Tony Hasler during UKOUG Tech 14. It’s a join cardinality estimate, so here are a couple of tables to demonstrate the issue – the only columns needed are the alpha_06 columns, but I reused some code from other demonstrations to create my test case, so there are lots of irrelevant columns in the create table script:


create table t1 nologging as
with generator as (
        select rownum id
        from dual
        connect by rownum <= 1000
)
select
        rownum                                          id,
        mod(rownum-1,200)                               mod_200,
        trunc(dbms_random.value(0,300))                 rand_300,
        mod(rownum-1,10000)                             mod_10000,
        trunc(sysdate) +
                trunc(dbms_random.value(0,1000))        date_1000,
        dbms_random.string('l',6)                       alpha_06,
        dbms_random.string('l',20)                      alpha_20
from
        generator,
        generator
where
        rownum <= 1e6
;

execute dbms_stats.gather_table_stats(user,'t1',method_opt=>'for all columns size 1')

create table t2 nologging as select * from t1;
execute dbms_stats.gather_table_stats(user,'t2',method_opt=>'for all columns size 1')

I’m going to join t1 to t2 with a predicate based on the alpha_06 columns – using a LIKE predicate. Before I do so I’ll point out that there are are 1,000,000 rows in the table, and (checking the column stats) 985,920 distinct values for alpha_06. Here’s my query, with the execution plan I got from 11.1.0.7:


select
        count(*)
from
        t1, t2
where
        t2.alpha_06 like t1.alpha_06
;

----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |     1 |    14 |  1122M  (6)|999:59:59 |
|   1 |  SORT AGGREGATE     |      |     1 |    14 |            |          |
|   2 |   NESTED LOOPS      |      |    50G|   651G|  1122M  (6)|999:59:59 |
|   3 |    TABLE ACCESS FULL| T1   |  1000K|  6835K|  1123   (6)| 00:00:06 |
|*  4 |    TABLE ACCESS FULL| T2   | 50000 |   341K|  1122   (6)| 00:00:06 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter("T2"."ALPHA_06" LIKE "T1"."ALPHA_06")

The 50,000 cardinality estimate for t2 looks like the standard 5% guess for “column >= {unknown value}”, following which the join cardinality of 50G is the same 5% guess applied to the Cartesian join between t1 and t2 (1M * 1M * 0.05). It’s not a good estimate in my case because the right answer happens to be close to 1M rows, specifically 1,003,176. So let’s upgrade to 11.2.0.4 and see what we get instead:

----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |     1 |    14 |  1050M  (6)|999:59:59 |
|   1 |  SORT AGGREGATE     |      |     1 |    14 |            |          |
|   2 |   NESTED LOOPS      |      |  2014K|    26M|  1050M  (6)|999:59:59 |
|   3 |    TABLE ACCESS FULL| T1   |  1000K|  6835K|  1051   (6)| 00:00:06 |
|*  4 |    TABLE ACCESS FULL| T2   |     2 |    14 |  1050   (6)| 00:00:06 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter("T2"."ALPHA_06" LIKE "T1"."ALPHA_06")

The estimate has dropped from 50 Billion rows down to 2 Million – a factor of about 25,000: possibly an indicator that the algorithm has changed, and that a few people might find execution plans changing as they upgrade to a newer version of Oracle. The change occurred at 11.2.0.2 as revealed by fix control 9303766 which has the description: “use 1/NDV+1/NROWS for col1 LIKE col2 selectivities”.

Just as a quick check on the arithmetic: there are 1 million rows in table t2, with (as noted above) 985,920 distinct values in the column, so the selectivity should be: 1/1000000 + 1/985920 = 2.014281 * e-6. Multiply the selectivity by 1e6 and you get 2, the cardinality estimate for t2; multiply the selectivity by 1M*1M (the Cartesian join) and you get 2,014,281, the cardinality estimate of the join. QED.

There are workarounds, of course. One would be to reverse out the fix control, either as an initialisation parameter or in a session logon trigger, another might be to modify the SQL – I think the following would be equivalent:


select
        *
from    t1, t2
where
        t2.alpha_06 like substr(t1.alpha_06,1,length(t1.alpha_06))||'%'
and     t1.alpha_06 is not null
and     t2.alpha_06 is not null

This changes the critical predicate from the form “col1 like col2″ to “col1 like {unknown value from function}” i.e. back to a case where the optimizer uses the 5% guess, and the cardinality estimates go back the original values.


Closure

Fri, 2014-12-05 02:11

It’s been a long time since I said anything interesting about transitive closure in Oracle, the mechanism by which Oracle can infer that if a = b and b = c then a = c but only (in Oracle’s case) if one of a, b, or c is a literal constant rather than a column. So with that quick reminder in place, here’s an example of optimizer mechanics to worry you. It’s not actually a demonstration of transitive closure coming into play, but I wanted to remind you of the logic to set the scene.

I have three identical tables, one million rows, no indexes. The SQL to create the first table is one I supplied a couple of days ago to demonstrate changes in join cardinality dependent on Oracle version:


create table t1
nologging
as
with generator as (
	select	--+ materialize
		rownum id
	from dual
	connect by
		level <= 1e4
)
select
	trunc(dbms_random.value(0,1000))	n_1000,
	trunc(dbms_random.value(0,750))		n_750,
	trunc(dbms_random.value(0,600))		n_600,
	trunc(dbms_random.value(0,400))		n_400,
	trunc(dbms_random.value(0,90))		n_90,
	trunc(dbms_random.value(0,72))		n_72,
	trunc(dbms_random.value(0,40))		n_40,
	trunc(dbms_random.value(0,3))		n_3
from
	generator	v1,
	generator	v2
where
	rownum <= 1e6
;

Here’s a simple SQL statement that joins the three tables:


select
	t1.*, t2.*, t3.*
from
	t1, t2, t3
where
	t2.n_90  = t1.n_90
and	t3.n_90  = t2.n_90
and	t3.n_600 = t2.n_600
and	t1.n_400 = 1
and	t2.n_400 = 2
and	t3.n_400 = 3
;

Given the various n_400 = {constant} predicates we should expect to see close to 2,500 rows from each table participating in the join – and that is exactly what Oracle predicts in the execution plan. The question is: what is the cardinality of the final join? Before showing you the execution plan and its prediction I’m going to bring transitivity into the picture.  Note the lines numbered 6 and 7.  If t2.n_90 = t1.n_90 and t3.n_90 = t2.n_90 then t3.n_90 = t1.n_90; so I might have written my query slightly differently – note the small change at line 7 below:


select
	t1.*, t2.*, t3.*
from
	t1, t2, t3
where
	t2.n_90  = t1.n_90
and	t3.n_90  = t1.n_90		-- changed
and	t3.n_600 = t2.n_600
and	t1.n_400 = 1
and	t2.n_400 = 2
and	t3.n_400 = 3
;

So here’s the exciting bit. My two queries are logically equivalent, and MUST return exactly the same row set. Check the final cardinality predictions in these two execution plans (from 12.1.0.2, but you get the same results in 11.2.0.4, older versions have other differences):


First Version - note the predicate for operation 3
----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      | 70949 |  5820K|  1869  (10)| 00:00:01 |
|*  1 |  HASH JOIN          |      | 70949 |  5820K|  1869  (10)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL | T1   |  2500 | 70000 |   622  (10)| 00:00:01 |
|*  3 |   HASH JOIN         |      |  2554 |   139K|  1245  (10)| 00:00:01 |
|*  4 |    TABLE ACCESS FULL| T2   |  2500 | 70000 |   622  (10)| 00:00:01 |
|*  5 |    TABLE ACCESS FULL| T3   |  2500 | 70000 |   622  (10)| 00:00:01 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T2"."N_90"="T1"."N_90")
   2 - filter("T1"."N_400"=1)
   3 - access("T3"."N_90"="T2"."N_90" AND "T3"."N_600"="T2"."N_600")
   4 - filter("T2"."N_400"=2)
   5 - filter("T3"."N_400"=3)

Second Version - note the predicate for operation 1
----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |  3264 |   267K|  1868  (10)| 00:00:01 |
|*  1 |  HASH JOIN          |      |  3264 |   267K|  1868  (10)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL | T1   |  2500 | 70000 |   622  (10)| 00:00:01 |
|*  3 |   HASH JOIN         |      | 10575 |   578K|  1245  (10)| 00:00:01 |
|*  4 |    TABLE ACCESS FULL| T2   |  2500 | 70000 |   622  (10)| 00:00:01 |
|*  5 |    TABLE ACCESS FULL| T3   |  2500 | 70000 |   622  (10)| 00:00:01 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T2"."N_90"="T1"."N_90" AND "T3"."N_90"="T1"."N_90")
   2 - filter("T1"."N_400"=1)
   3 - access("T3"."N_600"="T2"."N_600")
   4 - filter("T2"."N_400"=2)
   5 - filter("T3"."N_400"=3)

The a small change in the choice of presenting the predicates gives me a factor of 22 in the cardinality estimate – oops!

The actual result with my data was close to 3,000 rows – so one of the estimates in the second version was pretty good; but the point of the blog isn’t that you can “tune” the optimizer by carefully picking your way through transitive closure, the point is that a small “cosmetic” change you might make to a query could result in a significant change in the cardinality calculations which could then make a dramatic difference to the final execution plan. This example, by the way, depends on the same “multi-column sanity check” that showed up in the previous posting.

I will be expanding on this posting some time in the next couple of weeks but, again, the example should come up in my session on calculating selectivity at “Super Sunday” at UKOUG Tech 14.

 

 


Upgrades

Wed, 2014-12-03 02:24

I have a simple script that creates two identical tables , collects stats (with no histograms) on the pair of them, then executes a join. Here’s the SQL to create the first table:


create table t1
nologging
as
with generator as (
	select	--+ materialize
		rownum id
	from dual
	connect by
		level <= 1e4
)
select
	trunc(dbms_random.value(0,1000))	n_1000,
	trunc(dbms_random.value(0,750))		n_750,
	trunc(dbms_random.value(0,600))		n_600,
	trunc(dbms_random.value(0,400))		n_400,
	trunc(dbms_random.value(0,90))		n_90,
	trunc(dbms_random.value(0,72))		n_72,
	trunc(dbms_random.value(0,40))		n_40,
	trunc(dbms_random.value(0,3))		n_3
from
	generator	v1,
	generator	v2
where
	rownum <= 1e6
;

-- gather stats: no histograms

The two tables have 1,000,000 rows each and t2 is created from t1 with a simple “create as select”. The columns are all defined to be integers, and the naming convention is simple – n_400 holds 400 distinct values with uniform distribution from 0 – 399, n_750 holds 750 values from 0 – 749, and so on.

Here’s the simple query:


select
        t1.*, t2.*
from
        t1, t2
where
        t1.n_400 = 0
and     t2.n_72  = t1.n_90
and     t2.n_750 = t1.n_600
and     t2.n_400 = 1
;

Since I’ve created no indexes you might expect the query to do a couple of and a hash join to get its result – and you’d be right; but what do you think the predicted cardinality would be ?

Here are the results from running explain plan on the query and then reporting the execution plan – for three different versions of Oracle:



9.2.0.8
-------------------------------------------------------------------------
| Id  | Operation            |  Name       | Rows  | Bytes | Cost (%CPU)|
-------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |             |    96 |  4992 |  1230  (10)|
|*  1 |  HASH JOIN           |             |    96 |  4992 |  1230  (10)|
|*  2 |   TABLE ACCESS FULL  | T1          |  2500 | 65000 |   617  (11)|
|*  3 |   TABLE ACCESS FULL  | T2          |  2500 | 65000 |   613  (10)|
-------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T2"."N_750"="T1"."N_600" AND "T2"."N_72"="T1"."N_90")
   2 - filter("T1"."N_400"=0)
   3 - filter("T2"."N_400"=1)

***************************************************************************

10.2.0.5
---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |   116 |  6032 |  1229  (10)| 00:00:07 |
|*  1 |  HASH JOIN         |      |   116 |  6032 |  1229  (10)| 00:00:07 |
|*  2 |   TABLE ACCESS FULL| T1   |  2500 | 65000 |   616  (11)| 00:00:04 |
|*  3 |   TABLE ACCESS FULL| T2   |  2500 | 65000 |   612  (10)| 00:00:04 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T2"."N_750"="T1"."N_600" AND "T2"."N_72"="T1"."N_90")
   2 - filter("T1"."N_400"=0)
   3 - filter("T2"."N_400"=1)

***************************************************************************

11.2.0.4
---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |  2554 |   139K|  1225  (10)| 00:00:07 |
|*  1 |  HASH JOIN         |      |  2554 |   139K|  1225  (10)| 00:00:07 |
|*  2 |   TABLE ACCESS FULL| T1   |  2500 | 70000 |   612  (10)| 00:00:04 |
|*  3 |   TABLE ACCESS FULL| T2   |  2500 | 70000 |   612  (10)| 00:00:04 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T2"."N_72"="T1"."N_90" AND "T2"."N_750"="T1"."N_600")
   2 - filter("T1"."N_400"=0)
   3 - filter("T2"."N_400"=1)

The change for 11.2.0.4 (which is still there for 12.1.0.2. I didn’t check to see if it also appears in 11.1.0.7) is particularly worrying. When you see a simple query like this changing cardinality on the upgrade you can be fairly confident that some of your more complex queries will change their plans – even if there are no clever new optimizer transformations coming into play.

I’ll write up an explanation of how the optimizer has produced three different estimates some time over the next couple of weeks; but if you want an earlier answer this is one of the things I’ll be covering in my presentation on calculating selectivity at “Super Sunday” at UKOUG Tech 14.


UKOUG 2014

Tue, 2014-12-02 05:44

So it’s that time of year when I have to decide on my timetable for the UKOUG annual conference. Of course, I never manage to stick to it, but in principle here are the sessions I’ve highlighted:

Sunday
  • 12:30 – How to Avoid a Salted Banana – Lothar Flatz
  • 13:30 – Calculating Selectivity  – Me
  • 15:00 – Advanced Diagnostics Revisited – Julian Dyke
  • 16:00 – Testing Jumbo Frames for RAC – Neil Johnson
Monday
  • 9:00 – Oracle Indexes Q & A session – Richard Foote
  • 10:00 – How Oracle works in 50 minutes – Martin Widlake
  • 11:30 – Predictive Queries in 12c – Brendan Tierney
  • 14:30 – Oracle Database In-Memory DB and the Query Optimizer – Christian Antognini
  • 16:00 – Instrumenting, Analysing, & Tuning the Performance of Oracle ADF Applications – Frank Houweling
  • 17:00 – Techniques for Strategic Testing – Clive King
Tuesday
  • 9:30 – Top Five Things You Need To Know About Oracle Database In-Memory Option – Maria Colgan
  • 10:30 – How to Write Better PL/SQL – Andrew Clarke
  • 12:00 – Optimizer Round Table – Tony Hasler
  • 14:00 – What we had to Unlearn & Learn when Moving from M9000 to Super Cluster -Philippe Fierens
  • 15:00 – Maximum Availability Architecture: A Recipe for Disaster? – Julian Dyke
  • 16:30 – Chasing the Holy Grail of HA – Implementing Transaction Guard & Application Continuity in Oracle Database 12c -Mark Bobak
  • 17:30 – Five Hints for Efficient SQL – Me
Wednesday
  • 9:00 – Fundamentals of Troubleshooting (without graphics) pt.1 – Me
  • 10:00 – Fundamentals of Troubleshooting (without graphics) pt.2 – Me
  • 11:30 – Indexing in Exadata – Richard Foote

 

 


Line Numbers

Fri, 2014-11-28 06:49

One of the presentations I went to at the DOAG conference earlier on this month was called “PL/SQL Tuning, finding the perf. bottleneck with hierarchical profiler” by Radu Parvu from Finland. If you do a lot of PL/SQL programming and haven’t noticed the dbms_hprof package yet make sure you take a good look at it.

A peripheral question that came up at the end of the session asked about problems with line numbers in pl/sql procedures; why, when you get a run-time error, does the reported line number sometimes look wrong, and how do you find the right line. I can answer (or give at least one reason for) the first part, but not the second part; Julian Dontcheff had an answer for the second bit, but unfortunately I failed to take a note of it.

Here’s the SQL to create, run and list a very simple (and silly) procedure.


define m_value = 3
set timing on

create or replace procedure silly
as
        m_n number;
begin
        for i in 1..1000000 loop
                m_n := exp(sqrt(ln(&m_value)));
        end loop;
end;
/

execute silly

list

Here’s the output I got from running this, and the thing I want you to note is the time it takes to run, and the line number reported for the assignment:


Procedure created.

Elapsed: 00:00:00.01

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.00
  1  create or replace procedure silly
  2  as
  3          m_n number;
  4  begin
  5          for i in 1..1000000 loop
  6                  m_n := exp(sqrt(ln(&m_value)));
  7          end loop;
  8* end;

It seems to take my instance o.oo seconds to perform 1,000,000 evaluations of the exponent of the square root of the natural logarithm of 3 at line 6 of the code. But let’s make two changes; first, let’s try that with the value -1 (which is a little careless if you know your logarithms).


Procedure created.

Elapsed: 00:00:00.02
BEGIN silly; END;

*
ERROR at line 1:
ORA-06502: PL/SQL: numeric or value error
ORA-06512: at "TEST_USER.SILLY", line 5
ORA-06512: at line 1

Notice that the 6502 error is reported at line 5, not line 6.

Now let’s go back to the value 3, but start the script with the command: alter session set plsql_optimize_level = 0; (the default level is 2, the range is 0 to 3):


Session altered.

Elapsed: 00:00:00.00

Procedure created.

Elapsed: 00:00:00.01
BEGIN silly; END;

*
ERROR at line 1:
ORA-06502: PL/SQL: numeric or value error
ORA-06512: at "TEST_USER.SILLY", line 6
ORA-06512: at line 1

Reducing the pl/sql optimisation level to zero (or one) results in the the error message reporting the problem at line 6 – which matches our original code. So let’s check the effect of running the code at level zero with a valid number for the input.


Session altered.

Elapsed: 00:00:00.00

Procedure created.

Elapsed: 00:00:00.02

PL/SQL procedure successfully completed.

Elapsed: 00:00:24.56


The run time for our pl/sql call has gone up from 0.00 seconds to 24.56 seconds.

Like all good compilers the pl/sql compiler had recognised (at level 2) that we were assigning a constant inside a loop, so it had (effectively) rewritten our code to move the assignment outside the loop, effectively swapping lines 6 and 5, storing the change in the “object” level code, but not in the database copy of the source. The benefit we get (from the default settings) is a saving of 25 seconds of CPU, the (small) penalty we pay is that the lines reported for run-time errors aren’t always going to identify the text that caused the problem.


Lunchtime quiz

Wed, 2014-11-26 06:41

There was a question on OTN a few days ago asking the following question:

Here’s a query that ran okay on 11g, but crashed with Oracle error “ORA-01843: not a valid month” after upgrade to 12c; why ?

The generically correct answer, of course, is that the OP had been lucky (or unlucky, depending on your point of view) on 11g – and I’ll explain that answer in another blog posting.

That isn’t the point of this posting, though. This posting is a test of observation and deduction. One of the respondants in the thread had conveniently supplied a little bit of SQL that I copied and fiddled about with to demonstrate a point regarding CPU costing, but as I did so I thought I’d show you the following and ask a simple question.’


drop table T;

Create Table T
As
with
periods as (
                  Select 'January' period, 1 cal  From Dual
        union all Select 'February' period , 2 cal From Dual
        union all Select 'March' period , 3 cal From Dual
        union all Select 'April' period , 4 cal From Dual
        union all Select 'May'  period, 5 cal From Dual
        union all Select 'June' period, 6 cal From Dual
        union all Select 'July' period, 7 cal From Dual
        union all Select 'August' period, 8 cal From Dual
        union all Select 'September' period, 9 cal  From Dual
        union all Select 'October' period, 10 cal From Dual
        union all Select 'November' period, 11 cal From Dual
        Union All Select 'December' Period, 12 Cal From Dual
        Union All Select '13 Series' Period, Null Cal  From Dual
)
Select  Period,Cal
from periods;

prompt  ==================================
prompt  When we invoke below SQL it works.
prompt  ==================================

set autotrace on explain

select *
from    (
        select
                Period,
                Cal,
                to_date(Period || ', ' || 2014,'Month, YYYY') col1 ,
                to_date('November, 2014','Month, YYYY') col2
        From  T
        Where  Cal > 0
        )
;

prompt  ================================================
prompt  But when we add comparison operations , it fails
prompt  ================================================

select *
from    (
        select
                Period,
                Cal,
                to_date(Period || ', ' || 2014,'Month, YYYY')   col1,
                to_date('November, 2014','Month, YYYY')         col2
        From  T
        Where  Cal > 0
        )
where
        col1 >= col2
;

set autotrace off



All I’ve done is create a table then run and generate the execution plans for two queries – with a comment that if you try to run one query it will succeed but if you try to run the other it will fail (and raise ORA-01843). As far as the original supplier was concerned, both queries succeeded in 11g and the failure of the second one appeared only in 12c. In fact, for reasons that I won’t discuss here, it is POSSIBLE for the failure to appear in 11g as well, though not necessarily with this exact data set.

Here’s the COMPLETE output I got from running the code above on an 11.2.0.4 instance:



Table dropped.


Table created.

==================================
When we invoke below SQL it works.
==================================

PERIOD           CAL COL1      COL2
--------- ---------- --------- ---------
January            1 01-JAN-14 01-NOV-14
February           2 01-FEB-14 01-NOV-14
March              3 01-MAR-14 01-NOV-14
April              4 01-APR-14 01-NOV-14
May                5 01-MAY-14 01-NOV-14
June               6 01-JUN-14 01-NOV-14
July               7 01-JUL-14 01-NOV-14
August             8 01-AUG-14 01-NOV-14
September          9 01-SEP-14 01-NOV-14
October           10 01-OCT-14 01-NOV-14
November          11 01-NOV-14 01-NOV-14
December          12 01-DEC-14 01-NOV-14

12 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |    12 |   228 |     2   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T    |    12 |   228 |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("CAL">0)

Note
-----
   - dynamic sampling used for this statement (level=2)

================================================
But when we add comparison operations , it fails
================================================

PERIOD           CAL COL1      COL2
--------- ---------- --------- ---------
November          11 01-NOV-14 01-NOV-14
December          12 01-DEC-14 01-NOV-14

2 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |     1 |    19 |     2   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T    |     1 |    19 |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("CAL">0 AND TO_DATE("PERIOD"||', '||'2014','Month,
              YYYY')>=TO_DATE(' 2014-11-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))


So this is the question. What’s the anomaly in this output ?

Bonus question: What’s the explanation for the anomaly ?

Answers:

If I had asked why the query might, or might not, crash – the answer would be about the order of predicate evaluation, and simply collecting stats (or not) might have made a difference. Ever since “system stats”  and “CPU costing” appeared the optimizer has been able to change the order in which it applies filter predicates to a table (there’s a pdf of an article of mine from Oracle magazine in the 9i / 10g timeline linked at this URL) .  In this case, applying the “cal > 0″ predicate first luckily eliminates the rows that would fail the second predicate. Since the effect is driven by the optimizer’s stats this type of failure could occur ANY TIME you have a predicate that requires coercion between types to take place – which is one reason why you see the injunctions to use the correct data types; and why, if you need coercion to work around incorrect data types you have to consider writing your own functions to trap and resolve the necessary errors raised by Oracle’s implicit conversion mechanisms.

For a quick sketch of the optimizer strategy, the arithmetic is roughly:  predicate A costs c1 and predicate B costs c2; if I apply predicate A to every row I have to apply predicate B to only N surviving rows; if I apply predicate B to every row I have to apply predicate A to M surviving rows; which is smaller: (input_rows * c1 + N * c2) or (input_rows * c2 + M * c1).

The answer to the question I actually asked is this, though: I stressed the fact that this was the COMPLETE output because, as Narenda highlighted in comment 7 below –  the first query shows a note about dynamic sampling and the second query does not. This is a little surprising; we don’t have stats on the table, and the two queries are different so we have to optimizer both of them.  In 12c, of course, it’s possible that the optimizer may have done something clever with statistics feedback (formerly cardinality feedback) and created an SQL directive – but even then we should have seen a note about that.

For the bonus question: given the second output doesn’t report dynamic sampling we should be curious why not – did the optimizer simply decide not to try, did it try then decide not to use the results for some reason, or is there some other reason.  The obvious next step is to look at the 10053 (optimizer) trace – where you find that the optimizer DID do dynamic sampling or rather, it tried to do dynamic sampling but the query generated to take the sample failed with Oracle error ORA-01843, as suggested by Chinar Aliyev in comment 9  and expanded by Mohamed Houri in comment 11.

The irony of the sampling problem (hinted by Chinar Aliyev in comment 10) is that you could be in a position where you have a large table and oracle picks a small sample which happens to miss any of the problem rows and then return a sample that persuades the optimizer to pick an execution plan that is bound to find a problem row; alternatively the SQL used to generate the sample might apply the predicate in an order that manages to eliminate the problem rows, while the final plan derived after sampling persuades the optimizer to use the predicate in the order B, A.

 


Baselines

Sun, 2014-11-23 14:58

I’m not very keen on bending the rules on production systems, I’d prefer to do things that look as if they could have happened in a completely legal fashion, but sometimes it’s necessary to abuse the system and here’s an example to demonstrate the point. I’ve got a simple SQL statement consisting of nothing more than an eight table join where the optimizer (on the various versions I’ve tested, including 12c) examines 5,040 join orders (even though _optimizer_max_permutations is set to the default of 2,000 – and that might come as a little surprise if you thought you knew what that parameter was supposed to do):

select
	/*+ star */
	d1.p1, d2.p2, d3.p3,
	d4.p4, d5.p5, d6.p6,
	d7.p7,
	f.small_vc
from
	dim1		d1,
	dim2		d2,
	dim3		d3,
	dim4		d4,
	dim5		d5,
	dim6		d6,
	dim7		d7,
	fact_tab	f
where
	d1.v1 = 'abc'
and	d2.v2 = 'def'
and	d3.v3 = 'ghi'
and	d4.v4 = 'ghi'
and	d5.v5 = 'ghi'
and	d6.v6 = 'ghi'
and	d7.v7 = 'ghi'
and	f.id1 = d1.id
and	f.id2 = d2.id
and	f.id3 = d3.id
and	f.id4 = d4.id
and	f.id5 = d5.id
and	f.id6 = d6.id
and	f.id7 = d7.id
;

It’s useful to have such extreme examples because they make it easy to notice certain features of Oracle’s behaviour – in this case the difference between SQL Plan Baselines and SQL Profiles. After I had created a baseline for this statement the optimizer still examined 5,040 join orders because that’s the way that baselines work – the optimizer first tries to optimize the statement without the baseline in case it can produce a better plan (for future evaluation and evolution) than the plan dictated by the baseline.

In my example this wasn’t really a problem (apart from the memory requirement in the SGA) but one of my clients has a production query that takes 7 seconds to optimize then runs in a few seconds more, so I’d like to get rid of that 7 seconds … without touching the code. Adding a baseline won’t reduce the time to optimize. (Note: 7 seconds may not seem like a lot, but when several copies of this query are executed concurrently using a different literal value as an input, that 7 seconds can go up to anything between 40 and 500 seconds of CPU parse time per query).

If I take a different tack and write some code to acquire the content of the SQL Plan Baseline (viz. the outline section from the in-memory execution plan) but store it as an SQL Profile the optimizer simply follows the embedded hints and examines just one join order (because the set of hints includes a leading() hint specifying exactly the join order required). This is why I will, occasionally, take advantage of the code that Kerry Osborne and Randolf Geist produced some time ago to raid the library cache for a child cursor and store its plan outline as an SQL profile.

Footnote:

This dirty trick doesn’t always solve the problem – the first example I tried to use for my demonstration did a complex concatenation that took a couple of minutes to optimise, and storing the baseline as a profile didn’t help.

 

 


Parallel Costs

Fri, 2014-11-21 05:36

While creating a POC of a SQL rewrite recently I received a little surprise as I switched my query from serial execution to parallel execution and saw the optimizer’s estimated cost increase dramatically. I’ll explain why in a moment, but it made me think it might be worth setting up a very simple demonstration of the anomaly. I created a table t1 by copying view all_source – which happened to give me a table with about 100,000 rows and 1117 blocks – and then ran the query ‘select max(line) from t1;’ repeating the query with a /*+ parallel(t1 2) */ hint. From 11.2.0.4 here are the two execution plans I got:


---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     4 |   153   (4)| 00:00:01 |
|   1 |  SORT AGGREGATE    |      |     1 |     4 |            |          |
|   2 |   TABLE ACCESS FULL| T1   | 99173 |   387K|   153   (4)| 00:00:01 |
---------------------------------------------------------------------------

----------------------------------------------------------------------------------------------------------------
| Id  | Operation              | Name     | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |          |     1 |     4 |  1010   (1)| 00:00:05 |        |      |            |
|   1 |  SORT AGGREGATE        |          |     1 |     4 |            |          |        |      |            |
|   2 |   PX COORDINATOR       |          |       |       |            |          |        |      |            |
|   3 |    PX SEND QC (RANDOM) | :TQ10000 |     1 |     4 |            |          |  Q1,00 | P->S | QC (RAND)  |
|   4 |     SORT AGGREGATE     |          |     1 |     4 |            |          |  Q1,00 | PCWP |            |
|   5 |      PX BLOCK ITERATOR |          | 99173 |   387K|  1010   (1)| 00:00:05 |  Q1,00 | PCWC |            |
|   6 |       TABLE ACCESS FULL| T1       | 99173 |   387K|  1010   (1)| 00:00:05 |  Q1,00 | PCWP |            |
----------------------------------------------------------------------------------------------------------------

It seems that the optimizer thinks that running the query parallel 2 will take five times as long as running it serially ! Is it, perhaps, some cunning fix to the optimizer that is trying to allow for process startup time for the parallel execution slaves ? Is it a bug ? No – it’s just that I hadn’t got around to setting my system stats and they were complete garbage thanks to various other tests I had been running over the previous couple of weeks. Critically, I had not adjusted the “parallel-oriented” statistics to be consistent with the “serial-oriented” stats.

Here, from the 10053 trace file for the parallel run, is the section on the Single Table Access costing, together with the system stats and base statistical information:


SYSTEM STATISTICS INFORMATION
-----------------------------
  Using WORKLOAD Stats
  CPUSPEED: 1110 millions instructions/sec
  SREADTIM: 4.540000 milliseconds
  MREADTIM: 12.440000 millisecons
  MBRC:     21 blocks
  MAXTHR:   3000000 bytes/sec
  SLAVETHR: 1000000 bytes/sec

***************************************
BASE STATISTICAL INFORMATION
***********************
Table Stats::
  Table: T1  Alias: T1
    #Rows: 99173  #Blks:  1117  AvgRowLen:  76.00  ChainCnt:  0.00
Access path analysis for T1
***************************************
SINGLE TABLE ACCESS PATH
  Single Table Cardinality Estimation for T1[T1]
  Table: T1  Alias: T1
    Card: Original: 99173.000000  Rounded: 99173  Computed: 99173.00  Non Adjusted: 99173.00
  Access Path: TableScan
    Cost:  151.13  Resp: 1010.06  Degree: 0
      Cost_io: 147.00  Cost_cpu: 20826330
      Resp_io: 1007.76  Resp_cpu: 11570183

I’m going to walk through the optimizer’s calculations that got the serial I/O cost (cost_io: 147.00) and the parallel I/O cost (Resp_io: 1007.76), but before I do that I’ll point out how inconsistent some of the system stat are. The multiblock read time (mreadtim) is 12.44 milliseconds, to read an average of 21 blocks (mbrc) which, converting to bytes per second means (21 * 8192) * 1000/12.44 = 13,828,938 bytes per second; but the I/O rate for a single parallel execution slave (slavethr) is only 1,000,000 bytes per second – which means a single serial session can (apparently) tablescan nearly 14 times as fast as an individual parallel execution slave. It’s not surprising that somehow the optimizer thinks a serial tablescan will be faster than parallel 2) – but let’s check exactly how the arithmetic falls out.

Serial:

  • #Blks: 1117, MBRC = 21 => read requests = 1117/21 = 53.19
  • sreadtim = 4.54 milliseconds, mreadtim = 12.44 milliseconds = 2.74 * sreadtim
  • Cost = 53.19 * 2.74 + 1 (_tablescan_cost_plus_one = true) = 146.74 — which looks close enough.

Parallel:

  • #Blks: 1117, block size = 8KB => total I/O requirement = 9,150,464 bytes
  • slavethr: 1,000,000 bytes/sec, degree 2 => total throughput 2,000,000 bytes/sec => elapsed I/O time = 4.575232 seconds
  • sreadtim = 4.54 milliseconds => cost = 4.575232 / 0.00454 = 1007.76 QED.
Two final thoughts:

First, if I increase the degree of parallelism to 3 the cost drops to 673 (671.84 plus a bit of CPU); if I increase the degree any further the cost doesn’t drop any further – that’s because the maxthr (maximum throughput) is 3,000,000. The optimizer uses the smaller of maxthr and (degree * slavethr) in the parallel arithmetic.

Secondly, and most importantly, the way the optimizer behaves will be affected by the various parameters relating to parallelism, so here’s a list of the relevant settings on the instance I was using when creating this example:


SQL> show parameter parallel

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
fast_start_parallel_rollback         string      LOW
parallel_adaptive_multi_user         boolean     TRUE
parallel_automatic_tuning            boolean     FALSE
parallel_degree_limit                string      CPU
parallel_degree_policy               string      MANUAL
parallel_execution_message_size      integer     16384
parallel_force_local                 boolean     FALSE
parallel_instance_group              string
parallel_io_cap_enabled              boolean     FALSE
parallel_max_servers                 integer     80
parallel_min_percent                 integer     0
parallel_min_servers                 integer     0
parallel_min_time_threshold          string      AUTO
parallel_server                      boolean     FALSE
parallel_server_instances            integer     1
parallel_servers_target              integer     32
parallel_threads_per_cpu             integer     2
recovery_parallelism                 integer     0

Note, particularly, that I have not enabled parallel_automatic_tuning.

For further details on parallel costs and the cost/time equivalence that the optimizer uses in recent versions, here’s a link to an article by Randolf Geist.


Quantum Data

Thu, 2014-11-20 05:30

That’s data that isn’t there until you look for it, sort of, from the optimizer’s perspective.

Here’s some code to create a sample data set:


create table t1
as
with generator as (
	select	--+ materialize
		rownum id
	from dual
	connect by
		level <= 1e4
)
select
	rownum					id,
	mod(rownum-1,200)			mod_200,
	mod(rownum-1,10000)			mod_10000,
	lpad(rownum,50)				padding
from
	generator	v1,
	generator	v2
where
	rownum <= 1e6
;

begin
	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'T1',
		method_opt 	 => 'for all columns size 1'
	);
end;
/

Now derive the execution plans for a couple of queries noting, particularly, that we are using queries that are NOT CONSISTENT with the current state of the data (or more importantly the statistics about the data) – we’re querying outside the known range.


select * from t1 where mod_200  = 300;
select * from t1 where mod_200 >= 300;

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |  2462 |   151K|  1246   (5)| 00:00:07 |
|*  1 |  TABLE ACCESS FULL| T1   |  2462 |   151K|  1246   (5)| 00:00:07 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("MOD_200"=300)

SQL> select * from t1 where mod_200 >=300;

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |  2462 |   151K|  1246   (5)| 00:00:07 |
|*  1 |  TABLE ACCESS FULL| T1   |  2462 |   151K|  1246   (5)| 00:00:07 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("MOD_200">=300)

The predicted cardinality for mod_200 = 300 is the same as that for mod_200 >= 300. So, to be self-consistent, the optimizer really ought to report no rows (or a token 1 row) for any value of mod_200 greater than 300 – but it doesn’t.


SQL> select * from t1 where mod_200 = 350;

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |  1206 | 75978 |  1246   (5)| 00:00:07 |
|*  1 |  TABLE ACCESS FULL| T1   |  1206 | 75978 |  1246   (5)| 00:00:07 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("MOD_200"=350)

SQL> select * from t1 where mod_200 =360;

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |   955 | 60165 |  1246   (5)| 00:00:07 |
|*  1 |  TABLE ACCESS FULL| T1   |   955 | 60165 |  1246   (5)| 00:00:07 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("MOD_200"=360)

SQL> select * from t1 where mod_200 =370;

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |   704 | 44352 |  1246   (5)| 00:00:07 |
|*  1 |  TABLE ACCESS FULL| T1   |   704 | 44352 |  1246   (5)| 00:00:07 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("MOD_200"=370)

SQL> select * from t1 where mod_200 =380;

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |   452 | 28476 |  1246   (5)| 00:00:07 |
|*  1 |  TABLE ACCESS FULL| T1   |   452 | 28476 |  1246   (5)| 00:00:07 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("MOD_200"=380)

SQL> select * from t1 where mod_200 in (350, 360, 370, 380);

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |  3317 |   204K|  1275   (7)| 00:00:07 |
|*  1 |  TABLE ACCESS FULL| T1   |  3317 |   204K|  1275   (7)| 00:00:07 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("MOD_200"=350 OR "MOD_200"=360 OR "MOD_200"=370 OR "MOD_200"=380)

The IN-list results are consistent with the results for the individual values – but the result for the IN-list is NOT consistent with the result for the original mod_200 >= 300. The optimizer uses a “linear decay” strategy for handling predicates that go out of range, but not in a consistent way. It seems that, as far as out-of-range, range-based predicates are concerned, the data doesn’t exist until the wave front collapses.

Footnote:

This type of anomaly COULD affect some execution plans if your statistics haven’t been engineered to avoid the problems of “out of range” traps.


Comparisons

Wed, 2014-11-19 06:47

“You can’t compare apples with oranges.”

Oh, yes you can! The answer is 72,731,533,037,581,000,000,000,000,000,000,000.


SQL> 
SQL> create table fruit(v1 varchar2(30));
SQL> 
SQL> insert into fruit values('apples');
SQL> insert into fruit values('oranges');
SQL> commit;
SQL> 
SQL> 
SQL> begin
  2  	     dbms_stats.gather_table_stats(
  3  		     ownname	      => user,
  4  		     tabname	      =>'FRUIT',
  5  		     method_opt       => 'for all columns size 2'
  6  	     );
  7  end;
  8  /
SQL> 
SQL> select
  2  	     endpoint_number,
  3  	     endpoint_value,
  4  	     to_char(endpoint_value,'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx') hex_value
  5  from
  6  	     user_tab_histograms
  7  where
  8  	     table_name = 'FRUIT'
  9  order by
 10  	     endpoint_number
 11  ;

ENDPOINT_NUMBER                                   ENDPOINT_VALUE HEX_VALUE
--------------- ------------------------------------------------ -------------------------------
              1  505,933,332,254,715,000,000,000,000,000,000,000  6170706c65731ad171a7dca6e00000
              2  578,664,865,292,296,000,000,000,000,000,000,000  6f72616e67658acc6c9dcaf5000000
SQL> 
SQL> 
SQL> 
SQL> select
  2  	     max(endpoint_value) - min(endpoint_value) diff
  3  from
  4  	     user_tab_histograms
  5  where
  6  	     table_name = 'FRUIT'
  7  ;

                                            DIFF
------------------------------------------------
  72,731,533,037,581,000,000,000,000,000,000,000
SQL> 
SQL> spool off



Plan puzzle

Mon, 2014-11-17 07:43

I was in Munich a few weeks ago running a course on Designing Optimal SQL and Troubleshooting and Tuning, but just before I flew in to Munich one of the attendees emailed me with an example of a statement that behaved a little strangely and asked me if we could look at it during the course.  It displays an odd little feature, and I thought it might be interesting to write up what I did to find out what was going on. We’ll start with the problem query and execution plan:


select     s section_size,
           max(program_id) ,avg(program_id)
from       fullclones_test
cross join  (select distinct section_size s from fullclones_test)
where      section_size = (select min(section_size) from fullclones_test)
and        clone_size >= s
group by   s
order by   1; 

Since I found this a little difficult to follow (with repetitions of the same table name, and column aliases switching back and forth) I did a little cosmetic modification; all I’ve done is add table aliases and query block names, then arranged the text for my visual benefit. The aliases and query block names can help when dissecting the anomaly.


select
	/*+ qb_name(main) */
	max(ft1.program_id),
	avg(ft1.program_id),
	ftv.s	section_size
from
	fullclones_test	ft1
cross join
	(
	select	/*+ qb_name(inline) */
		distinct ft2.section_size s
	from	fullclones_test	ft2
	)	ftv
where	ft1.section_size = (
		select 	/*+ qb_name(subq) */
			min(section_size)
		from	fullclones_test	ft3
	)
and	ft1.clone_size >= ftv.s
group by
	ftv.s
order by
	ftv.s
; 

This query ran reasonably quickly (about half a second), producing the following execution plan:


--------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation              | Name            | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
--------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |                 |      1 |        |   404 (100)|      4 |00:00:00.05 |    1116 |       |       |          |
|   1 |  SORT GROUP BY NOSORT  |                 |      1 |      5 |   404   (6)|      4 |00:00:00.05 |    1116 |       |       |          |
|   2 |   MERGE JOIN           |                 |      1 |  48637 |   299   (7)|  50361 |00:00:00.58 |    1116 |       |       |          |
|   3 |    SORT JOIN           |                 |      1 |      5 |   114  (11)|      5 |00:00:00.02 |     372 |  2048 |  2048 | 2048  (0)|
|   4 |     VIEW               |                 |      1 |      5 |   114  (11)|      5 |00:00:00.02 |     372 |       |       |          |
|   5 |      HASH UNIQUE       |                 |      1 |      5 |   114  (11)|      5 |00:00:00.02 |     372 |  5894K|  3254K|  884K (0)|
|   6 |       TABLE ACCESS FULL| FULLCLONES_TEST |      1 |  99999 |   105   (3)|  99999 |00:00:00.31 |     372 |       |       |          |
|*  7 |    SORT JOIN           |                 |      5 |  20000 |   185   (4)|  50361 |00:00:00.16 |     744 |   619K|   472K|  550K (0)|
|*  8 |     TABLE ACCESS FULL  | FULLCLONES_TEST |      1 |  20000 |   106   (4)|  20076 |00:00:00.09 |     744 |       |       |          |
|   9 |      SORT AGGREGATE    |                 |      1 |      1 |            |      1 |00:00:00.01 |     372 |       |       |          |
|  10 |       TABLE ACCESS FULL| FULLCLONES_TEST |      1 |  99999 |   105   (3)|  99999 |00:00:00.29 |     372 |       |       |          |
--------------------------------------------------------------------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$071BB01A
   4 - INLINE       / FTV@SEL$1
   5 - INLINE   
   6 - INLINE       / FT2@INLINE   
   8 - SEL$071BB01A / FT1@SEL$1
   9 - SUBQ 
  10 - SUBQ         / FT3@SUBQ

Predicate Information (identified by operation id):
---------------------------------------------------
   7 - access("FT1"."CLONE_SIZE">="FTV"."S")
       filter("FT1"."CLONE_SIZE">="FTV"."S")
   8 - filter("FT1"."SECTION_SIZE"=)

As you can see by comparing the block name / object alias information, we can identify a single full tablescan being executed at line 9 to produce the min(section_size) in the subquery.

We can also see that the “select distinct” executes at lines 5/6 to produce 5 rows which are then joined with a merge join to the first full tablescan of t1.

If you’re wondering about the appearance of a sel$1 despite my efforts to name every query block, that’s the (almost) inevitable side effect of using ANSI syntax – virtually every join after the first two tables will introduce a new (unnameable) query block to introduce the next table.

Now here’s the anomaly: if we eliminate the avg() from the select list we’re going to produce a result that ought to require less work – but look what happens:

--------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation              | Name            | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
--------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |                 |      1 |        | 10802 (100)|      4 |00:02:48.83 |    1116 |       |       |          |
|   1 |  SORT GROUP BY NOSORT  |                 |      1 |      5 | 10802  (94)|      4 |00:02:48.83 |    1116 |       |       |          |
|   2 |   MERGE JOIN           |                 |      1 |    972M| 10697  (95)|   1007M|03:21:28.41 |    1116 |       |       |          |
|   3 |    SORT JOIN           |                 |      1 |  99999 |   380   (4)|  80042 |00:00:00.39 |     372 |  2037K|   674K| 1810K (0)|
|   4 |     TABLE ACCESS FULL  | FULLCLONES_TEST |      1 |  99999 |   105   (3)|  99999 |00:00:00.26 |     372 |       |       |          |
|*  5 |    SORT JOIN           |                 |  80042 |  20000 |   185   (4)|   1007M|00:57:11.13 |     744 |   619K|   472K|  550K (0)|
|*  6 |     TABLE ACCESS FULL  | FULLCLONES_TEST |      1 |  20000 |   106   (4)|  20076 |00:00:00.11 |     744 |       |       |          |
|   7 |      SORT AGGREGATE    |                 |      1 |      1 |            |      1 |00:00:00.01 |     372 |       |       |          |
|   8 |       TABLE ACCESS FULL| FULLCLONES_TEST |      1 |  99999 |   105   (3)|  99999 |00:00:00.28 |     372 |       |       |          |
--------------------------------------------------------------------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$6B65F52B
   4 - SEL$6B65F52B / FT2@INLINE
   6 - SEL$6B65F52B / FT1@SEL$1
   7 - SUBQ
   8 - SUBQ         / FT3@SUBQ

Predicate Information (identified by operation id):
---------------------------------------------------
   5 - access("FT1"."CLONE_SIZE">="FT2"."SECTION_SIZE")
       filter("FT1"."CLONE_SIZE">="FT2"."SECTION_SIZE")
   6 - filter("FT1"."SECTION_SIZE"=)

Ignore the timings from lines 2 and 5 – I was using the hint gather_plan_statistics to collect the rowsource execution stats, and those lines are showing a massive sampling error. The query took about 7 minutes 30 seconds to run. The key difference is that line 4 shows that the “select distinct” is NOT aggregated early – the optimizer has used complex view merging to “join then aggregate” rather than “aggregate then join”. As you can see, this was a bad choice and the join has produced over a billion (US) rows at line 2 which then have to aggregated down to just 4 rows in line 1.

The question then is why ? If I put a /*+ no_merge */ hint in query block named “inline” the optimizer accepts the hint and goes back to the plan that aggregates early and runs very quickly – so it’s not a question of the optimizer bypassing some mechanism to avoid getting the wrong answer. I think the only option available to use for further investigation at this point is to examine the 10053 (optimizer) trace file to see what’s going on.

From the (12c)  trace file where we select the avg() we see the following lines:

OJE: Begin: find best directive for query block INLINE (#0)
OJE: End: finding best directive for query block INLINE (#0)
CVM: CBQT Marking query block INLINE (#0) as valid for CVM.
CVM:   Not Merging INLINE (#0) into SEL$1 (#0) due to CBQT directive.

From the equivalent position in the trace file where we select only the max() we see the lines:

OJE: Begin: find best directive for query block INLINE (#0)
OJE: End: finding best directive for query block INLINE (#0)
CVM:   Merging SPJ view INLINE (#0) into SEL$1 (#0)

It’s always hard to decide exactly WHY things happen – but it looks as if the optimizer merges the view heuristically in the max() case “because it can”, but has a heuristic (internal directive) that stops it from merging in the avg() case. What a good job we have hints !

Footnote:

In cases like this it’s always possible that there’s a generic case that might produce wrong results even though the specific case won’t – so it’s worth spending a little time thinking about how the wrong results might appear. It’s also worth keep hold of the SQL as a test case because if there’s a generic block in place to handle specific cases you may find that future enhancements allow the block to be lifted for some cases, and it’s nice to be able to check for such cases as you upgrade.

On the other hand, you can get back to the efficient plan if you change the inline view to be:


        (
        select  /*+
                        qb_name(inline)
                */
                ft2.section_size s , count(*) ct
        from    fullclones_test ft2
        group by
                ft2.section_size
        )       ftv


That’s just a little bit surprising – but I’m not going to pursue this one any further, beyond noting that there are some interesting anomalies available with inline aggregate views, even in 12.1.0.2.

Footnote 2:

If you want to experiment further, here’s the SQL to generate the data set:

create table fullclones_test (
        program_id      number(5,0),
        start_line      number,
        clone_id        number(5,0),
        clone_line      number,
        clone_size      number,
        range_start     number,
        section_size    number(4,0)
)
;

insert into fullclones_test (
        program_id, start_line, clone_id, clone_line,
        clone_size, range_start, section_size
)
Select
        1,
        trunc(dbms_random.value(1,1000)),
        trunc(dbms_random.value(1,10)),
        trunc(dbms_random.value(1,1000)),
        trunc(dbms_random.value(20,100)),
        0,
        20*trunc(dbms_random.value(1,6))
from
        dual
connect by
        level <100000
;

commit;

exec dbms_stats.gather_table_stats(user,'fullclones_test')

Finally, for consistently reproducible results I had engineered my system stats as follows:


        begin
                dbms_stats.set_system_stats('MBRC',16);
                dbms_stats.set_system_stats('MREADTIM',10);
                dbms_stats.set_system_stats('SREADTIM',5);
                dbms_stats.set_system_stats('CPUSPEED',1000);
        end;


Shrink Tablespace

Fri, 2014-11-14 12:16

If you start moving objects around to try and reclaim space in a tablespace there are all sorts of little traps that make it harder than you might hope to get the maximum benefit with the minimum effort.  I’ve written a couple of notes in the past about how to proceed and, more recently, one of the difficulties involved. This is just a brief note about a couple of ideas to make life a little easier.

  • Check that you’ve emptied the recyclebin before you start
  • Before you try moving/rebuilding an object check that the total free space “below” that object is greater than  the size of the object or you’ll find that parts of the object move “up” the tablespace.
  • Before moving a table, mark its indexes as unusable. If you do this then (in recent versions of Oracle) the default behaviour is for the index space be freed as the segment vanishes and you may find that the extents of the table can move further “down” the tablespace.  (If you’ve kept tables and indexes in different tablespaces this is irrelevant, of course).
  • When you move an object think a little carefully about whether specifying an minimum initial extent size would help or hinder the move.
  • Don’t assume that moving the “highest” object first is the best strategy – work out where you expect the final tablespace HWM to be and you may find that it makes more sense to move other objects that are above the point first.
  • Moving objects with several small (64KB) extents first may allow you to free up larger (1MB, 8MB) gaps that can be used by other larger objects”””””
  • Creating a new tablespace and moving objects to it from the old tablespace “top down” may be the quickest way to proceed.  Work towards dropping the old tablespace HWM regularly.

 


Parallel Fun

Wed, 2014-11-12 10:42

As I write, there’s an ongoing thread on Oracle-L that started with the (paraphrased) question: “I’ve got this query that returns 7 million rows; when I change it to ‘select count(*)’ it returns in 4 seconds but when we display the full result set on screen it takes hours, and every second or two the screen pauses; how do I make it go faster.”

The general rapid response was: “You shouldn’t be running 7M rows to a screen – the time is the time for the network traffic and display.”

The first part of the statement is right – the second part is quite likely to be wrong and there’s a very strong hint in the question that makes me say that, it’s the “pauses every second or two”. Of course we don’t know what the OP isn’t telling us, and we don’t know how accurate he is in what he is telling us, so any ideas we have may be completely wrong. For example, we haven’t been given any idea of how long a “pause” is, we don’t really know how accurate that “second or two” might be and whether “every” is an exaggeration, and maybe the query is returning CLOB columns (and that could make a big difference to what you can do to improve performance).

If we take the statement at face value, though, there is one very obvious inference: although some of the time will be due to network traffic time, most of the time is probably due to Oracle doing something expensive for a significant fraction of the rows returned. The pattern of activity probably looks like this:

  • client: call server to fetch next array of rows
  • server: spend some time populating array  — this is where the client sees a pause
  • client: display result array
  • client: call server to fetch next array of rows
  •  etc…

Here’s a trivial example:

connect / as sysdba
set arraysize 500
set pagesize 40

select
        o1.spare1 ,
        (
        select  max((ctime))
        from    obj$    o2
        where   o2.owner# = o1.owner#
        and     o2.obj# < o1.obj#
        ) ct
from obj$ o1
;

On my laptop, running an instance of 11.2.0.4 with about 80,000 rows in obj$ (and a lot of them owned by SYS), I can count seconds and find that (approximately) I alternate between one second watching results scrolling up the screen and one second waiting as the server generates the next 500 rows.

Of course it’s possible to argue that the problem really is the network and nothing but the network struggling to cope with the never-ending stream of little packets produced by 7M rows. Could there be a choke point that causes the data to stop and start with great regularity, maybe – but previous experience says probably not. I have experienced bad network problems in the past, but when they’ve occurred I’ve always observed extremely random stop/go behaviour. The regularity implied in the question makes the Oracle-based problem seem far more likely.

Conveniently a couple of people asked for more clues – like the query text and the execution plan; even more conveniently the OP supplied the answers in this response. Since the email format makes them a little hard to read I’ve copied them here:


SELECT  bunch of stuff.....,

        (
                SELECT  RTRIM(XMLSERIALIZE(CONTENT EXTRACT( XMLAGG(XMLELEMENT("e", sr1.RELATED_SID
                        ||
                        ',')
                ORDER BY sr1.RELATED_SID), '//text()' ) ) , ',' )
                FROM    service_relationship sr1
                WHERE   sr1.SID                    = slv.SID
                        AND sr1.RELATIONSHIP_LEVEL = '1'
                GROUP BY sr1.SID
        ) AS RELATEDSERVICEINSTANCEIDLEVEL1,
        (
                SELECT  RTRIM(XMLSERIALIZE(CONTENT EXTRACT( XMLAGG(XMLELEMENT("e", sr2.RELATED_SID
                        ||
                        ',')
                ORDER BY sr2.RELATED_SID), '//text()' ) ) , ',' )
                FROM    service_relationship sr2
                WHERE   sr2.SID                    = slv.SID
                        AND sr2.RELATIONSHIP_LEVEL = '2'
                GROUP BY sr2.SID
        ) AS RELATEDSERVICEINSTANCEIDLEVEL2,
        (
               SELECT  RTRIM(XMLSERIALIZE(CONTENT EXTRACT( XMLAGG(XMLELEMENT("e", sr3.RELATED_SID
                        ||
                        ',')
                ORDER BY sr3.RELATED_SID), '//text()' ) ) , ',' )
                FROM    service_relationship sr3
                WHERE   sr3.SID                    = slv.SID
                        AND sr3.RELATIONSHIP_LEVEL = '3'
                GROUP BY sr3.SID
        ) AS RELATEDSERVICEINSTANCEIDLEVEL3,
        (
                SELECT  RTRIM(XMLSERIALIZE(CONTENT EXTRACT( XMLAGG(XMLELEMENT("e", sr4.RELATED_SID
                        ||
                        ',')
                ORDER BY sr4.RELATED_SID), '//text()' ) ) , ',' )
                FROM    service_relationship sr4
                WHERE   sr4.SID                    = slv.SID
                        AND sr4.RELATIONSHIP_LEVEL = '4'
                GROUP BY sr4.SID
        ) AS RELATEDSERVICEINSTANCEIDLEVEL4,
        (
                SELECT  RTRIM(XMLSERIALIZE(CONTENT EXTRACT( XMLAGG(XMLELEMENT("e", sr5.RELATED_SID
                        ||
                        ',')
                ORDER BY sr5.RELATED_SID), '//text()' ) ) , ',' )
                FROM    service_relationship sr5
                WHERE   sr5.SID                    = slv.SID
                        AND sr5.RELATIONSHIP_LEVEL = '5'
                GROUP BY sr5.SID
        ) AS RELATEDSERVICEINSTANCEIDLEVEL5
FROM    service_lookup slv
        LEFT JOIN service_location sl
        ON      sl.service_location_id = slv.service_location_id;

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 1570133209

--------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                        | Name                 | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
--------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                 |                      |  7331K|  5593M|  1877   (5)| 00:00:01 |        |      |            |
|   1 |  SORT GROUP BY                   |                      |     1 |    22 |   368   (6)| 00:00:01 |        |      |            |
|   2 |   PX COORDINATOR                 |                      |       |       |            |          |        |      |            |
|   3 |    PX SEND QC (RANDOM)           | :TQ10000             |    25 |   550 |   368   (6)| 00:00:01 |  Q1,00 | P->S | QC (RAND)  |
|   4 |     PX BLOCK ITERATOR            |                      |    25 |   550 |   368   (6)| 00:00:01 |  Q1,00 | PCWC |            |
|*  5 |      TABLE ACCESS STORAGE FULL   | SERVICE_RELATIONSHIP |    25 |   550 |   368   (6)| 00:00:01 |  Q1,00 | PCWP |            |
|   6 |  SORT GROUP BY                   |                      |     1 |    22 |   368   (6)| 00:00:01 |        |      |            |
|   7 |   PX COORDINATOR                 |                      |       |       |            |          |        |      |            |
|   8 |    PX SEND QC (RANDOM)           | :TQ20000             |    25 |   550 |   368   (6)| 00:00:01 |  Q2,00 | P->S | QC (RAND)  |
|   9 |     PX BLOCK ITERATOR            |                      |    25 |   550 |   368   (6)| 00:00:01 |  Q2,00 | PCWC |            |
|* 10 |      TABLE ACCESS STORAGE FULL   | SERVICE_RELATIONSHIP |    25 |   550 |   368   (6)| 00:00:01 |  Q2,00 | PCWP |            |
|  11 |  SORT GROUP BY                   |                      |     1 |    22 |   368   (6)| 00:00:01 |        |      |            |
|  12 |   PX COORDINATOR                 |                      |       |       |            |          |        |      |            |
|  13 |    PX SEND QC (RANDOM)           | :TQ30000             |    25 |   550 |   368   (6)| 00:00:01 |  Q3,00 | P->S | QC (RAND)  |
|  14 |     PX BLOCK ITERATOR            |                      |    25 |   550 |   368   (6)| 00:00:01 |  Q3,00 | PCWC |            |
|* 15 |      TABLE ACCESS STORAGE FULL   | SERVICE_RELATIONSHIP |    25 |   550 |   368   (6)| 00:00:01 |  Q3,00 | PCWP |            |
|  16 |  SORT GROUP BY                   |                      |     1 |    22 |   368   (6)| 00:00:01 |        |      |            |
|  17 |   PX COORDINATOR                 |                      |       |       |            |          |        |      |            |
|  18 |    PX SEND QC (RANDOM)           | :TQ40000             |    25 |   550 |   368   (6)| 00:00:01 |  Q4,00 | P->S | QC (RAND)  |
|  19 |     PX BLOCK ITERATOR            |                      |    25 |   550 |   368   (6)| 00:00:01 |  Q4,00 | PCWC |            |
|* 20 |      TABLE ACCESS STORAGE FULL   | SERVICE_RELATIONSHIP |    25 |   550 |   368   (6)| 00:00:01 |  Q4,00 | PCWP |            |
|  21 |  SORT GROUP BY                   |                      |     1 |    22 |   368   (6)| 00:00:01 |        |      |            |
|  22 |   PX COORDINATOR                 |                      |       |       |            |          |        |      |            |
|  23 |    PX SEND QC (RANDOM)           | :TQ50000             |    25 |   550 |   368   (6)| 00:00:01 |  Q5,00 | P->S | QC (RAND)  |
|  24 |     PX BLOCK ITERATOR            |                      |    25 |   550 |   368   (6)| 00:00:01 |  Q5,00 | PCWC |            |
|* 25 |      TABLE ACCESS STORAGE FULL   | SERVICE_RELATIONSHIP |    25 |   550 |   368   (6)| 00:00:01 |  Q5,00 | PCWP |            |
|  26 |  PX COORDINATOR                  |                      |       |       |            |          |        |      |            |
|  27 |   PX SEND QC (RANDOM)            | :TQ60002             |  7331K|  5593M|  1877   (5)| 00:00:01 |  Q6,02 | P->S | QC (RAND)  |
|* 28 |    HASH JOIN RIGHT OUTER BUFFERED|                      |  7331K|  5593M|  1877   (5)| 00:00:01 |  Q6,02 | PCWP |            |
|  29 |     PX RECEIVE                   |                      |  3175K|   920M|   366   (3)| 00:00:01 |  Q6,02 | PCWP |            |
|  30 |      PX SEND HASH                | :TQ60000             |  3175K|   920M|   366   (3)| 00:00:01 |  Q6,00 | P->P | HASH       |
|  31 |       PX BLOCK ITERATOR          |                      |  3175K|   920M|   366   (3)| 00:00:01 |  Q6,00 | PCWC |            |
|  32 |        TABLE ACCESS STORAGE FULL | SERVICE_LOCATION     |  3175K|   920M|   366   (3)| 00:00:01 |  Q6,00 | PCWP |            |
|  33 |     PX RECEIVE                   |                      |  7331K|  3467M|  1507   (5)| 00:00:01 |  Q6,02 | PCWP |            |
|  34 |      PX SEND HASH                | :TQ60001             |  7331K|  3467M|  1507   (5)| 00:00:01 |  Q6,01 | P->P | HASH       |
|  35 |       PX BLOCK ITERATOR          |                      |  7331K|  3467M|  1507   (5)| 00:00:01 |  Q6,01 | PCWC |            |
|  36 |        TABLE ACCESS STORAGE FULL | SERVICE_LOOKUP       |  7331K|  3467M|  1507   (5)| 00:00:01 |  Q6,01 | PCWP |            |
--------------------------------------------------------------------------------------------------------------------------------------

We have a simple two-table outer join, and five scalar subqueries in the select list. (Not being very familiar with the various XML calls I had no idea of what the scalar subqueries were doing, or how they produced a result, beyond the fact that they were querying and aggregating multiple rows. In fact the combination of calls does much the same as listagg(), though it allows for a CLOB result (which could be part of the performance problem, of course) rather than being limited to a varchar2() result).

Would you like to guess at this point why I constructed my demonstration query again obj$ the way I did when presenting the idea of high-cost per row queries as a reason for regular pauses in the output ? The execution plan matched one of my two initial guesses about what the query was going to look like. When you “select count(*) from {this query}”, the optimizer will factor out the scalar subqueries and only have to count the result set from the hash join – and it might even manage to use a couple of parallel index fast full scans to get that result rather than doing the tablescans. When you run the query you have to run the scalar subqueries.

If we trust the statistics, we have 5 subqueries to run for each row of the hash join – and the hash join is predicted to return 7.3 million rows. Given that the subqueries are all going to run parallel tablescans against a fairly large table (note – the cost of the tablescans on SERVICE_RELATIONSHIP is 368, compared to the cost of the tablescan on SERVICE_LOCATION which is 366 to return 3.1M rows) that’s an awful lot of work for each row returned – unless we benefit from an enormous amount of scalar subquery caching.

Here’s another performance threat that the plan shows, though: notice where the PX SEND QC operation appears – that means the PX slaves send their (7M) rows to the Query Co-ordinator and the QC is responsible for doing all the work of running the scalar subqueries. Another interesting little threat visible in the plan shows up in the TQ column – the plan uses six “data flow operations” (using the original naming convention, though that changed some time ago but survived in the column names of v$pq_tqstat). In principle each DFO could allocate two sets of PX slaves (and every DFO could have a different degree of parallelism); in this example DFO number 6 (the driving hash join) uses two sets of slave, and the other five DFOs (the scalar subqueries) use a single set each. The upshot of this is that if the default degree of parallelism in play is N this query will allocate 7N parallel query slaves. It gets a little nastier than that, though (based on checking the output from v$sql_plan_monitor), because each time one of the scalar subqueries runs Oracle seems to allocate and deallocate the slaves that are supposed to run it – which is probably going to cause some contention if there are other parallel queries trying to run at the same time.

Optimisation

So what could you do with this query ? It depends on how much change you want to make to the code.

It’s possible that an index on service_relationship(relationship_level, sid) – with compress 1 – might help if it’s very precise, and if the target table stays in the buffer cache for the duration of the query – but, in the absence scalar subquery caching that could still leave the query co-ordinator executing 35 million (5 queries x 7 million rows) subqueries in a serialised process.

A better bet may be to convert from subqueries to joins – remembering that the listagg() / xmlserialize() calls will require you to aggregate (which means sorting in this case) an estimated 25 rows per driving row per relationship_level; in other words you may need to sort 7M * 125 = 875M rows – but at least you could do that in parallel, and there’s always the possibility that the estimated 25 drops off as you work through the different levels. You could choose to do 5 outer hash joins or (as Iggy Fernandez outlined in the thread) you could do a single outer join with a decode on the relationship_level. Another variation on this theme (which would probably have a plan showing ‘join then aggregate’) would be to ‘aggregate then join’. It’s possible that creating a non-mergeable inline view for the 5 values of relationsip_level from a single table access, aggregating it to produce the five required columns, then using the result in an outer join, would be the most efficient option. In the absence of a detailed understanding of the data volume and patterns it’s hard to make any prediction of which strategy would work best.

Footnote:

I may be wrong in my analysis of this problem. When I first saw the question the reason for the performance pattern suggested an “obvious” design error in either the SQL or the infrastructure, and when I saw that the query and execution plan matched my prediction it became very hard for me to think that there might be some other significant cause.

There were a couple of interesting details in the execution plan that made me pursue the problem a little more. In the first case I built a very simple model to get an estimate of the time needed to display 7M rows of a reasonable width in SQL*Plus running across a typical LAN (my estimate was in the order of 45 minutes – not hours); then I spent a little more time (about 10 minutes) to build a model that reproduced the key features of the execution plan shown.

I then spent two or three hours playing with the model, and I’ll be writing a further blog with some of the results later on. One detail to carry away today, though, is that in 12c Oracle can do a new form of subquery unnesting which transformed the query from its 5 scalar subquery form into the seven table join form that was one of the suggestions made on the thread; even more interestingly, if I blocked the unnesting (to force the subquery execution) Oracle 12.1.0.2 came up with a new operator (EXPRESSION EVALUATION) that allowed it to run the subqueries from the PX slaves before passing the results to the query co-ordinator – in other words eliminating the serialisation point.

To be continued …


Quiz night

Fri, 2014-11-07 12:37

Prompted by an email from Yves Colin (who’ll be presenting on the Tuesday of UKOUG Tech14) I was prompted to dig out a little script I wrote some years ago and re-run an old test, leading to this simple question:  what’s the largest size array insert that Oracle will handle ?

If you’re tempted to answer, watch out – it’s not exactly a trick question, but there is a bit of a catch.

Answer:

There is an internal limit of 255 on the size of array that Oracle can insert as a unit. I leave it as an exercise to the reader to decide whether or not this makes any significant difference to performance, since the effects of row size, number of indexes maintained, and disk I/O requirements may make the effects of the limit virtually undetectable. To demonstrate the limit all we need do is insert a few hundred rows into a table and do a block dump, optionally followed by a dump of the redo log.


create table t1 (n1 number not null) segment creation immediate;

alter system switch logfile;

insert into t1 select rownum from all_objects where rownum <= 1000;
commit;

execute dump_log

start rowid_count t1

/*
   FILE_NO   BLOCK_NO ROWS_IN_BLOCK
---------- ---------- -------------
         5        180           660
         5        184           340

2 rows selected.


ROWS_IN_BLOCK     BLOCKS
------------- ----------
          340          1
          660          1

2 rows selected.
*/

alter system flush buffer_cache;

-- SQL> alter system dump datafile 5 block 180;

The dump_log procedure is simply a pl/sql wrapper for a call to ‘alter system dump logfile {current log}'; I the script rowid_count.sql extracts the file and block numbers from rowids in the given table and aggregates them in different ways.  The reason for running the script is to find a table block with a lot of rows in it; the block I dumped actually held the first 660 rows of the insert. Here’s a tiny extract from the block dump (with one little comment added):


tab 0, row 0, @0x1904			-- 6,400 dec
tl: 6 fb: --H-FL-- lb: 0x1  cc: 1
col  0: [ 2]  c1 02
tab 0, row 1, @0x190a
tl: 6 fb: --H-FL-- lb: 0x1  cc: 1
col  0: [ 2]  c1 03
tab 0, row 2, @0x1910
tl: 6 fb: --H-FL-- lb: 0x1  cc: 1
col  0: [ 2]  c1 04
tab 0, row 3, @0x1916
tl: 6 fb: --H-FL-- lb: 0x1  cc: 1

This shows the first four rows inserted, and you can see that the offset to the first row is 6,400 bytes, and then each subsequent row is 6 bytes further down the block (when the numbers get larger the row lengths will increase to 7 bytes). The positioning of these rows is, at first sight, a little odd – you might have guessed that they would either start at offset “zero” and work down to offset “8K”, or start at “8K” and work backwards up to zero – why is the first row 3/4 of the way down the block ?

Rather than show you more row dumps, I’ll give you a carefully selected extract from the row directory:


0x12:pri[0]     offs=0x1904
0x14:pri[1]     offs=0x190a

...

0x20c:pri[253]  offs=0x1f8a
0x20e:pri[254]  offs=0x1f91		End of first 255 rows
0x210:pri[255]  offs=0x120e		Start of second 255 rows
0x212:pri[256]  offs=0x1215

...

0x40a:pri[508]  offs=0x18f6
0x40c:pri[509]  offs=0x18fd		End of second 255 rows
0x40e:pri[510]  offs=0xdf5		Start of last 150 rows
0x410:pri[511]  offs=0xdfc

...

0x536:pri[658]  offs=0x1200		End of last 150 rows
0x538:pri[659]  offs=0x1207

The first 255 rows inserted are stacked at the bottom of the block at offsets 0x1904 to 0x1f91.
The second 255 rows inserted are stacked above them at offsets 0x120e to 0x18fd (note 0x18fd + 6 = 0x1903)
The last 150 rows inserted are stack above them at offsets 0xdf5 to 0x1207 (note 0x1207 + 6 = 0x120d)

No matter how large your attempted array insert, the maximum number of rows (or index entries) Oracle can insert into a block in a single internal array operation is 255.

Further corroboration comes from the redo log dump – here’s a tiny bit of a single change vector (i.e. a single atomic change to a single Oracle block) from the redo generated while this insert was going on:


CHANGE #18 CON_ID:0 TYP:0 CLS:1 AFN:5 DBA:0x014000c7 OBJ:95876 SCN:0x0000.007528dd SEQ:1 OP:11.11 ENC:0 RBL:0 FLG:0x0000
KTB Redo
op: 0x01  ver: 0x01
compat bit: 4 (post-11) padding: 1
op: F  xid:  0x0007.007.000009f2    uba: 0x01000bda.02a9.14
KDO Op code: QMI row dependencies Disabled
  xtype: XA flags: 0x00000000  bdba: 0x014000c7  hdba: 0x01400083
itli: 1  ispac: 0  maxfr: 4858
tabn: 0 lock: 1 nrow: 255
slot[0]: 0
tl: 6 fb: --H-FL-- lb: 0x0  cc: 1
col  0: [ 2]  c1 02
slot[1]: 1
tl: 6 fb: --H-FL-- lb: 0x0  cc: 1
col  0: [ 2]  c1 03
slot[2]: 2
tl: 6 fb: --H-FL-- lb: 0x0  cc: 1
col  0: [ 2]  c1 04

Although the CON_ID in the line 1 tells you this dump is from 12c the same limit holds across all (modern) versions of Oracle. The operation (OP 11.11 – in line 1) is “Insert Multiple Rows”, and at line 9 you can see: “nrow: 255″. I haven’t included the rest of the change vector, but all it does is show the remaining 252 rows.


Cardinality Feedback

Wed, 2014-11-05 12:43

A fairly important question, and a little surprise, appeared on Oracle-L a couple of days ago. Running 11.2.0.3 a query completed quickly on the first execution then ran very slowly on the second execution because Oracle had used cardinality feedback to change the plan. This shouldn’t really be entirely surprising – if you read all the notes that Oracle has published about cardinality feedback – but it’s certainly a little counter-intuitive.

Of course there are several known bugs related to cardinality feedback that could cause this anomaly to appear – a common complaint seems to relate to views on the right-hand (inner table) side of nested loop joins, and cardinality feedback being used on a table inside the view; but there’s an inherent limitation to cardinality feedback that makes it fairly easy to produce an example of a query doing more work on the second execution.

The limitation is that cardinality feedback generally can’t be used (sensibly) on all the tables where better information is needed. This blog describes the simplest example I can come up with to demonstrate the point. Inevitably it’s a little contrived, but it captures the type of guesswork and mis-estimation that can make the problem appear in real data sets. Here’s the query I’m going to use:


select
	t1.n1, t1.n2, t2.n1, t2.n2
from
	t1, t2
where
	t1.n1 = 0
and	t1.n2 = 1000
and	t2.id = t1.id
and	t2.n1 = 0
and	t2.n2 = 400
;

You’ll notice that I’ve got two predicates on both tables so, in the absence of “column-group” extended stats the optimizer will enable cardinality feedback as the query runs to check whether or not its “independent columns” treatment of the predicates gives a suitably accurate estimate of cardinality and a reasonable execution plan. If the estimates are bad enough the optimizer will use information it has gathered as the query ran as an input to re-optimising the query on the next execution.

So here’s the trick.  I’m going to set up the data so that there seem to be only two sensible plans:  (a) full scan of t1, with nested loop unique index access to t2; (b) full scan of t2, with nested loop unique index access to t1. But I’m going to make sure that the optimizer thinks that (a) is more efficient than (b) by making making the stats look as if (on average) the predicates on t1 should return 100 rows while the predicates on t2 return 200 rows.

On the other hand I’ve set the data up so that (for this specific set of values) t1 returns 1,000 rows which means Oracle will decide that its estimate was so far out that it will re-optimize with 1,000 as the estimated single table access cardinality for t1 – and that means it will decide to do the nested loop from t2 to t1. But what the optimizer doesn’t know (and hasn’t been able to find out by running the first plan) is that with this set of predicates t2 will return 20,000 rows to drive the nested loop into t1 – and the new execution plan will do more buffer gets and use more CPU (and time) than the old plan. Since cardinality feedback is applied only once, the optimizer won’t be able to take advantage of the second execution to change the plan again, or even to switch back to the first plan.

Here’s the setup so you can test the behaviour for yourselves:


create table t1
as
with generator as (
	select	--+ materialize
		rownum id
	from dual
	connect by
		level <= 1e4
)
select
	rownum			id,
	mod(rownum,2)		n1,
	mod(rownum,2000)	n2,	-- 200 rows for each value on average
	rpad('x',100)		padding
from
	generator	v1,
	generator	v2
where
	rownum <= 4e5
;

alter table t1 add constraint t1_pk primary key(id);

create table t2
as
with generator as (
	select	--+ materialize
		rownum id
	from dual
	connect by
		level <= 1e4
)
select
	rownum			id,
	mod(rownum,2)		n1,
	2 * mod(rownum,1000)	n2,	-- 400 rows for each value on average, same range as t1
	rpad('x',100)		padding
from
	generator	v1,
	generator	v2
where
	rownum <= 4e5
;

alter table t2 add constraint t2_pk primary key(id);

begin
	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'T1',
		method_opt	 => 'for all columns size 1'
	);
	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'T2',
		method_opt	 => 'for all columns size 1'
	);
end;
/

--
-- Now update both tables to put the data out of sync with the statistics
-- We need a skewed value in t1 that is out by a factor of at least 8 (triggers use of CF)
-- We need a skewed value in t2 that is so bad that the second plan is more resource intensive than the first
--

update t1 set n2 = 1000 where n2 between 1001 and 1019;
update t2 set n2 =  400 where n2 between 402 and 598;
commit;

Here are the execution plans for the first and second executions (with rowsource execution statistics enabled, and the “allstats last” option used in a call to dbms_xplan.display_cursor()).


----------------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name  | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers | Reads  |
----------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |       |      1 |        |  1049 (100)|      0 |00:00:00.36 |   11000 |   6588 |
|   1 |  NESTED LOOPS                |       |      1 |    100 |  1049   (3)|      0 |00:00:00.36 |   11000 |   6588 |
|   2 |   NESTED LOOPS               |       |      1 |    100 |  1049   (3)|   2000 |00:00:00.35 |    9000 |   6552 |
|*  3 |    TABLE ACCESS FULL         | T1    |      1 |    100 |   849   (4)|   2000 |00:00:00.30 |    6554 |   6551 |
|*  4 |    INDEX UNIQUE SCAN         | T2_PK |   2000 |      1 |     1   (0)|   2000 |00:00:00.02 |    2446 |      1 |
|*  5 |   TABLE ACCESS BY INDEX ROWID| T2    |   2000 |      1 |     2   (0)|      0 |00:00:00.01 |    2000 |     36 |
----------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter(("T1"."N2"=1000 AND "T1"."N1"=0))
   4 - access("T2"."ID"="T1"."ID")
   5 - filter(("T2"."N2"=400 AND "T2"."N1"=0))

----------------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name  | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers | Reads  |
----------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |       |      1 |        |  1249 (100)|      0 |00:00:00.66 |   32268 |   1246 |
|   1 |  NESTED LOOPS                |       |      1 |    200 |  1249   (3)|      0 |00:00:00.66 |   32268 |   1246 |
|   2 |   NESTED LOOPS               |       |      1 |    200 |  1249   (3)|  20000 |00:00:00.56 |   12268 |    687 |
|*  3 |    TABLE ACCESS FULL         | T2    |      1 |    200 |   849   (4)|  20000 |00:00:00.12 |    6559 |    686 |
|*  4 |    INDEX UNIQUE SCAN         | T1_PK |  20000 |      1 |     1   (0)|  20000 |00:00:00.19 |    5709 |      1 |
|*  5 |   TABLE ACCESS BY INDEX ROWID| T1    |  20000 |      1 |     2   (0)|      0 |00:00:00.15 |   20000 |    559 |
----------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter(("T2"."N2"=400 AND "T2"."N1"=0))
   4 - access("T2"."ID"="T1"."ID")
   5 - filter(("T1"."N2"=1000 AND "T1"."N1"=0))

Note
-----
   - cardinality feedback used for this statement

The second plan does fewer reads because of the buffering side effects from the first plan – but that’s not what the optimizer is looking at. The key feature is that the first plan predicts 100 rows for t1, with 100 starts for the index probe, but discovers 2,000 rows and does 2,000 probes. Applying cardinality feedback the optimizer decides that fetching 200 rows from t2 and probing t1 200 times will be lower cost than running the join the other way round with the 2,000 rows it now knows it will get – but at runtime Oracle actually gets 20,000 rows, does three times as many buffer gets, and spends twice as much time as it did on the first plan.

Hinting

Oracle hasn’t been able to learn (in time) that t2 will supply 20,000 rows – but if you knew this would happen you could use the cardinality() hint to tell the optimizer the truth about both tables /*+ cardinality(t1 2000) cardinality(t2 20000) */ this is the plan you would get:

--------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |
--------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |      1 |        |  1698 (100)|      0 |00:00:00.06 |   13109 |  13105 |       |       |          |
|*  1 |  HASH JOIN         |      |      1 |   2000 |  1698   (4)|      0 |00:00:00.06 |   13109 |  13105 |  1696K|  1696K| 1647K (0)|
|*  2 |   TABLE ACCESS FULL| T1   |      1 |   2000 |   849   (4)|   2000 |00:00:00.05 |    6554 |   6552 |       |       |          |
|*  3 |   TABLE ACCESS FULL| T2   |      1 |  20000 |   849   (4)|  20000 |00:00:00.09 |    6555 |   6553 |       |       |          |
--------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T2"."ID"="T1"."ID")
   2 - filter(("T1"."N2"=1000 AND "T1"."N1"=0))
   3 - filter(("T2"."N2"=400 AND "T2"."N1"=0))

Unfortunately, unless you have used hints, it doesn’t matter how many times you re-run the query after cardinality feedback has pushed you into the bad plan – it’s not going to change again (unless you mess around flushing the shared pool or using dbms_shared_pool.purge() to kick out the specific statement).

Upgrade

If you upgrade to 12c the optimizer does a much better job of handling this query – it produces an adaptive execution plan (starting with the nested loop join but dynamically switching to the hash join as the query runs). Here’s the full adaptive plan pulled from memory after the first execution – as you can see both the t1/t2 nested loop and hash joins were considered, then the nested loop was discarded in mid-execution. Checking the 10053 trace file I found that Oracle has set the inflexion point (cross-over from NLJ to HJ) at 431 rows.


----------------------------------------------------------------------------------------------------------------------------------------------------
|   Id  | Operation                     | Name  | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------------------------------
|     0 | SELECT STATEMENT              |       |      1 |        |  1063 (100)|      0 |00:00:00.06 |   13113 |  13107 |       |       |          |
|  *  1 |  HASH JOIN                    |       |      1 |    100 |  1063   (3)|      0 |00:00:00.06 |   13113 |  13107 |  1519K|  1519K| 1349K (0)|
|-    2 |   NESTED LOOPS                |       |      1 |    100 |  1063   (3)|   2000 |00:00:00.11 |    6556 |   6553 |       |       |          |
|-    3 |    NESTED LOOPS               |       |      1 |    100 |  1063   (3)|   2000 |00:00:00.10 |    6556 |   6553 |       |       |          |
|-    4 |     STATISTICS COLLECTOR      |       |      1 |        |            |   2000 |00:00:00.09 |    6556 |   6553 |       |       |          |
|  *  5 |      TABLE ACCESS FULL        | T1    |      1 |    100 |   863   (4)|   2000 |00:00:00.08 |    6556 |   6553 |       |       |          |
|- *  6 |     INDEX UNIQUE SCAN         | T2_PK |      0 |      1 |     1   (0)|      0 |00:00:00.01 |       0 |      0 |       |       |          |
|- *  7 |    TABLE ACCESS BY INDEX ROWID| T2    |      0 |      1 |     2   (0)|      0 |00:00:00.01 |       0 |      0 |       |       |          |
|  *  8 |   TABLE ACCESS FULL           | T2    |      1 |      1 |     2   (0)|  20000 |00:00:00.07 |    6557 |   6554 |       |       |          |
----------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T2"."ID"="T1"."ID")
   5 - filter(("T1"."N2"=1000 AND "T1"."N1"=0))
   6 - access("T2"."ID"="T1"."ID")
   7 - filter(("T2"."N2"=400 AND "T2"."N1"=0))
   8 - filter(("T2"."N2"=400 AND "T2"."N1"=0))

Note
-----
   - this is an adaptive plan (rows marked '-' are inactive)

Footnote:

For reference, here are a couple of the bug (or patch)  numbers associated with cardinality feedback:

  • Patch 13454409: BAD CARDINALITY FROM FEEDBACK (CFB) ON VIEW RHS OF NLJ
  • Bug 16837274 (fixed in 12.2): Bad cost estimate for object on RHS of NLJ
  • Bug 12557401: The table that is being incorrectly adjusted is in the right hand side of a nested loops.
  • Bug 8521689: Bad cardinality feedback estimate for view on right-hand side of NLJ

 


Upgrades

Mon, 2014-11-03 12:31

One of the worst problems with upgrades is that things sometimes stop working. A particular nuisance is the execution plan that suddenly stops appearing, to be replaced by an alternative plan that is much less efficient.

Apart from the nuisance of the time spent trying to force the old plan to re-appear, plus the time spent working out a way of rewriting the query when you finally decide the old plan simply isn’t going to re-appear, there’s also the worry about WHY the old plan won’t appear. Is it some sort of bug, is it that some new optimizer feature has disabled some older optimizer feature, or is it that someone in the optimizer group realised that the old plan was capable of producing the wrong results in some circumstances … it’s that last possibility that I find most worrying.

Here’s an example that appeared recently on OTN that’s still got me wondering about the possibility of wrong results (in the general case). We start with a couple of tables, a view, and a pipelined function. This example is a simple model of the problem that showed up on OTN; it’s based on generated data so that anyone who wants to can play around with it to see if they can bypass the problem without making any significant changes to the shape of the code:


create table t1
as
with generator as (
	select	--+ materialize
		rownum id
	from dual
	connect by
		level <= 1e4
)
select
	rownum			id,
	rownum			n1,
	mod(rownum,100)		n_100,
	rpad('x',100)		padding
from
	generator	v1
;

create table t2
as
with generator as (
	select	--+ materialize
		rownum id
	from dual
	connect by
		level <= 1e4
)
select
	rownum			id,
	rownum			n1,
	mod(rownum,100)		n_100,
	rpad('x',100)		padding
from
	generator	v1
;

alter table t2 add constraint t2_pk primary key(id);

begin
	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'T1',
		method_opt	 => 'for all columns size 1'
	);

	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'T2',
		method_opt	 => 'for all columns size 1'
	);

end;
/

create or replace type myScalarType as object (
        x int,
        y varchar2(15),
        d date
)
/

create or replace type myArrayType as table of myScalarType
/

create or replace function t_fun1(i_in number)
return myArrayType
pipelined
as
begin
	pipe row (myscalartype(i_in,     lpad(i_in,15),     trunc(sysdate) + i_in    ));
	pipe row (myscalartype(i_in + 1, lpad(i_in + 1,15), trunc(sysdate) + i_in + 1));
	return;
end;
/

create or replace view v1
as
select
	--+ leading(t2 x) index(t2)
	x.x, x.y, x.d,
	t2.id, t2.n1
from
	t2,
	table(t_fun1(t2.n_100)) x
where
	mod(t2.n1,3) = 1
union all
select
	--+ leading(t2 x) index(t2)
	x.x, x.y, x.d,
	t2.id, t2.n1
from
	t2,
	table(t_fun1(t2.n_100)) x
where
	mod(t2.n1,3) = 2
;

A key part of the problem is the UNION ALL view, where each subquery holds a join to a pipeline function. We’re about to write a query that joins to this view, and wants to push a join predicate into the view. Here’s the SQL:


select
	/*+ leading(t1 v1) use_nl(v1) */
	v1.x, v1.y, v1.d,
	v1.n1,
	t1.n1
from
	t1,
	v1
where
	t1.n_100 = 0
and	v1.id = t1.n1
;

You’ll notice that the join v1.id = t1.n1 could (in principle) be pushed inside the view to become t2.id = t1.n1 in the two branches of the UNION ALL; this would make it possible for the nested loop that I’ve hinted between t1 and v1 to operate efficiently – and in 11.1.0.7 this is exactly what happens:


------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |        | 16336 |   733K|   123   (1)| 00:00:01 |
|   1 |  NESTED LOOPS                         |        | 16336 |   733K|   123   (1)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL                   | T1     |   100 |   700 |    23   (5)| 00:00:01 |
|   3 |   VIEW                                | V1     |   163 |  6357 |     1   (0)| 00:00:01 |
|   4 |    UNION-ALL PARTITION                |        |       |       |            |          |
|   5 |     NESTED LOOPS                      |        |  8168 |   103K|    16   (0)| 00:00:01 |
|*  6 |      TABLE ACCESS BY INDEX ROWID      | T2     |     1 |    11 |     2   (0)| 00:00:01 |
|*  7 |       INDEX UNIQUE SCAN               | T2_PK  |     1 |       |     1   (0)| 00:00:01 |
|   8 |      COLLECTION ITERATOR PICKLER FETCH| T_FUN1 |       |       |            |          |
|   9 |     NESTED LOOPS                      |        |  8168 |   103K|    16   (0)| 00:00:01 |
|* 10 |      TABLE ACCESS BY INDEX ROWID      | T2     |     1 |    11 |     2   (0)| 00:00:01 |
|* 11 |       INDEX UNIQUE SCAN               | T2_PK  |     1 |       |     1   (0)| 00:00:01 |
|  12 |      COLLECTION ITERATOR PICKLER FETCH| T_FUN1 |       |       |            |          |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("T1"."N_100"=0)
   6 - filter(MOD("T2"."N1",3)=1)
   7 - access("T2"."ID"="T1"."N1")
  10 - filter(MOD("T2"."N1",3)=2)
  11 - access("T2"."ID"="T1"."N1")

For each row returned by the tablescan at line 2 we call the view operator at line 3 to generate a rowsource, but we can see in the predicate sections for lines 7 and 11 that the join value has been pushed inside the view, allowing us to access t2 through its primary key index. Depending on the data definitions, constraints, view definition, and version of Oracle, you might see the UNION ALL operator displaying the PARTITION option or the PUSHED PREDICATE option in cases of this type.

So now we upgrade to 11.2.0.4 (probably any 11.2.x.x version) and get the following plan:


------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |        |  1633K|    99M|   296K  (4)| 00:24:43 |
|   1 |  NESTED LOOPS                         |        |  1633K|    99M|   296K  (4)| 00:24:43 |
|*  2 |   TABLE ACCESS FULL                   | T1     |   100 |   700 |    23   (5)| 00:00:01 |
|*  3 |   VIEW                                | V1     | 16336 |   909K|  2966   (4)| 00:00:15 |
|   4 |    UNION-ALL                          |        |       |       |            |          |
|   5 |     NESTED LOOPS                      |        |   816K|    10M|  1483   (4)| 00:00:08 |
|*  6 |      TABLE ACCESS BY INDEX ROWID      | T2     |   100 |  1100 |   187   (2)| 00:00:01 |
|   7 |       INDEX FULL SCAN                 | T2_PK  | 10000 |       |    21   (0)| 00:00:01 |
|   8 |      COLLECTION ITERATOR PICKLER FETCH| T_FUN1 |  8168 | 16336 |    13   (0)| 00:00:01 |
|   9 |     NESTED LOOPS                      |        |   816K|    10M|  1483   (4)| 00:00:08 |
|* 10 |      TABLE ACCESS BY INDEX ROWID      | T2     |   100 |  1100 |   187   (2)| 00:00:01 |
|  11 |       INDEX FULL SCAN                 | T2_PK  | 10000 |       |    21   (0)| 00:00:01 |
|  12 |      COLLECTION ITERATOR PICKLER FETCH| T_FUN1 |  8168 | 16336 |    13   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("T1"."N_100"=0)
   3 - filter("V1"."ID"="T1"."N1")
   6 - filter(MOD("T2"."N1",3)=1)
  10 - filter(MOD("T2"."N1",3)=2)

In this plan the critical join predicate appears at line 3; the predicate hasn’t been pushed. On the other hand the index() hints in the view have, inevitably, been obeyed (resulting in index full scans), as has the use_nl() hint in the main query – leading to a rather more expensive and time-consuming execution plan.

The first, quick, debugging step is simply to set the optimizer_features_enable back to 11.1.0.7 – with no effect; the second is to try adding the push_pred() hint to the query – with no effect; the third is to generate the outline section of the execution plans and copy the entire set of hints from the good plan into the bad plan, noting as we do so that the good plan actually uses the hint OLD_PUSH_PRED(@”SEL$1″ “V1″@”SEL$1″ (“T2″.”ID”)) – still no effect.

Since I happen to know a few things about what is likely to appear in the 10053 (optimizer) trace file, my next step would be to flush the shared pool, enable the trace, and then check the trace file (using grep or find depending on whether I was running UNIX or Windows) for the phrase “JPPD bypassed”; this is what I got:


test_ora_9897.trc:OJPPD:     OJPPD bypassed: View contains TABLE expression.
test_ora_9897.trc:JPPD:     JPPD bypassed: View not on right-side of outer-join.
test_ora_9897.trc:JPPD:     JPPD bypassed: View not on right-side of outer-join.

So 11.1.0.7 had a plan that used the old_push_pred() hint, but 11.2.0.4 explicitly bypassed the option (the rubric near the top of the trace file translates OJPPD to “old-style (non-cost-based) JPPD”, where JPPD translates to “join predicate push-down”). It looks like the plan we got from 11.1.0.7 has been deliberately blocked in 11.2.0.4. So now it’s time to worry whether or not that means I could have been getting wrong results from 11.1.0.7.

In my test case, of course, I can bypass the problem by explicitly rewriting the query – but I’ll have to move the join with t1 inside the view for both subqueries; alternatively, given the trivial nature of the pipeline function, I could replace the table() operator with a join to another union all view. In real life such changes are not always so easy to implement.

Footnote: the restriction is still in place on 12.1.0.2.

Footnote 2: somewhere I’ve probably published a short note explaining that one of my standard pre-emptive strikes on an upgrade is to run the following command to extract useful information from the executable: “strings -a oracle | grep -v bypass”: it can be very helpful to have a list of situations in which some query transformation is bypassed.

 


first_rows(10)

Fri, 2014-10-31 11:31

No, not the 10th posting about first_rows() this week – whatever it may seem like – just an example that happens to use the “calculate costs for fetching the first 10 rows” optimizer strategy and does it badly. I think it’s a bug, but it’s certainly a defect that is a poster case for the inherent risk of using anything other than all_rows optimisation.  Here’s some code to build a couple of sample tables:


begin
	dbms_stats.set_system_stats('MBRC',16);
	dbms_stats.set_system_stats('MREADTIM',12);
	dbms_stats.set_system_stats('SREADTIM',5);
	dbms_stats.set_system_stats('CPUSPEED',1000);
end;
/

create table t1
as
with generator as (
	select	--+ materialize
		rownum id 
	from dual 
	connect by 
		level <= 1e4
)
select
	rownum					id,
	trunc(dbms_random.value(1,1000))	n1,
	lpad(rownum,10,'0')	small_vc,
	rpad('x',100)		padding
from
	generator	v1,
	generator	v2
where
	rownum <= 1e6
;

create index t1_n1 on t1(id, n1);

create table t2
as
with generator as (
	select	--+ materialize
		rownum id 
	from dual 
	connect by 
		level <= 1e4
)
select
	rownum					id,
	trunc(dbms_random.value(10001,20001))	x1,
	lpad(rownum,10,'0')	small_vc,
	rpad('x',100)		padding
from
	generator	v1,
	generator	v2
where
	rownum <= 1e6
;

create index t2_i1 on t2(x1);

begin
	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'T1',
		estimate_percent => 100,
		method_opt	 => 'for all columns size 1'
	);

	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'T2',
		estimate_percent => 100,
		method_opt	 => 'for all columns size 1'
	);

end;
/


create or replace view  v1
as
select 
	id, n1, small_vc, padding
from	t1 
where	n1 between 101 and 300
union all
select 
	id, n1, small_vc, padding
from	t1 
where	n1 between 501 and 700
;

The key feature of this demonstration is the UNION ALL view and what the optimizer does with it when we have first_rows_N optimisation – this is a simplified model of a production problem I was shown a couple of years ago, so nothing special, nothing invented. Here’s a query that behaves badly:


select
	/*+ gather_plan_statistics */
	v1.small_vc,
	v1.n1
from
	v1,
	t2
where
	t2.id = v1.id
and	t2.x1 = 15000
;

I’m going to execute this query in three different ways – as is, using all_rows optimisation; as is, using first_rows_10 optimisation, then using all_rows optimisation but with the necessary hints to make it follow the first_rows_10 execution path. Here are the resulting plans from an instance of 12.1.0.2 (the same thing happens in 11.2.0.4):


first_rows_10 plan
----------------------------------------------------------------------------------------------
| Id  | Operation                            | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |       |     1 |    35 |   107   (0)| 00:00:01 |
|*  1 |  HASH JOIN                           |       |     1 |    35 |   107   (0)| 00:00:01 |
|   2 |   VIEW                               | V1    |    24 |   600 |     4   (0)| 00:00:01 |
|   3 |    UNION-ALL                         |       |       |       |            |          |
|*  4 |     TABLE ACCESS FULL                | T1    |    12 |   240 |     2   (0)| 00:00:01 |
|*  5 |     TABLE ACCESS FULL                | T1    |    12 |   240 |     2   (0)| 00:00:01 |
|   6 |   TABLE ACCESS BY INDEX ROWID BATCHED| T2    |   100 |  1000 |   103   (0)| 00:00:01 |
|*  7 |    INDEX RANGE SCAN                  | T2_I1 |   100 |       |     3   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

all_rows plan
------------------------------------------------------------------------------------------------
| Id  | Operation                              | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                       |       |    40 |  1400 |   904   (1)| 00:00:01 |
|   1 |  NESTED LOOPS                          |       |    40 |  1400 |   904   (1)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED  | T2    |   100 |  1000 |   103   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN                    | T2_I1 |   100 |       |     3   (0)| 00:00:01 |
|   4 |   VIEW                                 | V1    |     1 |    25 |     8   (0)| 00:00:01 |
|   5 |    UNION ALL PUSHED PREDICATE          |       |       |       |            |          |
|   6 |     TABLE ACCESS BY INDEX ROWID BATCHED| T1    |     1 |    20 |     4   (0)| 00:00:01 |
|*  7 |      INDEX RANGE SCAN                  | T1_N1 |     1 |       |     3   (0)| 00:00:01 |
|   8 |     TABLE ACCESS BY INDEX ROWID BATCHED| T1    |     1 |    20 |     4   (0)| 00:00:01 |
|*  9 |      INDEX RANGE SCAN                  | T1_N1 |     1 |       |     3   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------

first_rows_10 plan hinted under all_rows optimisation
---------------------------------------------------------------------------------------------- 
| Id  | Operation                    | Name  | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     | 
---------------------------------------------------------------------------------------------- 
|   0 | SELECT STATEMENT             |       |   200 |  8600 |       |  6124   (3)| 00:00:01 | 
|*  1 |  HASH JOIN                   |       |   200 |  8600 |    17M|  6124   (3)| 00:00:01 |
|   2 |   VIEW                       | V1    |   402K|    12M|       |  5464   (3)| 00:00:01 | 
|   3 |    UNION-ALL                 |       |       |       |       |            |          | 
|*  4 |     TABLE ACCESS FULL        | T1    |   201K|  3933K|       |  2731   (3)| 00:00:01 | 
|*  5 |     TABLE ACCESS FULL        | T1    |   201K|  3933K|       |  2733   (3)| 00:00:01 | 
|   6 |   TABLE ACCESS BY INDEX ROWID| T2    |   100 |  1000 |       |   103   (0)| 00:00:01 |
|*  7 |    INDEX RANGE SCAN          | T2_I1 |   100 |       |       |     3   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

I’m not sure why the first_rows_10 plan uses “table access by rowid batched”, but I’d guess it’s because the optimiser calculates that sorting the index rowids before visiting the table may have a small benefit on the speed of getting the first 10 rows – eventually I’ll get around to checking the 10053 trace file. The important thing, though, is the big mistake in the strategy, not the little difference in table access.

In the first_rows_10 plan the optimizer has decided building an in-memory hash table from the UNION ALL of the rows fetched from the two copies of the t1 table will be fast and efficient; but it’s made that decision based on the assumption that it will only get 10 rows from each copy of the table – and at run-time it HAS to get all the relevant t1 rows to build the hash table before it can get any t2 rows. We can get some idea of the scale of this error when we look at the hinted plan under all_rows optimisation – it’s a lot of redundant data and a very expensive hash table build.

In contrast the all_rows plan does an efficient indexed access into the t2 table then, for each row, does a join predicate pushdown into the union all view using an indexed access path. If we only wanted to fetch 10 rows we could stop after doing a minimum amount of work. To demonstrate the error more clearly I’ve re-run the experiment for the first two plans from SQL*PLus, setting the arraysize to 11, the pagesize to 5, and stopping after the first 10 rows. Here are the plans showing the rowsource execution stats:


first_rows_10 plan
------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name  | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |       |      1 |        |   107 (100)|     12 |00:00:00.43 |   35150 |       |       |          |
|*  1 |  HASH JOIN                           |       |      1 |      1 |   107   (0)|     12 |00:00:00.43 |   35150 |    24M|  3582K|   23M (0)|
|   2 |   VIEW                               | V1    |      1 |     24 |     4   (0)|    400K|00:00:06.48 |   35118 |       |       |          |
|   3 |    UNION-ALL                         |       |      1 |        |            |    400K|00:00:04.20 |   35118 |       |       |          |
|*  4 |     TABLE ACCESS FULL                | T1    |      1 |     12 |     2   (0)|    200K|00:00:00.71 |   17559 |       |       |          |
|*  5 |     TABLE ACCESS FULL                | T1    |      1 |     12 |     2   (0)|    200K|00:00:00.63 |   17559 |       |       |          |
|   6 |   TABLE ACCESS BY INDEX ROWID BATCHED| T2    |      1 |    100 |   103   (0)|     28 |00:00:00.01 |      32 |       |       |          |
|*  7 |    INDEX RANGE SCAN                  | T2_I1 |      1 |    100 |     3   (0)|     28 |00:00:00.01 |       4 |       |       |          |
------------------------------------------------------------------------------------------------------------------------------------------------

all_rows plan
-----------------------------------------------------------------------------------------------------------------------
| Id  | Operation                              | Name  | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
-----------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                       |       |      1 |        |   904 (100)|     12 |00:00:00.01 |     213 |
|   1 |  NESTED LOOPS                          |       |      1 |     43 |   904   (1)|     12 |00:00:00.01 |     213 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED  | T2    |      1 |    100 |   103   (0)|     28 |00:00:00.01 |      32 |
|*  3 |    INDEX RANGE SCAN                    | T2_I1 |      1 |    100 |     3   (0)|     28 |00:00:00.01 |       4 |
|   4 |   VIEW                                 | V1    |     28 |      1 |     8   (0)|     12 |00:00:00.01 |     181 |
|   5 |    UNION ALL PUSHED PREDICATE          |       |     28 |        |            |     12 |00:00:00.01 |     181 |
|   6 |     TABLE ACCESS BY INDEX ROWID BATCHED| T1    |     28 |    212K|     4   (0)|      8 |00:00:00.01 |      93 |
|*  7 |      INDEX RANGE SCAN                  | T1_N1 |     28 |      1 |     3   (0)|      8 |00:00:00.01 |      85 |
|   8 |     TABLE ACCESS BY INDEX ROWID BATCHED| T1    |     28 |    213K|     4   (0)|      4 |00:00:00.01 |      88 |
|*  9 |      INDEX RANGE SCAN                  | T1_N1 |     28 |      1 |     3   (0)|      4 |00:00:00.01 |      84 |
-----------------------------------------------------------------------------------------------------------------------

If I had set the optimizer_mode to first_rows_10 because I really only wanted to fetch (about) 10 rows then I’ve managed to pay a huge overhead in buffer visits, memory and CPU for the privilege – the all_rows plan was much more efficient.

Remember – we often see cases where the first_rows(n) plan will do more work to get the whole data set in order to be able to get the first few rows more quickly (the simplest example is when the optimizer uses a particular index to get the first few rows of a result set in order without sorting rather than doing a (faster) full tablescan with sort. This case, though, is different: the optimizer is choosing to build a hash table as if it only has to put 10 rows into that hash table when it actually HAS to build the whole has table before it can take any further steps – we don’t get 10 rows quicker and the rest more slowly; we just get 10 very slow rows.

Footnote:

It’s possible that this is an example of bug 9633142: (FIRST_ROWS OPTIMIZER DOES NOT PUSH PREDICATES INTO UNION INLINE VIEW) but that’s reported as fixed in 12c, with a couple of patches for 11.2.0.2/3. However, setting “_fix_control”=’4887636:off’, does bypass the problem. (The fix control, introduced in 11.1.0.6 has description: “remove restriction from first K row optimization”)