Inside the Oracle Optimizer

Subscribe to Inside the Oracle Optimizer feed
Updated: 2 hours 32 min ago

Upgrading from Oracle Database 9i to 10g: What to expect from the Optimizer

Tue, 2008-02-26 15:34
One of the most daunting activities a DBA can undertake is upgrading the database to a new version. Having to comprehend all of the new features and to deal with potential plan changes can be overwhelming. In order to help DBA's upgrade from Oracle Database 9i to 10g a new whitepaper called "Upgrading from Oracle Database 9i to 10g: What to expect from the Optimizer" has recently been posted on Oracle Technology Network (OTN). This paper aims to explain in detail what to expect from the CBO when you upgrade from Oracle database 9i to 10g and describes what steps you should take before and after the upgrade to minimize any potential SQL regressions. This is a must read for any DBA planning on upgrading from 9i to 10g in the near future!
Categories: DBA Blogs, Development

Displaying and reading the execution plans for a SQL statement

Thu, 2008-02-07 14:37
Generating and displaying the execution plan of a SQL statement is a common task for most DBAs, SQL developers, and preformance experts as it provides them information on the performance characteristics of a SQL statement. An execution plan shows the detailed steps necessary to execute a SQL statement. These steps are expressed as a set of database operators that consumes and produces rows. The order of the operators and their implentation is decided by the query optimizer using a combination of query transformations and physical optimization techniques.

While the display is commonly shown in a tabular format, the plan is in fact tree-shaped. For example, consider the following query based on the SH schema (Sales History):



select prod_category, avg(amount_sold)
from sales s, products p
where p.prod_id = s.prod_id
group by prod_category;


The tabular representation of this query's plan is:



------------------------------------------
Id Operation Name
------------------------------------------
0 SELECT STATEMENT
1 HASH GROUP BY
2 HASH JOIN
3 TABLE ACCESS FULL PRODUCTS
4 PARTITION RANGE ALL
5 TABLE ACCESS FULL SALES
------------------------------------------

While the tree-shaped representation of the plan is:

GROUP BY
|
JOIN
_____|_______
| |
ACCESS ACCESS
(PRODUCTS) (SALES)




When you read a plan tree you should start from the bottom up. In the above example begin by looking at the access operators (or the leaves of the tree). In this case the access operators are implemented using full table scans. The rows produced by these tables scans will be consumed by the join operator. Here the join operator is a hash-join (other alternatives include nested-loop or sort-merge join). Finally the group-by operator implemented here using hash (alternative would be sort) consumes rows produced by the join-opertor.

The execution plan generated for a SQL statement is just one of the many alternative execution plans considered by the query optimizer. The query optimizer selects the execution plan with the lowest cost. Cost is a proxy for performance, the lower is the cost the better is the performance. The cost model used by the query optimizer accounts for the IO, CPU, and network usage in the query.

There are two different methods you can use to look at the execution plan of a SQL statement:


  1. EXPLAIN PLAN command - This displays an execution plan for a SQL statement without actually executing the statement.

  2. V$SQL_PLAN - A dictionary view introduced in Oracle 9i that shows the execution plan for a SQL statement that has been compiled into a cursor in the cursor cache.


Under certain conditions the plan shown when using EXPLAIN PLAN can be different from the plan shown using V$SQL_PLAN. For example, when the SQL statement contains bind variables the plan shown from using EXPLAIN PLAN ignores the bind variable values while the plan shown in V$SQL_PLAN takes the bind variable values into account in the plan generation process.

Displaying an execution plan has been made easier after the introduction of the dbms_xplan package in Oracle 9i and by the enhancements made to it in subsequent releases. This packages provides several PL/SQL procedures to display the plan from different sources:


  1. EXPLAIN PLAN command

  2. V$SQL_PLAN

  3. Automatic Workload Repository (AWR)

  4. SQL Tuning Set (STS)

  5. SQL Plan Baseline (SPM)


The following examples illustrate how to generate and display an execution plan for our original SQL statement using the different functions provided in the dbms_xplan package.

Example 1 Uses the EXPLAIN PLAN command and the dbms_xplan.display function.


SQL> EXPLAIN PLAN FOR
2 select prod_category, avg(amount_sold)
3 from sales s, products p
4 where p.prod_id = s.prod_id
5 group by prod_category;

Explained.



SQL> select plan_table_output
2 from table(dbms_xplan.display('plan_table',null,'basic'));

------------------------------------------
Id Operation Name
------------------------------------------
0 SELECT STATEMENT
1 HASH GROUP BY
2 HASH JOIN
3 TABLE ACCESS FULL PRODUCTS
4 PARTITION RANGE ALL
5 TABLE ACCESS FULL SALES
------------------------------------------


The arguments are for dbms_xplan.display are:


  • plan table name (default 'PLAN_TABLE'),

  • statement_id (default null),

  • format (default 'TYPICAL')


More details can be found in $ORACLE_HOME/rdbms/admin/dbmsxpln.sql.

Example 2 Generating and displaying the execution plan for the last SQL statement executed in a session:



SQL> select prod_category, avg(amount_sold)
2 from sales s, products p
3 where p.prod_id = s.prod_id
4 group by prod_category;

no rows selected


SQL> select plan_table_output
2 from table(dbms_xplan.display_cursor(null,null,'basic'));

------------------------------------------
Id Operation Name
------------------------------------------
0 SELECT STATEMENT
1 HASH GROUP BY
2 HASH JOIN
3 TABLE ACCESS FULL PRODUCTS
4 PARTITION RANGE ALL
5 TABLE ACCESS FULL SALES
------------------------------------------


The arguments used by dbms_xplay.dispay_cursor are:


  • SQL ID (default null, null means the last SQL statement executed in this session),

  • child number (default 0),

  • format (default 'TYPICAL')


The details are in $ORACLE_HOME/rdbms/admin/dbmsxpln.sql.


Example 3 Displaying the execution plan for any other statement requires the SQL ID to be provided, either directly or indirectly:

  1. Directly:

    SQL> select plan_table_output from
    2 table(dbms_xplan.display_cursor('fnrtqw9c233tt',null,'basic'));


  2. Indirectly:

    SQL> select plan_table_output
    2 from v$sql s,
    3 table(dbms_xplan.display_cursor(s.sql_id,
    4 s.child_number, 'basic')) t
    5 where s.sql_text like 'select PROD_CATEGORY%';


Example 4 - Displaying an execution plan corresponding to a SQL Plan Baseline. SQL Plan Baselines have been introduced in Oracle 11g to support the SQL Plan Management feature (SPM). In order to illustrate such a case we need to create a SQL Plan Baseline first.


SQL> alter session set optimizer_capture_sql_plan_baselines=true;

Session altered.

SQL> select prod_category, avg(amount_sold)
2 from sales s, products p
3 where p.prod_id = s.prod_id
4 group by prod_category;

no rows selected

If the above statement has been executed more than once, a SQL Plan Baseline will be created for it and you can verified this using the follows query:


SQL> select SQL_HANDLE, PLAN_NAME, ACCEPTED
2 from dba_sql_plan_baselines
3 where sql_text like 'select prod_category%';

SQL_HANDLE PLAN_NAME ACC
------------------------------ ------------------------------ ---
SYS_SQL_1899bb9331ed7772 SYS_SQL_PLAN_31ed7772f2c7a4c2 YES


The execution plan for the SQL Plan Baseline created above can be displayed either directly or indirectly:

  1. Directly
    select t.* from
    table(dbms_xplan.display_sql_plan_baseline('SYS_SQL_1899bb9331ed7772',
    format => 'basic')) t


  2. Indirectly
    select t.*
    from (select distinct sql_handle
    from dba_sql_plan_baselines
    where sql_text like 'select prod_category%') pb,
    table(dbms_xplan.display_sql_plan_baseline(pb.sql_handle,
    null,'basic')) t;



The output of either of these two statements is:



----------------------------------------------------------------------------
SQL handle: SYS_SQL_1899bb9331ed7772
SQL text: select prod_category, avg(amount_sold) from sales s, products p
where p.prod_id = s.prod_id group by prod_category
----------------------------------------------------------------------------

----------------------------------------------------------------------------
Plan name: SYS_SQL_PLAN_31ed7772f2c7a4c2
Enabled: YES Fixed: NO Accepted: YES Origin: AUTO-CAPTURE
----------------------------------------------------------------------------

Plan hash value: 4073170114

---------------------------------------------------------
Id Operation Name
---------------------------------------------------------
0 SELECT STATEMENT
1 HASH GROUP BY
2 HASH JOIN
3 VIEW index$_join$_002
4 HASH JOIN
5 INDEX FAST FULL SCAN PRODUCTS_PK
6 INDEX FAST FULL SCAN PRODUCTS_PROD_CAT_IX
7 PARTITION RANGE ALL
8 TABLE ACCESS FULL SALES
---------------------------------------------------------


Formatting


The format argument is highly customizable and allows you to see as little (high-level) or as much (low-level) details as you need / want in the plan output. The high-level options are:

  1. Basic
    The plan includes the operation, options, and the object name (table, index, MV, etc)
  2. Typical
    It includes the information shown in BASIC plus additional optimizer-related internal information such as cost, size, cardinality, etc. These information are shown for every operation in the plan and represents what the optimizer thinks is the operation cost, the number of rows produced, etc. It also shows the predicates evaluation by the operation. There are two types of predicates: ACCESS and FILTER. The ACCESS predicates for an index are used to fetch the relevant blocks because they apply to the search columns. The FILTER predicates are evaluated after the blocks have been fetched.
  3. All
    It includes the information shown in TYPICAL plus the lists of expressions (columns) produced by every operation, the hint alias and query block names where the operation belongs. The last two pieces of information can be used as arguments to add hints to the statement.
The low-level options allow the inclusion or exclusion of find details, such as predicates and cost.
For example,


select plan_table_output
from table(dbms_xplan.display('plan_table',null,'basic +predicate +cost'));

-------------------------------------------------------
Id Operation Name Cost (%CPU)
-------------------------------------------------------
0 SELECT STATEMENT 17 (18)
1 HASH GROUP BY 17 (18)
* 2 HASH JOIN 15 (7)
3 TABLE ACCESS FULL PRODUCTS 9 (0)
4 PARTITION RANGE ALL 5 (0)
5 TABLE ACCESS FULL SALES 5 (0)
-------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("P"."PROD_ID"="S"."PROD_ID")



select plan_table_output from
table(dbms_xplan.display('plan_table',null,'typical -cost -bytes'));

----------------------------------------------------------------------------
Id Operation Name Rows Time Pstart Pstop
----------------------------------------------------------------------------
0 SELECT STATEMENT 4 00:00:01
1 HASH GROUP BY 4 00:00:01
* 2 HASH JOIN 960 00:00:01
3 TABLE ACCESS FULL PRODUCTS 766 00:00:01
4 PARTITION RANGE ALL 960 00:00:01 1 16
5 TABLE ACCESS FULL SALES 960 00:00:01 1 16
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("P"."PROD_ID"="S"."PROD_ID")

Note Section


In addition to the plan, the package displays notes in the NOTE section, such as that dynamic sampling was used during query optimization or that star transformation was applied to the query.
For example, if the table SALES did not have statistics then the optimizer will use dynamic sampling and the plan display will report it as follows (see '+note' detail in the query):


select plan_table_output
from table(dbms_xplan.display('plan_table',null,'basic +note'));

------------------------------------------
Id Operation Name
------------------------------------------
0 SELECT STATEMENT
1 HASH GROUP BY
2 HASH JOIN
3 TABLE ACCESS FULL PRODUCTS
4 PARTITION RANGE ALL
5 TABLE ACCESS FULL SALES
------------------------------------------

Note
-----
- dynamic sampling used for this statement


Bind peeking



The query optimizer takes into account the values of bind variable values when generation an execution plan. It does what is generally called bind peeking. See the first post in this blog about the concept of bind peeking and its impact on the plans and the performance of SQL statements.
As stated earlier the plan shown in V$SQL_PLAN takes into account the values of bind variables while the one shown from using EXPLAIN PLAN does not. Starting with 10gR2, the dbms_xplan package allows the display of the bind variable values used to generate a particular cursor/plan. This is done by adding '+peeked_binds' to the format argument when using display_cursor().
This is illustrated with the following example:



variable pcat varchar2(50)
exec :pcat := 'Women'

select PROD_CATEGORY, avg(amount_sold)
from sales s, products p
where p.PROD_ID = s.PROD_ID
and prod_category != :pcat
group by PROD_CATEGORY;

select plan_table_output
from table(dbms_xplan.display_cursor(null,null,'basic +PEEKED_BINDS'));

------------------------------------------
Id Operation Name
------------------------------------------
0 SELECT STATEMENT
1 HASH GROUP BY
2 HASH JOIN
3 TABLE ACCESS FULL PRODUCTS
4 PARTITION RANGE ALL
5 TABLE ACCESS FULL SALES
------------------------------------------

Peeked Binds (identified by position):
--------------------------------------

1 - :PCAT (VARCHAR2(30), CSID=2): 'Women'
Categories: DBA Blogs, Development

Improvement of AUTO sampling statistics gathering feature in Oracle 11g

Tue, 2008-01-22 17:34
Optimizer statistics in Oracle are managed via a pl/sql package, dbms_stats. It provides several pl/sql procedures to gather statistics for a table, schema, or a database. For example, gather_table_statistics is used to gather statistics on a table. This procedure has an estimate_percent parameter, which specifies the sampling percentage of the statistics gathering. The users can specify any number between 0 ~ 100 for this parameter. For example, suppose you have a table BIGT, you can specify a 1% sampling percentage as follows:

exec dbms_stats.gather_table_stats(null, 'BIGT', 
estimate_percent => 1);


It is not always easy for users to pick the appropriate sampling percentage. If the specified sampling percentage is too high, it can take longer to gather statistics. On the contray, if the data is skewed and the specified sampling percentage is too low, the resulting statistics can be inaccurate.

For this reason, Oracle introduced the AUTO value for the estimate_percent parameter. For example, you can gather statistics on BIGT as follows:

exec dbms_stats.gather_table_stats(null, 'BIGT', 
estimate_percent => dbms_stats.auto_sample_size);


The advantage of using AUTO sample size over a fixed number is two-folds. First, when AUTO sample size is specified, the system automatically determines the appropriate sampling percentage. Second, AUTO sample size is more flexible than a fixed sampling percentage. A fixed sampling percentage size that was good at some point in time may not be appropriate after the data distribution in the table has changed. On the other hand when AUTO value is used Oracle will adjust the sample size when the data distribution changes.

When AUTO is used Oracle picks a sample size where the statistics quality is good enough. However, it does not work very well under cases of extreme skew in the data. In Oracle 11g, we improved the behavior when the AUTO value is used. First, AUTO sampling now generates deterministic statistics. Second, and more importantly, AUTO sampling generates statistics that are almost as accurate as 100% sampling but takes much less time than 100% sampling. To illustrate these merits, we compare the performance of using a fixed sampling percentage, AUTO sample size in Oracle 10g and AUTO sample size in Oracle 11g.

We used the standard TPC-D data generator to generate a Lineitem table. The Lineitem table is about 230G and contains 1.8 million rows with 16 columns. The schema of the lineitem table is as follows:

column namecolumn typel_shipdatedatel_orderkeynumberl_discountnumberl_extendedpricenumberl_suppkeynumberl_quantitynumberl_returnflagvarchar2l_partkeynumberl_linestatusvarchar2l_taxnumberl_commitdatedatel_receiptdatedatel_shipmodevarchar2l_linenumbernumberl_shipinstructvarchar2l_commentvarchar2
Table 1 gives the elapsed time of gathering statistics on the Lineitem table by different sampling percentages.

Sampling PercentageElapsed Time (sec) 1% sampling 797100% sampling (Compute)18772Auto sampling in Oracle 10g 2935Auto sampling in Oracle 11g 1908

Table 1: Statistics gathering time on 230G TPC-D Lineitem Table Using Different Estimate Percentages

We also compare the quality of the statistics gathered using different estimate percentages. Among all the statistics of a column, number of distinct values (NDV) is the one whose accuracy used to be an issue. We define the accuracy rate of NDV of a column as follows:


accuracy rate = 1 - (|estimated NDV - actual NDV|)/actual NDV.


The accuracy rate ranges from 0% to 100%. The higher the accuracy rate is, the more accurate the gathered statistics are. Since 100% sampling always lead to an accuracy rate of 100%, we do not report it. We focus on the columns which has at least one statistics accuracy rate below 99.9% when using different estimate percentages. Table 2 illustrates the accurate rates of the columns.

Column NameActual NDV Auto Sampling in Oracle 11g 1% Sampling orderkey 450,000,000 98.0% 50% comment 181,122,127 98.60% 4.60% partkey 60,000,000 99.20% 98.20% suppkey 3,000,000 99.60%
99.90% extendedprice 3,791,320 99.60% 94.30%

Table 2: Accuracy Rate of Gathering NDV LineItem Using Different Estimate Percentages

In short, the elapsed time of ''AUTO sampling in Oracle 11g'' is 10 times faster than 100% sampling but yields the statistics of similar quality (accuracy rate close to 100%).


Categories: DBA Blogs, Development

Outerjoins in Oracle

Tue, 2007-12-11 15:00
Since release 6, Oracle has supported a restricted form of left outerjoin, which uses Oracle-specific syntax. In 9i, Oracle introduced support for ANSI SQL 92/99 syntax for inner joins and various types of outerjoin. The Oracle syntax for left outerjoin and that of ANSI SQL 92/99 are not equivalent, as the latter is more expressive.

There appears to be some confusion about equivalence between ANSI outer join and Oracle outer join syntaxes. The following examples explain the equivalences and inequivalences of these two syntaxes.

Oracle-Specific Syntax

Consider query A, which expresses a left outerjoin in the Oracle syntax. Here T1 is the left table whose non-joining rows will be retained and all non-joining rows of T2 will be null appended.


A.
SELECT T1.d, T2.c
FROM T1, T2
WHERE T1.x = T2.x (+);


ANSI Left Outerjoin

In the ANSI outer join syntax, query A can be expressed as query B.


B.
SELECT T1.d, T2.c
FROM T1 LEFT OUTER JOIN T2
ON (T1.x = T2.x);


Equivalence

Consider the following queries. In the Oracle semantics, the presence of (+) on the filter predicate (e.g., T2.y (+) > 5 in query C) indicates that this filter must be applied to the table T2 before the outer join takes place.


C.
SELECT T1.d, T2.c
FROM T1, T2
WHERE T1.x = T2.x (+) and T2.y (+) > 5;


The ANSI left outer join query D is equivalent to C. Applying the filter on the right table in the left outer join is the same as combining the filter with the join condition.


D.
SELECT T1.d, T2.c
FROM T1 LEFT OUTER JOIN T2
ON (T1.x = T2.x and T2.y > 5);


Similarly, the presence of (+) on the filter predicate, T2.y (+) IS NULL, in query E indicates that this filter must be applied to the table T2 before the outer join takes place.


E.
SELECT T1.d, T2.c
FROM T1, T2
WHERE T1.x = T2.x (+) and T2.y (+) IS NULL;


The ANSI left outer join query F is equivalent to E.


F.
SELECT T1.d, T2.c
FROM T1 LEFT OUTER JOIN T2
ON (T1.x = T2.x and T2.y IS NULL);


Consider query G. Oracle will apply the filter, T2.y IS NULL, in query G after the outer join has taken place. G will return only those rows of T2 that failed to join with T1 or those whose T2.y values happen to be null.


G.
SELECT T1.d, T2.c
FROM T1, T2
WHERE T1.x = T2.x (+) and T2.y IS NULL;


The ANSI left outer join query H is equivalent to G, as the WHERE clause in H is applied after the left outer join is performed based on the condition specified in the ON clause.


H.
SELECT T1.d, T2.c
FROM T1 LEFT OUTER JOIN T2
ON (T1.x = T2.x)
WHERE T2.y IS NULL;


Consider query I, where the filter on the left table is applied before or after the outer join takes place.


I.
SELECT T1.d, T2.c
FROM T1, T2
WHERE T1.x = T2.x (+) and T1.Z > 4;


The ANSI left outer join query J is equivalent to query I.


J.
SELECT T1.d, T2.c
FROM T1 LEFT OUTER JOIN T2
ON (T1.x = T2.x)
WHERE T1.Z > 4;


Lateral Views

In Oracle, ANSI left and right outerjoins are internally expressed in terms of left outerjoined lateral views. In many cases, a left outerjoined lateral view can be merged and the ANSI left (or right) outerjoin can be expressed entirely in terms of Oracle native left outerjoin operator. (A lateral view is an inline view that contains correlation referring to other tables that precede it in the FROM clause.)

Consider the ANSI left outer join query K, which is first represented internally as L.


K.
SELECT T1.d, T2.c
FROM T1 LEFT OUTER JOIN T2
ON (T1.x = T2.x and T2.k = 5);

L.
SELECT T1.d, LV.c
FROM T1,
LATERAL (SELECT T2.C
FROM T2
WHERE T1.x = T2.x and T2.k = 5)(+) LV;


The lateral view in query L is merged to yield query M.


M.
SELECT T1.d, T2.c
FROM T1, T2
WHERE T1.x = T2.x (+) and T2.k (+)= 5;


Consider query N, which expresses a left outerjoin in the ANSI join syntax. Currently query N cannot be expressed using the Oracle native left outer join operator.


N.
SELECT T1.m, T2.n
FROM T1 LEFT OUTER JOIN T2
ON (T1.h = 11 and T1.y = T2.y)
WHERE T1.q > 3;


The query N is converted into query O with a left outer-joined lateral view. The lateral view in O cannot be merged, since the filter on the left table specified in the ON clause must be part of the left outerjoin condition.


O.
SELECT T1.m, LV.n
FROM T1,
LATERAL(SELECT T2.n
FROM T2
WHERE T1.h = 11 and T1.y = T2.y)(+) LV
WHERE T1.q > 3;


Consider query P with a disjunction in the outer join condition. Currently N cannot be expressed using the Oracle native left outer join operator.


P.
SELECT T1.A, T2.B
FROM T1 LEFT OUTER JOIN T2
ON (T1.x = T2.x OR T1.Z = T2.Z);


The query P will be converted into Q with a left outer-joined lateral view containing the disjunctive join condition.


Q.
SELECT T1.A, LV.B
FROM T1,
LATERAL (SELECT T2.B
FROM T2
WHERE T1.x = T2.x OR T1.Z = T2.Z) (+) LV;


ANSI Full Outerjoin

Before Oracle 11gR1 all ANSI full outerjoins were converted into a UNION ALL query with two branches, where one branch contained a left outerjoined lateral view and the other branch contained a NOT EXISTS subquery. A native support for hash full outerjoin was introduced in 11gR1 to overcome this problem. When the native full outerjoin, cannot be used, Oracle reverts to the pre-11gR1 strategy.

Consider query R, which specifies an ANSI full outerjoin.


R.
SELECT T1.c, T2.d
FROM T1 FULL OUTER JOIN T2
ON T1.x = T2.y;


Before 11gR1, Oracle would internally convert query R into S.


S.
SELECT T1.c, T2.d
FROM T1, T2
WHERE T1.x = T2.y (+)
UNION ALL
SELECT NULL, T2.d
FROM T2
WHERE NOT EXISTS
(SELECT 1 FROM T1 WHERE T1.x = T2.y);


With the native support of hash full outerjoin, R will be simply expressed as query T, where the view, VFOJ, is considered unmergeable.


T.
SELECT VFOJ.c, VFOJ.d
FROM (SELECT T1.c, T2.d
FROM T1, T2
WHERE T1.x F=F T2.y) VFOJ;


Conversion of Outerjoin into Inner Join

Consider query U. Here the filter predicate on the outer-joined table T2 does not contain the outerjoin operator (+); thus it will be applied after the left outerjoin has taken place. This will result in the elimination of all null appended rows of T2. Hence, Oracle converts the outer join into an inner join.


U.
SELECT T1.d, T2.c
FROM T1, T2
WHERE T1.x = T2.x (+) and T2.y > 5;


The ANSI left outer join query V is equivalent to query U, as the WHERE clause in V is applied after the left outer join is performed based on the condition specified in the ON clause.


V.
SELECT T1.d, T2.c
FROM T1 LEFT OUTER JOIN T2
ON (T1.x = T2.x)
WHERE T2.y > 5;


Oracle converts the queries U and V into query W with an inner join.


W.
SELECT T1.d, T2.c
FROM T1, T2
WHERE T1.x = T2.x and T2.y > 5;



Q&A

Q1: I do not understand the queries N and O. What is the difference between
a filter appearing in the ON Clause or a filter appearing in the WHERE
clause?

A1: Consider two tables T11 and T22.

T11:
A | B
1 | 2
2 | 3
3 | 5

T22:
X | Y
7 | 2
8 | 4
9 | 4

The following ANSI left outer join query N’ involving
the tables T11 and T22 will return three rows, since
the filter, which always fails, is part of the join
condition. Although this join condition, which
comprises both the predicates in the ON clause,
always evaluates to FALSE, all the rows of the left
table T11 are retained in the result.

N’.
SELECT *
FROM T11 LEFT OUTER JOIN T22
ON (T11.A > 9 and T11.B = T22.Y);

A B X Y
------ ---------- ---------- ---------
1 2
2 3
3 5

However, if the filter, T11.A > 9, is moved to the WHERE clause,
the query will return zero rows.


Q2: Is the outer to inner join conversion a new feature?

A2: No. This feature has been avaliable since Release 7.


Q3: Has native full outer join been made available in
versions prior to 11gR1?

A3: Yes. It is available in 10.2.0.3 and 10.2.0.4, but not by
default.
Categories: DBA Blogs, Development

Why are there more cursors in 11g for my query containing bind variables?

Mon, 2007-12-03 17:05
Oracle introduced a new feature, adaptive cursor sharing, in 11g, to improve the plans that are selected for queries containing bind variables. This feature can result in more cursors for the same query containing bind variables. We'll explain why in this article. Before we get into the details, let's review a little history.

Oracle introduced the bind peeking feature in Oracle 9i. With bind peeking, the Optimizer peeks at the values of user-defined bind variables on the first invocation of a cursor. This allows the optimizer to determine the selectivity of any WHERE clause condition as if literals have been used instead of bind variables, thus improving the quality of the execution plan generated for statements using bind variables.

However, there was a problem with this approach, when the column used in the WHERE clause with the bind contained a data skew. If there is data skew in the column, it is likely that a histogram has been created on this column during statistics gathering. When the optimizer peeks at the value of the user-defined bind variable and chooses a plan, it is not guaranteed that this plan will be good for all possible values for the bind variable. In other words, the plan is optimized for the peeked value of the bind variable, but not for all possible values.

In 11g, the optimizer has been enhanced to allow multiple execution plans to be used for a single statement that uses bind variables. This ensures that the best execution plan will be used depending on the bind value. Let's look at an example to see exactly how this works.

Assume I have simple table emp which has 100,000 rows and has one index called emp_i1 on deptno column.

SQL> desc emp

Name Null? Type
---------------------- -------- ----------------------------------
ENAME VARCHAR2(20)
EMPNO NUMBER
PHONE VARCHAR2(20)
DEPTNO NUMBER

There is a data skew in the deptno column, so when I gathered statistics on the emp table, Oracle automatically created a histogram on the deptno column.

SQL> select table_name, column_name, histogram from user_tab_cols;

TABLE_NAME COLUMN_NAME HISTOGRAM
------------------ ------------------ ---------------
EMP DEPTNO HEIGHT BALANCED
EMP EMPNO NONE
EMP ENAME NONE
EMP PHONE NONE


Now I will execute a simple select on my emp table, which has a single WHERE
clause predicate on the deptno column. The predicate contains a bind variable. We will begin by using the value 9 for this bind variable. The value 9 occurs 10 times in the table, i.e. in 0.0001% of the rows.

SQL> exec :deptno := 9

SQL> select /*ACS_1*/ count(*), max(empno)
2 from emp
3 where deptno = :deptno;


COUNT(*) MAX(EMPNO)
---------- ----------
10 99
Given how selective the value 9 is, we should expect to get an index range scan for this query. Lets check the execution plan.

SQL> select * from table(dbms_xplan.display_cursor);

PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------
SQL_ID 272gr4hapc9w1, child number 0
------------------------------------------------------------------------
select /*ACS_1*/ count(*), max(empno) from emp where deptno = :deptno

Plan hash value: 3184478295
------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |Cost (%CPU)|
------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 2 (100)|
| 1 | SORT AGGREGATE | | 1| 16 | |
| 2 | TABLE ACCESS BY INDEX ROWID| EMP | 1| 16 | 2 (0)|
| 3 | INDEX RANGE SCAN | EMP_I1| 1| | 1 (0)|
------------------------------------------------------------------------


So we got the index range scan that we expected. Now let's look at the execution statistics for this statement

SQL> select child_number, executions, buffer_gets,
2 is_bind_sensitive, is_bind_aware
3 from v$sql
4 where sql_text like 'select /*ACS_1%';

CHILD_NUMBER EXECUTIONS BUFFER_GETS IS_BIND_SENSITIVE IS_BIND_AWARE
------------ ---------- ----------- ----------------- -------------
0 1 53 Y N

You can see we have one child cursor that has been executed once and has a small number of buffer gets. We also see that the cursor has been marked bind sensitive. A cursor is marked bind sensitive if the optimizer believes the optimal plan may depend on the value of the bind variable. When a cursor is marked bind sensitive, Oracle monitors the behavior of the cursor using different bind values, to determine if a different plan for different bind values is called for. This cursor was marked bind sensitive because the histogram on the deptno column was used to compute the selectivity of the predicate "where deptno = :deptno". Since the presence of the histogram indicates that the column is skewed, different values of the bind variable may call for different plans.

Now let's change the value of the bind variable to 10, which is the most popular value for the deptno column. It occurs 99900 times in the table, i.e in 99.9% of the rows.

SQL>  exec :deptno := 10

SQL> select /*ACS_1*/ count(*), max(empno)
2 from emp
3 where deptno = :deptno;

COUNT(*) MAX(EMPNO)
---------- ----------
99900 100000

We expect to get the same plan as before for this execution because Oracle initially assumes it can be shared. Let's check:

SQL> select * from table(dbms_xplan.display_cursor);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------
SQL_ID 272gr4hapc9w1, child number 0
------------------------------------------------------------------------
select /*ACS_1*/ count(*), max(empno) from emp where deptno = :deptno

Plan hash value: 3184478295
------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |Cost (%CPU)|
------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 2 (100)|
| 1 | SORT AGGREGATE | | 1| 16 | |
| 2 | TABLE ACCESS BY INDEX ROWID| EMP | 1| 16 | 2 (0)|
| 3 | INDEX RANGE SCAN | EMP_I1| 1| | 1 (0)|
------------------------------------------------------------------------

The plan is still an index range scan as before, but if we look at the execution statistics, we should see two executions and a big jump in the number of buffer gets from what we saw before.

SQL> select child_number, executions, buffer_gets,
2 is_bind_sensitive, is_bind_aware
3 from v$sql
4 where sql_text like 'select /*ACS_1%';

CHILD_NUMBER EXECUTIONS BUFFER_GETS IS_BIND_SENSITIVE IS_BIND_AWARE
------------ ---------- ----------- ----------------- -------------
0 2 1007 Y N

You should also note that the cursor is still only marked bind sensitive and not bind aware at this point. So let's re-execute the statement using the same popular value, 10.

SQL> exec :deptno := 10

SQL> select /*ACS_1*/ count(*), max(empno)
2 from emp
3 where deptno = :deptno;

COUNT(*) MAX(EMPNO)
---------- -----------
99900 100000

Behind the scenes during the first two executions, Oracle was monitoring the behavior of the queries, and determined that the different bind values caused the data volumes manipulated by the query to be significantly different. Based on this difference, Oracle "adapts" its behavior so that the same plan is not always shared for this query. Hence a new plan is generated based on the current bind value, 10.

Let's check what the new plan is.

SQL> select * from table(dbms_xplan.display_cursor);

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------
SQL_ID 272gr4hapc9w1, child number 1
--------------------------------------------------------------------
select /*ACS_1*/ count(*), max(empno) from emp where deptno = :deptno

Plan hash value: 2083865914
--------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
--------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 240 (100)|
| 1 | SORT AGGREGATE | | 1 | 16 | |
|* 2 | TABLE ACCESS FULL | EMP | 95000 | 1484K | 240 (1)|
--------------------------------------------------------------------


Given how unselective the value 10 is in the table, it's not surprising that the new plan is a full table scan. Now if we display the execution statistics we should see an additional child cursor (#1) has been created. Cursor #1 should show a number of buffers gets lower than cursor #0 and it is marked both bind sensitive and bind aware. A bind aware cursor may use different plans for different bind values, depending on how selective the predicates containing the bind variable are.

Looking at the execution statistics:

SQL> select child_number, executions, buffer_gets,
2 is_bind_sensitive, is_bind_aware
3 from v$sql
4 where sql_text like 'select /*ACS_1%';

CHILD_NUMBER EXECUTIONS BUFFER_GETS IS_BIND_SENSITIVE IS_BIND_AWARE
------------ ---------- ----------- ----------------- -------------
0 2 1007 Y N
1 1 821 Y Y

we see that there is a new cursor, which represents the plan which uses a table scan. But if we execute the query again with a more selective bind value, we should use the index plan:

SQL> exec :deptno := 9

SQL> select /*ACS_1*/ count(*), max(empno)
2 from emp
3 where deptno = :deptno;

COUNT(*) MAX(EMPNO)
---------- ----------
10 99

SQL> select * from table(dbms_xplan.display_cursor);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------
SQL_ID 272gr4hapc9w1, child number 2
------------------------------------------------------------------------
select /*ACS_1*/ count(*), max(empno) from emp where deptno = :deptno

Plan hash value: 3184478295
------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |Cost (%CPU)|
------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 2 (100)|
| 1 | SORT AGGREGATE | | 1| 16 | |
| 2 | TABLE ACCESS BY INDEX ROWID| EMP | 1| 16 | 2 (0)|
| 3 | INDEX RANGE SCAN | EMP_I1| 1| | 1 (0)|
------------------------------------------------------------------------

The proper plan was chosen, based on the selectivity produced by the current bind value.

There is one last interesting thing to note about this. If we look at the execution statistics again, there are three cursors now:


SQL> select child_number, executions, buffer_gets,
2 is_bind_sensitive, is_bind_aware, is_shareable
3 from v$sql
4 where sql_text like 'select /*ACS_1%';

CHILD_NUMBER EXECUTIONS BUFFER_GETS IS_B_SENS IS_B_AWAR IS_SHAR
------------ ---------- ----------- --------- --------- ----------
0 2 957 Y N N
1 1 765 Y Y Y
2 2 6 Y Y Y

The original cursor was discarded when the cursor switched to bind aware mode. This is a one-time overhead. Note that the cursor is marked as not shareable (is_shareable is "N"), which means that this cursor will be among the first to be aged out of the cursor cache, and that it will no longer be used. In other words, it is just waiting to be garbage collected.

There is one other reason that you may notice additional cursors for such a query in 11g. When a new bind value is used, the optimizer tries to find a cursor that it thinks will be a good fit, based on similarity in the bind value's selectivity. If it cannot find such a cursor, it will create a new one (like above, when one (#1) was created for unselective "10" and one (#2) was created for highly-selective "9"). If the plan for the new cursor is the same as one of the existing cursors, the two cursors will be merged, to save space in the cursor cache. This will result in one being left behind that is in a not shareable state. This cursor will be aged out first if there is crowding in the cursor cache, and will not be used for future executions.

Q & A
Instead of answering the questions in your comments one by one, I am going to summarize the questions and provide my answers here.

Q: Is this behavior managed by 11g optimizer automatically and we don't need cursor_sharing anymore?
A: We have not changed the behavior of the cursor_sharing parameter yet, for backwards compatibility purposes. So if you set it to similar, adaptive cursor sharing will only kick in for queries where the literals are replace with binds. We hope that in the future, this feature will persuade people to set cursor_sharing to force.

Q: Would it have any impact like holding library cache latches for longer time to search for appropriate child cursor.
A: Any additional overhead in matching a cursor is always a concern, and we strive to minimize the impact. There is of course some increase in the code path to match a bind-aware cursor, since it requires more intelligent checks. This feature should not, however, impact cursors which are not yet marked bind-aware.

Q: What triggers a cursor to be marked "bind sensitive"?
A: Our goal is to consider many types of predicates where the selectivity can change when the bind value changes.
In this first version of the feature, we only handle equality predicates where a histogram exists on the column and range predicates (with or without histogram). We do not currently consider LIKE predicates, but it is on the top of our list for future work.

Q: Also it sounds like the optimizer is using the number of rows returned to decided that it's time for a new plan...
A: I am not going to go into the details of the "special sauce" for how we decide to mark a cursor bind-aware. The number of rows processed is one input.

Q: Are you planning a hint to mark statements as bind-aware ?
A: Yes, we plan to add this in the future. This will allow users to bypass the startup cost of automatically determining that a query is a good candidate for bind-aware cursor sharing.
Categories: DBA Blogs, Development

Pages