Dominic Brooks

Subscribe to Dominic Brooks feed Dominic Brooks
Dominic Brooks on Oracle Performance, Tuning, Data Quality & Sensible Design ... (Now with added Sets Appeal)
Updated: 1 hour 44 min ago

Developing for the Oracle Database

Wed, 2018-05-09 09:49

1 Coding Standards
What this document does not claim to offer is a prescriptive guide on the minutiae of coding standards.

Coding standards and naming conventions, particularly in SQL and PL/SQL, are an emotive and divisive subject.

This is largely the fault of a historical absence of mature IDEs and a lack of any rigorous, generally accepted standards as seen with other languages.

Many developers still hand-craft scripts and routines and the various tools available often have differences, subtle or otherwise, in what built-in formatting they can offer.

Good developers can adapt, good standards can adapt.

The most important objectives for coding standards are to :
• Make development faster and debugging easier
• Make easier the understanding of other people’s code
• Limit bugs

The following expectations support these objectives:
• Where possible use SQL before using PLSQL
• Code will be reasonably formatted and legible, preferably with a consistent style within the module at least.
• It is preferred but not mandated that Oracle keywords be capitalized – e.g. CREATE OR REPLACE PACKAGE, SELECT, UPDATE, DELETE, FROM – and lowercase used for objects and columns, etc
• In SELECT statements, tables should be aliased – this provides a very small benefit to the optimizer when parsing but also prevents bugs particularly in subqueries.
• Procedures, cursors and variables, etc should be properly scoped – e.g. public vs private, global vs local, parameter scope, etc
• Meaningful names will be given for code items
• Reasonable naming might include some sort of prefixed or suffixed indicator of purpose, e.g. k_ for constants, l_ or v_ for local variables, g_ for global variables, p_ for procedure, f_ for function, _pkg for package, i_ for in parameters, o_for out parameters, io_ for in out parameters.
• Package and procedure level comments should indicate why/when a particular program was changed but SVN, or other code respositories, are the appropriate mechanism for code control.
• Code comments should be used when they add value.
• Excessive commenting and stating the obvious should be avoided – these are often more effective when refactoring the code concerned into a private routine (procedure/function) which is named appropriately (e.g. function f_format_swift_string).
• CamelCase is not considered appropriate for the database as all code is stored in the db as uppercase.
• Package headers and bodies should be checked into separate files for clarity and to prevent unnecessary recompilcation of unchanged code and dependencies (version dependent)
• Performance should be built in and evidence of such documented.

2 Writing Optimal SQL

2.1 Key points
Writing Optimal SQL should be relatively simple but many people struggle particularly when making the transition from an object/attribute language to a set-based language like SQL.

The key tenets of performant database code in Oracle are:
• Think in sets.
• Think about how the database might be able to process and apply logic to a set of data with great efficiency.
• Ask the right question in the best way.
• Know your data.

In support, when thinking about database code and SQL operations:

• If the query needs to be long/big, make it long/big.
• Bulk operations are critical, row-by-row operations are a cardinal performance sin.
• Eliminate data at the earliest opportunity.
• Sort on the smallest possible set – if possible avoid aggregations, sorting and distinct operations on the largest sets of data.
• Use bind variable when you require shareable SQL and when bind variables make sense
• Use literals when literals make sense.
• Use a mix of binds and literals if appropriate.
• Avoid PL/SQL in SQL.
• Be careful of applying functions (TRUNC, etc) to columns in the WHERE clause.
• User-defined functions which are called from SQL and which themselves contain SQL are, almost without exception, unacceptable.
• Never rely on implicit datatype conversion. Use the correct datatypes for parameters and where possible convert parameters NOT columns.

2.2 Thinking in Sets

Crucial.

For further reading on thinking in sets, see:
http://explainextended.com/2009/07/12/double-thinking-in-sql/

2.3 What’s the question?

When writing SQL, focus on the question being asked by the SQL statement.

If you put the question into words as a comment before a complex SQL statement, then this can often add value to the next developer.

Often the most performant version of a SQL statement is the one which asks the question at hand in the most natural way.

To this end, proper consideration needs to be given to:
• Subqueries – EXISTS / IN / NOT EXISTS / NOT IN
• Set-operators – MINUS, UNION, UNION ALL, INTERSECT
• Use of DISTINCT is often an indication of a wrong sql statement or poor design
• Common Table Expressions:

Often it can help to use of Common Table Expressions (CTE), aka the WITH clause, for separating the main logic of the query from the subsequent fetching of additional data/attributes, e.g.

WITH main_logic AS
(SELECT …
FROM …
WHERE …)
SELECT ml.*, x.this, y.that, z.something_else
FROM main_logic ml
, …
WHERE …;

2.4 To ANSI or Not To ANSI

Another divisive subject is ANSI SQL vs Oracle Join syntax.

Again, this guide should not seek to be prescriptive on the preference of one over the other.

The bottom line should be that if a developer finds it easier to write a correct and optimal SQL statement using one rather than the other, then that is most important.

There are some SQL statement constructs which are more conveniently written in ANSI – the FULL OUTER JOIN for example.

It is also true that the optimizer always transforms ANSI SQL to the equivalent Oracle syntax and there are some limitations to the optimizer’s other complex query transformations when using ANSI SQL.

And unfortunately there are bugs in both.

2.5 Eliminate Early

Where there are predicates (WHERE clauses) which can significantly reduce the dataset early, check that they are being applied early enough in the execution plan (more information to follow), check whether the SQL statement might be rephrased or reconstructed (CTE/WITH) to make sure they are applied at an appropriate stage.

2.6 Sort / Aggregate on the smallest possible dataset

Similar to eliminate early. Sorting and aggregating requires memory and under certain conditions can spill to expensive (unscalable) disk operations.
Wherever possible, do the sort or aggregation on the smallest set of rows (not necessarily applicable to the order by clause of a query).

2.7 What’s the big deal with PL/SQL functions called from SQL?

The bottom line is that it’s about performance.

We could get in a whole argument about reusability vs performance but performance eventually wins in the end.

Often the correct mechanism for reusability in the Oracle database is not a function but a view joined to appropriately in the main SQL.

Functions cause a relatively expensive context switch between SQL and PLSQL engines.

In the name of reusability, functions encourage row-by-row operations and discourage thinking in sets.

If the function itself contains SQL, then this SQL will not be part of the read consistency mechanism of the calling statements which can be potentially problematic.

If you absolutely have to, have to, have to use functions in SQL, then think again.

Then if you really, really do then please look at deterministic functions and consider wrapping the function call in a (select from dual) to expose the potential benefits of subquery caching for functions called with repeated parameters.

2.8 What about simple functions like TRUNC in the WHERE clause?

Using functions on columns in the WHERE clause can prevent the optimizer from using an index or from pruning a partition unless a function-based index is in place on the column.

For this reason, it is often best to avoid this sort of construct:
WHERE TRUNC(some_date_column) = TO_DATE(’01-NOV-2013’,’DD-MON-YYYY’)

In favour of this:
WHERE some_date_column) >= TO_DATE(’01-NOV-2013’,’DD-MON-YYYY’)
AND some_date_column) < TO_DATE(’02-NOV-2013’,’DD-MON-YYYY’)

2.9 Using the correct datatype, be explicit

Performance problems related to using the incorrect datatypes are common.

The optimizer will implicitly add functions to make sure the datatypes on both sides of the predicate match.

Always convert date-like parameters to DATEs where the column datatype is also DATE.

Never rely on implicit datatype conversion.

3 Execution Plans & Metrics – What we want, why we want it and how to get it
We have flown through some aspects of how to have a better chance of writing an optimal SQL statement.

3.1 How can we tell if it’s optimal?

Run it.

Run it twice to rule out the effect of uncontrollable factors like OS caching, SAN caching, etc.

Run it on representative data.

Run it on current volumes.

Run it on expected future volumes.

Then what?

In order to validate that our SQL statement is likely to have effective performance, what we want is the actual execution plan used and preferably the actual rowsource metrics.

3.2 How?

3.2.1 Serial (i.e. non-parallel) Execution Plans

In general, the following is usually a good approach across a variety of tools – SQL Developer, Toad and SQL*Plus for example:
Alter session set statistics_level = all;
–bind setup
Var bind1 number
Exec :bind1 := …;

–run target sql statement
select ….
— fetch execution plan and metrics
select * from table(dbms_xplan.display_cursor(null,null,’allstats last’));

Then run as a script.

Firstly, for getting actual execution metrics we can do one of two things prior to running the SQL statement concerned:
1. Add the /*+ gather_plan_statistics */ hint to the SQL or
2. In the same session, run alter session set statistics_level = all;

Then run the target SQL and immediately afterwards run this select:

select * from table(dbms_xplan.display_cursor(null,null,’allstats last’));

This is a convenient wrapper to get the execution plan and metrics from V$SQL_PLAN.

The first parameter is SQL_ID and by passing in NULL, we default to the previous SQL_ID run in this session.

The second parameter is CHILD_CURSOR_NO and this should be the previous child_id for the previous sql_id.
The third parameter is the FORMAT and ‘ALLSTATS LAST’ format says to get all statistics for the last execution.

If this works this should produce an output which is examined in more detail in section 6.

3.2.2 What if this doesn’t work?

If you find you don’t have privilege to run these commands – you need access to V$SESSION for example to use DBMS_XPLAN.DISPLAY_CURSOR – then you need privilege. There is no reason for privilege not to be given.

Otherwise the approach above is effective 90% of the time.

For parallel execution plans, see section 3.2.3 below.

However, in addition and specifically to SQL Developer, there are some recursive operations run by the tool which means that SQL Developer runs some internal commands such that when our DBMS_XPLAN statement runs, the previous SQL ID is no longer our target SQL statement.

There is one such example in SQL Developer 3 related to timestamp columns which affects the test script when running everything as a script (F5). In this case, there are two alternatives. Firstly, run the individual commands in SQL Developer as Run Statement (F9 / Ctrl + Enter). Alternatively, just comment out the timestamp columns in the SELECT part of the statement, for the purposes of this exercise.

Furthermore, in SQL Developer 4 there are further changes to recursive operations which seem to affect some statements.

In all such cases, if the output of the DBMS_XPLAN.DISPLAY_CURSOR is not the execution plan of the statement being profiled then the approach should be to identify the SQL statement in the shared pool (Look for matching SQL_TEXT in V$SQL) and plug the specific SQL_ID into the first argument of the DBMS_XPLAN call (no need to rerun the target SQL statement).

3.2.3 Parallel Execution Plans

For parallel execution plans, the approach of using DBMS_XPLAN.DISPLAY_CURSOR with the format of ‘ALLSTATS LAST’ is not appropriate because it fetches the execution metrics from the last execution of the statement – which is the Query Coordinator (QC) and does not include the metrics of the parallel slaves.

A better approach for parallel execution plans is to use real time sql monitoring and the easiest way to do this is to run the following and capture the output report:
select dbms_sqltune.report_sql_monitor(‘’) from dual;

This requires you to identify the SQL_ID of the target sql statement from V$SQL (seek matching text in SQL_TEXT/SQL_

FULLTEXT column).
It may also require you to add the /*+ monitor */ hint to your SQL statement as by default this only kicks in on executions which last longer than a default number of seconds (2?) and for statements which are less than a certain length

3.2.4 When all else fails

Fall back on SQL Trace.
Alter session set sql_trace = true;
SELECT….
Alter session set sql_trace = false;

This produces a trace file on the database server and the trace file name can be identified by:
select * from v$diag_info where name = ‘Default Trace File’;

This can be run through TKPROF to get the execution metrics but TKPROF can also lie about the execution plan so this should be double checked in V$SQL_PLAN or by using DBMS_XPLAN.

In rare circumstances and if all the above alternatives are unavailable or impractical for some reason, only then might EXPLAIN PLAN or AUTOTRACE be acceptable.

For example, in any modern version of Oracle, you can do the following
explain plan for select…;
select * from table(dbms_xplan.display);

Now this is not useless but, for numerous reasons, EXPLAIN PLAN cannot be trusted and is not sufficient for our purposes.

AUTOTRACE also does not tell the truth (because it itself relies on EXPLAIN PLAN).
EXPLAIN PLAN is an estimate of what the execution plan for a SQL statement will be.

It doesn’t peek at binds.

It assumes all binds are VARCHAR2.

3.3 Why are we doing this?

We want this information documented as part of the change, attached to the Jira or whatever tool using for change management and included in any code review.

The most effective mechanism for tuning SQL is “Tuning by Cardinality Feedback”: http://www.centrexcc.com/Tuning%20by%20Cardinality%20Feedback.pdf

This follows the principle that:
“if an access plan is not optimal it is because the cardinality estimate for one or more of the row sources is grossly incorrect”
and
“the cbo (cost-based optimizer) does an excellent job of finding the best access plan for a given sql provided it is able to accurately estimate the cardinalities of the row sources in the plan”

By gathering the actual execution plan and the actual execution metrics, we can show whether the optimizer was accurate in its estimations and if it was accurate, then, from a developer perspective and for the purposes of most code reviews, there is a good likelihood that the SQL is good enough for the optimizer to do a good job with.

4 Interpretation of Execution Plans and Execution Metrics

If we’ve been lucky we should have the actual execution plan and the executions metrics.

4.1 What are we looking for? How do we interpret it?

Providing a thorough guide on how to interpret most variations of execution plans is beyond the scope of this guide, although we ill provide a basic guide in Appendix A.

Essentially, what we want to see in the execution metrics is that the optimizer’s estimates are broadly accurate.

How accurate?

In general, we shouldn’t necessarily be overly concerned until we get to a factor of 10x or even more.

Estimated 100K rows, Actual 1M rows – probably not too bothered.

Estimate of 1 row, Actual 10000 rows – likely to be significant inefficiencies in either join order, join mechanism and/or access path.

And when we are looking at estimates vs actual, we need to consider the “Starts” so what we are looking for is that “Starts * E-rows” is in the right ballpark compared to “A-rows”. For more information, please see Appendix A.

Here are a couple of examples:
SQL_ID fst03j2p1czpb, child number 0
————————————-
select * from t1 , t2 where t1.col1 = t2.col1
Plan hash value: 1838229974

————————————————————————————-
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
————————————————————————————-
| 0 | SELECT STATEMENT | | 1 | | 10000 |00:00:00.03 | 1172 |
|* 1 | HASH JOIN | | 1 | 10000 | 10000 |00:00:00.03 | 1172 |
| 2 | TABLE ACCESS FULL| T1 | 1 | 10000 | 10000 |00:00:00.01 | 576 |
| 3 | TABLE ACCESS FULL| T2 | 1 | 10000 | 10000 |00:00:00.01 | 596 |
————————————————————————————-

Predicate Information (identified by operation id):
—————————————————

1 – access(“T1″.”COL1″=”T2”.”COL1″)

In the above, the estimates are accurate so there is a very good chance that this is a good plan.

Here’s another, this time not so good because the estimate of rows in T1 was 1 whereas the actual was 10000.

This led the optimizer to choose an index access path over a full table scan and a NESTED LOOP rather than a HASH JOIN.
SQL_ID 9quvuvkf8tzwj, child number 0
————————————-
select /*+ cardinality(t1 1) */ * from t1 , t2 where t1.col1 =
t2.col1

Plan hash value: 931243032

——————————————————————————————————–
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
——————————————————————————————————–
| 0 | SELECT STATEMENT | | 1 | | 10000 |00:00:00.04 | 12640 |
| 1 | NESTED LOOPS | | 1 | | 10000 |00:00:00.04 | 12640 |
| 2 | NESTED LOOPS | | 1 | 1 | 10000 |00:00:00.02 | 2640 |
| 3 | TABLE ACCESS FULL | T1 | 1 | 1 | 10000 |00:00:00.01 | 596 |
|* 4 | INDEX UNIQUE SCAN | SYS_C00446778 | 10000 | 1 | 10000 |00:00:00.01 | 2044 |
| 5 | TABLE ACCESS BY INDEX ROWID| T2 | 10000 | 1 | 10000 |00:00:00.02 | 10000 |
——————————————————————————————————–

Predicate Information (identified by operation id):
—————————————————

4 – access(“T1″.”COL1″=”T2”.”COL1″)

4.2 Do I need to worry about things like NESTED LOOPS vs HASH JOINS?

For the purposes of this exercise, no but the more knowledge the better.

Accuracy of estimates should be sufficient.

The remainder of the information should be attached to the change tool for review.

5 Appendix A: Basic guide to reading an execution plan

Using the following execution plan from a two table join:
SQL_ID 9quvuvkf8tzwj, child number 0
————————————-
select /*+ cardinality(t1 1) */ * from t1 , t2 where t1.col1 =
t2.col1

Plan hash value: 931243032

—————————————————————————————-
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time |
—————————————————————————————-
| 0 | SELECT STATEMENT | | 1 | | 10000 |00:00:00.04 |
| 1 | NESTED LOOPS | | 1 | | 10000 |00:00:00.04 |
| 2 | NESTED LOOPS | | 1 | 1 | 10000 |00:00:00.02 |
| 3 | TABLE ACCESS FULL | T1 | 1 | 1 | 10000 |00:00:00.01 |
|* 4 | INDEX UNIQUE SCAN | SYS_C.. | 10000 | 1 | 10000 |00:00:00.01 |
| 5 | TABLE ACCESS BY INDEX ROWID| T2 | 10000 | 1 | 10000 |00:00:00.02 |
—————————————————————————————-

Predicate Information (identified by operation id):
—————————————————

4 – access(“T1″.”COL1″=”T2”.”COL1″)

There are four key elements:
• The SQL statement
• The SQL ID – a hash value of the sql statement, usually consistent between databases and even across versions
• The execution plan
• The predicate section – not to be overlooked. Can highlight issues with implicit functions and datatype conversions amongst other things

For the execution plan itself there are a number of elements to be concerned with:
• Optimizer – all modern version of Oracle use the Cost-based Optimizer (CBO). This uses statistics and cost calculations to choose a best-cost plan for execution.

• Cost – Cost is an estimated indicator of time which the optimizer uses to compare execution plan possibilities, usually choosing the lowest cost plan. However, to all intents and purposes, developers should ignore it.

• Cardinality – An estimate of the number of rows for a particular rowsource, for a particular join, etc. Exposed in the execution plan as E-Rows for estimate and A-Rows for actuals. When comparing E-Rows to A-Rows it is important to take Starts into account, i.e to compare “Starts * E-Rows” to A-Rows. The Nested loop operations for example will have multiple starts for the inner/probed rowsource.

• Parent:child operations – An execution plan is generally a succession of parent:child operations – follow and match the indentation. A join mechanism should have two children.

• Join mechanism – A join mechanism joins two rowsources. There are a variety of mechanisms but in general there are two main methods depending on the cardinalities:

o NESTED LOOP – Essentially a FOR LOOP – For each row in the outer/driving rowsource, probe the inner/probed rowsource. Generally used for low cardinality rowsources.

o HASH JOIN – Hash all the outer/driving rowsource based on the join key(s) then hash all the inner rowsource. Generally used for high cardinality rowsources. If the cardinality estimate is too low, work area sizes used for hashing maybe too small and spill to temp space on disk – slow/unscalable

• Join order – Depending on the cardinalities, the optimizer can choose to join T1 to T2 or T2 to T1. The number of permutations for join order is N! where N is the number of tables being joined. The optimizer will limit itself to a maximum number of permutations to evaluate.

• Access path – how the data is fetched from the table, i.e. by index via various different index access mechanisms or by tablescan, etc.

• Buffers – A measure of logical IO. See below.

5.1 What happens first in the execution plan?
—————————————————————————————-
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time |
—————————————————————————————-
| 0 | SELECT STATEMENT | | 1 | | 10000 |00:00:00.04 |
| 1 | NESTED LOOPS | | 1 | | 10000 |00:00:00.04 |
| 2 | NESTED LOOPS | | 1 | 1 | 10000 |00:00:00.02 |
| 3 | TABLE ACCESS FULL | T1 | 1 | 1 | 10000 |00:00:00.01 |
|* 4 | INDEX UNIQUE SCAN | SYS_C.. | 10000 | 1 | 10000 |00:00:00.01 |
| 5 | TABLE ACCESS BY INDEX ROWID| T2 | 10000 | 1 | 10000 |00:00:00.02 |
—————————————————————————————-
Predicate Information (identified by operation id):
—————————————————
4 – access(“T1″.”COL1″=”T2”.”COL1″)

There are a couple of exceptions but in general the execution plan starts at the first operation without a child.

So, following the indentation, the first operation without a child is:
| 3 | TABLE ACCESS FULL | T1 | 1 | 1 | 10000 |00:00:00.01 |

This is considered to be the inner/driving rowsource of the parent operation at:
| 2 | NESTED LOOPS | | 1 | 1 | 10000 |00:00:00.02 |

For each row in this rowsource, we probe the inner rowsource:
|* 4 | INDEX UNIQUE SCAN | SYS_C.. | 10000 | 1 | 10000 |00:00:00.01 |

Which is actually an index lookup on the primary key using the predicate:
4 – access(“T1″.”COL1″=”T2”.”COL1″)

The data produced by this join is then used in the parent operation:
| 1 | NESTED LOOPS | | 1 | | 10000 |00:00:00.04 |

Which uses the rowids from the unique index/primary key for table T2 to get the actual table data from T2:
| 5 | TABLE ACCESS BY INDEX ROWID| T2 | 10000 | 1 | 10000 |00:00:00.02 |

5.2 Red flags?

• Row estimates of 1:
o The minimum row estimate is 1 and in some cases this actually means 0.
o If this is not a primary key access and there really isn’t 0/1, then are there any statistics for this object?
o Row estimates of 0/1 where the actual number of rows is significantly more than 1 can cause significant performance problems

• MERGE JOIN CARTESIAN + BUFFER SORT – particularly where the estimate is 1. Can be particularly detrimental if the actual rows are greater than 1. Rarely a good operation but can be symptomatic of a missing join.

• Implicit datatype conversions

• Nested loop operations where the inner/probed table/rowsource is a FULL segment scan.

• VIEW operations – symptomatic of a non-mergeable view which may or may not be a problem

• FILTER operations where the row-by-row operation is significant

5.3 Is there anything else to look out for?

Yes, that buffers column is a measure of logical IO.

When comparing different ways of doing things, when tuning SQL, one of the key measures that should be targeted is a reduction in logical IO.

If one approach uses significantly less logical IO compared to another approach then that is significant. The statement with the lower IO is likely to be better, is more likely to benefit from having more of the data it’s interested in cached and is less likely to impact other queries and the caching of other data.

There should probably be a rule of thumb about the ratio of logical IO to rows fetched. The difficulty is picking the right indicators.

If a query selects 100 rows from 100 million buffer gets and those all-important estimates are reasonably accurate, this should be a strong signal that perhaps the indexes are not optimal for that particular query.

As a rule of thumb, a ratio of a couple of consistent gets or less per row is damn good. 100,000s or millions may well be an indicator of significant inefficiencies.

But, as always, it depends.

It also significantly depends on whether the query itself is fast enough for the business requirement and whether it has the potential to impact other users of the database.

Furthermore, one lone query is unlikely to justify a new index but that is beyond the scope of this guide.

5.4 Further Reading

A 10053 Trace provides a very detailed walkthrough of the optimizer’s process of coming up with the execution plan. Not for the faint-hearted.
http://www.oracle.com/technetwork/database/bi-datawarehousing/twp-explain-the-explain-plan-052011-393674.pdf
Jonathan Lewis: Cost-based Oracle Fundamentals: http://www.apress.com/9781590596364

6 Appendix B Interacting with the database
The Oracle database is a very powerful piece of software.

It’s also likely to be one of the most expensive pieces of software in an application’s tech stack.

The keys to a performant Oracle database also differ significantly from other vendors.

How you do best approach something in Sybase or SQL Server is not necessarily how you should do something in Oracle.

One classic example is the use of temporary tables.

Developers should know how to get the best out of a particular database.

To treat it like a bit bucket or a “slow, dumb backup datastore” is to waste money and resources.

6.1 Vendor-specific Database features

Application developers should not be overly wary of using a feature particular to the Oracle database. Some tools can make it difficult to use vendor-specific features or optimizations but an investment in time and effort to do so can reap significant performance benefits.

Whilst this attitude might be relevant for third-party product developers who have to write software that can be installed on different database vendors, this is largely not true of enterprises writing internal software systems.

It is unlikely that the Oracle database on a particular system will be replaced by another vendor database.

It is far more likely that a Java component interacting with the database will eventually be replaced by a C# component or that the usage of the Oracle database will be deprecated in favour of caching and NOSQL technologies, so if you’re going to use SQL, use Oracle-specific features where they offer benefit.

6.2 Round-tripping

The default fetchsize for JDBC and for SQL*Plus is 10. The default is almost never appropriate for general usage as many SQL statements can be expected to fetch significantly more than 10 rows and therefore significant gains can be made by increasing this setting beyond the default.

The issue is not only about roundtrips across the network, it’s also related to the logical IO that a query needs to do. If you ask for just 10 rows, the database will do all the IO it needs to do to fetch the first ten rows. When you ask for the next 10 rows, the server process on the database might well have to do a logical read of some of the same blocks as the previous fetch which can lead to significant inefficiencies compared to a larger fetchsize.

6.3 Abstraction & Interfacing with the database

Abstraction is a principle that is put on a pedestal in the middle tier and yet often abandoned when interacting with the database.

Put simply if SQL is embedded in Java code then this introduces unnecessary dependencies on the database model and limits the ability to make subtle changes to the SQL or to the model without making a change to the application server code and doing an app server release.

Views, procedures and packages can all provide an interface to the database and the data model.

6.4 It’s all about the data.

Interacting with data appropriately, regardless of database vendor, is crucial.

Think in Sets.

Also consider the success of Engineered Systems like Oracle Exadata.

One of the things that Exadata focuses on, for example, is being able to eliminate redundant data as early as possible.

This means that the logic in the storage cells can eliminate the data before it even gets to the traditional database memory, before it goes anywhere near the network, long before it goes up to the application.

And it can do this with significant degrees of parallelism, usually with far more efficiency than similar processing in application threads.

Why is this relevant?

Eliminate early.

Let the database do the work it was designed to do.

Applications should let the database give them the smallest set of data that they need and should not bring excessive amounts of data into the middle tier for elimination and aggregation there.

Volumes of data are exploding. The best chances of scaling efficiently to deal with these volumes of data are to interact with the data appropriately.

Blocking Prepared XA Transaction

Sat, 2018-01-20 06:20

There was an oracle-l thread last month about blocking sessions which could not be identified.

I replied back about a very niche scenario which I’d seen a few times before. And it’s just happened again so I thought it would be useful to just post some evidence on this.

Here are a couple of articles posted previously on XA transactions:

First, a reminder, from the posts above, that a transaction doesn’t have to have a session.

Normal “vanilla” sessions, there’s a 1:1 relationship. But with an XA transaction, a session can attach and detach – but only one session can be attached to a transaction at any one time.

And a bit of context about these XA transactions.

This is a complex Java application with multiple resources participating in the XA transaction. There was HornetQ involved and HornetQ was apparently timing out and was bounced, the taking down being down probably with a kill -9.

And an XA transaction does what is called a 2-phase commit where the first phase involves the transaction co-ordinator instructing the participating resources to prepare to commit and subsequently the second phase is to commit.

What I have seen many times on this application is that something happens to the transaction co-ordinator in between the PREPARE and the COMMIT and we are left we an orphaned transaction.

This transaction will eventually time-out – depending on the properties set on the transaction by the co-ordinator – at which point it will become an in-doubt transaction and be visible in dba_2pc_pending.

Back to reality… there is nothing in DBA_2PC_PENDING:

select * from dba_2pc_pending;

no rows selected

But we can see there is a transaction in prepared state in V$TRANSACTION, a transaction which started yesterday afternoon:

select * from v$transaction where status = 'PREPARED';
ADDR                 XIDUSN    XIDSLOT     XIDSQN     UBAFIL     UBABLK     UBASQN     UBAREC STATUS           START_TIME           START_SCNB START_SCNW START_UEXT START_UBAFIL START_UBABLK START_UBASQN START_UBAREC SES_ADDR               FLAG SPACE RECURSIVE NOUNDO PTX NAME                                                                                                                                                                                                                                                             PRV_XIDUSN PRV_XIDSLT PRV_XIDSQN PTX_XIDUSN PTX_XIDSLT PTX_XIDSQN     DSCN-B     DSCN-W  USED_UBLK  USED_UREC     LOG_IO     PHY_IO     CR_GET  CR_CHANGE START_DATE            DSCN_BASE  DSCN_WRAP  START_SCN DEPENDENT_SCN XID              PRV_XID          PTX_XID        
---------------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------------- -------------------- ---------- ---------- ---------- ------------ ------------ ------------ ------------ ---------------- ---------- ----- --------- ------ --- ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- -------------------- ---------- ---------- ---------- ------------- ---------------- ---------------- ----------------
0000004A833D6868        691         32     315541         22    1085802     -28624         18 PREPARED         01/19/18 15:39:45    3454176189       2913         13           22      1085802       -28624            9 0000004B82E584A0    4201987 NO    NO        NO     NO                                                                                                                                                                                                                                                                            0          0          0          0          0          0          0          0          1         12     -40287       -163  -66597824      -1385 19-JAN-2018 15:39:45          0          0    1.3E+13             0 B302200095D00400 0000000000000000 0000000000000000 


And this is the script which I mentioned on the oracle-l thread which is one I use to see what locks transaction are holding, when the transaction started and what sessions are attached:

select s.machine
,      lo.inst_id
,      lo.object_id
,      lo.session_id
,      lo.os_user_name
,      lo.process
,      ob.owner
,      ob.object_name
,      ob.subobject_name
,      tx.addr
,      tx.start_time txn_start_time
,      tx.status
,      tx.xid
,      s.*
from   gv$locked_object lo
,      dba_objects      ob
,      gv$transaction    tx
,      gv$session        s
where  ob.object_id = lo.object_id
and    tx.xidusn    (+) = lo.xidusn
and    tx.xidslot   (+) = lo.xidslot
and    tx.xidsqn    (+) = lo.xidsqn
and    s.taddr      (+) = tx.addr
order by txn_start_time, session_id, object_name;

For privacy reasons and as this is a real-world situation and not an isolated test case, I won’t share the output of the script.

But it shows that:

  • the transaction is holding mode 6 exclusive TX row locks on a number of objects
  • that the transaction is in PREPARED
  • and that there is no session attached to the transaction (although v$locked_object does tell us what the SID was when it was there)

Now, from the application perspective, something has apparently rolled back a message perhaps because now HornetQ has been bounced, everything is back up and running and it seems like the message that resulted in our orphaned transaction is being reattempted and is being blocked by the exclusive locks still being held.

From an ACTIVE SESSION HISTORY perspective, this is what we see from this script from which I’ve removed columns for brevity and privacy but left them in so you can see what I run normally:

select count(*) over (partition by h.sample_time) sess_cnt
--,      h.user_id
--,      (select username from dba_users u where u.user_id = h.user_id) u, h.service_hash
,      xid--, sample_id
, sample_time, session_state, session_id, session_serial#,--sql_id,
sql_exec_id, sql_exec_start, event, --p1,
mod(p1,16), blocking_session,blocking_session_serial#--, current_obj#
--,      (select object_name||' - '||subobject_name from dba_objects where object_id = current_obj#) obj
--,      (select sql_fulltext from v$sql s where s.sql_id = h.sql_id and rownum = 1) sqltxt
--,      (select sql_text from dba_hist_sqltext s where s.sql_id = h.sql_id and rownum = 1) sqltxt
--, h.*
from   v$active_session_history h
where event = 'enq: TX - row lock contention'
order by h.sample_id desc;
XID              SESSION_STATE SESSION_ID SESSION_SERIAL# EVENT                                                            MOD(P1,16) BLOCKING_SESSION BLOCKING_SESSION_SERIAL#
---------------- ------------- ---------- --------------- ---------------------------------------------------------------- ---------- ---------------- ------------------------
4F021700A3C00600 WAITING              232           53035 enq: TX - row lock contention                                             6
FC010B00196E0A00 WAITING              471            5205 enq: TX - row lock contention                                             6
670320004FA50300 WAITING             2652           11791 enq: TX - row lock contention                                             6
640204005BA40500 WAITING             4300           49665 enq: TX - row lock contention                                             6

So, you can see we have four sessions being blocked on exclusive mode 6 row level locks and that the blocking session is null. If I showed you the full history then you would see that these sessions have been repeatedly trying for many hours.

BTW, this is RAC but all these sessions are intentionally on the same instance so there’s none of that jiggery-pokery involved.

I imagine at some point there was an internal conversation in Oracle about whether to report blocking session or blocking transaction.

At this point, it’s just a normal lock held by a “normal” transaction which hasn’t committed yet and actually the application just keeps trying to run the transaction waits for 1 minute until they hit the default distributed transaction timeout:

ORA-02049: timeout: distributed transaction waiting for lock

which will be logged somewhere obscure in the application logs – and there’ll be a brief pause and then it starts all over again.

Anyway at this point what to do?

At this point, the transaction hasn’t timed out.

In a few hours, the transaction will time out and become an in-doubt transaction.

Once it does, the application will receive a message:

ORA-01591: lock held by in-doubt distributed transaction

At which time, it can be manually resolved.

Typically, this is what we usually do:

begin
for txn in (select local_tran_id from dba_2pc_pending)
loop
-- if txn is not already forced rollback
execute immediate 'rollback force '''||txn.local_tran_id||'''';
commit;
dbms_transaction.purge_lost_db_entry(txn.local_tran_id);
commit;
end loop;
end;
/

If we were going to intervene right now and resolve it, what could we do?

We could bounce the database. Then the application would become in-doubt and see above.

But bouncing the database is quite drastic.

A lot of the advice in the manual is about what to do once it is IN-DOUBT. Some of it might work now – not sure.

What I advise is that if we know we want to rollback or commit this transaction now and we don’t want to wait for it to become IN-DOUBT which often we can’t wait for then programatically we can attach to the transaction using DBMS_XA and do something with it.

First of all, we need some information about the transaction.
CAVEAT DBA!!! This isn’t exactly well documented but I have found that what we tend to need is to identify the transactions in V$GLOBAL_TRANSACTION which are in COLLECTING state:

select state
,      UTL_RAW.CAST_TO_BINARY_INTEGER (globalid)
,      UTL_RAW.CAST_TO_BINARY_INTEGER (branchid)
,      t.* 
from v$global_transaction t where state = 'COLLECTING';
STATE                                  UTL_RAW.CAST_TO_BINARY_INTEGER(GLOBALID) UTL_RAW.CAST_TO_BINARY_INTEGER(BRANCHID)   FORMATID GLOBALID                                                                                                                         BRANCHID                                                                                                                           BRANCHES   REFCOUNT PREPARECOUNT STATE                                       FLAGS COUPLING      
-------------------------------------- ---------------------------------------- ---------------------------------------- ---------- -------------------------------------------------------------------------------------------------------------------------------- -------------------------------------------------------------------------------------------------------------------------------- ---------- ---------- ------------ -------------------------------------- ---------- ---------------
COLLECTING                                                                   49                                       45     131075 312D2D35363832376566363A393236643A35613562363664363A633738353036                                                                 2D35363832376566363A393236643A35613562363664363A633738353065                                                                              1          1            1 COLLECTING                                      1 TIGHTLY COUPLED 


Then we can replace the formatid, global id and branch id in the script below. Whether you require numbers or raw depends on the signature to DBMS_XA_XID – see documentation.

set serveroutput on
DECLARE
l_xid     DBMS_XA_XID :=
DBMS_XA_XID(131075,
'312D2D35363832376566363A393236643A35613562363664363A633738353036',
'2D35363832376566363A393236643A35613562363664363A633738353065');
l_return  PLS_INTEGER;
BEGIN
l_return := SYS.dbms_xa.xa_rollback(xid =&gt;  l_xid);
dbms_output.put_line(l_return);
END;
/

This approach above comes with no guarantees.
But it has worked for me several times in the past.

Internal Audit Policies

Tue, 2017-06-27 13:23

The Oracle RDBMS market has long been a-shrinking.

A future as an Oracle specialist definitely looks like being past its shelf-life.

Of course, having spent many years as such you will have many transferable skills anyway, transferable to other products and languages within administration, build, performance, troubleshooting or design and development. And if you’re lucky to a number of these. It’s highly likely that you’ve not been a monogamous technologist anyway.

However, what frustrates me, I mean really, really frustrates me are internal audit policies.

Oracle is filled with plenty of fabulous features.

But when I can’t use a significant number of those then it makes me mad… to the extent that I see this all coming to an end prematurely. Unnecessarily.

I can’t use DBMS_PARALLEL_EXECUTE.

Fabulous feature.

From http://docs.oracle.com/database/122/ARPLS/DBMS_PARALLEL_EXECUTE.htm#ARPLS67331 :

To execute chunks in parallel, the user must have CREATE JOB system privilege.

Well, the application schema isn’t allowed CREATE JOB (or access to DBMS_SCHEDULER) because it’s against the internal audit policy.

(And I also hate it when I’m told that “anyway DBMS_JOB is deprecated” when it’s not, at least not until 12.2 and even then they deprecation is dubious because of transactionality or rather the continued lack thereof in the replacement.)

If I’m granted CREATE JOB then apparently I’m, or rather my application is, likely to suddenly go bonkers creating jobs left, right and centre when I should be using standard job infrastructure like Autosys or Control+M or whatever you use. (Even though I wouldn’t because I’m responsible for what I and my application do. And if I do irresponsible things, in general I expect sanctions and consequences)

I can make a very reasonable and well-reasoned argument as to why I should be using DBMS_PARALLEL_EXECUTE but ultimately I’m arguing with a checklist, even if there is a Jobsworth holding it.

Bottom line is that if I want to do multi-threaded work like this via DBMS_PARALLEL_EXECUTE, I can’t. But if I want to achieve the same thing in one thousand Java threads, that is perfectly fine. This is just not a level playing field!!!

My reasonable response is ok but can’t the DBAs provide a controlled developer toolkit of wrappers around such policied functionality?

To which the answer is also no. Because then the DBAs would potentially be responsible for any issues arising out of that usage.

The internal audit policy is largely driven by external policy coming from generic regulators and auditors (so I am told regularly although I don’t have direct access to the external policy).

The bank is being subjected to severe regulation and those regulations cannot be put on hold whilst alternative solutions are sought. Regulators and auditors have no concern when it comes to the lower level problems that are caused when they see high level gaps that need to be closed.

Yeah… so the application has to deal with the consequences even when the policy is wrong. Sorry. It’s wrong. Or it’s poorly implemented.

The list of what I can and can’t do (not personally remember, this is the main application schema, not me or a representation of me) grows all the time.

This DBMS_PARALLEL_EXECUTE and CREATE JOB is actually an old wound re-opened by refusal to be allowed to use an application context / SYS_CONTEXT.

The application needs CREATE ANY CONTEXT.

No, this is an elevated privilege. Not allowed.

Why is it an elevated privilege? Oh you mean the ANY word? That’s sort of misleading.

Checklist says no.

It’s not the same as CREATE ANY TABLE which allows you to create a table in ANY schema. Contexts are global. Security for actually setting the context value is via a named schema.package.

Checklist says no.

So the application can’t use VPD or any other worthy use case of using applications contexts.

Checklist says no.

So, can the DBA create the context for the application? Or provide some wrapper or toolkit to create it.

Checklist says no.

Um… ok. So that’s it then.

It’s just plain bonkers.

These are just two examples of many.

I’m had many a battle over the “can’t grant a privilege directly to a schema, has to be via a role” conversation. Which then leads to the explanation to the DBA of how PLSQL works. Oh so tedious.

I’m not 100% sure to what extent this is just local law enforcement or whether this is more widespread.

I’m pretty sure the latter but there are bound to be pockets of sanity and common sense out there somewhere.

Bottom line is that this will accelerate the shrinking Oracle RDBMS market.

People whose “speciality” are Oracle checklists are contributing to the demise and they will hopefully be able to transfer their checklist enforcement skills to other markets.

And shouldn’t product owners within Oracle surely be aware and be working with regulators and auditors to address these issues and in turn work with their customers to make sure the what they have discussed and agreed is carried down to the lower level? Also asking too much.

Bonkers. Bonkers. Bonkers. And been getting worse for years.

Working with Oracle is becoming unworkable.

My first article in nearly a year… and not a technical article… just a moan… and a reflection that I’m doing less and less Oracle work.
(But I’m not writing regularly on my other learnings either: http://thereplmc.wordpress.com)

It’s just a shame.


Outer Join with OR and Lateral View Decorrelation

Wed, 2016-07-06 11:33

Use of ANSI SQL is a personal thing.

Historically I have not been a fan apart from where it makes things easier/possible.

This reticence was mainly due to optimizer bugs and limitations in the earlier days.

Recently I have been using it much more because I find that the developers I interact with prefer it / understand it better.

You might/should be aware that Oracle will rewrite ANSI SQL to an Oracle syntax representation, this transformation being listed in the optimizer trace file.

You might/should also be aware that Oracle outer join syntax does not allow OR or IN:

drop table t1;
drop table t2;

create table t1
as
select floor((rownum+1)/2) col1
,      case when mod(rownum,2) = 0 then 1 else 2 end col2
,      10 col3
from   dual
connect by rownum <= 20;

create table t2
as
select rownum col1
,      case when mod(rownum,2) = 0 then 2 else 1 end col3
from   dual
connect by rownum <= 10;

select *
from   t1
,      t2
where  t1.col1 = t2.col1 (+) 
and  ((t1.col2 = 1
and    t2.col3 (+) > t1.col3)
or    (t1.col2 = 2
and    t2.col3 (+) < t1.col3));

ORA-01719: outer join operator (+) not allowed in operand of OR or IN

ANSI SQL remedies this:

alter session tracefile_identifier = 'domlg1';
alter session set events 'trace[rdbms.SQL_Optimizer.*]';
select *
from   t1
left join t2
on    t1.col1 = t2.col1
and ((t1.col2 = 1
and   t2.col3 > t1.col3)
or   (t1.col2 = 2
and   t2.col3 < t1.col3));

alter session set events 'trace off';

But it comes at a price.

Note the execution plan:

----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |    20 |  1300 |    42   (0)| 00:00:01 |
|   1 |  NESTED LOOPS OUTER |      |    20 |  1300 |    42   (0)| 00:00:01 |
|   2 |   TABLE ACCESS FULL | T1   |    20 |   780 |     2   (0)| 00:00:01 |
|   3 |   VIEW              |      |     1 |    26 |     2   (0)| 00:00:01 |
|*  4 |    TABLE ACCESS FULL| T2   |     1 |    26 |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - filter("T1"."COL1"="T2"."COL1" AND ("T1"."COL2"=1 AND
              "T2"."COL3">"T1"."COL3" OR "T1"."COL2"=2 AND "T2"."COL3"<"T1"."COL3"))   

Now, maybe you will have better luck than me but no matter what I try I cannot change the NESTED LOOPS OUTER operation.
So, if that lateral view involves some full table scans or other significant operations, they might be very expense on the outer operation of a nested loop.

The reason is in the optimizer trace.

Query after View Removal
******* UNPARSED QUERY IS ********
SELECT "T1."COL1" "COL1", "T1."COL2" "COL2", "T1."COL3" "COL3", "VW_LAT_AE9E49E8"."ITEM_1_0" "COL1", "VW_LAT_AE9E49E8"."ITEM_2_1" "COL3" FROM "DOM"."T1" "T1", LATERAL( (SELECT "T2"."COL1" "ITEM_1_0", "T2"."COL3" "ITEM_2_1" FROM "DOM"."T2" "T2" WHERE "T1"."COL1"="T2"."COL1" AND ("T1"."COL2"=1 AND "T2"."COL3">"T1"."COL3" OR "T1"."COL2"=2 AND "T2"."COL3" < "T1"."COL3"))) (+) "VW_LAT_AE9E49E8"
DCL:Checking validity of lateral view decorrelation SEL$BCD4421C (#1)
DCL: Bypassed: view has non-well-formed predicate
DCL: Failed decorrelation validity for lateral view block SEL$BCD4421C (#1)

The OR prevents the decorrelation which seems to mean that we’re stuck with a NESTED LOOP for now.

Further Reading on ANSI:
Oracle Optimizer Blog
Jonathan Lewis on ANSI Outer
Jonathan Lewis on ANSI


INSERT ALL caveat

Thu, 2016-05-26 11:33

Why you might want to think twice about using INSERT ALL.

One of those things I knew and then forgot.

So, let’s say you’ve got three tables or a partitioned table or something like that.

Let’s use regional tables for simplicity.

drop table t1_r1;
drop table t1_r2;
drop table t1_r3;

create table t1_r1
(col1 varchar2(2) not null
,col2 number not null
,check( col1 in ('R1')));

create table t1_r2
(col1 varchar2(2) not null
,col2 number not null
,check( col1 in ('R2')));

create table t1_r3
(col1 varchar2(2) not null
,col2 number not null
,check( col1 in ('R3')));

insert into t1_r1 values ('R1',1);
insert into t1_r2 values ('R2',1);
insert into t1_r3 values ('R3',1);

commit;

And you want a routine that will insert into one of those tables depending on region.

And you’re a simple fellow, so you go with an IF statement:

create or replace procedure p1 (
  col1 in varchar2, 
  col2 in number
)
as
begin
  if col1 = 'R1'
  then
      insert into t1_r1 values(col1,col2);
  elsif col1 = 'R2'
  then
      insert into t1_r3 values(col1,col2);
  else 
      insert into t1_r3 values(col1,col2);
  end if;
end p1;
/

Procedure P1 compiled

And then in the same session you run this uncommitted:

exec p1('R1',2);

PL/SQL procedure successfully completed.

And then in another session you decide to truncate table T1_R3:

truncate table t1_r3;

Table T1_R3 truncated.

No problem.
None was expected.

However…

Let’s say that we decide to tidy up that procedure and get rid of some of the repetition by using an INSERT ALL statement.
I will use a standalone sql statement just to demonstrate a further minor aspect rather than using a procedure with a bound parameter.

insert all
when col1 = 'R1' then into t1_r1
when col1 = 'R2' then into t1_r2
when col1 = 'R3' then into t1_r3
select 'R1' col1,2 col2
from dual;

1 row inserted.

Let’s revisit the truncate:

truncate table t1_r3;

SQL Error: ORA-00054: resource busy and acquire with NOWAIT specified or timeout expired
00054. 00000 -  "resource busy and acquire with NOWAIT specified or timeout expired"
*Cause:    Interested resource is busy.
*Action:   Retry if necessary or increase timeout.

TM share locks from the INSERT ALL on all three possible targets prevent the TRUNCATE.

So, a simple/simplisitic illustration of why you might want to think twice about whether INSERT ALL is the best feature for your use case, based on a real life problem.


Side-effect of ALTER SESSION SET CURRENT_SCHEMA

Mon, 2016-04-18 11:11

For most of our database set-ups we use a different TEMP space for application users than for end-user/support/developer/reporting usage.
The intention is to minimise the potential impact of a rogue ad-hoc query on the main applicaiton.

However, turns out this is ineffective IF you use:

ALTER SESSION SET CURRENT_SCHEMA = main_app_schema;

This is documented but I wasn’t previously aware.
No surprise that this learning opportunity was presented as a direct result of a rogue query on a prod DB blowing out the main TEMP space.

Deduction quickly suggested that the above must be the case and it was quickly proven by a test case and also supported by the documentation .

the database uses the temporary tablespace of the specified schema for sorts, joins, and storage of temporary database objects

Obviously… not using ALTER SESSION and using fully qualified object names prevents this “vulnerability”… but that’s not good enough unfortunately.


Did it really fix it 2: Plan flip, unprofileable, baseline OFE round 2 and “stability”

Mon, 2015-11-16 12:31

Snappy title, huh?

Aka: Why a sql plan baseline may be no guarantee of stability.

The other day, a problematic plan flip…

Tangent 1:
Cue much discussion about plan flips etc.
My thoughts on stability are that the priority for most application owners is stability and predictability but that does not tally with the defaul CBO behaviour and potentially you have to turn off a helluva lot to even get close.

Tangent 2:
I have commented before about the common sensitive of many databases to concurrent direct path reads (or rather the sensitivity is on the underlying SAN).

Here is one such illustration caused by this particular plan flip.

See that our single block read times go out from a “normal” range of 5 – 20ms to an average of 50+ ms. At this point anything doing physical IO starts to notice.
(Needless to say, I’m not a fan of these “normal” IO times – I think they’re awful but the SLA of the tier 2 (non flash) SAN storage is 20ms so we don’t alert until we breach that.)
sbr

Source:

select snap_id, begin_time, end_time, round(average,2)
from   dba_hist_sysmetric_summary
where  metric_name     = 'Average Synchronous Single-Block Read Latency'
order by snap_id;

Back to reality
Anyway, standard immediate-term fix for a plan flip is to hunt down a better plan from AWR and fix it with a profile or a baseline.

Not a problem to find one for this SQL as it nearly always executed with the same decent plan.
(AWR history is much longer than this but it gives the picture)

select to_char(trunc(sn.end_interval_time),'DD-MON-YYYY') dt
,      st.sql_id
,      st.plan_hash_value
,      sum(rows_processed_delta) rws
,      sum(executions_delta)     execs
,      round(sum(elapsed_time_delta)/1000/1000)   elp
,      round(sum(elapsed_time_delta)/1000/1000/nvl(nullif(sum(executions_delta),0),1),2)   elpe
,      round(sum(iowait_delta)/1000/1000)         io
,      round(sum(cpu_time_delta)/1000/1000)       cpu
,      sum(buffer_gets_delta)    gets
,      round(sum(disk_reads_delta))         disk_reads
from   dba_hist_snapshot sn
,      dba_hist_sqlstat  st
where  st.snap_id            = sn.snap_id
and    sn.instance_number = st.instance_number
and    st.sql_id             IN ('6xrx006fmqbq2')
group by trunc(sn.end_interval_time), st.sql_id, st.plan_hash_value
order by trunc(sn.end_interval_time) desc, elp desc; 
DT          SQL_ID        PLAN_HASH_VALUE        RWS      EXECS        ELP       ELPE         IO        CPU       GETS DISK_READS
----------- ------------- --------------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
10-NOV-2015 6xrx006fmqbq2      3871506300          5       1815     236211     130.14     226340       9993  757853211  745588219 
09-NOV-2015 6xrx006fmqbq2      3871506300          1       1232     156412     126.96     149660       6846  507571690  499305166 
09-NOV-2015 6xrx006fmqbq2      3803735407         53       6835        743        .11          3        723  190488567         41 
07-NOV-2015 6xrx006fmqbq2      3803735407         31       3079        230        .07          3        218   37385043         56 
06-NOV-2015 6xrx006fmqbq2      3803735407        166      18931       1200        .06         11       1128  180142678        484 
05-NOV-2015 6xrx006fmqbq2      3803735407        305       9904        553        .06          5        508   75239139         93 
04-NOV-2015 6xrx006fmqbq2      3803735407        160      18900       1089        .06          9       1027  150523728        204 
03-NOV-2015 6xrx006fmqbq2      3803735407        226      12048        859        .07         58        794  154111239        424 
02-NOV-2015 6xrx006fmqbq2      3803735407       1081      13327       1276         .1         66       1200  278129234        465 

Problem plan is obvious, right?

Started during the day on the 9th, continued overnight until eventually noticed.

PHV 3871506300 does a FTS and gets direct path reads which causes our IO stress.

So, as mentioned, no problem, hint it with a sql profile via COE_XFR_SQL_PROFILE.sql (see Metalink note 215187.1)

But it didn’t work.

Not properly.

See top line.

DT          SQL_ID        PLAN_HASH_VALUE        RWS      EXECS        ELP       ELPE         IO        CPU       GETS DISK_READS
----------- ------------- --------------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
10-NOV-2015 6xrx006fmqbq2      3353364157         20       4485     114084      25.44          4     111246 2907305774        216
10-NOV-2015 6xrx006fmqbq2      3871506300          5       1815     236211     130.14     226340       9993  757853211  745588219 
09-NOV-2015 6xrx006fmqbq2      3871506300          1       1232     156412     126.96     149660       6846  507571690  499305166 
09-NOV-2015 6xrx006fmqbq2      3803735407         53       6835        743        .11          3        723  190488567         41 
07-NOV-2015 6xrx006fmqbq2      3803735407         31       3079        230        .07          3        218   37385043         56 
06-NOV-2015 6xrx006fmqbq2      3803735407        166      18931       1200        .06         11       1128  180142678        484 
05-NOV-2015 6xrx006fmqbq2      3803735407        305       9904        553        .06          5        508   75239139         93 
04-NOV-2015 6xrx006fmqbq2      3803735407        160      18900       1089        .06          9       1027  150523728        204 
03-NOV-2015 6xrx006fmqbq2      3803735407        226      12048        859        .07         58        794  154111239        424 
02-NOV-2015 6xrx006fmqbq2      3803735407       1081      13327       1276         .1         66       1200  278129234        465 

We didn’t get “good” plan PHV 3803735407, we avoided “nasty” plan PHV 3871506300 but we got not-quite-so-nasty-but-still-not-nice PHV 3353364157.

The SQL Profile was not able to reproduce the desired PHV.

But at least PHV was not doing the problematic direct path reads!

I took the hints from PHV 3803735407 via

select * from table(dbms_xplan.display_awr('6xrx006fmqbq2',3803735407,format=>'+OUTLINE'));

I double checked they matched the hints in the SQL Profile – they did.

I tried using the hints manually in a SQL statement and the result was consistent with the SQL Profile – PHV 3353364157.

At this point, the DBA suggested using a SQL Plan Baseline.

I resisted but eventually gave in.

Why would it make any difference?

When a plan cannot be reproduced then there are good reasons why a baseline is much, much safer than a profile.

A baseline must reproduce the desired PHV or it will be rejected completely.

A profile has no knowledge of PHV so it is possible that under certain circumstances the hints in a profile may be only partially applied.

Profile was dropped and baseline was created.

This was the immediate aftermath (top line)… Success!

And V$SQL was showing that the baseline was being used.

DT          SQL_ID        PLAN_HASH_VALUE        RWS      EXECS        ELP       ELPE         IO        CPU       GETS DISK_READS
----------- ------------- --------------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
10-NOV-2015 6xrx006fmqbq2      3803735407          4       1866        120        .06          1        116   17401768         30 
10-NOV-2015 6xrx006fmqbq2      3353364157         20       4485     114084      25.44          4     111246 2907305774        216
10-NOV-2015 6xrx006fmqbq2      3871506300          5       1815     236211     130.14     226340       9993  757853211  745588219 
09-NOV-2015 6xrx006fmqbq2      3871506300          1       1232     156412     126.96     149660       6846  507571690  499305166 
09-NOV-2015 6xrx006fmqbq2      3871506300          1       1232     156412     126.96     149660       6846  507571690  499305166 
09-NOV-2015 6xrx006fmqbq2      3803735407         53       6835        743        .11          3        723  190488567         41 
07-NOV-2015 6xrx006fmqbq2      3803735407         31       3079        230        .07          3        218   37385043         56 
06-NOV-2015 6xrx006fmqbq2      3803735407        166      18931       1200        .06         11       1128  180142678        484 
05-NOV-2015 6xrx006fmqbq2      3803735407        305       9904        553        .06          5        508   75239139         93 
04-NOV-2015 6xrx006fmqbq2      3803735407        160      18900       1089        .06          9       1027  150523728        204 
03-NOV-2015 6xrx006fmqbq2      3803735407        226      12048        859        .07         58        794  154111239        424 
02-NOV-2015 6xrx006fmqbq2      3803735407       1081      13327       1276         .1         66       1200  278129234        465 

I maintained that this didn’t make sense but the ticket was closed so I couldn’t get an SPM trace.

At this point a copy of the statement without profile and without baseline and without the manual fullset of hints was returning the original PHV 3803735407.

I went away that night and on the train I had a thought: “what if it wasn’t round 1 of the baseline which worked but round 2”.

Reminder of the baseline reproduction system

Round 2 is OFE only.

Doesn’t matter whether it is round 1 or round 2, if it’s reproduced it’s reproduced.

But if it was under OFE only then that was no guarantee of stability.

Then the following night, late night call, the baseline was no longer reproducing PHV 3803735407 but was back to the big problem PHV 3871506300!

Although I didn’t have an SPM trace, I’m adamant it verified my theory about it being round 2 only.

DT          SQL_ID        PLAN_HASH_VALUE        RWS      EXECS        ELP       ELPE         IO        CPU       GETS DISK_READS
----------- ------------- --------------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
11-NOV-2015 6xrx006fmqbq2      3871506300          0        471      80513     170.94      77936       2543  193096137  188992511 
11-NOV-2015 6xrx006fmqbq2      3803735407        212       9933        690        .07         14        586   97253038        154 
10-NOV-2015 6xrx006fmqbq2      3803735407          4       1866        120        .06          1        116   17401768         30 
10-NOV-2015 6xrx006fmqbq2      3353364157         20       4485     114084      25.44          4     111246 2907305774        216
10-NOV-2015 6xrx006fmqbq2      3871506300          5       1815     236211     130.14     226340       9993  757853211  745588219 
09-NOV-2015 6xrx006fmqbq2      3871506300          1       1232     156412     126.96     149660       6846  507571690  499305166 
09-NOV-2015 6xrx006fmqbq2      3871506300          1       1232     156412     126.96     149660       6846  507571690  499305166 
09-NOV-2015 6xrx006fmqbq2      3803735407         53       6835        743        .11          3        723  190488567         41 
07-NOV-2015 6xrx006fmqbq2      3803735407         31       3079        230        .07          3        218   37385043         56 
06-NOV-2015 6xrx006fmqbq2      3803735407        166      18931       1200        .06         11       1128  180142678        484 
05-NOV-2015 6xrx006fmqbq2      3803735407        305       9904        553        .06          5        508   75239139         93 
04-NOV-2015 6xrx006fmqbq2      3803735407        160      18900       1089        .06          9       1027  150523728        204 
03-NOV-2015 6xrx006fmqbq2      3803735407        226      12048        859        .07         58        794  154111239        424 
02-NOV-2015 6xrx006fmqbq2      3803735407       1081      13327       1276         .1         66       1200  278129234        465 

Note that at times of change, no database structures were being changed.
Stats will have been gathered, sure – that was the agent of change no doubt.
But no index maintenance was happening, nothing was going unusable, no partition maintenance was happening.
Nothing was changing which should mean that a particular plan was impossible.

This means that the set of hints for this particular plan was not deterministic or rather the hintset was surely incomplete in some way which mean that the optimizer was free to do certain transformations which meant that the hintset was then invalid? Not 100% sure, still to be looked at in more detail…

At this point we dropped the baseline and went with a cut-down sql profile which applied only two hints – to politely request index scans for the two problematic FTS in each of PHV 3871506300 and 3353364157.

And this worked and has been ok since (so far) but it will not be the longer term solution.

DT          SQL_ID        PLAN_HASH_VALUE        RWS      EXECS        ELP       ELPE         IO        CPU       GETS DISK_READS
----------- ------------- --------------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
16-NOV-2015 6xrx006fmqbq2      3803735407         80       6887        348        .05         11        310   44574494         86 
13-NOV-2015 6xrx006fmqbq2      3803735407        114      13054        842        .06         19        713  117260543        482 
12-NOV-2015 6xrx006fmqbq2      3803735407        585      28074       1854        .07         12       1823  321890748       1983 
11-NOV-2015 6xrx006fmqbq2      3871506300          0        471      80513     170.94      77936       2543  193096137  188992511 
11-NOV-2015 6xrx006fmqbq2      3803735407        212       9933        690        .07         14        586   97253038        154 
10-NOV-2015 6xrx006fmqbq2      3871506300          5       1815     236211     130.14     226340       9993  757853211  745588219 
10-NOV-2015 6xrx006fmqbq2      3353364157         20       4485     114084      25.44          4     111246 2907305774        216 
10-NOV-2015 6xrx006fmqbq2      3803735407          4       1866        120        .06          1        116   17401768         30 
09-NOV-2015 6xrx006fmqbq2      3871506300          1       1232     156412     126.96     149660       6846  507571690  499305166 
09-NOV-2015 6xrx006fmqbq2      3803735407         53       6835        743        .11          3        723  190488567         41 
07-NOV-2015 6xrx006fmqbq2      3803735407         31       3079        230        .07          3        218   37385043         56 
06-NOV-2015 6xrx006fmqbq2      3803735407        166      18931       1200        .06         11       1128  180142678        484 
05-NOV-2015 6xrx006fmqbq2      3803735407        305       9904        553        .06          5        508   75239139         93 
04-NOV-2015 6xrx006fmqbq2      3803735407        160      18900       1089        .06          9       1027  150523728        204 
03-NOV-2015 6xrx006fmqbq2      3803735407        226      12048        859        .07         58        794  154111239        424 
02-NOV-2015 6xrx006fmqbq2      3803735407       1081      13327       1276         .1         66       1200  278129234        465

Never before seen an execution plan which had quite this problem interacting with SPM.

A test case to get to the root of the problem is still on the TODO list.

Something to do with a statement involving a WITH subquery and NO_MERGE
e.g.

WITH x AS (SELECT /*+ no_merge...)
SELECT 
FROM ...
WHERE EXISTS (SELECT
              FROM   x
              ...
              WHERE...);

Did it really fix it 1: OFE & nasty FIRST_ROWS plan on 11.2.0.4

Mon, 2015-11-16 10:50

Plan degradations on upgrade are normal.

This one’s no different.

On further investigation, turned out application was setting optimizer_mode = first_rows somewhere.

First point about this is that first_rows really shouldn’t be used either as a hint or as an optimizer_mode.

What does FIRST_ROWS mean?
From 11g doco:

The optimizer uses a mix of costs and heuristics to find a best plan for fast delivery of the first few rows.

If any sort of FIRST_ROWS type intervention is required, then it might be better to be using a more modern, fully-costed FIRST_ROWS_n (where n = 1 | 10 | 100 | 1000).
If it’s even required.

The 12c documentation is clearer about this:

FIRST_ROWS is available for backward compatibility and plan stability; use FIRST_ROWS_n instead.

But I don’t think we should attach too much credence to that statement not being in the 11g documentation.

Here’s the problem plan.

Pre upgrade, snippet of the plan looked as per below.
Please note position of T1.

-------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                   | Name                        | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                            |                             |    34 | 13668 |  3003   (1)| 00:00:22 |       |       |
|   1 |  SORT ORDER BY                              |                             |    34 | 13668 |  3003   (1)| 00:00:22 |       |       |
|   2 |   NESTED LOOPS OUTER                        |                             |    34 | 13668 |  3002   (1)| 00:00:22 |       |       |
|   3 |    NESTED LOOPS OUTER                       |                             |    34 | 12648 |  2968   (1)| 00:00:22 |       |       |
|   4 |     NESTED LOOPS OUTER                      |                             |    34 | 12274 |  2935   (1)| 00:00:22 |       |       |
|   5 |      NESTED LOOPS OUTER                     |                             |    34 | 11900 |  2867   (1)| 00:00:21 |       |       |
|   6 |       NESTED LOOPS OUTER                    |                             |    34 | 11628 |  2833   (1)| 00:00:21 |       |       |
|   7 |        NESTED LOOPS OUTER                   |                             |    34 | 10404 |  2799   (1)| 00:00:21 |       |       |
|   8 |         NESTED LOOPS OUTER                  |                             |    34 |  8228 |  2765   (1)| 00:00:20 |       |       |
|   9 |          NESTED LOOPS                       |                             |    34 |  6732 |  2731   (1)| 00:00:20 |       |       |
|  10 |           NESTED LOOPS                      |                             |    69 | 12282 |  2524   (1)| 00:00:19 |       |       |
|  11 |            NESTED LOOPS                     |                             |    69 |  9591 |  2386   (1)| 00:00:18 |       |       |
|* 12 |             TABLE ACCESS FULL               | T1                          |  1658 |   213K|   725   (3)| 00:00:06 |       |       |
|* 13 |             TABLE ACCESS BY INDEX ROWID     | T2                          |     1 |     7 |     1   (0)| 00:00:01 |       |       |
|* 14 |              INDEX UNIQUE SCAN              | T2_U1                       |     1 |       |     0   (0)| 00:00:01 |       |       |
|  15 |            TABLE ACCESS BY INDEX ROWID      | T3                          |     1 |    39 |     2   (0)| 00:00:01 |       |       |
|* 16 |             INDEX UNIQUE SCAN               | T3_U1                       |     1 |       |     1   (0)| 00:00:01 |       |       |
|  17 |           TABLE ACCESS BY GLOBAL INDEX ROWID| T4                          |     1 |    20 |     3   (0)| 00:00:01 | ROWID | ROWID |
|* 18 |            INDEX RANGE SCAN                 | T4_N1                       |     1 |       |     2   (0)| 00:00:01 |       |       |

Post upgrade, plan has degraded to:

--------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                    | Name                        | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
--------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                             |                             |    69 | 27738 |   508M  (3)|999:59:59 |       |       |
|   1 |  NESTED LOOPS OUTER                          |                             |    69 | 27738 |   508M  (3)|999:59:59 |       |       |
|   2 |   NESTED LOOPS OUTER                         |                             |    69 | 26979 |   508M  (3)|999:59:59 |       |       |
|   3 |    NESTED LOOPS OUTER                        |                             |    69 | 26220 |   508M  (3)|999:59:59 |       |       |
|   4 |     NESTED LOOPS OUTER                       |                             |    69 | 23184 |   508M  (3)|999:59:59 |       |       |
|   5 |      NESTED LOOPS OUTER                      |                             |    69 | 22632 |   508M  (3)|999:59:59 |       |       |
|   6 |       NESTED LOOPS OUTER                     |                             |    69 | 20562 |   508M  (3)|999:59:59 |       |       |
|   7 |        NESTED LOOPS OUTER                    |                             |    69 | 16146 |   508M  (3)|999:59:59 |       |       |
|   8 |         NESTED LOOPS                         |                             |    69 | 13662 |   508M  (3)|999:59:59 |       |       |
|   9 |          NESTED LOOPS                        |                             |  1675 |   312K|   508M  (3)|999:59:59 |       |       |
|  10 |           NESTED LOOPS                       |                             |   703K|    39M|  3476K  (1)| 06:54:55 |       |       |
|* 11 |            TABLE ACCESS BY GLOBAL INDEX ROWID| T4                          |   703K|    13M|  2068K  (1)| 04:06:54 | ROWID | ROWID |
|  12 |             INDEX FULL SCAN                  | T4_N2                       |  2240K|       |  7943   (1)| 00:00:57 |       |       |
|  13 |            TABLE ACCESS BY INDEX ROWID       | T3                          |     1 |    39 |     2   (0)| 00:00:01 |       |       |
|* 14 |             INDEX UNIQUE SCAN                | T3_U1                       |     1 |       |     1   (0)| 00:00:01 |       |       |
|* 15 |           TABLE ACCESS FULL                  | T1                          |     1 |   132 |   718   (3)| 00:00:06 |       |       |

In the post-upgrade plan, the most pertinent part was step 15.

A FULL TABLE SCAN as the inner rowsource of a NESTED LOOP, in other words for every row in the driving rowsource do a full table scan.”

Standard reaction to such a degradation is to test the SQL with OPTIMIZER_FEATURES_ENABLE (OFE).

With OFE set either via OPT_PARAM hint or via ALTER SESSION then the SQL reverted to pre-upgrade plan.

This is good!

But… how much confidence can we have that this actually fixed the problem?

It would be great if we could find a bug which was definitively responsible for this change in execution plan.

OFE is an umbrella but it doesn’t turn off ALL fixes/optimizer features in a release.

Many of those fixes covered by the OFE umbrella are represented in V$SYSTEM_FIX_CONTROL

So, I went looking for anything which sounded related and went back to default 11.2.0.4 OFE and starting turning off the BUGNOs one-by-one – no difference, still a bad plan

Then still at OFE 11.2.0.4, I turned off all the BUGNOs…. no difference, still a bad plan.

At this point, my confidence in OFE as a fix for this problem was starting to waiver.

So, I tried to reduce the problem SQL to it’s smallest form representing the problem.

alter session set optimizer_mode = first_rows;
alter session set optimizer_features_enable = '11.2.0.4';

 explain plan for
 SELECT *
 FROM   T1
 ,      T3
 ,      T4
 WHERE  T1.COL1             = T3.COL1
 AND    T3.COL2             = T4.COL2
 AND    T1.DATE_COL        '+OUTLINE'));

This distilled statement reproduced the issue at the heart of the original statement:

----------------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name    | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |         |  6029 |  4533K|  1612M  (3)|999:59:59 |       |       |
|   1 |  NESTED LOOPS                        |         |  6029 |  4533K|  1612M  (3)|999:59:59 |       |       |
|   2 |   NESTED LOOPS                       |         |  2237K|   795M|  6551K  (1)| 13:01:53 |       |       |
|   3 |    TABLE ACCESS BY GLOBAL INDEX ROWID| T4      |  2240K|   188M|  2068K  (1)| 04:06:53 | ROWID | ROWID |
|   4 |     INDEX FULL SCAN                  | T4_N2   |  2240K|       |  7943   (1)| 00:00:57 |       |       |
|   5 |    TABLE ACCESS BY INDEX ROWID       | T3      |     1 |   285 |     2   (0)| 00:00:01 |       |       |
|*  6 |     INDEX UNIQUE SCAN                | T3_U1   |     1 |       |     1   (0)| 00:00:01 |       |       |
|*  7 |   TABLE ACCESS FULL                  | T1      |     1 |   397 |   718   (3)| 00:00:06 |       |       |
----------------------------------------------------------------------------------------------------------------

Is this test case a 100% indication that OFE wasn’t applicable to the original problem?
Not sure.
But at this point I had littleconfidence that OFE provided any sort of reliable fix to the original SQL statement.

I don’t know what the internal heuristics are/were for FIRST_ROWS.

But that is a stupid place for a full table scan regardless of the number of rows expected to return.

If we remove one of the T1 predicates above and the various row estimates go up then the FTS moves to a driving position.

I could/should get a 10053 trace but FIRST_ROWS shouldn’t be in use in the first place.

Solution – hunt down where the app code is doing the FIRST_ROWS thing and exterminate.

All the other possible OPTIMIZER_MODE modes gave a plan for the original SQL which started like this:

--------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                   | Name | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
--------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                            |      |    35 | 14105 |  1386   (2)| 00:00:10 |       |       |
|   1 |  SORT ORDER BY                              |      |    35 | 14105 |  1386   (2)| 00:00:10 |       |       |
|   2 |   NESTED LOOPS OUTER                        |      |    35 | 14105 |  1385   (2)| 00:00:10 |       |       |
|*  3 |    HASH JOIN OUTER                          |      |    35 | 12565 |  1350   (2)| 00:00:10 |       |       |
|*  4 |     HASH JOIN OUTER                         |      |    35 | 11305 |  1315   (2)| 00:00:10 |       |       |
|   5 |      NESTED LOOPS OUTER                     |      |    35 |  9065 |  1280   (2)| 00:00:10 |       |       |
|   6 |       NESTED LOOPS OUTER                    |      |    35 |  8015 |  1245   (2)| 00:00:09 |       |       |
|   7 |        NESTED LOOPS OUTER                   |      |    35 |  7735 |  1210   (2)| 00:00:09 |       |       |
|   8 |         NESTED LOOPS OUTER                  |      |    35 |  7350 |  1176   (2)| 00:00:09 |       |       |
|   9 |          NESTED LOOPS                       |      |    35 |  6965 |  1106   (2)| 00:00:08 |       |       |
|  10 |           NESTED LOOPS                      |      |    69 | 12351 |   899   (3)| 00:00:07 |       |       |
|* 11 |            HASH JOIN                        |      |    69 |  9660 |   761   (3)| 00:00:06 |       |       |
|* 12 |             TABLE ACCESS FULL               | T2   |   237 |  1659 |    36   (6)| 00:00:01 |       |       |
|* 13 |             TABLE ACCESS FULL               | T1   |  1671 |   217K|   725   (3)| 00:00:06 |       |       |

So…. when optimizer upgrade problems strike, you should always try OFE.

But it’s also worth further investigation to confirm that OFE really is a reliable fix for the problem.


SQL Tuning: Thinking in Sets / How and When to be Bushy

Thu, 2015-11-05 11:48

Below is a SQL statement from a performance problem I was looking at the other day.

This is a real-world bit of SQL which has slightly simplified and sanitised but, I hope, without losing the real-worldliness of it and the points driving this article.

You don’t really need to be familiar with the data or table structures (I wasn’t) as this is a commentary on SQL structure and why sometimes a rewrite is the best option.

SELECT bd.trade_id
,      bdp.portfolio_id
,      bd.deal_id 
,      bd.book_id
,      pd.deal_status   prev_deal_status
FROM   deals            bd
,      portfolios       bdp
,      deals            pd
,      portfolios       pdp
-- today's data
WHERE  bd.business_date         = :t_date
AND    bd.src_bus_date          < :t_date
AND    bd.type                  = 'Trade'
AND    bdp.ref_portfolio_id     = bd.book_id
-- yesterday's data
AND    pd.business_date         = :y_date
AND    pd.type                  = 'Trade'
AND    pdp.ref_portfolio_id     = pd.book_id
-- some join columns
AND    bd.trade_id              = pd.trade_id
AND    bdp.portfolio_id         = pdp.portfolio_id;

There is no particular problem with how the SQL statement is written per se.

It is written in what seems to be a standard developer way.

Call it the “lay everything on the table” approach.

This is a common developer attitude:

“Let’s just write a flat SQL structure and let Oracle figure the best way out.”

Hmmm… Let’s look at why this can be a problem.

First, what is the essential requirement of the SQL?

Compare information (deal status) that we had yesterday for a subset of deals/trades

Something like that anyway…

So … What is the problem?

The Optimizer tends to rewrite and transform any SQL we give it anyway and tries to flatten it out.

The SQL above is already flat so isn’t that a good thing? Shouldn’t there be less work for the optimizer to do?

No, not necessarily. Flat SQL immediately restricts our permutations.

The problem comes with how Oracle can take this flat SQL and join the relevant row sources to efficiently get to the relevant data.

Driving Rowsource

Let’s assume that we should drive from today’s deal statuses (where we actually drive from will depend on what the optimizer estimates / costs).

SELECT ...
FROM   deals            bd
,      portfolio        bdp
...
-- today's data
WHERE  bd.business_date         = :t_date
AND    bd.src_bus_date          < :t_date
AND    bd.type                  = 'Trade'
AND    bdp.ref_portfolio_id     = bd.book_id
....
Where do we go from here?

We want to join from today’s deals to yesterdays deals.

But the data for the two sets of deals data is established via the two table join (DEALS & PORTFOLIOS).

thinking_in_Sets_venn

We want to join on TRADE_ID which comes from the two DEALS tables and PORTFOLIO_ID which comes from the two PORTFOLIOS tables.

SELECT ...
FROM   ...
,      deals            pd
,      portfolios       pdp
WHERE  ...
-- yesterday's data
AND    pd.business_date         = :y_date
AND    pd.type                  = 'Trade'
AND    pdp.ref_portfolio_id     = pd.book_id
...

And joined to via:

AND    bd.trade_id              = pd.trade_id
AND    bdp.portfolio_id         = pdp.portfolio_id

So from our starting point of today’s business deals, we can either go to PD or to PDP, but not to both at the same time.

Hang on? What do you mean not to both at the same time?

For any multi-table join involving more than two tables, the Optimizer evaluates the different join tree permutations.

Left-Deep Tree

Oracle has a tendency to choose what is called a left-deep tree.

If you think about a join between two rowsources (left and right), a left-deep is one where the second child (the right input) is always a table.

NESTED LOOPS are always left-deep.

HASH JOINS can be left-deep or right-deep (normally left-deep as already mentioned)

Zigzags are also possible, a mixture of left-deep and right-deep.

Below is an image of a left-based tree based on the four table join above.

left_tree

Here is an execution plan which that left-deep tree might represent:

---------------------------------------------------------------
| Id  | Operation                               | Name        |
---------------------------------------------------------------
|   0 | SELECT STATEMENT                      |               |
|   1 |  NESTED LOOPS                         |               |
|   2 |   NESTED LOOPS                        |               |
|   3 |    NESTED LOOPS                       |               |
|   4 |     NESTED LOOPS                      |               |
|*  5 |      TABLE ACCESS BY ROWID            | DEALS         |
|*  6 |       INDEX RANGE SCAN                | DEALS_IDX01   |
|   7 |      TABLE ACCESS BY INDEX ROWID      | PORTFOLIOS    |
|*  8 |       INDEX UNIQUE SCAN               | PK_PORTFOLIOS |
|*  9 |     TABLE ACCESS BY INDEX ROWID       | DEALS         |
|* 10 |      INDEX RANGE SCAN                 | DEALS_IDX01   |
|* 11 |    INDEX UNIQUE SCAN                  | PK_PORTFOLIOS |
|* 12 |   TABLE ACCESS BY INDEX ROWID         | PORTFOLIOS    |
---------------------------------------------------------------

Predicate Information (identified by operation id): 
--------------------------------------------------- 
 
   5 - filter("BD"."TYPE"='Trade' AND "BD"."SRC_BUS_DATE"<:t_date) 
   6 - access("BD"."BUSINESS_DATE"=:t_date) 
   8 - access("BD"."BOOK_ID"="BDP"."REF_PORTFOLIO_ID") 
   9 - filter(("BD"."TYPE"='Trade' AND "BD"."TRADE_ID"="PD"."TRADE_ID")) 
  10 - access("PD"."BUSINESS_DATE"=:y_date) 
  11 - access("PD"."BOOK_ID"="PDP"."REF_PORTFOLIO_ID") 
  12 - filter("BDP"."PORTFOLIO_ID"="PDP"."PORTFOLIO_ID")

Right-Deep Tree

A right-deep tree is one where the first child, the left input, is a table.

Illustration not specific to the SQL above:

right_tree

Bushy Tree

For this particular SQL, this is more what we are looking for:

bushy_tree

The essence of the problem is that we cannot get what is called bushy join, not with the original flat SQL.

The Optimizer cannot do this by default. And this isn’t an approach that we can get at by hinting (nor would we want to if we could, of course!).

Rewrite Required

To get this bushy plan, we need to rewrite our SQL to be more explicit around the set-based approach required.

WITH subq_curr_deal AS
     (SELECT /*+ no_merge */
             bd.trade_id
      ,      bd.deal_id
      ,      bd.book_id
      ,      bdp.portfolio_id
      FROM   deals      bd
      ,      portfolios bdp
      WHERE  bd.business_date         = :t_date
      AND    bd.src_bus_date          < :t_date
      AND    bd.type                  = 'Trade'
      AND    bdp.ref_portfolio_id     = bd.book_id)
,    subq_prev_deal AS
     (SELECT /*+ no_merge */
             pd.trade_id 
      ,      pd.deal_status
      ,      pdp.portfolio_id
      FROM   deals      pd
      ,      portfolios pdp
      WHERE  pd.business_date         = :y_date
      AND    pd.type                  = 'Trade'
      AND    pdp.ref_portfolio_id     = pd.book_id)
SELECT cd.trade_id
,      cd.portfolio_id
,      cd.deal_id
,      cd.book_id 
,      pd.deal_status prev_deal_status
FROM   subq_curr_deal cd
,      subq_prev_deal pd
WHERE  cd.trade_id             = pd.trade_id
AND    cd.portfolio_id         = pd.portfolio_id;
How exactly does the rewrite help?

By writing the SQL deliberately with this structure, by using WITH to create subqueries in conjunction with no_merge, we are deliberately forcing the bushy join.

This is an example execution plan that this bushy tree might represent.

----------------------------------------------------------
| Id  | Operation                        | Name          |
----------------------------------------------------------
|   0 | SELECT STATEMENT                 |               |
|*  1 |  HASH JOIN                       |               |
|   2 |   VIEW                           |               |
|   3 |    NESTED LOOPS                  |               |
|   4 |     NESTED LOOPS                 |               |
|*  5 |      TABLE ACCESS BY INDEX ROWID | DEALS         |
|*  6 |       INDEX RANGE SCAN           | DEALS_IDX01   |
|*  7 |      INDEX UNIQUE SCAN           | PK_PORTFOLIOS |
|   8 |     TABLE ACCESS BY INDEX ROWID  | PORTFOLIOS    |
|   9 |   VIEW                           |               |
|  10 |    NESTED LOOPS                  |               |
|  11 |     NESTED LOOPS                 |               |
|* 12 |      TABLE ACCESS BY INDEX ROWID | DEALS         |
|* 13 |       INDEX RANGE SCAN           | DEALS_IDX01   |
|* 14 |      INDEX UNIQUE SCAN           | PK_PORTFOLIOS |
|  15 |     TABLE ACCESS BY INDEX ROWID  | PORTFOLIOS    |
----------------------------------------------------------
 
Predicate Information (identified by operation id): 
--------------------------------------------------- 
                                 
   1 - access("CD"."TRADE_ID"="PD"."TRADE_ID" AND "CD"."PORTFOLIO_ID"="PD"."PORTFOLIO_ID")
   5 - filter(("BD"."TYPE"='Trade' AND "BD"."SRC_BUS_DATE"<:t_date)) 
   6 - access("BD"."BUSINESS_DATE"=:t_date ) 
   7 - access("BD"."BOOK_ID"="BDP"."REF_PORTFOLIO_ID") 
  12 - filter("PD"."TYPE"='Trade') 
  13 - access("PD"."BUSINESS_DATE"=:y_date) 
  14 - access("PDP"."REF_PORTFOLIO_ID"="PD"."BOOK_ID")
Is this a recommendation to go use WITH everywhere?

No.

What about the no_merge hint?

No.
The no_merge hint is a tricky one. This is not necessarily a recommendation but its usage here prevents the Optimizer from flattening. I often find it goes hand-in-hand with this sort of deliberately structured SQL for that reason, and similar goes for push_pred.

Do developers need to know about left deep, right deep and bushy?

No, not at all.

Takeaways?

It helps to think in sets and about what sets of data you are joining and recognise when SQL should be deliberately structured.

Further Reading

http://www.oaktable.net/content/right-deep-left-deep-and-bushy-joins
https://tonyhasler.wordpress.com/2008/12/27/bushy-joins/
https://www.toadworld.com/platforms/oracle/b/weblog/archive/2014/06/16/hitchhikers-guide-to-explain-plan-8
https://jonathanlewis.wordpress.com/2007/01/24/left-deep-trees/
https://antognini.ch/top/


What AWR isn’t telling you

Fri, 2015-10-16 14:08

It is well-known that AWR, and Statspack before, take snapshots of V$ views (or rather the underlying objects) to produce the data in AWR.

It is also well-known that, when considering sql and its statistics in the shared pool, if something big hitting happens but the big-hitter is no longer in the shared pool by the time of the snapshot, then it can’t be recorded in your AWR picture of activity.

But like many things supposedly well-known, it can still come back and remind you that you’ve forgotten or overlooked this behaviour.

Here is a little illustration which happened earlier this week.

This week I’ve been looking closely at activity which happens on a particular DB between 16:00 and 17:00 because the IO subsystem is showing signs of stress and the timings for ‘Average Synchronous Single-Block Read Latency’ (V$SYSMETRIC_SUMMARY / DBA_HIST_SYSMETRIC_SUMMARY) have pushed above our amber warning levels.

Don’t get me started but our amber level for a single block read is 20ms!
Way too high for my liking for such sustained average but that is apparently the SLA from the SAN.

Why do we have such an alert?
The purpose is two-fold.

  1. It can sometimes be a warning that some of our IO-sensitive batch work might show some performance degradations, threatening SLAs
  2. In the past, it has been an effective warning that a sql statement executed concurrently in multiple threads from the applications has switched to a "bad" execution plan whose impact is system-wide (normally heavy concurrent direct path reads).

So… I was hovering around this system during the relevant timeframe looking at what was going on.

And I observed some heavy hitting queries running in parallel originating from another system (i.e. not my application).

There were two executions of one sql id (0cu2sm062jutc) executing concurrently with one single execution of a another sql statement.

The top-level RTSM execution metrics for these three executions were all very similar to this:

Executing for about 10 minutes, in parallel doing a lot of physical direct path reads (38GB) and accumulating over 1 hour in parallel processing time.

 Status              :  DONE (ALL ROWS)          
 Instance ID         :  2                        
 SQL ID              :  0cu2sm062jutc            
 SQL Execution ID    :  33554432                 
 Execution Started   :  10/13/2015 16:03:35      
 First Refresh Time  :  10/13/2015 16:03:37      
 Last Refresh Time   :  10/13/2015 16:13:19      
 Duration            :  584s                     
 Module/Action       :  JDBC Thin Client/-       
 Program             :  JDBC Thin Client         
 Fetch Calls         :  257                      

Global Stats
===================================================================================================================================
| Elapsed |   Cpu   |    IO    | Application | Concurrency | Cluster  |  Other   | Fetch | Buffer | Read  | Read  | Write | Write |
| Time(s) | Time(s) | Waits(s) |  Waits(s)   |  Waits(s)   | Waits(s) | Waits(s) | Calls |  Gets  | Reqs  | Bytes | Reqs  | Bytes |
===================================================================================================================================
|    4041 |     252 |     3782 |        0.06 |        1.75 |     0.97 |     3.95 |   257 |     6M | 75051 |  38GB |  1070 | 125MB |
===================================================================================================================================

And then whilst I was fiddling with various queries against V$SQL the queries in question disappeared from V$SQL.

No big surprise of course, stuff is liable to be released from the library cache whenever it sees fit.

But then the remembrance penny hit home and I was left thinking that it was no wonder that I couldn’t see this regularly when looking retrospectively at this timeframe via AWR.

But this cumulative exection time above is larger than the number 3 SQL statement in AWR by elapsed time … so I’m not making this up. It’s significant.

This got me thinking – how can we try to identify heavy-hitting SQL which is being missed by AWR – ASH surely? – and what can we do to help – COLORED SQL?

Here is my first attempt at such an ASH query, trying to find queries which have taken longer than 5 minutes in a single execution in a single snapshot period but which aren’t in AWR.

SELECT * 
FROM   (SELECT h.dbid
        ,      h.instance_number
        ,      h.snap_id
        ,      h.sql_id
        ,      h.sql_exec_id
        ,      h.sql_exec_start
        ,      h.qc_instance_id||'::'||h.qc_session_id||'::'||h.qc_session_serial# px_details
        ,      COUNT(*)
        ,      COUNT(DISTINCT h.session_id)
        --,      MIN(h.snap_id) min_snap
        --,      MAX(h.snap_id) max_snap
        ,      MIN(h.sample_time)
        ,      MAX(h.sample_time)
        ,    ((EXTRACT(DAY    FROM (MAX(h.sample_time) - MIN(h.sample_time)))*86400)
            + (EXTRACT(HOUR   FROM (MAX(h.sample_time) - MIN(h.sample_time)))*3600)
            + (EXTRACT(MINUTE FROM (MAX(h.sample_time) - MIN(h.sample_time)))*60)
            + (EXTRACT(SECOND FROM (MAX(h.sample_time) - MIN(h.sample_time))))) duration
        FROM   dba_hist_active_sess_history h
        ,      dba_hist_snapshot            s
        WHERE  h.dbid            = s.dbid
        AND    h.snap_id         = s.snap_id
        AND    h.instance_number = s.instance_number
        AND    sql_id           IS NOT NULL
        AND    sql_exec_id      IS NOT NULL
        AND    NOT EXISTS (SELECT 1
                           FROM   dba_hist_sqlstat st
                           WHERE  st.dbid            = h.dbid
                           AND    st.snap_id         = h.snap_id --BETWEEN xxx.min_snap AND xxx.max_snap
                           AND    st.instance_number = h.instance_number
                           AND    st.sql_id          = h.sql_id)
        GROUP BY 
               h.dbid
        --
        ,      h.snap_id
        --
        ,      h.instance_number
        ,      h.sql_id
        ,      h.sql_exec_id
        ,      h.sql_exec_start
        ,      h.qc_instance_id
        ,      h.qc_session_id
        ,      h.qc_session_serial#) xxx
WHERE  duration > 600
ORDER BY snap_id DESC, duration DESC;

The main problem with this query is that DURATION does not tell us how long the query has been active in the database.

We could have a query which has been active all that time but was doing relatively little in the DB.

I had a quick look (again) at TM_DELTA_TIME and TM_DELTA_DB_TIME but they are still unreliable – in some of the rows it looks ok, in others not.

Let’s forget that and pretend that the SQL above works ok.

Perhaps we can add a limit WHERE the COUNT(*) > threshold because otherwise this is currently giving me 100s of statements in my production database.

That gives me a token 166 statements over the past 30 days.

And that’s a problem because my intention was to use this to identify SQL statements to add to AWR via ADD_COLORED_SQL.
And that has a limit of 100 statements.

Ah… but no, the other penny has just dropped whilst writing this…

If I add color this SQL, that will make no difference. That just means AWR will record that SQL if it’s there.

But if it’s not in the shared pool at snapshot time, it’s not in the shared pool so it’s not in AWR.

Duh!

It’s almost as if we need a process which if a sql statement has breached a certain threshold then it won’t let it be flushed until MMON has written it to AWR.

No, that’s not right. That would present other problems.

I could shorten the AWR snapshot interval but I’m not convinced.

I think this is an oft-overlooked yet significant shortcoming.

So, for the moment at least, I’m left with what I think is a problem and I can’t think of a good solution… can you?

Or are you thinking mountain vs molehill?


Quick Cross-Period AWR Comparison

Thu, 2015-10-15 04:43

Here’s a query which I find useful in order to have a very quick comparison across AWR snapshots of the high level time model statistics.
The numbers should match those in the associated section in the AWR report.

If you feel compulsed, obsessively, with tuning then you may see some blips here and there which then encourage you to dive into the AWR detail for that snapshot.

Or quite often I get in in the morning and there might be an email about slow overnight processes and generic “database has been slow” enquiries and before I start hunting around to prove or disprove the allegations, this can be a useful frame.

WITH subq_snaps AS
(SELECT dbid                dbid
 ,      instance_number     inst
 ,      snap_id             e_snap
 ,      lag(snap_id) over (partition by instance_number, startup_time order by snap_id) b_snap
 ,      TO_CHAR(begin_interval_time,'D') b_day
 ,      TO_CHAR(begin_interval_time,'DD-MON-YYYY HH24:MI') b_time
 ,      TO_CHAR(end_interval_time,'HH24:MI')   e_time
 ,    ((extract(day    from (end_interval_time - begin_interval_time))*86400)
     + (extract(hour   from (end_interval_time - begin_interval_time))*3600)
     + (extract(minute from (end_interval_time - begin_interval_time))*60)
     + (extract(second from (end_interval_time - begin_interval_time)))) duration
 FROM   dba_hist_snapshot)
SELECT ss.inst
,      ss.b_snap
,      ss.e_snap
,      ss.b_time
,      ss.e_time
,      TO_CHAR(ROUND(MAX(CASE WHEN bm.stat_name = 'DB time' THEN em.value - bm.value END)/1000000/60,2),'999999990.99')                                  db_time
,      TO_CHAR(ROUND(MAX(CASE WHEN bm.stat_name = 'DB time' THEN em.value - bm.value END)/(ss.duration*1000000),1),'999999990.99')        aas
,      (SELECT round(average,2)
        FROM   dba_hist_sysmetric_summary sm
        WHERE  sm.dbid            = ss.dbid
        AND    sm.snap_id         = ss.e_snap
        AND    sm.instance_number = ss.inst
        AND    sm.metric_name     = 'Average Synchronous Single-Block Read Latency'
        AND    sm.group_id        = 2)                                                                                                                   assbl
,      (SELECT round(average,2)
FROM   dba_hist_sysmetric_summary sm
WHERE  sm.dbid            = ss.dbid
AND    sm.snap_id         = ss.e_snap
AND    sm.instance_number = ss.inst
AND    sm.metric_name     = 'Host CPU Utilization (%)'
AND    sm.group_id        = 2)                                                                                                                   cpu_util
,      TO_CHAR(ROUND(MAX(CASE WHEN bm.stat_name = 'DB CPU' THEN em.value - bm.value END)/1000000,2),'999999990.99')                                      db_cpu
,      TO_CHAR(ROUND(MAX(CASE WHEN bm.stat_name = 'sql execute elapsed time' THEN em.value - bm.value END)/1000000,2),'999999990.99')                    sql_time
,      TO_CHAR(ROUND(MAX(CASE WHEN bm.stat_name = 'PL/SQL execution elapsed time' THEN em.value - bm.value END)/1000000,2),'999999990.99')               plsql_time
,      TO_CHAR(ROUND(MAX(CASE WHEN bm.stat_name = 'parse time elapsed' THEN em.value - bm.value END)/1000000,2),'999999990.00')                          parse_time
,      TO_CHAR(ROUND(MAX(CASE WHEN bm.stat_name = 'failed parse elapsed time' THEN em.value - bm.value END)/1000000,2),'999999990.99')                   failed_parse
,      TO_CHAR(ROUND(MAX(CASE WHEN bm.stat_name = 'hard parse (sharing criteria) elapsed time' THEN em.value - bm.value END)/1000000,2),'999999990.99')  hard_parse_sharing
,      TO_CHAR(ROUND(MAX(CASE WHEN bm.stat_name = 'RMAN cpu time (backup/restore)' THEN em.value - bm.value END)/1000000,2),'999999990.99')              rman_cpu
,      TO_CHAR(ROUND(MAX(CASE WHEN bm.stat_name = 'connection management call elapsed time' THEN em.value - bm.value END)/1000000,2),'999999990.99')     connection_mgmt
,      TO_CHAR(ROUND(MAX(CASE WHEN bm.stat_name = 'sequence load elapsed time' THEN em.value - bm.value END)/1000000,2),'999999990.99')                  sequence_load
,      TO_CHAR(ROUND(100*MAX(CASE WHEN bm.stat_name = 'DB CPU' THEN em.value - bm.value END)
           / NULLIF(MAX(CASE WHEN bm.stat_name = 'DB time' THEN em.value - bm.value END),0),2),'999999990.99')                                           db_cpu_perc
,      TO_CHAR(ROUND(100*MAX(CASE WHEN bm.stat_name = 'sql execute elapsed time' THEN em.value - bm.value END)
           / NULLIF(MAX(CASE WHEN bm.stat_name = 'DB time' THEN em.value - bm.value END),0),2),'999999990.99')                                           sql_time_perc
,      TO_CHAR(ROUND(MAX(CASE WHEN bm.stat_name = 'PL/SQL execution elapsed time' THEN em.value - bm.value END)
           / NULLIF(MAX(CASE WHEN bm.stat_name = 'DB time' THEN em.value - bm.value END),0),2),'999999990.99')                                           plsql_time_perc
,      TO_CHAR(ROUND(MAX(CASE WHEN bm.stat_name = 'parse time elapsed' THEN em.value - bm.value END)
           / NULLIF(MAX(CASE WHEN bm.stat_name = 'DB time' THEN em.value - bm.value END),0),2),'999999990.99')                                           parse_time_perc
,      TO_CHAR(ROUND(MAX(CASE WHEN bm.stat_name = 'failed parse elapsed time' THEN em.value - bm.value END)
           / NULLIF(MAX(CASE WHEN bm.stat_name = 'DB time' THEN em.value - bm.value END),0),2),'999999990.99')                                           failed_parse_perc
,      TO_CHAR(ROUND(MAX(CASE WHEN bm.stat_name = 'hard parse (sharing criteria) elapsed time' THEN em.value - bm.value END)
           / NULLIF(MAX(CASE WHEN bm.stat_name = 'DB time' THEN em.value - bm.value END),0),2),'999999990.99')                                           hard_parse_sharing_perc
,      TO_CHAR(ROUND(MAX(CASE WHEN bm.stat_name = 'RMAN cpu time (backup/restore)' THEN em.value - bm.value END)
           / NULLIF(MAX(CASE WHEN bm.stat_name = 'DB time' THEN em.value - bm.value END),0),2),'999999990.99')                                           rman_cpu_perc
FROM  subq_snaps              ss
,     dba_hist_sys_time_model em
,     dba_hist_sys_time_model bm
WHERE bm.dbid                   = ss.dbid
AND   bm.snap_id                = ss.b_snap
AND   bm.instance_number        = ss.inst
AND   em.dbid                   = ss.dbid
AND   em.snap_id                = ss.e_snap
AND   em.instance_number        = ss.inst
AND   bm.stat_id                = em.stat_id
GROUP BY
       ss.dbid
,      ss.inst
,      ss.b_day
,      ss.b_snap
,      ss.e_snap
,      ss.b_time
,      ss.e_time
,      ss.duration
--HAVING b_day NOT IN (6,7)
--AND    inst = 2
--AND b_snap = 18673
--AND    e_time = '17:00'
ORDER BY b_snap DESC;

Also, similar query for comparing the different IO metrics.
Again should match numbers in AWR.

WITH subq_snaps AS
(SELECT dbid                dbid
 ,      instance_number     inst
 ,      snap_id             e_snap
 ,      lag(snap_id) over (partition by instance_number, startup_time order by snap_id) b_snap
 ,      TO_CHAR(begin_interval_time,'D') b_day
 ,      TO_CHAR(begin_interval_time,'DD-MON-YYYY HH24:MI') b_time
 ,      TO_CHAR(end_interval_time,'HH24:MI')   e_time
 ,    ((extract(day    from (end_interval_time - begin_interval_time))*86400)
     + (extract(hour   from (end_interval_time - begin_interval_time))*3600)
     + (extract(minute from (end_interval_time - begin_interval_time))*60)
     + (extract(second from (end_interval_time - begin_interval_time)))) duration
 FROM   dba_hist_snapshot)
,    io_stats AS
(SELECT ss.*
 ,      bv.event_name
 ,      ev.time_waited_micro_fg - bv.time_waited_micro_fg time_waited_micro
 ,      ev.total_waits_fg       - bv.total_waits_fg       waits
 FROM   subq_snaps            ss
 ,      dba_hist_system_event bv
 ,      dba_hist_system_event ev
 WHERE  bv.dbid                   = ss.dbid
 AND    bv.snap_id                = ss.b_snap
 AND    bv.instance_number        = ss.inst
 AND    bv.event_name            IN ('db file sequential read','direct path read','direct path read temp','db file scattered read','db file parallel read')
 AND    ev.dbid                   = ss.dbid
 AND    ev.snap_id                = ss.e_snap
 AND    ev.instance_number        = ss.inst
 AND    ev.event_id               = bv.event_id)
SELECT io.dbid
,      io.inst
,      io.b_snap
,      io.e_snap
,      io.b_time
,      io.e_time
,      (SELECT ROUND(average,2)
        FROM   dba_hist_sysmetric_summary sm
        WHERE  sm.dbid            = io.dbid
        AND    sm.snap_id         = io.e_snap
        AND    sm.instance_number = io.inst
        AND    sm.metric_name     = 'Average Synchronous Single-Block Read Latency'
        AND    sm.group_id        = 2) assbl
,      MAX(CASE WHEN event_name = 'db file sequential read' THEN ROUND(CASE WHEN waits &lt; 0 THEN NULL ELSE waits END) END) single_waits
,      MAX(CASE WHEN event_name = 'db file scattered read'  THEN ROUND(CASE WHEN waits &lt; 0 THEN NULL ELSE waits END) END) multi_waits
,      MAX(CASE WHEN event_name = 'db file parallel read'   THEN ROUND(CASE WHEN waits &lt; 0 THEN NULL ELSE waits END) END) prefch_wait
,      MAX(CASE WHEN event_name = 'direct path read'        THEN ROUND(CASE WHEN waits &lt; 0 THEN NULL ELSE waits  END) END) direct_waits
,      MAX(CASE WHEN event_name = 'direct path read temp'   THEN ROUND(CASE WHEN waits &lt; 0 THEN NULL ELSE waits END) END)  temp_waits
,      MAX(CASE WHEN event_name = 'db file sequential read' THEN ROUND(CASE WHEN waits &lt; 0 THEN NULL ELSE waits END/duration) END) iops_single
,      MAX(CASE WHEN event_name = 'db file sequential read' THEN ROUND(CASE WHEN time_waited_micro/1000/1000 &lt; 0 THEN NULL ELSE time_waited_micro/1000/1000 END) END) single_secs_total
,      MAX(CASE WHEN event_name = 'db file sequential read' THEN ROUND((time_waited_micro/1000)/NULLif(waits,0)) END) single_avg
,      MAX(CASE WHEN event_name = 'db file scattered read'  THEN ROUND(CASE WHEN waits &lt; 0 THEN NULL ELSE waits END/duration) END) iops_multi
,      MAX(CASE WHEN event_name = 'db file scattered read'  THEN ROUND(CASE WHEN time_waited_micro/1000/1000 &lt; 0 THEN NULL ELSE time_waited_micro/1000/1000 END) END) multi_secs_total
,      MAX(CASE WHEN event_name = 'db file scattered read'  THEN ROUND((time_waited_micro/1000)/NULLif(waits,0)) END) multi_avg
,      MAX(CASE WHEN event_name = 'db file parallel read'   THEN ROUND(CASE WHEN waits &lt; 0 THEN NULL ELSE waits END/duration) END) iops_prefch
,      MAX(CASE WHEN event_name = 'db file parallel read'   THEN ROUND(CASE WHEN time_waited_micro/1000/1000 &lt; 0 THEN NULL ELSE time_waited_micro/1000/1000 END) END) prefch_secs_total
,      MAX(CASE WHEN event_name = 'db file parallel read'   THEN ROUND((time_waited_micro/1000)/NULLif(waits,0)) END) prefch_avg
,      MAX(CASE WHEN event_name = 'direct path read'        THEN ROUND(CASE WHEN waits &lt; 0 THEN NULL ELSE waits  END/duration) END) iops_direct
,      MAX(CASE WHEN event_name = 'direct path read'        THEN ROUND(CASE WHEN time_waited_micro/1000/1000 &lt; 0 THEN NULL ELSE time_waited_micro/1000/1000 END) END) direct_secs_total
,      MAX(CASE WHEN event_name = 'direct path read'        THEN ROUND((time_waited_micro/1000)/NULLif(waits,0)) END) direct_avg
,      MAX(CASE WHEN event_name = 'direct path read temp'   THEN ROUND(CASE WHEN waits &lt; 0 THEN NULL ELSE waits END/duration) END) iops_temp
,      MAX(CASE WHEN event_name = 'direct path read temp'   THEN ROUND(CASE WHEN time_waited_micro/1000/1000 &lt; 0 THEN NULL ELSE time_waited_micro/1000/1000 END) END) temp_secs_total
,      MAX(CASE WHEN event_name = 'direct path read temp'   THEN ROUND((time_waited_micro/1000)/NULLif(waits,0)) END) temp_avg
FROM   io_stats io
GROUP BY
       io.dbid
,      io.inst
,      io.b_day
,      io.b_snap
,      io.e_snap
,      io.b_time
,      io.e_time
,      io.duration
HAVING b_day NOT IN (6,7)
AND    inst = 2
--AND b_snap = 18673
--AND    e_time = '17:00'
ORDER BY b_snap DESC;

Finally, one of the queries I used the most for quick checks – comparing top SQL in AWR across periods:

SELECT x.*, (SELECT sql_text from dba_hist_sqltext t where t.sql_id = x.sql_id and rownum = 1) txt
FROM (
SELECT sn.snap_id
,      TO_CHAR(sn.end_interval_time,'DD-MON-YYYY HH24:MI') dt
,      st.sql_id
,      st.instance_number
,      st.parsing_schema_name
,      st.plan_hash_value
,      SUM(st.fetches_delta) fch
,      SUM(rows_processed_delta) rws
,      SUM(executions_delta)     execs
,      ROUND(SUM(elapsed_time_delta)/1000/1000)   elp
,      ROUND(SUM(elapsed_time_delta)/1000/1000/NVL(NULLIF(SUM(executions_delta),0),1),2)   elpe
,      ROUND(SUM(cpu_time_delta)/1000/1000)       cpu
,      SUM(buffer_gets_delta)    gets
,      ROUND(SUM(iowait_delta)/1000/1000)         io
,      ROUND(SUM(clwait_delta)/1000/1000)         cl
,      ROUND(SUM(ccwait_delta)/1000/1000)         cc
,      ROUND(SUM(apwait_delta)/1000/1000)         ap
,      ROUND(SUM(plsexec_time_delta)/1000/1000)   pl
,      ROUND(SUM(disk_reads_delta))         disk_reads
,      ROUND(SUM(direct_writes_delta))        direct_writes
,      ROW_NUMBER() over (PARTITION BY sn.dbid, sn.snap_id, st.instance_number
                          ORDER BY SUM(elapsed_time_delta) desc) rn
FROM   dba_hist_snapshot sn
,      dba_hist_sqlstat  st
WHERE  st.dbid            = sn.dbid
AND    st.snap_id         = sn.snap_id
AND    sn.instance_number = st.instance_number
GROUP BY
       sn.dbid
,      sn.snap_id
,      sn.end_interval_time
,      st.sql_id
,      st.instance_number
,      st.parsing_schema_name
,      st.plan_hash_value
) x
WHERE rn &lt;= 5
ORDER by snap_id DESC, instance_number, rn;

Let me know if you spot any errors or anomolies or obvious improvements / additions.


SQL Plan Management Choices

Tue, 2015-08-11 04:02

My thoughts on SQL plan management decision points: SPM SQL Patches are also available, primarily to avoid a specific problem not to enforce a particular plan, and are not covered in the above flowchart.


Epoch

Fri, 2015-06-26 06:15

Note to self because it’s just one of those date/timezone-related topics which just doesn’t seem to stick…

Epoch/Unix time – See https://en.wikipedia.org/wiki/Unix_time

Unix time (also known as POSIX time or erroneously as Epoch time) is a system for describing instants in time, defined as the number of seconds that have elapsed since 00:00:00 Coordinated Universal Time (UTC), Thursday, 1 January 1970,

Firstly when converting from Oracle date or timestamp – we need to work from UTC not local time.

select systimestamp
,      ((extract( day    from systimestamp - TO_TIMESTAMP('01/01/1970', 'MM/DD/YYYY'))*24*60*60)
      + (extract( hour   from systimestamp - TO_TIMESTAMP('01/01/1970', 'MM/DD/YYYY'))*60*60)
      + (extract( minute from systimestamp - TO_TIMESTAMP('01/01/1970', 'MM/DD/YYYY'))*60)
      + (round(extract( second from systimestamp - TO_TIMESTAMP('01/01/1970', 'MM/DD/YYYY')))))*1000
        now_epoch
from dual;

SYSTIMESTAMP                         NOW_EPOCH
----------------------------------- -------------
26-JUN-15 11.57.09.634813000 +01:00 1435319830000 

If we plug that epoch into somewhere like http://www.epochconverter.com/ , we can see it’s wrong – it’s 1 hour ahead of where it should be:

Assuming that this timestamp is in milliseconds:
GMT: Fri, 26 Jun 2015 11:57:10 GMT
Your time zone: 26 June 2015 12:57:10 GMT+1:00 DST

That’s because we need to work in UTC using SYS_EXTRACT_UTC, i.e.

with now as
(select systimestamp                  now_ts
 ,      sys_extract_utc(systimestamp) now_utc
 ,      ((extract( day    from sys_extract_utc(systimestamp) - TO_TIMESTAMP('01/01/1970', 'MM/DD/YYYY'))*24*60*60)
       + (extract( hour   from sys_extract_utc(systimestamp) - TO_TIMESTAMP('01/01/1970', 'MM/DD/YYYY'))*60*60)
       + (extract( minute from sys_extract_utc(systimestamp) - TO_TIMESTAMP('01/01/1970', 'MM/DD/YYYY'))*60)
       + (round(extract( second from sys_extract_utc(systimestamp) - TO_TIMESTAMP('01/01/1970', 'MM/DD/YYYY')))))*1000
         now_epoch
 from dual)
select *
from   now;

NOW_TS                              NOW_UTC             NOW_EPOCH
----------------------------------- ------------------ -------------
26-JUN-15 12.03.35.231688000 +01:00 26-JUN-15 11.03.35 1435316626000 

Better!

This came about because there is a table storing epoch/unix time format which is originating from Java code and the developer said that the conversion was losing 1 hour and speculated that the DB might be “unsafe” when dealing with epoch time.

So, let’s convert this number back to a date or a timestamp and crush that notion.

with now as
(select systimestamp                  now_ts
 ,      sys_extract_utc(systimestamp) now_utc
 ,      ((extract( day    from sys_extract_utc(systimestamp) - TO_TIMESTAMP('01/01/1970', 'MM/DD/YYYY'))*24*60*60)
       + (extract( hour   from sys_extract_utc(systimestamp) - TO_TIMESTAMP('01/01/1970', 'MM/DD/YYYY'))*60*60)
       + (extract( minute from sys_extract_utc(systimestamp) - TO_TIMESTAMP('01/01/1970', 'MM/DD/YYYY'))*60)
       + (round(extract( second from sys_extract_utc(systimestamp) - TO_TIMESTAMP('01/01/1970', 'MM/DD/YYYY')))))*1000
         now_epoch
 from dual)
select now_ts
,      now_utc
,      now_epoch
,      TO_TIMESTAMP('01-01-1970','DD-MM-YYYY') + NUMTODSINTERVAL(now_epoch/1000,'SECOND') 
       epoch_back_to_ts
from   now;

NOW_TS                              NOW_UTC             NOW_EPOCH    EPOCH_BACK_TO_TS 
----------------------------------- ------------------ ------------- ------------------
26-JUN-15 12.09.45.671605000 +01:00 26-JUN-15 11.09.45 1435316986000 26-JUN-15 11.09.46 

Our conversion back is still in UTC so we need to convert, and there are numerous ways that we might want to convert that back:

with now as
(select systimestamp                  now_ts
 ,      sys_extract_utc(systimestamp) now_utc
 ,      ((extract( day    from sys_extract_utc(systimestamp) - TO_TIMESTAMP('01/01/1970', 'MM/DD/YYYY'))*24*60*60)
       + (extract( hour   from sys_extract_utc(systimestamp) - TO_TIMESTAMP('01/01/1970', 'MM/DD/YYYY'))*60*60)
       + (extract( minute from sys_extract_utc(systimestamp) - TO_TIMESTAMP('01/01/1970', 'MM/DD/YYYY'))*60)
       + (round(extract( second from sys_extract_utc(systimestamp) - TO_TIMESTAMP('01/01/1970', 'MM/DD/YYYY')))))*1000
         now_epoch
 from dual)
select now_ts
,      now_utc
,      now_epoch
,      TO_TIMESTAMP('01-01-1970','DD-MM-YYYY') + NUMTODSINTERVAL(now_epoch/1000,'SECOND') 
       epoch_back_to_utc
       ,      CAST(
       FROM_TZ(TO_TIMESTAMP('01-01-1970','DD-MM-YYYY') + NUMTODSINTERVAL(now_epoch/1000,'SECOND'),'UTC')
            AT TIME ZONE 'Europe/London' AS TIMESTAMP) 
       back_to_name
,      CAST(
       FROM_TZ(TO_TIMESTAMP('01-01-1970','DD-MM-YYYY') + NUMTODSINTERVAL(now_epoch/1000,'SECOND'),'UTC')
            AT LOCAL AS TIMESTAMP) 
       back_to_local
,      CAST(
       FROM_TZ(TO_TIMESTAMP('01-01-1970','DD-MM-YYYY') + NUMTODSINTERVAL(now_epoch/1000,'SECOND'),'UTC')
            AT TIME ZONE DBTIMEZONE AS TIMESTAMP) 
       back_to_dblocal
FROM   now;

NOW_TS                              NOW_UTC             NOW_EPOCH     EPOCH_BACK_TO_UTC  BACK_TO_NAME       BACK_TO_LOCAL      BACK_TO_DBLOCAL  
----------------------------------- ------------------ -------------- ------------------ ------------------ ------------------ ------------------
26-JUN-15 12.12.23.936868000 +01:00 26-JUN-15 11.12.23 1435317144000  26-JUN-15 11.12.24 26-JUN-15 12.12.24 26-JUN-15 12.12.24 26-JUN-15 12.12.24