Skip navigation.

Yann Neuhaus

Syndicate content
All blog entries from http://www.dbi-services.com/
Updated: 13 min 15 sec ago

SQL Server 2014: buffer pool extension & corruption

Wed, 2014-11-19 01:49

I had the opportunity to attend Paul Randal’s session on advanced data recovery techniques at the Pass Summit. During this session one attendee asked Paul if a page that has just been corrupted can remain in the buffer pool extension (BPE). As you probably know, BPE only deals with clean pages. Paul hesitated a lot and asked us to test and this is exactly what I will do in the blog post.

First, let’s start by limiting the maximum memory that can be used by the buffer pool:

 

-- Configure SQL Server max memory to 1024 MB EXEC sp_configure'show advanced options', 1; GO RECONFIGURE; EXEC sp_configure'max server memory (MB)', 1024; GO RECONFIGURE; GO

 

Then we can enable the buffer pool extension feature:

 

ALTER SERVER CONFIGURATION SET BUFFER POOL EXTENSION ON (           -- change the path if necessary        FILENAME = N'E:SQLSERVERMSSQLSERVERDATAssdbuffer_pool.bpe',        SIZE = 4096 MB );

 

I configured a buffer pool extension size with 4X the max memory value for the buffer cache

At this point I need a database with a big size in order to have a chance to retrieve some data pages in the buffer pool extension part. My AdventureWorks2012 database will fit this purpose:

 

USE AdventureWorks2012; GO   EXEC sp_spaceused;

 

blog_23_-_1_-_adventureworks2012_size

 

I have also 3 big tables in this database: dbo.bigTransactionHistory_rs1 (2.2GB), dbo.bigTransactionHistory_rs2 (2.1 GB) and BigTransactionHistory (1.2GB)

 

blog_23_-_2_-_adventureworks2012_top_tables_size

 

I have a good chance to find out some pages related on these tables in the BPE, if I perform a big operation like a DBCC CHECKDB on the AdventureWorks2012 database.

After performing a complete integrity check of this database and executing some queries as well, here it is the picture of my buffer pool:

 

SELECT        CASE is_in_bpool_extension              WHEN 1 THEN 'SSD'              ELSE 'RAM'        END AS location,        COUNT(*) AS nb_pages,        COUNT(*) * 8 / 1024 AS size_in_mb,        COUNT(*) * 100. /(SELECT COUNT(*) FROM sys.dm_os_buffer_descriptors(nolock)) AS percent_ FROM sys.dm_os_buffer_descriptors(nolock) GROUP BY is_in_bpool_extension ORDER BY location;

 

blog_23_-_21_-_buffer_pool_overview

 

Is it possible to find some pages in the buffer pool extension part that concerns the table bigTransactionHistory_rs1?

 

SELECT        bd.page_id, da.page_type, bd.is_modified FROM sys.dm_os_buffer_descriptors AS bd        JOIN sys.dm_db_database_page_allocations(DB_ID('AdventureWorks2012'), OBJECT_ID('dbo.bigTransactionHistory_rs1'), NULL, NULL, DEFAULT) AS da              ON bd.database_id = da.database_id                     AND bd.file_id = da.allocated_page_file_id                            AND bd.page_id = da.allocated_page_page_id WHERE bd.database_id = DB_ID('AdventureWorks2012')                     AND bd.is_in_bpool_extension = 1                            AND da.page_type IS NULL

 

blog_23_-_3_-_bigTransactionHistory_rs1_pages

 

I chose the first page 195426 and I finally corrupted it

 

DBCC WRITEPAGE(AdventureWorks2012, 1, 195426, 0, 2, 0x0000);

 

blog_23_-_4_-_corrupt_page

 

Then, let's take a look at the page with ID 195426 to see if it still remains in the BPE:

 

SELECT        page_id,        is_in_bpool_extension,        is_modified FROM sys.dm_os_buffer_descriptors AS bd WHERE bd.page_id = 195426

 

blog_23_-_5_-_check_location_page_after_corruption

 

Ok (fortunately) not :-) However we can notice that the page has not been tagged as "modified" by looking at the sys.dm_os_buffer_descriptors DMV. Hum my guess at this point is that using DBCC WRITEPAGE is not a classic process for modifying a page but in fact the process used by the BPE extension is not what we can imagine at the first sight.

Indeed, moving a page from BPE is almost orthogonal to the dirty nature of a page because the buffer manager will move a page into the memory because it becomes hot due to the access attempt. Modifying a page needs first access to the page (a particular thanks to Evgeny Krivosheev - SQL Server Program Manager - for this clarification).

We can verify if the page with ID 195426 is really corrupted (remember this page belongs to the bigTransactionHistory_rs1 table):

 

DBCC CHECKTABLE(bigTransactionHistory_rs1) WITH NO_INFOMSGS, ALL_ERRORMSGS, TABLERESULTS;

 

blog_23_-_6_-_check_corruption_for_data_page

 

Note some other corruptions but in this context it doesn't matter because I performed some other corruption tests in this database :-)

So the next question could be the following: Do you think a corrupted page can be moved from the buffer pool into the memory? … The following test will give us the response:

 

CHECKPOINT GO DBCC DROPCLEANBUFFERS; GO -- Perform queries in order to full fill the buffer cache and its extension

 

We flush dirty pages to disk and the we clean the buffer cache. Afterward, I perform some others queries in order to populate the buffer cache (memory and BPE) with database pages. At this point we have only clean pages. A quick look at the buffer cache with the sys.dm_os_buffer_descriptor DMV give us the following picture (I recorded into a temporary table each time I found out the page ID 195426 into the buffer cache (either memory or BPE):

 

blog_23_-_7_-_find_out_a_corrupted_page_in_the_BPE

 

We can notice that a corrupted page can be part of the buffer pool extension and this is an expected behavior because the page ID 195426 is not dirty or modified but corrupted only at this point.

Enjoy!

Oracle locks: Identifiying blocking sessions

Mon, 2014-11-17 09:17

When you have sessions blocked on locks, you probably have all information about the waiters (they call you and anyway their waiting session is visible in v$session our ASH). But you usually need to get enough information that help to identify the blocker.

Here is a query I use to get that quickly, based on V$WAIT_CHAINS

Here is the result I want to get:

session                 wait event                                      minutes USER PRO
----------------------- ----------------------------------------------- ------- ---- ---
 ABCLBP1 '831,54109@1'  SQL*Net message from client                        13.5 SYS  sql
  ABCLBP4 '395,21891@4' enq: TX - row lock contention on TABLE             13.2 SYS  SQL
                          "SYS"."TEST_FRANCK" on rowid AAC0aCAAnAAABSCAAA

I have information about blocking session, waiting session, the type of lock (here TX - row lock) and because it is a row lock I want to know the ROWID of the locked row. 

Here is the query I used to get it:

column "wait event" format a50 word_wrap
column "session" format a25
column "minutes" format 9999D9
column CHAIN_ID noprint
column N noprint
column l noprint
with w as (
select
 chain_id,rownum n,level l
 ,lpad(' ',level,' ')||(select instance_name from gv$instance where inst_id=w.instance)||' '''||w.sid||','||w.sess_serial#||'@'||w.instance||'''' "session"
 ,lpad(' ',level,' ')||w.wait_event_text ||
   case
   when w.wait_event_text like 'enq: TM%' then
    ' mode '||decode(w.p1 ,1414332418,'Row-S' ,1414332419,'Row-X' ,1414332420,'Share' ,1414332421,'Share RX' ,1414332422,'eXclusive')
     ||( select ' on '||object_type||' "'||owner||'"."'||object_name||'" ' from all_objects where object_id=w.p2 )
   when w.wait_event_text like 'enq: TX%' then
    (
     select ' on '||object_type||' "'||owner||'"."'||object_name||'" on rowid '
     ||dbms_rowid.rowid_create(1,data_object_id,relative_fno,w.row_wait_block#,w.row_wait_row#)
     from all_objects ,dba_data_files where object_id=w.row_wait_obj# and w.row_wait_file#=file_id
    )
   end "wait event"
 , w.in_wait_secs/60 "minutes"
 , s.username , s.program
 from v$wait_chains w join gv$session s on (s.sid=w.sid and s.serial#=w.sess_serial# and s.inst_id=w.instance)
 connect by prior w.sid=w.blocker_sid and prior w.sess_serial#=w.blocker_sess_serial# and prior w.instance = w.blocker_instance
 start with w.blocker_sid is null
)
select * from w where chain_id in (select chain_id from w group by chain_id having max("minutes") >= 1 and max(l)>1 )
order by n
/

This query retrieves the wait chains where a session is waiting for more than one minute on a table lock (TM) or row lock (TX) .

When it is a table lock (TM), I get the locked object_id from the P2 parameter, in order to know the table name.

When it is a row lock, I get the table and rowid from V$SESSION. Note that I have to join with dba_data_files in order to convert the absolute file_id to a relative one, and to join to dba_objects in order to convert the object_id to the data_object_id one - in order to built the ROWID.

More information about ROWID, relative file number and data object id in my previous post: From 8.0 extended rowid to 12c pluggable db: Why Oracle Database is still a great software

 

My planning for DOAG 2014

Mon, 2014-11-17 01:53

I'm quickly checking the planning for DOAG these 3 days and here is the list of sessions I would like to attend.

There are still a lot of interesting ones even if my choice is limited by the language (I would like to understand German but I'm limited so sessions in English). And I've still some concurrency issues to solve because I cannot be at two places at the same time.

12c SQL Plan Directive: state has changed between 12.1.0.1 and 12.1.0.2

Sun, 2014-11-16 09:38

SQL plan Directives have been introduced in 12c. If you have scripts that check their states (and I have that as I prefer to monitor closely the features that are new) you probably have seen a difference when going from 12.1.0.1 - the first release of 12c and the only one avilable yet in Standard Edition - and 12.1.0.2 - the first patchest. I'll explain here what are the SQL Plan Directive states and how they changed.

When a SQL Plan Directive is created, it's state is NEW. It means that a misestimate has been encountered but the reason has not been yet determined. Remember that the directive is created only at execution time, so very limited information is available, except the fact that A-Rows is different that E-Rows.

On a subsequent optimization (same or different statement that uses the same table and same column predicates) the optimizer sees the SQL Plan Directive and can update it with more information. If the reason of misestimation is that some statistics are missing then the state of the directive is changed from NEW to MISSING_STATS. 

MISSING_STATS directives lead to short term and long term solutions:

  • each new query will solve missing stats by gathering more statistics with Dynamic Sampling
  • the next dbms_stats gathering will gather extended statistics to definitly fix the issue

Once the latter is done, we can expect to have good estimations without Dynamic Sampling anymore, thanks to the extended statistics. But that has to be checked. So the next query optimization will check it and update the SQL plan Directive accordingly:

  • HAS_STATS to show that statistics (extended statistics here) are now sufficient to get correct estimations
  • PERMANENT to show that the misestimation is still there and extended statistics have to be ignored because they were calculated for an issue that they do not solve.
Finally, the HAS_STATS SQL Plan Directives are purged after some weeks as they are not needed anymore - the issue being solved definitely.   So what has changed in 12.1.0.1 ?   There are only two states now:
  • 'USABLE' that covers the 'NEW', 'MISSING_STATS' and 'PERMANENT' which means that a directive is there to be evaluated by the optimizer, but the issue is not solved yet.
  • 'SUPERSEDED' when it has been solved (the 'HAS_STATS') or it is redundant with another directive, which means that the issue is solved somewhere else.
This is a simplification, but if you want to have the same level of detail that you had in 12.1.0.2 then you can get it from the 'internal state' which is exposed in XML in the NOTES column.   Here is an example of two USABLE state:  
SQL> select directive_id,type,state,reason,notes from dba_sql_plan_directives where directive_id in(select directive_id from dba_sql_plan_dir_objects where owner='DEMO' );

        DIRECTIVE_ID TYPE              STATE         REASON
-------------------- ---------------- ---------- ------------------------------------
NOTES
--------------------------------------------------------------------------------
 1350980939868665098 DYNAMIC_SAMPLING USABLE         SINGLE TABLE CARDINALITY MISESTIMATE

  NEW
  NO
  {EC(DEMO.DEMO_TABLE)[A, B, C, D]}

which is 'NEW' and  
        DIRECTIVE_ID TYPE              STATE         REASON
-------------------- ---------------- ---------- ------------------------------------
NOTES
--------------------------------------------------------------------------------
 1350980939868665098 DYNAMIC_SAMPLING USABLE         SINGLE TABLE CARDINALITY MISESTIMATE

  MISSING_STATS
  NO
  {EC(DEMO.DEMO_TABLE)[A, B, C, D]}

which is 'MISSING_STATS'.   And a 'SUPERSEDED' once dbms_stats has run:  
        DIRECTIVE_ID TYPE              STATE         REASON
-------------------- ---------------- ---------- ------------------------------------
NOTES
--------------------------------------------------------------------------------
 1350980939868665098 DYNAMIC_SAMPLING SUPERSEDED SINGLE TABLE CARDINALITY MISESTIMATE

  HAS_STATS
  NO
  {EC(DEMO.DEMO_TABLE)[A, B, C, D]}

which is 'HAS_STATS'
Note that the xml tags were eaten by my blog editor. They are: internal_state,redundant,spd_text
We do full demos of SQL Plan Directives in our 'Oracle 12c new features workshop' and 'Oracle performance tuning workshop'. It's a great feature that brings the CBO to another level of intelligence. And there are some misconceptions about them. Some people think that they store statistics. But that's wrong. Statistics come from cardinality feedback, dynamic sampling, or object statistics. There is no need for another component to store them. The only thing that is stored by SQL Plan Directives are their state. Which makes the state a very important information - and the reason for that blog post.  

 

 

Oracle AVDF post-installation configuration

Thu, 2014-11-13 04:47

In one of my last blog, named: "Oracle Audit Vault and Database Firewall (AVDF) 12.1 - installation on VirtualBox" I explained how to install AVDF on VirtualBox. Since some of you asked for a blog on "How to configure AVDF", I decided to write this posting on AVDF post-installation configuration. This one only concerns the post-installation phase, a third blog will be dedicated to practical cases concerning the configuration of Database Firewall Policies.

Specifying the Audit Vault Server Certificate and IP Address

You must associate each Database Firewall with an Audit Vault Server by specifying the server's certificate and IP address, so that the Audit Vault Server can manage the firewall. If you are using a resilient pair of Audit Vault Servers for high availability, you must associate the firewall to both servers.


1. Log in to the Audit Vault Server as an administrator, and then click the Settings tab.

2. In the Security menu, click Certificate. The server's certificate is displayed.

avdf001.png
  
3. Copy the server's certificate.

4. Log in to the Database Firewall administration console

5. In the System menu, click Audit Vault Server.

6. Enter the IP Address of the Audit Vault Server.

7. Paste the Audit Vault Server's Certificate in the next field.

avdf002.png


8.  Click Apply.

Registering Oracle Secured Target Ensure That Auditing Is Enabled in the Oracle Secured Target

Databaseoracle@vmtest12c:/home/oracle/ [DUMMY] SOUK
******** dbi services Ltd. ********
STATUS         : OPEN
DB_UNIQUE_NAME : SOUK
OPEN_MODE      : READ WRITE
LOG_MODE       : NOARCHIVELOG
DATABASE_ROLE  : PRIMARY
FLASHBACK_ON   : NO
FORCE_LOGGING  : NO
VERSION        : 11.2.0.3.0
***********************************
oracle@vmtest12c:/home/oracle/ [SOUK] sqlplus "/as sysdba"SQL*Plus: Release 11.2.0.3.0 Production on Sun Sep 15 22:35:49 2013Copyright (c) 1982, 2011, Oracle.  All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit ProductionSQL>SQL>
SQL> SHOW PARAMETER AUDIT_TRAILNAME                                 TYPE        VALUE
------------------------------------ ----------- ---------------------------
audit_trail                          string      DB

 

If the output of the SHOW PARAMETER command is NONE or if it is an auditing value that you want to change, then you can change the setting as follows.For example, if you want to change to XML, and if you are using a server parameter file, you would enter the following:

SQL> ALTER SYSTEM SET AUDIT_TRAIL=XML SCOPE=SPFILE;
SQL> SHUTDOWN IMMEDIATE;
SQL> STARTUP;

Registering Hosts in the Audit Vault Server

1. Log in to the Audit Vault Server console as an administrator.  

2. Click the Hosts tab. A list of the registered hosts, if present, appears in the Hosts page. To control the view of this list see "Working With Lists of Objects in the UI".

3. Click Register.  

4. Enter the Host Name and Host IP address.  

avdf003.png

 

5. Click Save.

Deploying and Activating the Audit Vault Agent on Secured Target Hosts

1. Log in to the Audit Vault Server console as an administrator.

2. Click the Hosts tab, and then from the Hosts menu, click Agent.

3. Click “Download Agent” and save the agent.jar file to a location of your choice.

avdf004.png

 

4. Using an OS user account, copy the agent.jar file to the secured target's host computer.

5. On the host machine, set JAVA_HOME to the installation directory of the jdk1.6 (or higher version), and make sure the java executable corresponds to this JAVA_HOME setting.

avdf005.png

 

6. Start a command prompt with Run as Administrator. In the directory where you placed the agent.jar file, extract it by running:

java -jar agent.jar -d Agent_Home

avdf006.png

 

Request agent Activation

To request activation of the Audit Vault Agent:

1. On the secured target host computer, go to the following directory:

Agent_Home/bin

2. Agent_Home is the directory created in the step 7 above.  Run the following command:

./agentctl activate

avdf007.png

Activate and Start the Agent

In this step, you approve the agent activation request in the Audit Vault Server, then start the agent on the secured target host machine.To activate and start the agent:

1. Log in to the Audit Vault Server console as an administrator.

2. Click the Hosts tab.

3. Select the host you want to activate, and then click Activate.

avdf008.png

 

This will generate a new activation key under the Agent Activation Key column.You can only activate a host if you have completed the procedures in Step 1: Deploy the Audit Vault Agent on the Host Machine. Otherwise the Agent Activation Status for that host will be No Request.

4. Change directory as follows:

cd Agent_Home/bin

Agent_Home is the directory created in the step 7 above.

5. On the secured target host machine, run the following command and provide the activation key from Step 3:

./agentctl start -k key

Note: the -k argument is not needed after the initial agentctl start command.

avdf009.png

avdf010.png

Stopping and Starting the Audit Vault Agent

To stop or start the Audit Vault Agent after initial activation and start, run one of the following commands from the Agent_Home/bin directory on the secured target host machine:

./agentctl stop

./agentctl start

Changing the Logging Level for the Audit Vault Agent

The logging level you set affects the amount of information written to the log files. You may need to take this into account for disc space limitations.The following logging levels are listed in the order of amount of information written to log files, with debug providing the most information:

  • error - Writes only error messages
  • warn - (Default) Writes warning and error messages
  • info - Writes informational, warning, and error messages
  • debug - Writes detailed messages for debugging purposes

To change the logging level for an Audit Vault Agent:  

1. Ensure that you are logged into AVCLI on the Audit Vault Server.

2. Run the ALTER HOST command. The syntax is as follows:

ALTER HOST host_name SET LOGLEVEL=av.agent:log_level     

In this specification:

  •  host_name: The name of the host where the Audit Vault Agent is deployed.
  •  log_level: Enter a value of info, warn, debug, or error.
Registering Secured Targets

1. If you will collect audit data from a secured target, do stored procedure auditing (SPA), entitlements auditing, or enable database interrogation, create a user account on the secured target, with the appropriate privileges to allow Oracle AVDF to access the required data.     

Setup scripts: Scripts are available to configure user account privileges for these secured target types:- "Oracle Database Setup Scripts"

  • "Sybase ASE Setup Scripts"
  • "Microsoft SQL Server Setup Scripts"
  • "IBM DB2 for LUW Setup Scripts"
  • "MySQL Setup Scripts"
  • "Sybase SQL Anywhere Setup Scripts"     

Linux secured targets: Assign the Oracle AVDF user to the log_group parameter in the Linux /etc/audit/auditd.conf configuration file. This user must have execute permission on the folder that contains the audit.log file (default folder is /var/log/audit).

Other types of secured targets: You must create a user that has the appropriate privileges to access the audit trail required. For example, for a Windows secured target, this user must have administrative permissions in order to read the security log.

Note: Oracle AVDF does not accept user names with quotation marks. For example, "JSmith" would not be a valid user name for an Audit Vault and Database Firewall user account on secured targets.

avdf011.png

 

2. Log in to the Audit Vault Server console as an administrator.

3. Click the Secured Targets tab. The Secured Targets page lists the configured secured targets to which you have access. You can sort or filter the list of targets. See "Working With Lists of Objects in the UI".

4. Click Register, and in the Register Secured Target page, enter a name and description for the new target.

5. In the Secured Target Location field, enter the connect string for the secured target.      See "Secured Target Locations (Connect Strings)" for the connect string format for a specific secured target type. For example, for Oracle Database, the string might look like the following:     

jdbc:oracle:thin:@//HOSTNAME:PORT/SERVICE_NAME

6. In the Secured Target Type field, select the secured target type, for example, Oracle Database.

7. In the User Name, Password, and Re-enter Password fields, enter the credentials for the secured target user account you created in Step 1.

8. If you will monitor this secured target with a Database Firewall, in the Add Secured Target Addresses area, for each available connection of this database enter the following information, and then click Add.

  • IP Address (or Host Name)
  • Port Number
  • Service Name (Oracle Database only)

9. If required, enter values for Attribute Name and Attribute Value at the bottom of the page, and click Add. Collection attributes may be required by the Audit Vault Agent depending on the secured target type. See "Collection Attributes" to look up requirements for a specific secured target type.

10. If you will monitor this secured target with a Database Firewall, you can increase the processing resource for this secured target by adding the following Collection Attribute:

  • Attribute Name: MAXIMUM_ENFORCEMENT_POINT_THREADS
  • Attribute Value: A number between 1 - 16 (default is 1)     

This defines the maximum number of Database Firewall processes (1 - 16) that may be used for the enforcement point associated with this secured target. You should consider defining this if the number of secured targets you are monitoring is less than the number of processing cores available on the system running the Database Firewall. Setting a value when it is not appropriate wastes resources.

avdf012.png

 

11. Click Save.


Configuring an Audit Trail in the Audit Vault Server

In order to start collecting audit data, you must configure an audit trail for each secured target in the Audit Vault Server, and then start the audit trail collection manually. Before configuring an audit trail for any secured target, you must:

  • Add the secured target in the Audit Vault Server. See "Registering or Removing Secured Targets in the Audit Vault Server" for details.
  • Register the secured target host machine and deploy and activate the agent on that machine. See "Registering Hosts".

This procedure assumes that the Audit Vault Agent is installed on the same computer as the secured target.

To configure an audit trail for a secured target:

1. Log in to the Audit Vault Server console as an administrator.

2. Click the Secured Targets tab.

3. Under Monitoring, click Audit Trails. The Audit Trails page appears, listing the configured audit trails and their status.

4. In the Audit Trails page, click Add.

5. From the Collection Host drop-down list, select the host computer of the secured target.  

6. From the Secured Target Name drop-down list, select the secured target's name.  

7. From the Audit Trail Type drop-down list, select one of the following:

  • CUSTOM
  • DIRECTORY
  • EVENT LOG
  • NETWORK
  • SYSLOG
  • TABLE
  • TRANSACTION LOG     

See Table B-13 for details on which type(s) of audit trails can be collected for a specific secured target type, and "Data Collected for Each Audit Trail Type" for descriptions of data collected.  

8. In the Trail Location field, enter the location of the audit trail on the secured target computer, for example, sys.aud$.      The trail location depends on the type of secured target. See "Audit Trail Locations" for supported trail locations.      Note: If you selected DIRECTORY for Audit Trail Type, the Trail Location must be a directory mask.

9. If you have deployed plug-ins for this type of secured target, select the plug-in in the Collection Plug-in drop-down list. For more information on plug-ins, see "About Agent Plug-ins".

avdf013.png

 

10. Click Save.

Starting and Stopping Audit Trails in the Audit Vault Server

To start or stop audit trail collection for a secured target:

1. Log in to the Audit Vault Server console as an administrator.

2. Click the Secured Targets tab.

3. Click Audit Trails.

4. Select the audit trail(s) you want to start or stop, and then click Start or Stop. avdf014.png

 

 I very do hope that this blog will help you delpoying AVDF. Do not hesitate to post comments if you have any questions.

 

 

AWR does not store explain plan predicates

Tue, 2014-11-11 14:49

What is the most important part in an execution plan? It's probably the predicate section. It helps to see implicit conversions. It helps to understand why an index is used or not. Or to see additional predicates coming from constraints. When you get an execution plan from shared pool, the dbms_xplan.display_cursor() shows the predicates. But when you retrieve a plan from the past, dbms_xplan.display_awr() does not show them.

Statspack

With Statspack, you can store execution plans when you take level 7 snapshot (which I often set as default). But you don't have the predicate section, for the simple reason that they are not collected.

Here is what you can find in spcpkg.sql:


insert into stats$sql_plan(
 plan_hash_value, ...
 access_predicates ,
 filter_predicates...) ... select ...
 new_plan.plan_hash_value, ...
 0 -- should be max(sp.access_predicates) (2254299),
 0 -- should be max(sp.filter_predicates)

 

AWR

AWR is not better. Finding the code is a bit more difficult. It's optimized, run from the binaries. Let's find it:

$ strings $ORACLE_HOME/bin/oracle | grep -i "v[$]sql_plan"

 

SELECT /*+ leading(S) use_nl(P) PARAM('_cursor_plan_unparse_enabled','FALSE') */ remarks, operation, options, object_node, object_owner, object_name, object_alias, object_type, optimizer, id, parent_id, depth, position, cost, cardinality, bytes, other_tag, partition_start, partition_stop, partition_id, distribution, cpu_cost, io_cost, temp_space, access_predicates, filter_predicates,projection, time, qblock_name FROM v$sql_plan_statistics_all P, v$sqlarea_plan_hash S WHERE P.sql_id = :1 AND P.plan_hash_value = :2 AND P.con_dbid = :3 AND S.sql_id = :4 AND S.plan_hash_value = :5 AND S.con_dbid = :6 AND P.child_address = s.last_active_child_address

 

I've highlighted what is interesting here. The parameter description for _cursor_plan_unparse_enabled is 'enables/disables using unparse to build projection/predicates' which is true by default but is set to false for this query. This, access_predicates ant filter_predicates are null as in Statspack.

Why?

It you tried to use those access_predicate and filter_predicates in the days of 9i you probably remember that it finished in ORA-7445 most of the times. Those columns are a bit special. The predicates are not stored as-is. They are 'unparsed' from the execution plan code. And because of a few bugs, Oracle has probably chosen to avoid automatic collection on them.

I think there are still some bugs still until 11.1 But I query those columns very frequently (directly or through dbms_xplan.display_cursor) and I don't think I've seen any problem in current versions. I hope that one day that limitation will be released.

Workaround

When I've a plan coming from AWR, the first thing I do is to try to find it in the shared pool with dbms_xplan.display_cursor - with same sql_id and same plan_hash_value. Then I'll have the predicate section.

If it is not present anymore, then I'll try to reproduce it with an EXPLAIN PLAN. In order to get the same plan, I retrieve the outlines ('+OUTLINE' format) and the bind variable values may help as well ('+PEEKED_BINDS' format) and then I reproduce it and check that I've the same plan_hash_value.

 

Let's see on an example

I have the following plan from AWR

SQL> select * from table(dbms_xplan.display_awr('7ws837zynp1zv',3722429161,format=>'basic +predicate'));

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
SQL_ID 7ws837zynp1zv
--------------------
SELECT CARD_ID, CUSTOMER_ID, CARD_TYPE, CARD_NUMBER, EXPIRY_DATE,
IS_VALID, SECURITY_CODE FROM CARD_DETAILS WHERE CUSTOMER_ID = :B2 AND
ROWNUM < :B1

Plan hash value: 3722429161

--------------------------------------------------------------------
| Id  | Operation                            | Name                |
--------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |                     |
|   1 |  COUNT STOPKEY                       |                     |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| CARD_DETAILS        |
|   3 |    INDEX RANGE SCAN                  | CARDDETAILS_CUST_IX |
--------------------------------------------------------------------

17 rows selected.

There is no predicates here.

Then I try to get the plan with explain plan:

SQL> alter session set current_schema=SOE;

Session altered.

SQL> explain plan for
  2  SELECT CARD_ID, CUSTOMER_ID, CARD_TYPE, CARD_NUMBER, EXPIRY_DATE,
  3  IS_VALID, SECURITY_CODE FROM CARD_DETAILS WHERE CUSTOMER_ID = :B2 AND
  4  ROWNUM  select * from table(dbms_xplan.display(format=>'basic +predicate'));

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 2597291669

-------------------------------------------
| Id  | Operation          | Name         |
-------------------------------------------
|   0 | SELECT STATEMENT   |              |
|*  1 |  COUNT STOPKEY     |              |
|*  2 |   TABLE ACCESS FULL| CARD_DETAILS |
-------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(ROWNUM < TO_NUMBER(:B1))
   2 - filter("CUSTOMER_ID"=TO_NUMBER(:B2))

15 rows selected.

The problem is that I've not the same plan. I want the predicates for the index access (plan hash value 3722429161).

So I get the maximum information from the AWR plan, with +OUTLINE and +PEEKED_BINDS:

SQL> select * from table(dbms_xplan.display_awr('7ws837zynp1zv',3722429161,format=>'basic +outline +peeked_binds'));

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
SQL_ID 7ws837zynp1zv
--------------------
SELECT CARD_ID, CUSTOMER_ID, CARD_TYPE, CARD_NUMBER, EXPIRY_DATE,
IS_VALID, SECURITY_CODE FROM CARD_DETAILS WHERE CUSTOMER_ID = :B2 AND
ROWNUM < :B1

Plan hash value: 3722429161

--------------------------------------------------------------------
| Id  | Operation                            | Name                |
--------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |                     |
|   1 |  COUNT STOPKEY                       |                     |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| CARD_DETAILS        |
|   3 |    INDEX RANGE SCAN                  | CARDDETAILS_CUST_IX |
--------------------------------------------------------------------

Outline Data
-------------

  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('12.1.0.2')
      DB_VERSION('12.1.0.2')
      ALL_ROWS
      OUTLINE_LEAF(@"SEL$1")
      INDEX_RS_ASC(@"SEL$1" "CARD_DETAILS"@"SEL$1"
              ("CARD_DETAILS"."CUSTOMER_ID"))
      BATCH_TABLE_ACCESS_BY_ROWID(@"SEL$1" "CARD_DETAILS"@"SEL$1")
      END_OUTLINE_DATA
  */

Peeked Binds (identified by position):
--------------------------------------

   1 - :B2 (NUMBER): 315821
   2 - :B1 (NUMBER): 15

39 rows selected.

And I can now do the explain plan with the hints coming from the outlines (I can also replace the variables with the binds if I want to, as they are those that were peeked to optimize the statement):

SQL> explain plan for
  2  SELECT
  3    /*+
  4        OPTIMIZER_FEATURES_ENABLE('12.1.0.2')
  5        DB_VERSION('12.1.0.2')
  6        ALL_ROWS
  7        OUTLINE_LEAF(@"SEL$1")
  8        INDEX_RS_ASC(@"SEL$1" "CARD_DETAILS"@"SEL$1"
  9                ("CARD_DETAILS"."CUSTOMER_ID"))
 10        BATCH_TABLE_ACCESS_BY_ROWID(@"SEL$1" "CARD_DETAILS"@"SEL$1")
 11    */
 12  CARD_ID, CUSTOMER_ID, CARD_TYPE, CARD_NUMBER, EXPIRY_DATE,
 13  IS_VALID, SECURITY_CODE FROM CARD_DETAILS WHERE CUSTOMER_ID = :B2 AND
 14  ROWNUM  select * from table(dbms_xplan.display(format=>'basic +predicate'));

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 3722429161

--------------------------------------------------------------------
| Id  | Operation                            | Name                |
--------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |                     |
|*  1 |  COUNT STOPKEY                       |                     |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| CARD_DETAILS        |
|*  3 |    INDEX RANGE SCAN                  | CARDDETAILS_CUST_IX |
--------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(ROWNUM < TO_NUMBER(:B1))
   3 - access("CUSTOMER_ID"=TO_NUMBER(:B2))

16 rows selected.

Bingo, I've now the predicates.

 

Time for a Wish

I wish that one day Oracle will release that limitation so that we can get predicate information from AWR (when in EE + Diagnostic Pack) and Statspack (SE and EE without option).

I've posted the idea on OTN. Please vote for the idea here

12.1.0.2 CDB views are now using CONTAINERS()

Tue, 2014-11-11 07:41

I've blogged about multitenant internals at the time when 12.1.0.1 was just released. Something has changed in 12.1.0.2 and blogging about it was in my todo list for a long time. Now the occasion to do it has been triggered by a question on Oracle Forums about CDB_DATA_FILES not showing PDB$SEED datafiles because there is an important change we must be aware of.

In the previous blog, I have described how you can query PDB information using 'container data objects', like the CDB_ views that are created by catcdbviews.sql, defined with the CDB$VIEW clause, which looks like a table function (but is not). It was not documented, and implemented internally with parallel query, partitioning and fixed table. Lot of internal details from Laurent Leturgez here.

12.1.0.2 has introduced the CONTAINERS() table function which does the same but is documented.

And CDB_ views are now using the CONTAINERS() clause instead of CDB$VIEW() one.

However, there is something different. By default the PDB$SEED is not queried. This comes from the initialization parameter exclude_seed_cdb_view which is true by default:

SQL> show parameter exclude_seed_cdb_view

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
exclude_seed_cdb_view                boolean     TRUE

SQL> select * from containers(dual);

D     CON_ID
- ----------
X          3
X          1

You don't see PDB$SEED here which is container id 2

We can get back to the 12.1.0.1 behaviour by setting the parameter to false:

SQL> alter session set exclude_seed_cdb_view=false;

Session altered.

SQL> select * from containers(dual);

D     CON_ID
- ----------
X          2
X          1
X          3

 

So, what's the point about that? If you are used to list the database datafiles by doing a select from DBA_DATA_FILES then you probably query CDB_DATA_FILES from root in multitenant. And you expect to have all files. You add control files and log file members and you have an exhaustive list of your database files. Maybe you use that in a backup or maintenance script.

Then if you do that in 12.1.0.2 you will miss the PDB$SEED. Except if you set exclude_seed_cdb_view to false. You can't rely on CDB_DATA_FILES and that's the important point raised in the Oracle Forum post.

I usually prefer to use RMAN for that and RMAN is right:

RMAN> connect target /

connected to target database: CDB1 (DBID=836194344)

RMAN> report schema;

using target database control file instead of recovery catalog
Report of database schema for database with db_unique_name CDB1

List of Permanent Datafiles
===========================
File Size(MB) Tablespace           RB segs Datafile Name
---- -------- -------------------- ------- ------------------------
1    790      SYSTEM               YES     /ZFS01/oradata/CDB1/system01.dbf
3    610      SYSAUX               NO      /ZFS01/oradata/CDB1/sysaux01.dbf
4    160      UNDOTBS1             YES     /ZFS01/oradata/CDB1/undotbs01.dbf
5    250      PDB$SEED:SYSTEM      NO      /ZFS01/oradata/CDB1/pdbseed/system01.dbf
6    5        USERS                NO      /ZFS01/oradata/CDB1/users01.dbf
7    490      PDB$SEED:SYSAUX      NO      /ZFS01/oradata/CDB1/pdbseed/sysaux01.dbf
8    270      PDB1:SYSTEM          NO      /ZFS01/oradata/CDB1/PDB1/system01.dbf
9    530      PDB1:SYSAUX          NO      /ZFS01/oradata/CDB1/PDB1/sysaux01.dbf
10   40       PDB1:USERS           NO      /ZFS01/oradata/CDB1/PDB1/SAMPLE_SCHEMA_users01.dbf
11   1345     PDB1:EXAMPLE         NO      /ZFS01/oradata/CDB1/PDB1/example01.dbf

List of Temporary Files
=======================
File Size(MB) Tablespace           Maxsize(MB) Tempfile Name
---- -------- -------------------- ----------- --------------------
1    275      TEMP                 32767       /ZFS01/oradata/CDB1/temp01.dbf
2    20       PDB$SEED:TEMP        32767       /ZFS01/oradata/CDB1/pdbseed/pdbseed_temp012014-11-01_08-57-07-AM.dbf
3    20       PDB1:TEMP            32767       /ZFS01/oradata/CDB1/PDB1/PDB1_temp012014-11-01_09-10-16-AM.dbf

Where CDB_DATA_FILES by default shows only:

SQL> select con_id,file_id,file_name from cdb_data_files;

    CON_ID    FILE_ID FILE_NAME
---------- ---------- ----------------------------------------------------
         1          1 /ZFS01/oradata/CDB1/system01.dbf
         1          3 /ZFS01/oradata/CDB1/sysaux01.dbf
         1          6 /ZFS01/oradata/CDB1/users01.dbf
         1          4 /ZFS01/oradata/CDB1/undotbs01.dbf
         3          8 /ZFS01/oradata/CDB1/PDB1/system01.dbf
         3          9 /ZFS01/oradata/CDB1/PDB1/sysaux01.dbf
         3         10 /ZFS01/oradata/CDB1/PDB1/SAMPLE_SCHEMA_users01.dbf
         3         11 /ZFS01/oradata/CDB1/PDB1/example01.dbf
More internals...

Now what has change about CONTAINERS vs CDB$VIEW?

SQL> set autotrace trace explain
SQL> select * from cdb_data_files;

Execution Plan
----------------------------------------------------------
Plan hash value: 1439328272

--------------------------------------------------------------------
| Id  | Operation               | Name     | Rows  | Pstart| Pstop |
--------------------------------------------------------------------
|   0 | SELECT STATEMENT        |          | 20004 |       |       |
|   1 |  PX COORDINATOR         |          |       |       |       |
|   2 |   PX SEND QC (RANDOM)   | :TQ10000 | 20004 |       |       |
|   3 |    PX PARTITION LIST ALL|          | 20004 |     1 |   254 |
|   4 |     FIXED TABLE FULL    | X$CDBVW$ | 20004 |       |       |
--------------------------------------------------------------------

It's still using parallel processes on a partitioned fixed table

SQL> alter session set "_px_cdb_view_enabled"=FALSE;

Session altered.

SQL> select * from cdb_data_files;

Execution Plan
----------------------------------------------------------
Plan hash value: 2351439557

---------------------------------------------------------------
| Id  | Operation          | Name     | Rows  | Pstart| Pstop |
---------------------------------------------------------------
|   0 | SELECT STATEMENT   |          | 20004 |       |       |
|   1 |  PARTITION LIST ALL|          | 20004 |     1 |   254 |
|   2 |   FIXED TABLE FULL | X$CDBVW$ | 20004 |       |       |
---------------------------------------------------------------
SQL> alter session set "_partition_cdb_view_enabled"=FALSE;

Session altered.

SQL> select * from cdb_data_files;

Execution Plan
----------------------------------------------------------
Plan hash value: 1784620524

-----------------------------------------------------
| Id  | Operation        | Name     | Rows  | Bytes |
-----------------------------------------------------
|   0 | SELECT STATEMENT |          | 20004 |  8185K|
|   1 |  FIXED TABLE FULL| X$CDBVW$ | 20004 |  8185K|
-----------------------------------------------------

And ndocumented parameters _px_cdb_view_enabled and _partition_cdb_view_enabled still control it.

Each partition is estimated to return 10000 rows (that's hardcoded) so the estimation for 2 partitions (CDB$ROOT and PDB1) is about 20000.

However, that was after I tried to gather statistics for cdb views:

SQL> select dbms_pdb.update_cdbvw_stats from dual;

and I got statistics:

SQL> select * from  cdbvw_stats$ where objname='DBA_DATA_FILES';

OBJNAME                        TIMESTAMP      FLAGS     ROWCNT     SPARE1
------------------------------ --------- ---------- ---------- ----------
DBA_DATA_FILES                 11-NOV-14          1          4

which are the number of datafiles in my CDB$ROOT. So I expected the estimation to be about 8. But that will probably be for another blog post...

What is important to know for the moment is that by default CDB_DATA_FILES don't show all your database files.

Pass summit 2014: My favorite sessions of this great event

Mon, 2014-11-10 08:19

The Pass Summit 2014 is now over and it’s time for us to go back home. I want to share my favorite sessions with you in this post.

blog_22_-_1_-_sqlpass_over

 

Pass Summit was really an amazing event regardless of the expertise area (BI or SQL Server engine). This was also a good opportunity to meet SQL Server guys in others countries as well as the French SQL Server community. We attended a lot of interesting sessions and I admit it was often difficult to choose between two or more sessions at the same time. Nevertheless, here is my list of favorite sessions I was able to participate in:

 

blog_22_-_2_-_AlwaysOn

SQL Server 2014 AlwaysOn (high availability and disaster recovery) and troubleshooting – Kevin Farlee & Trayce Jordan

These sessions was a good reminder of what SQL Server AlwaysOn exactly means. Indeed, AlwaysOn is only a label that defines two main technologies: SQL Server FCI and availability groups. Kevin Farlee described some improvements provided by SQL Server 2014 for availability groups like diagnostics, availability of the read-only replicas, resilience of the network that reduce node eviction, usage of cluster shared volumes with SQL Server 2014 FCI and others. I remember to have some blog posts in my todo list. We also had an interesting experience feedback of the support team that concerns different availability group issues and how they resolved them. It was very funny when Trace Jordan asked us for our feedback concerning availability group issues. In my case, I know I already had to face to some issues, but some people in the meeting room too, it seems :-)

 

Latches, locks, free structures and SQL Server IO – Bob Ward & Klaus Aschenbrenner

I have to admit my brain was sometimes burning during both sessions … this is what we can expect from a 500-level session :-). If I have to summarise here what can I tell from these two sessions: On one side we learned about latches and spinlocks, how they can be problematic for critical workloads and how can we can analyse and resolve them as well. On the other side, we learned about low-level Windows API and structures used by SQL Server concerning IOs. We saw interesting demos with DMVs, extended events and the Windows debugger. Personally, I’m a fan of these topics and I hope to give some sessions on this subject in the future.

blog_22_-_3_-_SQLIO

 

blog_22_-_4_-_corruption

Advanced data recovery techniques – Paul Randal

A very interesting session by Paul Rand in which he described different corruption cases and how to resolve them. The most part of the session was dedicated only to demos and I was able to see that some corruptions are relatively simple to repair while for others this is not the case, as they require strong internal structure skills concerning SQL Server pages. A powerful command that Paul used during the session: DBCC WRITPAGE - in order to corrupt database pages. I occasionally used this command in my blog. Be careful not to use it in production! Probably the most importance lesson to learn from this session: Practice! Practice! Practice!

 

Query Tuning Mastery: Manhandling Parallelism, 2014 Edition – Adam Machanic

A very impressive session on how to deal with parallelism on SQL Server. Adam already gave a session in 2012 on this subject and the new edition is on the same level than the preview. This is the type of sessions where you have to listen one or more times to hope understand all pieces of information. This also applies to Bob Ward and Paul Rand. A bit of work in sight...

blog_22_-_5_-_parallelism

 

blog_22_-_6_-_high_performance_sql

High Performance Infrastructure for SQL Server 2014 – Michael Frandson

My last session of this summit and a good surprise for me! I didn’t know Michael before and this guy has specialized on storage, scalability, and virtualization. He has contributed to a lot of projects with Microsoft and SQLCat. Michael discussed new storages features in Windows Server 2012 / R2 and how they relate to SQL Server. We had an overview on Infiniband, SMB, multi-path IO, RDMA, RCoE, NAND flash at steady pace. Then we discussed topics of SQL Server features related to storage like buffer pool extension, In-memory tables, usage of local disk with SQL Server FCI. Finally, Michael finished his session with a pretty smart reminder: the SQL Server performance is carried out both at application level (developpers) and infrastructure level (DBA / architectes)

 

Before leaving Seattle, we had our ceremonial breakfast at the Seattle marketplace and our traditional burger party at Johnny Rockets ...

 

blog_22_-_8_-_public_market blog_22_-_7_-_Johnny_Rockets_logo

 

See you at the next Pass summit!

Oracle SQL Profile: why multiple OPT_ESTIMATE?

Sat, 2014-11-08 15:48

In a previous blog I'v shared my script to retrieve the OPT_ESTIMATE hints from a SQL Profile. In the example I made, I had two lines for each table:

--- PROFILE HINTS from dbiInSite (1) statement 4fz1vtn0w8aak:
/*+
OPT_ESTIMATE(@"SEL$2CBA5DDD", TABLE, "EMPLOYEES"@"SEL$1", SCALE_ROWS=2)
OPT_ESTIMATE(@"SEL$58A6D7F6", TABLE, "EMPLOYEES"@"SEL$1", SCALE_ROWS=2)
OPT_ESTIMATE(@"SEL$6AE97DF7", TABLE, "DEPARTMENTS"@"SEL$1", SCALE_ROWS=5.185185185)
OPT_ESTIMATE(@"SEL$58A6D7F6", TABLE, "DEPARTMENTS"@"SEL$1", SCALE_ROWS=5.185185185)
*/

The reason is that when the optimizer do some transformations to the query, then the query block identifiers can change. And when you adjust a cardinality estimation, you must do it for all transformations or you will completely mess up the optimizer choice.

When I do an explain plan which show the query blocks, I have only the SEL$58A6D7F6 one:

SQL> explain plan for
  2  select distinct DEPARTMENT_NAME  from DEPARTMENTS join EMPLOYEES
  3  using(DEPARTMENT_ID)  where DEPARTMENT_NAME like '%ing' and SALARY>20000 ;

Explained.

SQL> select * from table(dbms_xplan.display(format=>'basic +alias'));

PLAN_TABLE_OUTPUT
-------------------------------------------------------------------
Plan hash value: 3041748347
-------------------------------------------------------------------
| Id  | Operation                             | Name              |
-------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |                   |
|   1 |  HASH UNIQUE                          |                   |
|   2 |   NESTED LOOPS SEMI                   |                   |
|   3 |    TABLE ACCESS FULL                  | DEPARTMENTS       |
|   4 |    TABLE ACCESS BY INDEX ROWID BATCHED| EMPLOYEES         |
|   5 |     INDEX RANGE SCAN                  | EMP_DEPARTMENT_IX |
-------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------

   1 - SEL$58A6D7F6
   3 - SEL$58A6D7F6 / DEPARTMENTS@SEL$1
   4 - SEL$58A6D7F6 / EMPLOYEES@SEL$1
   5 - SEL$58A6D7F6 / EMPLOYEES@SEL$1

In order to confirm that the duplicate OPT_ESTIMATE are coming from different transformations, I've generated a 10053 trace and searched for SEL$6AE97DF7:

Registered qb: SEL$6AE97DF7 0x851d8eb8 (DISTINCT PLACEMENT SEL$58A6D7F6; SEL$58A6D7F6; "EMPLOYEES"@"SEL$1")
---------------------
QUERY BLOCK SIGNATURE
---------------------
  signature (): qb_name=SEL$6AE97DF7 nbfros=2 flg=0
    fro(0): flg=0 objn=92595 hint_alias="DEPARTMENTS"@"SEL$1"
    fro(1): flg=1 objn=0 hint_alias="VW_DTP_43B5398E"@"SEL$43B5398E"

that's the Distinct Placement.
let's try the PLACE_DISTINCT hint:

SQL> explain plan for
  2  select /*+ PLACE_DISTINCT(EMPLOYEES) */ distinct DEPARTMENT_NAME  from DEPARTMENTS join EMPLOYEES
  3  using(DEPARTMENT_ID)  where DEPARTMENT_NAME like '%ing' and SALARY>20000 ;

Explained.

SQL> select * from table(dbms_xplan.display(format=>'basic +alias'));

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------
Plan hash value: 2901355344

--------------------------------------------------------------------
| Id  | Operation                              | Name              |
--------------------------------------------------------------------
|   0 | SELECT STATEMENT                       |                   |
|   1 |  HASH UNIQUE                           |                   |
|   2 |   NESTED LOOPS SEMI                    |                   |
|   3 |    TABLE ACCESS FULL                   | DEPARTMENTS       |
|   4 |    VIEW PUSHED PREDICATE               | VW_DTP_43B5398E   |
|   5 |     TABLE ACCESS BY INDEX ROWID BATCHED| EMPLOYEES         |
|   6 |      INDEX RANGE SCAN                  | EMP_DEPARTMENT_IX |
--------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------

   1 - SEL$6AE97DF7
   3 - SEL$6AE97DF7 / DEPARTMENTS@SEL$1
   4 - SEL$9B757045 / VW_DTP_43B5398E@SEL$43B5398E
   5 - SEL$9B757045 / EMPLOYEES@SEL$1
   6 - SEL$9B757045 / EMPLOYEES@SEL$1

Here is where the

OPT_ESTIMATE(@"SEL$6AE97DF7", TABLE, "DEPARTMENTS"@"SEL$1", SCALE_ROWS=5.185185185)
makes sense. The same cardinality adjustment must be done for each transformation that the optimizer is evaluating.

That observation brings me to the following: what happens to your profiles when you upgrade to a version that brings new optimizer transformations? The optimizer will compare some plans with adjusted cardinalities, compared with some plans with non-adjusted cardinalites. And that is probably not a good idea.

In my opinion, SQL Profiles are just like hints: a short term workaround that must be documented and re-evaluated at each upgrade.

Oracle SQL Profiles: Check what they do before accepting them blindly

Fri, 2014-11-07 16:28

People often ask me if I use the SQL Tuning Advisor. The answer is no, at least not through the end of it. But don't take me wrong. SQL Tuning Advisor is a really nice feature. If you like the CBO then you must love the Tuning Advisor, because it is just the CBO with more time to optimize, and more ways to optimize. The fact is that when a customer calls me, they expect a bit more than running an advisor and implement the result. I'm there to fix the root cause, not just workaround some missing stats. And this is why I use it very rarely.

However when I have a big query, with a plan that covers several pages, it takes a lot of time to find what is wrong. The method is often based on comparing the estimated cardinalities with the actual ones. But If I have access to the Tuning Pack, then the SQL Tuning Advisor can help to find very quickly where the estimations are going wrong.

The SQL Tuning Advisor proposes a SQL Profile to adjust the estimations. Then I just have to check the biggest adjustment and I can focus where the estimations are wrong. However, that information is not exposed. The SQL Tuning Advisor report shows the new plan, but not the ways it gets to it.

The goal of this post is to give you the query I use to show exactly what the profile will implement when you accept it.

Even if you're going to accept the profile, It's a good idea to check it before. It will help to choose if you need to enable 'force matching' or not. And if it is a good fix or if there are more sustainable ways to achieve the same.

You probably know that a profile implements the estimation adjustment with the OPT_ESTIMATE hints which adjust it with a 'scale_rows' factor that can apply to tables, index selectivity or joins. They is very well explained on the Pythian blog

So, let's take an exemple. My favorite query to show bad estimations on the HR schema is:

alter session set current_schema=HR;
select distinct DEPARTMENT_NAME
 from DEPARTMENTS join EMPLOYEES using(DEPARTMENT_ID)
 where DEPARTMENT_NAME like '%ing' and SALARY>20000;
For the simple reason that I know that lot of the department names are ending with 'ing' (Marketing, Purchasing,...) but the CBO doesn't know that. And I can't give that information through column statistics or histograms:
SQL> select * from table( dbms_xplan.display_cursor(format=>'rowstats last ') );

PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------------------------
SQL_ID  4fz1vtn0w8aak, child number 0
-------------------------------------
select distinct DEPARTMENT_NAME  from DEPARTMENTS join EMPLOYEES
using(DEPARTMENT_ID)  where DEPARTMENT_NAME like '%ing' and SALARY>20000

Plan hash value: 3041748347

----------------------------------------------------------------------------------------------
| Id  | Operation                             | Name              | Starts | E-Rows | A-Rows |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |                   |      1 |        |      0 |
|   1 |  HASH UNIQUE                          |                   |      1 |      1 |      0 |
|   2 |   NESTED LOOPS SEMI                   |                   |      1 |      1 |      0 |
|*  3 |    TABLE ACCESS FULL                  | DEPARTMENTS       |      1 |      1 |      7 |
|*  4 |    TABLE ACCESS BY INDEX ROWID BATCHED| EMPLOYEES         |      7 |      1 |      0 |
|*  5 |     INDEX RANGE SCAN                  | EMP_DEPARTMENT_IX |      7 |     10 |     55 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("DEPARTMENTS"."DEPARTMENT_NAME" LIKE '%ing')
   4 - filter("EMPLOYEES"."SALARY">20000)
   5 - access("DEPARTMENTS"."DEPARTMENT_ID"="EMPLOYEES"."DEPARTMENT_ID")

Note
-----
   - this is an adaptive plan

Look at E-Rows and A-Rows: I have 7 departments ending with 'ing' but the optimizer thinks that there is only one. In 11g you need a profile to help the optimizer or you're going into an expensive nested loop. This example has few rows, but imagine the consequence when a nested loop is choosen but must execute on millon of rows. In 12c - and if you are in Enterprise Edition - the adaptive plan will help to avoid that situation. As soon as a threshold is reached another plan will be executed.

But even with adaptive plan, there may be a better plan that is possible only with accurate estimations. Let's see what the SQL Tuning Advisor will find.

Running SQL Tuning Advisor

I create and execute the tuning task:

begin
 dbms_output.put_line('task id: '||
 dbms_sqltune.create_tuning_task(
  task_name=>'dbiInSite',
  description=>'dbi InSite workshop Oracle Tuning',
  scope=>dbms_sqltune.scope_comprehensive,
  time_limit=>30,
  sql_id=>'4fz1vtn0w8aak'
  )
 );
 dbms_sqltune.execute_tuning_task('dbiInSite');
end;
/

And show the report:

SQL> set long 1000000 longc 1000000
SQL> select dbms_sqltune.report_tuning_task('dbiInSite') FROM dual;

DBMS_SQLTUNE.REPORT_TUNING_TASK('DBIINSITE')
-------------------------------------------------------------------------------
GENERAL INFORMATION SECTION
-------------------------------------------------------------------------------
Tuning Task Name   : dbiInSite
Tuning Task Owner  : SYS
Workload Type      : Single SQL Statement
Scope              : COMPREHENSIVE
Time Limit(seconds): 30
Completion Status  : COMPLETED
Started at         : 11/08/2014 00:03:22
Completed at       : 11/08/2014 00:03:23

-------------------------------------------------------------------------------
Schema Name: HR
SQL ID     : 4fz1vtn0w8aak
SQL Text   : select distinct DEPARTMENT_NAME
              from DEPARTMENTS join EMPLOYEES using(DEPARTMENT_ID)
              where DEPARTMENT_NAME like '%ing' and SALARY>20000

-------------------------------------------------------------------------------
FINDINGS SECTION (1 finding)
-------------------------------------------------------------------------------

1- SQL Profile Finding (see explain plans section below)
--------------------------------------------------------
  A potentially better execution plan was found for this statement.

  Recommendation (estimated benefit: 33.67%)
  ------------------------------------------
  - Consider accepting the recommended SQL profile.
    execute dbms_sqltune.accept_sql_profile(task_name => 'dbiInSite',
            task_owner => 'SYS', replace => TRUE);

  Validation results
  ------------------
  The SQL profile was tested by executing both its plan and the original plan
  and measuring their respective execution statistics. A plan may have been
  only partially executed if the other could be run to completion in less time.

                           Original Plan  With SQL Profile  % Improved
                           -------------  ----------------  ----------
  Completion Status:            COMPLETE          COMPLETE
  Elapsed Time (s):             .005964           .000177      97.03 %
  CPU Time (s):                 .005999             .0002      96.66 %
  User I/O Time (s):                  0                 0
  Buffer Gets:                       13                 9      30.76 %
  Physical Read Requests:             0                 0
  Physical Write Requests:            0                 0
  Physical Read Bytes:                0                 0
  Physical Write Bytes:               0                 0
  Rows Processed:                     0                 0
  Fetches:                            0                 0
  Executions:                         1                 1

  Notes
  -----
  1. Statistics for the original plan were averaged over 10 executions.
  2. Statistics for the SQL profile plan were averaged over 10 executions.

-------------------------------------------------------------------------------
EXPLAIN PLANS SECTION
-------------------------------------------------------------------------------

1- Original With Adjusted Cost
------------------------------
Plan hash value: 3041748347

-------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name              | Rows  | Bytes | Cost (%CPU)| 
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |                   |     1 |    23 |     7  (15)| 
|   1 |  HASH UNIQUE                          |                   |     1 |    23 |     7  (15)| 
|   2 |   NESTED LOOPS SEMI                   |                   |     1 |    23 |     6   (0)| 
|*  3 |    TABLE ACCESS FULL                  | DEPARTMENTS       |     7 |   112 |     3   (0)| 
|*  4 |    TABLE ACCESS BY INDEX ROWID BATCHED| EMPLOYEES         |     1 |     7 |     1   (0)| 
|*  5 |     INDEX RANGE SCAN                  | EMP_DEPARTMENT_IX |    10 |       |     0   (0)| 
-------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("DEPARTMENTS"."DEPARTMENT_NAME" LIKE '%ing')
   4 - filter("EMPLOYEES"."SALARY">20000)
   5 - access("DEPARTMENTS"."DEPARTMENT_ID"="EMPLOYEES"."DEPARTMENT_ID")

2- Using SQL Profile
--------------------
Plan hash value: 2473492969

---------------------------------------------------------------------------------------------
| Id  | Operation                     | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |             |     1 |    23 |     5  (20)| 00:00:01 |
|   1 |  HASH UNIQUE                  |             |     1 |    23 |     5  (20)| 00:00:01 |
|   2 |   NESTED LOOPS                |             |     1 |    23 |     4   (0)| 00:00:01 |
|   3 |    NESTED LOOPS               |             |     1 |    23 |     4   (0)| 00:00:01 |
|*  4 |     TABLE ACCESS FULL         | EMPLOYEES   |     1 |     7 |     3   (0)| 00:00:01 |
|*  5 |     INDEX UNIQUE SCAN         | DEPT_ID_PK  |     1 |       |     0   (0)| 00:00:01 |
|*  6 |    TABLE ACCESS BY INDEX ROWID| DEPARTMENTS |     1 |    16 |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - filter("EMPLOYEES"."SALARY">20000)
   5 - access("DEPARTMENTS"."DEPARTMENT_ID"="EMPLOYEES"."DEPARTMENT_ID")
   6 - filter("DEPARTMENTS"."DEPARTMENT_NAME" LIKE '%ing')

-------------------------------------------------------------------------------

So even if I have adaptive plan, a better plan is possible. When the optimizer know that there are more rows from DEPARTMENTS than EMPLOYEES, it's better to start with EMPLOYEES, for nested loop as well as for hash join.

Showing the OPT_ESTIMATE hints

Then I don't want to accept the profile yet, but want to see all those OPT_ESTIMATE hints that has been determined by the SQL Tuning Advisor and that will be added to the query when the profile is accepted. Here my script. Note that this script is for 11g and 12c. In 10g the information was stored elsewhere. You can go to Jonathan Levis post for the 10g query.

set serveroutput on echo off
declare
  -- input variables
  input_task_owner dba_advisor_tasks.owner%type:='SYS';
  input_task_name dba_advisor_tasks.task_name%type:='dbiInSite';
  input_show_outline boolean:=false;
  -- local variables
  task_id  dba_advisor_tasks.task_id%type;
  outline_data xmltype;
  benefit number;
begin
  for o in ( select * from dba_advisor_objects where owner=input_task_owner and task_name=input_task_name and type='SQL')
  loop
          -- get the profile hints (opt_estimate)
          dbms_output.put_line('--- PROFILE HINTS from '||o.task_name||' ('||o.object_id||') statement '||o.attr1||':');
          dbms_output.put_line('/*+');
          for r in (
            select hint,benefit from (
             select case when attr5 like 'OPT_ESTIMATE%' then cast(attr5 as varchar2(4000)) when attr1 like 'OPT_ESTIMATE%' then attr1 end hint,benefit
             from dba_advisor_recommendations t join dba_advisor_rationale r using (task_id,rec_id)
             where t.owner=o.owner and t.task_name = o.task_name and r.object_id=o.object_id and t.type='SQL PROFILE'
             and r.message='This attribute adjusts optimizer estimates.'
            ) order by to_number(regexp_replace(hint,'^.*=([0-9.]+)[^0-9].*$','\1'))
          ) loop
           dbms_output.put_line('   '||r.hint); benefit:=to_number(r.benefit)/100;
          end loop;
          dbms_output.put_line('*/');
          -- get the outline hints
          begin
          select outline_data into outline_data from (
              select case when other_xml is not null then extract(xmltype(other_xml),'/*/outline_data/hint') end outline_data
              from dba_advisor_tasks t join dba_sqltune_plans p using (task_id)
              where t.owner=o.owner and t.task_name = o.task_name and p.object_id=o.object_id  and t.advisor_name='SQL Tuning Advisor' --11gonly-- and execution_type='TUNE SQL'
              and p.attribute='Using SQL profile'
          ) where outline_data is not null;
          exception when no_data_found then null;
          end;
          exit when not input_show_outline;
          dbms_output.put_line('--- OUTLINE HINTS from '||o.task_name||' ('||o.object_id||') statement '||o.attr1||':');
          dbms_output.put_line('/*+');
          for r in (
              select (extractvalue(value(d), '/hint')) hint from table(xmlsequence(extract( outline_data , '/'))) d
          ) loop
           dbms_output.put_line('   '||r.hint);
          end loop;
          dbms_output.put_line('*/');
          dbms_output.put_line('--- Benefit: '||to_char(to_number(benefit),'FM99.99')||'%');
  end loop;
  dbms_output.put_line('');
end;
/


And here is the output:

--- PROFILE HINTS from dbiInSite (1) statement 4fz1vtn0w8aak:
/*+
OPT_ESTIMATE(@"SEL$2CBA5DDD", TABLE, "EMPLOYEES"@"SEL$1", SCALE_ROWS=2)
OPT_ESTIMATE(@"SEL$58A6D7F6", TABLE, "EMPLOYEES"@"SEL$1", SCALE_ROWS=2)
OPT_ESTIMATE(@"SEL$6AE97DF7", TABLE, "DEPARTMENTS"@"SEL$1", SCALE_ROWS=5.185185185)
OPT_ESTIMATE(@"SEL$58A6D7F6", TABLE, "DEPARTMENTS"@"SEL$1", SCALE_ROWS=5.185185185)
*/

PL/SQL procedure successfully completed.

This is very interesting information. It says that the actual number of employees in this query (with that specific where clause) is 2 times what is estimated from statistics. And that the estimated number of departments is 5 times what is estimated.

So what it is used for?

That gives me several ways to improve, even without implementing the profile.

First, the number of employees for a 'SALARY>2000' predicate can be improved with histograms. That is better than a profile because it will improve all queries that filter on employee salary.

Then, for DEPARTMENTS, histograms will not help because the bad estimation comes from the LIKE '%ing' predicate and I've no way to give that information with statistics. Ok, let's go for the profile. If you want to implement the profile, will you choose 'force matching' or not? Of course not. The estimation adjustment makes sense only with our '%ing' values. This is were looking at the OPT_ESTIMATE is very important, or you can't do the right choice.

And you may choose something else than profile. Dynamic sampling makes sense in this case. Another solution may be to add OPT_ESTIMATE or CARDINALITY hints in the query.

Conclusion SQL Tuning Advisor is powerful, when used intelligently. It gives ideas about what is wrong and proposes a way to fix it. But you can have more when retrieving the internal hints that the profile generate. Better choice to implement the profile, or alternative solutions. As usual, if you see something wrong or to improve in my query, please comment. Quizz

The answer is left for another post, but if you have an idea, please don't hesitate to comment.
Question is: In the above output, I've two OPT_ESTIMATE rows for each table. what is the reason for that?

Pass Summit 2014: the beginning

Fri, 2014-11-07 08:05

This is our first time with Stéphane Haby at this amazing SQLServer community event. At 7:30 am, we were already in the conference center at the registration desk. They gave us a funny badge :-)

 

blog_20_-_1_-_badge

blog_20_-_11_-_badge

 

So, after breakfast time, we went to the keynote session.

 

blog20_2_passsummit

blog20__3_keynote2

 

During this session we listened to a speech of SQL Pass President Thomas Larock. The SQLPass community is growing over time. The following picture gives you a good idea of the importance of this community this year:

 

blog20_-_4_-_keynote

 

Then, it was the turn of Microsoft to lay out their vision of the future data market. I won't hide from you that Microsoft will focus on Microsoft Azure, which will be a main component of the Microsoft data platform.

 

blog20_-_5_-_keynote3

 

We had a presentation of different services like Azure DocumentDB (NoSQL), Azure HDInsight (with Hadoop), Polybase (bringing the relational world and Hadoop together), Azure search (search as a service in Azure) and the PowerBI stack. So, among all this, what are the news about the SQL Server engine? Microsoft has announced some interesting features like stretch databases and the possibility to add nonclustered columnstore indexes to in-memory tables. The former consists of extending on-premise databases to Azure DBs by storing hot business data on-premise and cold or close business data to Azure DBs. The latter will provide the ability to run analytics queries concurrently with OLTP workload in the same database.

This keynote was a good reminder of the importance of the cloud capabilities and the hybrid perspective in future database architectures. Personally, I don’t think that we can still reject it, but maybe I have to get a good prediction from Azure machine learning :-)

See you soon for the next sessions!

Using tmux for semi-interactive demos

Thu, 2014-11-06 14:36

You are a speaker and like to show some demos. Which kind of demo do you do?

  • you script everything so that you just have to press 'enter' during the demo
  • you prepare everything in a text file and copy/paste during the demo
  • you type everything in live?

The future of SQL*Plus and some oddities

Wed, 2014-11-05 13:19

Rumors are coming after OOW14. Oracle is working on a new SQL*Plus version. Not only a new version. It's a completely new development. Here is what we know currently about the future of sqlplus an sqldeveloper. But I start with some very old stuff.

SQL*Plus

sqlplus is an old tool. With many features that look as oddities today.

You want to see some oddities? Good idea because it can be error-prone. Without knowing them, an error when commenting out a line can be very problematic:

SQL> create table MY_DUAL ( DUMMY varchar2(1) );
Table created.

SQL> insert into MY_DUAL values ('X');
1 row created.

SQL> /*I've inserted only one row, right?*/
1 row created.

SQL> select count(*) from MY_DUAL;

  COUNT(*)
----------
         2

Yes, my comment without a space after the '/*' executed the previous command again, as does the '/'

Another one? Imagine that by mistake you comment a line using the linux comments '#'

SQL> # drop table MY_DUAL;
Table dropped.

The command was executed. The '#' executes the command. What is the reason for that? Imagine I'm typing a long pl/sql and I want to run a describe in order to check table columns. I'm in 1985 so I just can't open a new window.

SQL> declare
  2   cursor c is select * from SCOTT.EMP where
  3  # -- what are the columns of EMP ?
  3  # -- ok let's get them:
  3  # desc SCOTT.EMP
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 EMPNO                                     NOT NULL NUMBER(4)
 ENAME                                              VARCHAR2(10)
 JOB                                                VARCHAR2(9)
 MGR                                                NUMBER(4)
 HIREDATE                                           DATE
 SAL                                                NUMBER(7,2)
 COMM                                               NUMBER(7,2)
 DEPTNO                                             NUMBER(2)

  3    mgr is null;
  4  begin
  5   null;
  6  end;
  7  .
SQL> l
  1  declare
  2   cursor c is select * from SCOTT.EMP where
  3    mgr is null;
  4  begin
  5   null;
  6* end;
SQL>

Great, I was able to run a command without disturbing the statement I was writing. Cool feature in 1985. Risky feature in 2014. Still there for compatibility. Up to you to disable it with:

SQL> set sqlprefix off
SQL> #drop table MY_DUAL;
SP2-0734: unknown command beginning "#drop tabl..." - rest of line ignored.
SQL Developer SQL Developer was a nice evolution. I use it a lot. I don't know enough all the possibilities. For example I discovered only very recently that there is a command line access to SQL Developer which has a few features:
F:\\ora\\sqldeveloper\\sqldeveloper\\bin>sdcli
Available features:
cart: Database Cart Batch Tasks
dba: Basic Batch DBA Tasks
format: SQL Format Task
migration: Database Migration Tasks
reports: Basic Batch Reporting Tasks
unittest: Unit Testing Batch Tasks

More info about it on jeff Smith blog

SQL Developer 4.1

At Oracle Open World, Jeff Smith (Product Manager of SQL Developer) not only organized the Golden Gate Run, but also unveiled a few SQL developer 4.1 new features. It's not only a tool for developers, but you can do many DBA or monitoring tasks from it.

His presentation is here

A new SQL*Plus?

If you follow Kris Rice tweets you have probably seen a few teasers about a new sqlplus coming. Yes, they are completely rewriting sqlplus. Here is an example:

A little more of a preview of what we're building. http://t.co/Xfx01EexRf

— krisrice (@krisrice) October 31, 2014

What we can expect?

  • Auto completion (proposes table names in the same way as linux bash proposes file names)
  • Statement history in the same way we do with our DMK
  • simple command to get the create statements, such as: 'DDL EMP' to show the 'CREATE TABLE EMP ...' statement

Another teaser screenshot:

How about this one? What else should be there? #orclapex pic.twitter.com/J7oLzMGaHd

— krisrice (@krisrice) October 31, 2014

Because, as a consultant, I'm often going to new environments where I have to adapt to the customer tools, I prefer to use the tools that are easily available. sqlplus is always there. sqldeveloper is easy to install (just unzip and it connects through jdbc).SQL Developer is shipped in ORACLE_HOME snce 11g. It's not the latest version but you will be happy to find it if you're not allowed to install anything. And I'm not talking only about customer environments, but if you have to take a hands-on certification exam you will also appreciate ;)

A small update:

it's not a rewrite of sqlplus but rather using the sqldeveloper sqlplus engine.

How to disable all database links

Tue, 2014-11-04 13:36

A frequent scenario: you refresh test from production with a RMAN duplicate. Once the duplicate is done, you probably change dblinks so that they address the test environment instead of the production one. But are you sure that nobody will connect in between and risk to access production from the test environement? You want to disable all db links until you have finished your post-duplicate tasks.

I know two solutions for that. The first one is for 12c only. You can add the NOOPEN to the duplicate statement. Then the duplicate leaves the database in MOUNT and you can open it in restricted mode and do anything you want before opening it to your users.

But if you're still in 11g you want to be able to disable all database links before the open. That can be done in the instance, steeing the open_links parameter to zero in your spfile.

Let's see an example:

SQL> alter system set open_links=0 scope=spfile;
System altered.

I restart my instance:

startup force
ORACLE instance started.
Total System Global Area  943718400 bytes
Fixed Size                  2931136 bytes
Variable Size             641730112 bytes
Database Buffers          188743680 bytes
Redo Buffers                5455872 bytes
In-Memory Area            104857600 bytes
Database mounted.
Database opened.

And here is the result:

SQL> select * from dual@LOOPBACK_DB_LINK;
select * from dual@LOOPBACK_DB_LINK
                   *
ERROR at line 1:
ORA-02020: too many database links in use

With that you prevent any connection through database links until you change them to address the test environment. Then:

SQL> alter system reset open_links;
System altered.

SQL> shutdown immediate;
SQL> startup

and then:

SQL> show parameter open_links

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
open_links                           integer     4
open_links_per_instance              integer     4

SQL> set autotrace on explain
SQL> select * from dual@LOOPBACK_DB_LINK;

D
-
X


Execution Plan
----------------------------------------------------------
Plan hash value: 272002086

----------------------------------------------------------------------------------------
| Id  | Operation              | Name | Rows  | Bytes | Cost (%CPU)| Time     | Inst   |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT REMOTE|      |     1 |     2 |     2   (0)| 00:00:01 |        |
|   1 |  TABLE ACCESS FULL     | DUAL |     1 |     2 |     2   (0)| 00:00:01 |    DB1 |
----------------------------------------------------------------------------------------

Note
-----
   - fully remote statement

SQL> set autotrace off
SQL> select * from V$DBLINK;

DB_LINK
----------------------------------------------------------------------------------------------------
  OWNER_ID LOG HET PROTOC OPEN_CURSORS IN_ UPD COMMIT_POINT_STRENGTH     CON_ID
---------- --- --- ------ ------------ --- --- --------------------- ----------
LOOPBACK_DB_LINK
         0 YES YES UNKN              0 YES YES                     1          0


Yes, that was the occasion to see how to check dblink usage from the execution plan and from V$DBLINK.

Oracle cloud control / SQL Details / Statistics

Wed, 2014-10-29 06:06

A question that I had several times: in Enterprise Manager, in the screen about one SQL statement, the 'statistics' tab shows the number of executions, elapsed time, etc. Question is: which time window does it cover? There is a one hour chart above, and two timestamps displayed as 'First Load Time' and 'Last load Time', and we don't know which one is related with the execution statistics numbers. I'll explain it clearly on an example.

I'll check a query I have on my system which has several cursors, with two different execution plans. And I check from V$SQL because here is where is the most detailed information, and columns are well documented.

From the documentation:

  • FIRST_LOAD_TIME is the Timestamp of the parent creation time
  • LAST_LOAD_TIME is the Time at which the query plan was loaded into the library cache

It's clear that because V$SQL show information about child cursors, the FIRST_LOAD_TIME will be the same for all children.

SQL> select sql_id,plan_hash_value,executions,first_load_time,last_load_time,last_active_time from v$sql where sql_id='dcstr36r0vz0d' order by child_number

SQL_ID        PLAN_HASH_VALUE EXECUTIONS FIRST_LOAD_TIME     LAST_LOAD_TIME      LAST_ACTIVE_TIME
------------- --------------- ---------- ------------------- ------------------- -------------------
dcstr36r0vz0d        17720163         60 2014-10-29/07:01:59 2014-10-29/07:01:59 2014-10-29/13:01:25
dcstr36r0vz0d      3798950322        102 2014-10-29/07:01:59 2014-10-29/07:03:49 2014-10-29/13:05:54
dcstr36r0vz0d      3798950322         24 2014-10-29/07:01:59 2014-10-29/07:05:55 2014-10-29/13:05:54
dcstr36r0vz0d      3798950322          1 2014-10-29/07:01:59 2014-10-29/08:11:19 2014-10-29/08:11:19
dcstr36r0vz0d      3798950322          1 2014-10-29/07:01:59 2014-10-29/08:29:34 2014-10-29/08:29:34

The plan with hash value 17720163 has been executed 60 times since 07:01:59. It was the first child cursor (child_number=0) for that parent, so this is why FIRST_LOAD_TIME=LAST_LOAD_TIME

And, the plan with hash value 3798950322 has been executed 128 times since 07:03:49 by cursors that are not shared but have come to the same plan anyway

Two remarks:

  • FIRST_LOAD_TIME is the same for all children because it is a parent information
  • LAST_LOAD_TIME is different for each child and that's important because Enterprise Manager don't show that detail, aggregating together the children with same execution plan.
Time to look at the Enterprise Manager screen.   I'm talking about the 'Real Time'  statistics:   EMLastLoadTime1.png   and I've selected the plan hash value 17720163:   EMLastLoadTime2.png   Ok. So we have 60 executions here. This matches the line in V$SQL. And we know that is it 60 executions since 07:01:59 because both timestamps are the same. No doubt here.   Then, let's select the other plan hash value from the popup:   EMLastLoadTime3.png   128 executions for this plan. This is what we had when summing the lines from V$SQL. And look at the Shared Cursor Statistics. The number of 'Child Cursors' is 4 which is what we know. The 'First Load Time' is the one of the parent.   However, what is the 'Last Load Time' when we know that there are 4 different values in V$SQL for it? Look, they choose the latest one, 08:29:34, and that's a good choice according to the name. It's the last load time.   But what I want to know is the time from which the 128 executions are counted. And that should be the earliest one. In my example, we know from V$SQL what we had 128 executions since 07:03:49 but that timestamp is not displayed here.   If you want a date, you should take the 'First Load time' because it's true that there were 128 executions of cursors with that plan hash value since 07:01:59   Sometimes the first load time is very old and it would be better to have the MIN(LAST_LOAD_TIME). But anyway if we want better time detail, we can choose the 'Historical' view instead of the 'Real Time' one and we have the numbers related with the AWR snapshots.   Here is an example for the cursor with plan hash value 17720163:   EMLastLoadTime4.png   From the historical view, we select a timestamp, we see the begin and end timestamps. Here I have 10 executions per hour.   Everything looks good there, except that 'Child Cursors' is 5, which is for the whole statement and not only for the cursors selected by the plan hash value.   Then I've two conclusions:
  • 'Last Load Time' is not useful to know the time window covered by the Real Time statistics. Use First 'Load time instead'
  • In case of any doubt, fall back to V$ views which are much more documented, and give more detail.

Oracle Grid Infrastructure: fixing the PRVG-10122 error during installation

Tue, 2014-10-28 02:22

Making errors is human and when you configure a new Oracle Grid Infrastructure environment (especially one with a large number of nodes), mistakes can happen when configuring ASMLIB on all nodes. If you get an error looking like"PRVG-10122 : ASMLib configuration...does not match with cluster nodes", there is a simple solution to fix it.

When you are installing Grid Infrastructure, the following error can occor in cluvfy output or directly in the pre-requisites check step of the OUI:

 

ERROR:
PRVG-10122 : ASMLib configuration value set to configuration parameter "ORACLEASM_UID" on the node "srvora01" does not match with cluster nodes
PRVG-10122 : ASMLib configuration value set to configuration parameter "ORACLEASM_GID" on the node "srvora01" does not match with cluster nodes
PRVG-10122 : ASMLib configuration value set to configuration parameter "ORACLEASM_ENABLED" on the node "srvora01" does not match with cluster nodes
PRVG-10122 : ASMLib configuration value set to configuration parameter "ORACLEASM_UID" on the node "srvora02" does not match with cluster nodes
PRVG-10122 : ASMLib configuration value set to configuration parameter "ORACLEASM_GID" on the node "srvora02" does not match with cluster nodes
PRVG-10122 : ASMLib configuration value set to configuration parameter "ORACLEASM_ENABLED" on the node "srvora02" does not match with cluster nodes
Result: Check for ASMLib configuration failed.

 

The three parameters ORACLEASM_UID, ORACLEASM_GID and ORACLEASM_ENABLED displayed in the error should be defined when configuring ASMLIB on the system (see Grid Infrastructure 12c pre-requisites for installation, configure ASM step). To check if the configuration is coherent between the nodes specified in the error above, run the following command as root on all concerned nodes. In my case, srvora01 and srvora02 are the involved servers:

 

On srvora01

[root@srvora01 ~]# oracleasm configure
ORACLEASM_ENABLED=true
ORACLEASM_UID=grid
ORACLEASM_GID=asmadmin
ORACLEASM_SCANBOOT=true
ORACLEASM_SCANORDER=""
ORACLEASM_SCANEXCLUDE=""
ORACLEASM_USE_LOGICAL_BLOCK_SIZE="false"

 

On srvora02

[root@srvora02 ~]# oracleasm configure
ORACLEASM_ENABLED=false
ORACLEASM_UID=..
ORACLEASM_GID=..
ORACLEASM_SCANBOOT=true
ORACLEASM_SCANORDER=""
ORACLEASM_SCANEXCLUDE=""
ORACLEASM_USE_LOGICAL_BLOCK_SIZE="false"

 

As we can see, it seems that ASMLIB has not been configured on srvora02: ORACLEASM_ENABLED is false, and no UID or GID are provided. Thse are the default values! The parameters are different between the two nodes.

To solve the issue, simply reconfigure ASMLIB on the second node by running the following command with the right parameters:

 

On srvora02

 

[root@srvora02 ~]# oracleasm configure -i

 

Configuring the Oracle ASM library driver.This will configure the on-boot properties of the Oracle ASM library
driver.  The following questions will determine whether the driver is
loaded on boot and what permissions it will have.  The current values
will be shown in brackets ('[]').  Hitting without typing an
answer will keep that current value.  Ctrl-C will abort.

 

Default user to own the driver interface []: grid
Default group to own the driver interface []: asmadmin
Start Oracle ASM library driver on boot (y/n) [n]: y
Scan for Oracle ASM disks on boot (y/n) [y]: y
Writing Oracle ASM library driver configuration: done

 

Now, the parameters are the same between the two nodes:

 

[root@srvora02 ~]# oracleasm configure
ORACLEASM_ENABLED=true
ORACLEASM_UID=grid
ORACLEASM_GID=asmadmin
ORACLEASM_SCANBOOT=true
ORACLEASM_SCANORDER=""
ORACLEASM_SCANEXCLUDE=""
ORACLEASM_USE_LOGICAL_BLOCK_SIZE="false"

 

The Grid Infrastructure installation can now continue. I know that the error message was quite explicit, but after having spent hours to configure all nodes of my cluster, it took me some time to understand my big mistake. I hope this will help you.

SQL Server & memory leak: Are you sure?

Mon, 2014-10-27 22:02

I have recently come across an issue with one of my customer that told me that their SQL Server 2008 R2 instance had a memory leak. These are strong words! The server has 64 GB of RAM and the concerned SQL Server instance is limited to 54 GB according to the max server memory (GB) server option. However, he noticed that SQL Server used 60 GB of memory which did not correspond to the configured max server memory option. What’s going on? Let’s investigate what is causing this memory overhead.

I admit to work mainly on SQL Server 2012 now because most of my customers have upgraded to this version and I had to revisit my skills about memory architecture with SQL Server 2008 R2. Smile

Let’s start by the following question: Is it possible for SQL Server to consume more memory than a configured memory limit? For SQL Server versions older than 2012 the answer is yes and this is directly related to the memory architecture design. From this point on, we will talk exclusively about the SQL Server 2005 / 2008 versions. So, as you certainly know, SQL Server caps only the buffer pool memory area that mainly concerns the single page allocator mechanism.

Let’s have a little deep dive on SQL Server memory architecture here to understand where the single page allocator comes from. My goal is not to provide a complete explanation of the SQL Server memory management with SQLOS but just the concept to understand how to troubleshoot the issue presented in this article.

First of all, threads with SQL Server cannot directly interface with memory. They must go through memory allocators routines from a memory node that know which Windows APIs to use to honor a memory request. The memory node is a hidden component and provides a locality of allocation. Page allocators are one of the memory allocator types and most commonly used with the SQLOS memory manager because they allocate memory in multiple of SQLOS's page – (payload = 8KB). Because memory node is a hidden component threads cannot use it directly and it must create a memory object that has its own memory clerk depending on its type.

A memory clerk is another component that provides data caching, memory control (in case of memory pressure for instance), memory usage statistics tracking capabilities and supports the same type of memory allocators than a memory node. There are several type of memory allocators like single page allocator that can only provide one page at the time, multipage allocator that provide a set of pages at time and others.

To summarize, when a request is asking memory to create a memory object, it goes to the concerned memory clerk which in turn requests the concerned memory allocator. Here a simple representation of memory allocation:

 

{Thread} -> {Memory object} --> {Memory Clerk} --> {Memory allocator in Memory Node}

 

The most interesting part here is that the buffer pool in SQL Server versions older than 2012 acts as both a memory clerk and a consumer. It means that it can provide single pages from others consumers and track its own memory consumption. You can read the detailed explainations in the excellent articles from Slava Oks here.

This is why we have only control of the buffer pool size by configuring the min. / max. server memory options because it provide single page allocations unlike other memory clerks with multi-page allocations. Note that SQL Server 2012 memory management is completely different and fortunately, we have a better control of the memory limitation with SQL Server.

So, now that we know exactly what is capped by the “max. server memory server” level option, let's go back to my problem: Unfortunately I cannot reveal the real context of my customer here, but no problem: I am able to simulate the same problem.

In my scenario, I have a SQL Server 2008 R2 instance capped to 2560 MB and the total memory on my server is 8192MB. If I take a look at the task manager panel like my customer I can see that SQL Server uses more than my configured value MB:

 

Blog_19_-_1_-_task_manager_

 

A quick math tells us that SQL Server is using 3.4 GB rather than 2.5 GB. Well, where then does the remaining part come from? If we take a look at the DMV sys.dm_os_sys_info, we can confirm that the buffer pool is capped to 2560MB as expected (single page allocations):

 

SELECT        physical_memory_in_bytes / 1024 / 1024 AS physical_memory_MB,        bpool_committed / 128 AS bpool_MB,        bpool_commit_target / 128 AS bpool_target_MB,        bpool_visible / 128 AS bpool_visible_MB FROM sys.dm_os_sys_info(nolock)

 

 

Blog_19_-_2_-_sys_dm_os_sys_info_

 

Go ahead and take a look at the multipage allocators statistics information related on memory clerks by using the DMV sys.dm_os_memory_clerks (remember that memory clerks have memory usage capabilities):

 

SELECT        name AS clerk_name,        memory_node_id,        sum(single_pages_kb) / 1024 as single_page_total_size_mb,        sum(multi_pages_kb) / 1024 as multi_page_total_size_mb,        sum(awe_allocated_kb) / 1024 as awe_allocaed_size_MB FROM sys.dm_os_memory_clerks(nolock) WHERE memory_node_id 64 group by memory_node_id, name HAVING SUM(multi_pages_kb) > 0 ORDER BY sum(single_pages_kb) + sum(multi_pages_kb) + sum(awe_allocated_kb) DESC;

 

Blog_19_-_3_-_sys_dm_os_memory_clerks_

 

Note that in my case (same case as for my customer, but with a different order of magnitude), we have a particular memory clerk with a multipage allocator size greater than others. A quick math (2560 + 863 = 3423 MB) can confirm most part of the memory overhead in this case. So, at this point we can claim that SQL Server does not suffer from any memory leaks. This is a normal behavior 'by design'.

Finally, let's go back to the root cause of my customer that comes from the memory clerk related to the TokenAndPermUserStore cache that grows with time (approximatively 6GB of memory overhead). What exactly is the TokenAndPermUserStore cache?

Well, this is a security cache that maintains the different security token types like LoginToken, TokenPerm, UserToken, SecContextToken, and TokenAccessResult generated when a user executes a query. The problem can occur when this cache store grows, the time to search for existing security entries to reuse increases, causing potentially slower query times because access to this cache is controlled by only one thread (please refer to the Microsoft KB 927396). In my demo we can also notice a lot of different entries related on the TokenAndPermUserStore cache by using the following query:

 

WITH memory_cache_entries AS (        SELECT              name AS entry_name,              [type],              in_use_count,              pages_allocated_count,              CAST(entry_data AS XML) AS entry_data        FROM sys.dm_os_memory_cache_entries(nolock)        WHERE type = 'USERSTORE_TOKENPERM' ), memory_cache_entries_details AS (        SELECT              entry_data.value('(/entry/@class)[1]', 'bigint') AS class,              entry_data.value('(/entry/@subclass)[1]', 'int') AS subclass,              entry_data.value('(/entry/@name)[1]', 'varchar(100)') AS token_name,              pages_allocated_count,              in_use_count        FROM memory_cache_entries ) SELECT        class,        subclass,        token_name,        COUNT(*) AS nb_entries FROM memory_cache_entries_details GROUP BY token_name, class, subclass ORDER BY nb_entries DESC;

 

 

Blog_19_-_4_-_TokenAndPermUserStoreDetails_

 

The situation above was very similar to my customer issue but we didn’t find any relevant related performance problems, only an important number of SQL Server logins and ad-hoc queries used by the concerned application.

The fix consisted of flushing entries from the TokenAndPermUserStore cache according to the workaround provided by Microsoft in the Microsoft KB 927396. I hope it will be a temporary solution, the time to investigate from the side of the application code, but this is another story!

However here the moral of my story: check carefully how SQL Server uses memory before saying that it suffers from memory leak Smile

Oracle: an unexpected lock behavior with rollback

Sun, 2014-10-26 15:20

Here is an odd Oracle behavior I observed when a transaction that acquired a lock is rolled back. Note that this is related to a specific combination of locks that should not be encountered in production. So it's not a bug. Just something unexpected.

 

First session

In my first session I lock the DEPT table in share mode (RS)

20:56:56 SQL1> lock table dept in row share mode;
Table(s) Locked.

My first session (SID=53) has acquired a TM lock in mode 2 (row share):

20:56:56 SQL1> select object_name,session_id,LOCKED_MODE from v$locked_object join dba_objects using(object_id);

OBJECT_NAME SESSION_ID LOCKED_MODE
----------- ---------- -----------
DEPT                53           2

 

Second session

In my second session I lock the DEPT table in share + row exclusive mode (SRX). This is compatible with the RS.

20:56:59 SQL2> lock table dept in share row exclusive mode;
Table(s) Locked.

My second session (SID=59) has acquired a TM lock in mode 5 (share + row exclusive):

20:56:59 SQL2> select object_name,session_id,LOCKED_MODE from v$locked_object join dba_objects using(object_id);

OBJECT_NAME SESSION_ID LOCKED_MODE
----------- ---------- -----------
DEPT                59           5
DEPT                53           2

Then I rollback my transaction

20:56:59 SQL2> rollback;

My lock was released and I've only the one from Session 1 (SID=53):

OBJECT_NAME SESSION_ID LOCKED_MODE
----------- ---------- -----------
DEPT                53           2

Now comes the funny part. I run exactly the same SSX lock, which was immediately acquired before:

21:14:30 SQL2> lock table dept in share row exclusive mode wait 5 ;

But now it hangs. Let's see the blocking tree with utllockt.sql:

WAITING_SESSION   LOCK_TYPE         MODE_REQUESTED MODE_HELD      LOCK_ID1          LOCK_ID2
----------------- ----------------- -------------- -------------- ----------------- -----------------
53                None
   59             DML               Exclusive      Row-S (SS)     94228             0

What? My session wants to acquire an Exclusive lock? I've never requested that.
And finally it fails because my Session 1 has a RS lock which prevents exclusive locks.

lock table dept in share row exclusive mode wait 5
*
ERROR at line 1:
ORA-00054: resource busy and acquire with NOWAIT specified or timeout expired

 

Third session

But there is worse. I disconnect my second session and try to do the same from a third one. And I even try only a RS lock:

21:15:20 SQL3> lock table dept in row share mode wait 5 ;

And I'm blocked again. It seems that because a session had acquired a SRX lock and has rolled back, while another session held a RS one, then any new transaction that wants to acquire any lock must acquire temporarily an exclusive one before.

 

Event 10704 "Print out information about what enqueues are being obtained"

In order to get further I traced the locks with event 10704 (see how in a previous blog).

Here is the trace when it failed, filtering on '00017014' which is the object_id of DEPT in hexadecimal:

ksqgtl *** TM-00017014-00000000-00000000-00000000 mode=2 flags=0x400 timeout=5 ***
ksqrcl: TM-00017014-00000000-00000000-00000000
ksqgtl *** TM-00017014-00000000-00000000-00000000 mode=6 flags=0x400 timeout=5 ***
ksqcmi: TM-00017014-00000000-00000000-00000000 mode=6 timeout=5
ksqcmi: returns 51
ksqgtl: RETURNS 51
ksqrcl: returns 0

My 'lock table dept in row share mode' acquires a mode 2 (which is the 'row share' mode) but then released it and tried to acquire a mode 6 (which is exclusive)

And then here is a trace when it was successful, after the transaction in Session 1 has been committed:

ksqgtl *** TM-00017014-00000000-00000000-00000000 mode=2 flags=0x400 timeout=5 ***
ksqrcl: TM-00017014-00000000-00000000-00000000
ksqgtl *** TM-00017014-00000000-00000000-00000000 mode=6 flags=0x400 timeout=5 ***
ksqcmi: TM-00017014-00000000-00000000-00000000 mode=6 timeout=5
ksqcmi: returns 0
ksqgtl: RETURNS 0
ksqgtl *** TM-00017014-00000001-00000000-00000000 mode=3 flags=0x400 timeout=5 ***
ksqgtl: RETURNS 0
ksqcnv: TM-00017014-00000000-00000000-00000000 mode=2 timeout=5
ksqcmi: TM-00017014-00000000-00000000-00000000 mode=2 timeout=5
ksqcmi: returns 0
ksqrcl: TM-00017014-00000001-00000000-00000000
ksqrcl: returns 0

So it it did the same, but that time the mode 6 can be acquired. Then we see a conversion to mode 2 which is the RS we wanted.

Finally I tried with all other combinations of locks, but it seems that only that one (RS then rolled back SRX) show that behavior. I tried also with DML instead of 'lock table' statement but the Share lock acquired by DML (the non-indexed foreign key case) is released immediately so I cannot rollback it. And a failure in the statement do not trigger the same behaviour.

Final note: the only reference I've find for that behavior is this post on oracle-l

Documentum Multiple ADTS: Ratio of rendition creations between instances (Part 1)

Fri, 2014-10-24 04:34

I recently realized there are very few information about multiple ADTS instances installation and management. And we have a customer which has two ADTS running for one docbase. I therefore decided to write a couple of blog posts on "Multiple ADTS", since it is likely to be helpful for someone in the same situation.

 

Rendition processing & ADTS instance

This first blog post will tell you how to find out which rendition has been processed by which ADTS instance. Hence, we will be able to calculate what the ratio of processing between two (or more) servers is. The blog post will be split in two parts as I know two ways of getting the same result.

The best architecture is to have 2 dedicated servers for the two different ADTS instances. Let assume we have SERVER1 and SERVER2. Each one is hosting an instance of ADTS with the same version and patch level. We will see in another blog how to configure both servers to allow a kind of load balancing.

You have to find some audit or log in the docbase to figure out when a rendition was requested. When a rendition is requested, it is posted in the dmi_queue_items queue, but you may already be aware of this. If not, you can take a short look at my other blog post (http://www.dbi-services.com/index.php/blog/entry/documentum-useful-queries-dqliapi) which presents some DQLS to find what is pending in the rendition queue.

But unfortunately, when the rendition is done, the queued item is deleted and all your hope of getting some information vanishes with it. We can get a little hint if the rendition failed as it is kept in this queue, but with the attribute event set to DTS (go to my other post for more info). But still, it doesn't show which server failed it.

 

CTS Report

In this part I will talk about the CTS report method. In the next part of this blog post, I will introduce the dmr_content method.

If you heard about Digital Asset Manager you may be on the way. It is an EMC tool which can be used to generate reports of ADTS activities. If you can get it, just use it, as it is exactly what you need. But in this post, I will show you how to do without. So, just like Digital Asset Manager, you will have to enable a specific option in the DA which is not enabled by default. It is some kind of audit of ADTS activities and it will populate a table with rendition information and, as you may guess, with information of which server processed each rendition.

Login to your DA as installation owner and navigate to Content Transformation Services and then to CTS Reporting Configuration. You should see the following form:

MA-RR-01_config_report.PNG

Now check Reporting Configuration to "On", select Archiving Configuration mode and put 7 for Archiving Interval. You just configured your repository to write an entry each time a rendition is done by whichever server with a seven days period before archiving. It means that each time a rendition is done, an entry will be stored in dm_cts_response table and after seven days (starting now), all entries will be stored in dmc_cts_request_archiving. With DQL, you can look at these tables which should be empty because no rendition was done yet.

 

select * from dm_cts_response;
select * from dmc_cts_request_archiving;

 

After a period of time, let's say 5 days, some entries have been added to the first table. Only if you have rendition activities of course. And you can check which server is involved in the process thanks to the following query:

 

select cts_server_name, trans_completed_time from dm_cts_response order by trans_completed_time desc;

MA-RR-01_config_report2.PNG

Conclusion

These tables are used by the EMC tool Digital Asset Manager and if you know a little DQL, you can build some stats without buying the tool. However there is a drawback for this method: In fact, you enable some kind of audit trace which will populate 2 tables in the repository. Hence, if you have a lot of rendering processes, the tables could grow fast and it will need space. In addition, you will not be able to trace older renditions as everything is stored only since you activated the report.

But it's quite useful as you can get the rendition ratio in the following way:

 

select cts_server_name, count(*) as val from dm_cts_response group by cts_server_name;

From 8.0 extended rowid to 12c pluggable db: Why Oracle Database is still a great software

Fri, 2014-10-24 03:31

Do you know why Oracle Database is a leader in the database market since 30 years? Do you know any other software that is still the leading solution after decades? It think that it is because Oracle has been a good software from the get-go. Many early decisions in the software architecture have revealed themselves later to be the right decision. Several decisions, such as the C language that made it portable to all platforms that were relevant during those 30 years, or the parallel server that has brought RAC when standalone servers were not able to scale anymore. Here, I will illustrate a decision made 15 years ago that has made the whole 12c pluggable databases stuff possible.

 

Oracle 7 rowid

The ROWID is the physical address of a row in the database. Physically, a database is just a set of files where data is stored as rows in blocks. So, at the time of Oracle 7, the ROWID was just:

  • the file number
  • the block number in that file
  • the row number in that block
That was printed in hexadecimal when we selected the ROWID pseudo-column:  

CaptureOracle7Rowid.PNG

 

Here my rows are numeroted from 0 to 3 and are all in the file 2 at block offset 58 (0x3a)

At that time, the maximum number of datafiles in a database was 1022. That was enough until the datawarehouse days brought the need for Very Large DataBases.

 

Relative file number

In Oracle 8 they wanted to increase the possible number of datafile without changing the rowid (which would have involved updating all blocks during an upgrade). And here is how they did:

1. they introduced the 'relative file number'. The file identification part is not unique for the database but only for the tablespace. That means that you can have 1022 datafiles per tablespace instead of 1022 datafiles per database.

2. they kept the relative_fno equal to the absolute file_id for the first 1022 datafiles, so that an Oracle 7 datafile is still compatible: the number that was the absolute file_id being now the relative_fno.

3. when going over the 1022 datafiles per database, the file_id can go beyond (it is not stored in the rowid) but the relative_fno just recycles to 1.

The consequence is that a rowid is not unique anymore. When we have to look for a row by its rowid, we must know on which tablespace it is, because the rowid identification is related to the tablespace.

No problem. Where are the rowid used? In chained rows, to get the other piece. A table cannot store parts of its row in different tablespace, so we know that if we are in tablespace USERS and have a pointer to a chained row, it has to be in the USER tablespace as well. They are used in indexes as well. And that's the same: if we unique scan the index EMP_PK to get the rowid of the EMP row, we know that it is in the USERS tablespace because we know that EMP table is in the USERS tablespace.

However, a new feature appeared at that time: a table can be partitioned. We cannot say anymore that EMP rows are in USERS tablespace because EMP may be partitioned over several tablespaces. That does not concern chained rows. That does not concern non-partitioned tables. And with partitioned tables, that does not concern local indexes because local index partitions always address rows in one table partition.

This is where Oracle has introduced an extended rowid, for global indexes on partitioned tables, which is larger and is able to give the tablespace information in addition to the relative file number. It is called extended as opposite to 'restricted' rowid which is restricted to cases where we know the tablespace.

They could have choosen to store only the tablespace number. But they choose to store the object id instead, which - from the dictionary - can be used to know the tablespace. People were puzzled by that choice at that time, but it has been revealed later as a great software architecture decision because this is what allows us to have pluggable databases 15 years later.

But before that, I have to introduce another 8.0 change, which is the data object id.

 

Data object id

In Oracle 7, there is only one object_id. Because there is a one-to-one relation between the logical object (table) and the physical object (segment). That has changed with the introduction of partitioning, where one table may have several partitions.

Partitions are also logical objects, and each of them has a unique object_id. But once again, the software architects made a very good decision by separating the identification of the physical object: they introduced the data_object_id. When you create a table, the data_object_id of the segment is equal to the object_id of the table. But that can change.

For example, what happens when you truncate a table? You just reset the high water mark without going to all the blocks. And you can insert new rows that may go into the same place. But how are you sure that concurrent users will not mix old blocks with new ones ? Remember that the reader do not lock anything, so they didn't notice your truncate. The truncate changes the data object_id as if it were a new segment, but still related to the same logical table.

And with partitioning you can exchange partition: the logical object_id changes but the physical data_object_id is still the same because it is still the same segment.

It's not always easy to know if we are dealing with an object_id or a data_object_id. Basically, things that are logical (for example lock table or lock partition) show the object_id and they are - depending on the place - called OBJECT_ID, OBJ# or OBJN. When it is related to the physical segment (blocks in buffer cache, block corruption) we see DATA_OBJECT_ID, DATAOBJ# or OBJD or even OBJ or OBJECT_NUMBER.

When I'm not sure about what to use in my scripts, I test it on a table that has been truncated at least once, so that both values are different. I advise you to always test your scripts on a table that has been truncated and on a tablespace that has been transported.

 

Oracle 8 extended rowid

So I said that the extended rowid does not store the tablespace id. It stores the data_object_id, which is associated with one and only one tablespace. The format of the extended rowid is:

  • the data object id (which gives the tablespace from the data dictionary)
  • the relative file number (relative to the tablespace)
  • the block number in that file
  • the row number in that block

and it is needed only when we don't navigate from an object which can be used to define exactly which tablespace can contain the segment.

Now let's understand why the Oracle software architects have decided to store the data_object_id instead of the tablespace number. By doing that, they made the tablespaces physically independant of the database.

 

Pluggable tablespaces

Yes, I know it's called transportable tablespaces. But it was the premise of pluggable database. Anyway, pluggable databases are just transportable tablespaces that include the system tablespace (which contain the metadata for the other tablespaces).

You can transport a tablespace from another database and plug it in you database. Of course, the absolute file number will change as it is assigned by your database. The tablespace number will change. But the relative file numbers - relative to the tablespace - do not have to change. 

And of course the object_id will change: a new one will be used when importing the metadata. But the data_object_id do not have to change. The reason is that data_object_id is not expected to be unique in the database. It must be unique only whithin the object (two partitions of the same table cannot have the same data_object_id).

 

Oracle 8 extended rowid was designed for pluggable databases

And this is where those early decisions have all their meaning. You can plug a tablespace and the rowid of the rows in that tablespace do not have to be updated. This is what makes it a quick operation because only the dictionary and the file headers have to be updated. The time depends on the number of objects, but not on the volume of data. The agility brought by pluggable databases in 2013 were actually designed in 1997.

 

rowid is not unique

So I said that rowid is not unique? Let's prove it. I create a table DEMO2 in tablespace DEMO1, export that tablespace, rename the table to DEMO2 to DEMO1, import that tablespace as DEMO2, so that I have now two tables DEMO1 and DEMO2 respectively in tablespaces DEMO1 and DEMO1.

Here is my table:

 

SQL> select dname,rowid
  2   ,dbms_rowid.rowid_object(rowid) data_object_id
  3   ,dbms_rowid.rowid_relative_fno(rowid) relative_fno
  4   ,dbms_rowid.rowid_block_number(rowid) block_id
  5   ,dbms_rowid.rowid_row_number(rowid) row_number
  6   from DEMO2
  7  /

DNAME          ROWID              DATA_OBJECT_ID RELATIVE_FNO   BLOCK_ID ROW_NUMBER
-------------- ------------------ -------------- ------------ ---------- ----------
ACCOUNTING     AAAX7uAACAAAACEAAA          98030            2        132          0
RESEARCH       AAAX7uAACAAAACEAAB          98030            2        132          1
SALES          AAAX7uAACAAAACEAAC          98030            2        132          2
OPERATIONS     AAAX7uAACAAAACEAAD          98030            2        132          3

 

I export the tablespace with transportable tablespaces:

 

SQL> host expdp '"/ as sysdba"' transport_tablespaces=DEMO1
Export: Release 12.1.0.2.0 - Production on Fri Oct 24 15:33:35 2014
Copyright (c) 1982, 2014, Oracle and/or its affiliates.  All rights reserved.
Connected to: Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options
Starting "SYS"."SYS_EXPORT_TRANSPORTABLE_01":  "/******** AS SYSDBA" transport_tablespaces=DEMO1
Processing object type TRANSPORTABLE_EXPORT/PLUGTS_BLK
Processing object type TRANSPORTABLE_EXPORT/TABLE
Processing object type TRANSPORTABLE_EXPORT/TABLE_STATISTICS
Processing object type TRANSPORTABLE_EXPORT/STATISTICS/MARKER
Processing object type TRANSPORTABLE_EXPORT/POST_INSTANCE/PLUGTS_BLK
Master table "SYS"."SYS_EXPORT_TRANSPORTABLE_01" successfully loaded/unloaded
******************************************************************************
Dump file set for SYS.SYS_EXPORT_TRANSPORTABLE_01 is:
  /u02/app/oracle/admin/DEMO/dpdump/expdat.dmp
******************************************************************************
Datafiles required for transportable tablespace DEMO1:
  /tmp/demo1.dbf
Job "SYS"."SYS_EXPORT_TRANSPORTABLE_01" successfully completed at Fri Oct 24 15:34:35 2014 elapsed 0 00:00:59
SQL> host cp '/tmp/demo1.dbf' '/tmp/demo2.dbf'
SQL> alter tablespace DEMO1 read write;
Tablespace altered.

Then import it to DEMO2 tablespace (after renaming my previous table)

 

SQL> rename DEMO2 to DEMO1;
SQL> host impdp '"/ as sysdba"' transport_datafiles='/tmp/demo2.dbf ' remap_tablespace='DEMO1:DEMO2';
Import: Release 12.1.0.2.0 - Production on Fri Oct 24 15:34:35 2014

Copyright (c) 1982, 2014, Oracle and/or its affiliates.  All rights reserved.

Connected to: Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options
Master table "SYS"."SYS_IMPORT_TRANSPORTABLE_01" successfully loaded/unloaded
Starting "SYS"."SYS_IMPORT_TRANSPORTABLE_01":  "/******** AS SYSDBA" transport_datafiles=/tmp/demo2.dbf  remap_tablespace=DEMO1:DEMO2
Processing object type TRANSPORTABLE_EXPORT/PLUGTS_BLK
Processing object type TRANSPORTABLE_EXPORT/TABLE
Processing object type TRANSPORTABLE_EXPORT/TABLE_STATISTICS
Processing object type TRANSPORTABLE_EXPORT/STATISTICS/MARKER
Processing object type TRANSPORTABLE_EXPORT/POST_INSTANCE/PLUGTS_BLK
Job "SYS"."SYS_IMPORT_TRANSPORTABLE_01" successfully completed at Fri Oct 24 15:34:49 2014 elapsed 0 00:00:13
SQL> alter tablespace DEMO2 read write;
Tablespace altered.

 

Now I have 2 tables with different object_id but same data_object_id:

 

SQL> select object_name,object_id,data_object_id from user_objects where object_name like 'DEMO_';

OBJECT_NAM  OBJECT_ID DATA_OBJECT_ID
---------- ---------- --------------
DEMO2           98076          98030
DEMO1           98029          98030

 

And 2 segments in different files (file_id) but same relative_fno:

 

SQL> select segment_name,tablespace_name,file_id,relative_fno,block_id from dba_extents where segment_name like 

SEGMENT_NAME    TABLESPACE    FILE_ID RELATIVE_FNO   BLOCK_ID
--------------- ---------- ---------- ------------ ----------
DEMO1           DEMO1               2            2        128
DEMO2           DEMO2               4            2        128

 

I update the rows so that I be sure to select on different tables

 

SQL> update DEMO1 set dname=upper(dname);
4 rows updated.
SQL> update DEMO2 set dname=lower(dname);
4 rows updated.

 

And now showing the ROWID from both tables:

 

SQL> select dname,rowid
  2   ,dbms_rowid.rowid_object(rowid) data_object_id
  3   ,dbms_rowid.rowid_relative_fno(rowid) relative_fno
  4   ,dbms_rowid.rowid_block_number(rowid) block_id
  5   ,dbms_rowid.rowid_row_number(rowid) row_number
  6   from DEMO1
  7  /

DNAME          ROWID              DATA_OBJECT_ID RELATIVE_FNO   BLOCK_ID ROW_NUMBER
-------------- ------------------ -------------- ------------ ---------- ----------
ACCOUNTING     AAAX7uAACAAAACEAAA          98030            2        132          0
RESEARCH       AAAX7uAACAAAACEAAB          98030            2        132          1
SALES          AAAX7uAACAAAACEAAC          98030            2        132          2
OPERATIONS     AAAX7uAACAAAACEAAD          98030            2        132          3

SQL> select dname,rowid
  2   ,dbms_rowid.rowid_object(rowid) data_object_id
  3   ,dbms_rowid.rowid_relative_fno(rowid) relative_fno
  4   ,dbms_rowid.rowid_block_number(rowid) block_id
  5   ,dbms_rowid.rowid_row_number(rowid) row_number
  6   from DEMO2
  7  /

DNAME          ROWID              DATA_OBJECT_ID RELATIVE_FNO   BLOCK_ID ROW_NUMBER
-------------- ------------------ -------------- ------------ ---------- ----------
accounting     AAAX7uAACAAAACEAAA          98030            2        132          0
research       AAAX7uAACAAAACEAAB          98030            2        132          1
sales          AAAX7uAACAAAACEAAC          98030            2        132          2
operations     AAAX7uAACAAAACEAAD          98030            2        132          3

 

Conclusion: I have in my database two different tables with sames rowid because it is a physical copy. Only the data dictionary makes the difference.