Yann Neuhaus

Subscribe to Yann Neuhaus feed
dbi services technical blog
Updated: 12 hours 18 min ago

PostgreSQL 11: Procedures and transaction control

Tue, 2018-03-20 11:58

Up to PostgreSQL 10 it was not possible to create procedures in PostgreSQL. Of course you can create functions which do not return anything but the possibility to create a procedure was not there. That will probably change in PostgreSQL 11 when nothing happens which will lead to the removal of that commit. On top of that there was another commit which enables transaction control inside procedures which can be quite interesting. Lets see how that works.

Up to PostgreSQL 10 the only choice to have something like a procedure is to create a function returning void:

postgres=# select version();
                                                          version                                                           
----------------------------------------------------------------------------------------------------------------------------
 PostgreSQL 10.0 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-16), 64-bit
(1 row)

postgres=# CREATE FUNCTION dummy_func (id int) RETURNS VOID AS $$
postgres$# DECLARE 
postgres$# BEGIN
postgres$#   RAISE NOTICE 'id is %', id;
postgres$# END;
postgres$# $$  LANGUAGE plpgsql;
CREATE FUNCTION
postgres=# set client_min_messages = 'NOTICE';
SET
postgres=# select dummy_func(1);
NOTICE:  id is 1
 dummy_func 
------------
 
(1 row)

When you tried to do something like this it was not possible:

postgres=# CREATE PROCEDURE dummy_proc (id int) AS $$
postgres$# DECLARE 
postgres$# BEGIN
postgres$#   raise notice 'id is %', id;
postgres$# END;
postgres$# $$  LANGUAGE plpgsql;
ERROR:  syntax error at or near "PROCEDURE"
LINE 1: CREATE PROCEDURE dummy_proc (id int) AS $$

But now in the PostgreSQL development version you can do it (CREATE PROCEDURE):

postgres=# select version();
                                                  version                                                   
------------------------------------------------------------------------------------------------------------
 PostgreSQL 11devel on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-16), 64-bit
(1 row)

postgres=# CREATE PROCEDURE dummy_proc (id int) AS $$
postgres$# DECLARE 
postgres$# BEGIN
postgres$#   raise notice 'id is %', id;
postgres$# END;
postgres$# $$  LANGUAGE plpgsql;
CREATE PROCEDURE
postgres=# call dummy_proc(1);
NOTICE:  id is 1
CALL

Also notice that you need to use call to execute a procedure. Using select as you would do it for a function will not work:

postgres=# select dummy_proc(1);
ERROR:  dummy_proc(integer) is a procedure
LINE 1: select dummy_proc(1);
               ^
HINT:  To call a procedure, use CALL.

On top of the commit which introduced procedures there was another one (see the beginning of this post) which introduced transaction control for procedures. As the name implies this can be useful when you want to control transactions inside the procedure, e.g.:

postgres=# create table t1 ( a int primary key );
CREATE TABLE
postgres=# CREATE OR REPLACE PROCEDURE dummy_proc2 (id int) AS $$
postgres$# DECLARE 
postgres$#   l_id int := id;
postgres$# BEGIN
postgres$#   raise notice 'id is %', l_id;
postgres$#   insert into t1 (a) values (l_id);
postgres$#   commit;
postgres$#   l_id := l_id + 1;
postgres$#   raise notice 'id is %', l_id;
postgres$#   insert into t1 (a) values (l_id);
postgres$#   commit;
postgres$#   raise notice 'id is %', l_id;
postgres$#   insert into t1 (a) values (l_id);
postgres$# END;
postgres$# $$  LANGUAGE plpgsql;

When you execute this it will violate the primary key with the third insert:

postgres=# call dummy_proc2(1);
NOTICE:  id is 1
NOTICE:  id is 2
NOTICE:  id is 2
ERROR:  duplicate key value violates unique constraint "t1_pkey"
DETAIL:  Key (a)=(2) already exists.
CONTEXT:  SQL statement "insert into t1 (a) values (l_id)"
PL/pgSQL function dummy_proc2(integer) line 13 at SQL statement

… but because we can now commit (or rollback) in between we did not lose all the records:

postgres=# select * from t1;
 a 
---
 1
 2
(2 rows)

Trying to do the same with a function in PostgreSQL 10 will not work:

postgres=# create table t1 ( a int primary key );
CREATE TABLE
postgres=# CREATE FUNCTION dummy_func2 (id int) RETURNS VOID AS $$
postgres$# DECLARE 
postgres$#   l_id int := id;
postgres$# BEGIN
postgres$#   raise notice 'id is %', l_id;
postgres$#   insert into t1 (a) values (l_id);
postgres$#   commit;
postgres$#   l_id := l_id + 1;
postgres$#   raise notice 'id is %', l_id;
postgres$#   insert into t1 (a) values (l_id);
postgres$#   commit;
postgres$#   raise notice 'id is %', l_id;
postgres$#   insert into t1 (a) values (l_id);
postgres$# END;
postgres$# $$  LANGUAGE plpgsql;
CREATE FUNCTION
postgres=# select dummy_func2(1);
NOTICE:  id is 1
ERROR:  cannot begin/end transactions in PL/pgSQL
HINT:  Use a BEGIN block with an EXCEPTION clause instead.
CONTEXT:  PL/pgSQL function dummy_func2(integer) line 7 at SQL statement
postgres=# select * from t1;
 a 
---
(0 rows)

Nice features …

 

Cet article PostgreSQL 11: Procedures and transaction control est apparu en premier sur Blog dbi services.

redhat event Ansible Automates Zurich

Tue, 2018-03-20 10:08

Here are some fresh personal impressions from the “Ansible Automates Zurich” half day event organized by redhat.
There was a overwhelming interest for this event. About 200 people filled the room and there were many more on the waiting list.
The event started with a networking lunch. My impression was, that the visitors were not only interested in the food, but also gaining some practical knowledge about Ansible. Most of them are already using Ansible.

Eduard Modalek from redhat opened the Event. He was very impressed with the interest in the event.
AnsibleAutomates ZurichSacha Dubois who is a great speaker with solid technical background gave an introduction about Ansible and the impact to your team and enterprise. “Ansible loves the repetitive work your people hate.”

Ansible loves the repetitive work your people hate

Ansible community downloads

The strength of Ansible is the community. Enjoy the  statistics. Ansible has

  • more than 27’000 stars on GitHub
  • over 1250 modules and
  • more than 500’000 downloads a month

 

Christian Sanabria and Marc Frauenfelder from ipt – Innovation Process Technology showed how they do Continuous Delivery by combining Jenkins Pipeline with the Ansible Tower API for provisioning. They published the code in GitHub:
https://github.com/mfrauenfelder/ansible/blob/master/Jenkinsfile

After the break Rob McMahon, redhat Director Cloud EMEA, who had already presented a day before at the Ansible Automates Netherlands event, showed his visions about empowering customers with Ansible.  The future are automated organizations and Ansible is the right tool to automate everything and help to overcome the silo character of cross functional teams.

Ansible Automates Organizations
Another customer case has been shown by Nicolas Christener and Lukas Grossar from Adfinis SyGroup AG. They fully automated the roll out of desktop clients.

Roland Wolters from redhat reveals what is coming in March 2018 with Ansible 2.5

  • Networking now first level connection “plugin”- credentials just like with any other connections
  • Role versioning
  • become on Windows features NT AUTHORITY\System and others
  • new loop mechanism ( no with_loops anymore)
  • fact namespaces

He also explained the differences between Ansible Tower and  the open source project AWX.

The highlight at the end was a live demo which deployed a WordPress environment in the Microsoft Azure cloud. Roland Wolters and Sacha Dubois demonstrated it successfully. Congratulations ! If you have a valid subscription at Microsoft Azure you can try it out https://github.com/sa-ch/ansible-azure-deployment

Ansible deploy WordPress on Microsoft AzureThere has been time for drinks and networking afterwards.
I spoke with Eduard Modalek. The organization was a big challenge, so many people wanted to attend this event and this was the biggest room they could get at such short notice in Zürich. He said that in future there will be more events like this, and perhaps some workshops with time for the audience to exchange ideas.

 

Cet article redhat event Ansible Automates Zurich est apparu en premier sur Blog dbi services.

SQL Server 2017: Can I run a “standard” rebuild on the index after a Resumable Online Index Rebuild is suspended?

Tue, 2018-03-20 07:18

This article is an additional article to my precedent about “SQL Server 2017: What’s happens with an interruption during a Resumable Online Index Rebuild?

The goal is to see after I suspend the rebuild, if I can run a traditional rebuild without the option RESUME.

 

I start an index rebuild with the option Resumable:

ALTER INDEX [NCI_dbo_Transaction] ON [dbo].[Transaction]
REBUILD WITH (RESUMABLE = ON, ONLINE = ON)
GO

 

In a new query window, I stopped the rebuild:

USE [WideWorldImportersDW]
GO
ALTER INDEX [NCI_dbo_Transaction] ON [dbo].[Transaction] 
PAUSE
GO

 

As you can see, the index rebuild is stopped:
riro2_00

 

I have a look on the index state with the DMV sys.index_resumable_operations:
riro2_01

 

To restart, I run the rebuild without the RESUME or ABORT options:
riro2_02

As you can see, the query does not work and you have this error message:
Msg 10637, Level 16, State 1, Line 14
Cannot perform this operation on ‘object’ with ID 430624577 as one or more indexes are currently in resumable index rebuild state. Please refer to sys.index_resumable_operations for more details.

The standard rebuild will not change the status of the resumable rebuild and begin from scratch in order to do the rebuild from beginning.
Finally, the only way to rebuild the index is really to use the RESUME option and continue where it’s stopped.

 

Cet article SQL Server 2017: Can I run a “standard” rebuild on the index after a Resumable Online Index Rebuild is suspended? est apparu en premier sur Blog dbi services.

SQL Server 2017: What’s happens with an interruption during a Resumable Online Index Rebuild?

Tue, 2018-03-20 06:14

Last week, I presented this option in our internal event, the dbi xChange.

My colleague David Barbarin asked me this question, what’s append if an interruption occurs (reboot, SQL Server service stopped,…) during a Resumable Online Index Rebuild?

I was not able to answer because I forgot to do this test.

The new enhancement for an index rebuild provides the capability to suspend, resume or abort an online index operation (msdn reference). I always use the option PAUSE to see how it works but never with a break.

Now I do the test to have the answer. It’s never too late! ;-)

To do my demo, I downloaded the Microsoft example database WideWorldImportersDW from github.

I choose the index NCI_dbo_Transaction in the Transaction table.

riro_00

As you can see with this query, the index is big enough to have the time to interrupt the rebuild.

I run the rebuild with this command:

ALTER INDEX [NCI_dbo_Transaction] ON [dbo].[Transaction] 
REBUILD WITH (RESUMABLE = ON, ONLINE = ON)
GO

After killing the SQL Server task in the task manager, I have this message in SSMS:
riro_01
Of course, the server is no more available…

I start the SQL Server service and have a look on the index state with the DMV sys.index_resumable_operations:
riro_02
As you can see the index is on state PAUSE with 19% completed.

To restart, I run the rebuild with the RESUME option:
riro_03

If the index rebuild is interrupted, you don’t need to restart from scratch.
The rebuild is in a PAUSE state and then you can decide or not to continue the rebuild.
Very clever option, isn’t it?

Don’t hesitate to go on David’s blog to read how this option works with AlwaysOn:
https://blog.dbi-services.com/sql-server-2017-high-availability-and-resumable-online-indexes/

 

Cet article SQL Server 2017: What’s happens with an interruption during a Resumable Online Index Rebuild? est apparu en premier sur Blog dbi services.

Finally you will be able to use exit and quit in psql

Tue, 2018-03-20 05:39

When giving the PostgreSQL DBA Essentials workshop one of the main issues people have is how they can exit psql. Even on stackoverflow this is a popular topic. The good news for people who still don’t like to use “\q” here is the commit that will add additional options to quit/exit from psql.

Up to PostgreSQL 10 what you can usually see is something like this:

postgres@pgbox:/home/postgres/ [PG10] psql -X postgres
psql (10.0)
Type "help" for help.

postgres=# exit
postgres-# exit
postgres-# quit
postgres-# let me out, what do I need to to?
postgres-# 

Starting with PostgreSQL 11 you can either use “quit”:

postgres@pgbox:/home/postgres/ [PGDEV] psql -X postgres
psql (11devel)
Type "help" for help.

postgres=# quit
postgres@pgbox:/home/postgres/ [PGDEV] 

… or “exit”:

postgres@pgbox:/home/postgres/ [PGDEV] psql -X postgres
psql (11devel)
Type "help" for help.

postgres=# exit
postgres@pgbox:/home/postgres/ [PGDEV] 

I am pretty sure MacBook users will love that :)

 

Cet article Finally you will be able to use exit and quit in psql est apparu en premier sur Blog dbi services.

Result cache invalidation caused by DML locks

Sun, 2018-03-18 16:03

In a previous post about Result Cache, I was using ‘set autotrace on’ in my session to see when the result cache hit saves the logical reads. I got a comment, by e-mail and with full test case, mentioning that the invalidation I had when doing some DML does not occur when autotrace is set to on. I reproduced the problem, on 18c because I use my cloud services to do my tests, but it works the same way in 12c.

I’ll focus here on DML which does not modify any rows from the table the result depends on, because this is the case that depends on the session transaction context. When rows are modified, the result is always invalidated.

For each test here I’ll flush the result cache:

SQL> exec if dbms_result_cache.flush then dbms_output.put_line('Flushed.'); end if;
PL/SQL procedure successfully completed.
 
SQL> select id,type,status,name,cache_id,invalidations from v$result_cache_objects order by id
no rows selected

And I begin each test with the result cache loaded by a query.

DML on the dependent table

The result or the query is loaded into the cache, with a dependency on the SCOTT.DEPT table:

SQL> select id,type,status,name,cache_id,invalidations from v$result_cache_objects order by id
 
ID TYPE STATUS NAME CACHE_ID INVALIDATIONS
-- ---- ------ ---- -------- -------------
0 Dependency Published SCOTT.DEPT SCOTT.DEPT 0
1 Result Published select /*+ result_cache */ count(*) from DEPT 50vtzqa5u0xcy4bnh6z600nj1u 0

Now, I’m executing some DML on this DEPT table:

SQL> connect scott/tiger@//localhost/PDB1
Connected.
SQL> delete from DEPT where DEPTNO>40;
0 rows deleted.

This delete does not touch any row, but declares the intention to modify the table with a Row eXclusive lock (TM mode=3):

SQL> select * from v$locked_object natural join (select object_id,object_name from user_objects)
 
OBJECT_ID XIDUSN XIDSLOT XIDSQN SESSION_ID ORACLE_USERNAME OS_USER_NAME PROCESS LOCKED_MODE CON_ID OBJECT_NAME
--------- ------ ------- ------ ---------- --------------- ------------ ------- ----------- ------ -----------
73376 0 0 0 21 SCOTT oracle 7772 3 5 DEPT

Note that the transaction ID is all zeros. Logically, we are in a transaction, as we have a lock that will be released only at the end of the transaction (commit or rollback). But, as we didn’t modify any block yet, there is no entry in the transaction table:

SQL> select xidusn,xidslot,xidsqn,start_time,used_urec from v$transaction
no rows selected

I commit this transaction:

SQL> commit;
Commit complete.

and check the result cache objects:

SQL> select id,type,status,name,cache_id,invalidations from v$result_cache_objects order by id
 
ID TYPE STATUS NAME CACHE_ID INVALIDATIONS
-- ---- ------ ---- -------- -------------
0 Dependency Published SCOTT.DEPT SCOTT.DEPT 0
1 Result Published select /*+ result_cache */ count(*) from DEPT 50vtzqa5u0xcy4bnh6z600nj1u 0

This look good: nothing was modified so there is no reason to invalidate the result. However, in the previous post I referenced, a similar test invalidated the cache even when no rows were touched. And a reader remarked that this occured only when I previously run a query with ‘autotrace on’. Without autotrace, the behaviour is like what I show here above: no invalidation when no rows is modified.

Same in an already started transaction

The difference is that the ‘autotrace on’ runs an ‘explain plan’, filling the plan table, which means that a transaction was already started. Here I run autotrace on a completely different query:

SQL> set autotrace on explain
Autotrace Enabled
Displays the execution plan only.
SQL> select * from dual;
DUMMY
-----
X
 
Explain Plan
-----------------------------------------------------------
PLAN_TABLE_OUTPUT
Plan hash value: 272002086
 
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 2 | 2 (0)| 00:00:01 |
| 1 | TABLE ACCESS FULL| DUAL | 1 | 2 | 2 (0)| 00:00:01 |
--------------------------------------------------------------------------
 
SQL> set autotrace off;
Autotrace Disabled

Following this, I have no locked objects, but a transaction has been initiated:

SQL> select * from v$locked_object natural join (select object_id,object_name from user_objects)
no rows selected
 
SQL> select xidusn,xidslot,xidsqn,start_time,used_urec from v$transaction
XIDUSN XIDSLOT XIDSQN START_TIME USED_UREC
------ ------- ------ ---------- ---------
9 18 709 03/18/18 20:30:09 2

Here is the big difference: I have a transaction ID here.
Now doing the same as before, a delete touching no rows:

SQL> delete from DEPT where DEPTNO>40;
0 rows deleted.

When I query the lock objects, they are now associated to a non-zero transaction ID:

SQL> select * from v$locked_object natural join (select object_id,object_name from user_objects)
 
OBJECT_ID XIDUSN XIDSLOT XIDSQN SESSION_ID ORACLE_USERNAME OS_USER_NAME PROCESS LOCKED_MODE CON_ID OBJECT_NAME
--------- ------ ------- ------ ---------- --------------- ------------ ------- ----------- ------ -----------
73376 9 18 709 21 SCOTT oracle 7772 3 5 DEPT

Here is the difference, now the result cache has been invalidated:

SQL> commit;
Commit complete.
 
SQL> select id,type,status,name,cache_id,invalidations from v$result_cache_objects order by id
 
ID TYPE STATUS NAME CACHE_ID INVALIDATIONS
-- ---- ------ ---- -------- -------------
0 Dependency Published SCOTT.DEPT SCOTT.DEPT 1
1 Result Invalid select /*+ result_cache */ count(*) from DEPT 50vtzqa5u0xcy4bnh6z600nj1u 0

DML on another table – RS though referential integrity

CaptureLocksThis gives the idea that the invalidation is not really triggered by actual modifications, but at commit time from the DML locks when associated with a transaction.
And some DML on tables may acquire a Row-S or Row-X lock on the tables linked by referential integrity. This has changed a lot though versions – look at the slide on the right.

I start in the same situation, with the result cache loaded, no locked objects, but a transaction that has started:

SQL> select id,type,status,name,cache_id,invalidations from v$result_cache_objects order by id
 
ID TYPE STATUS NAME CACHE_ID INVALIDATIONS
-- ---- ------ ---- -------- -------------
0 Dependency Published SCOTT.DEPT SCOTT.DEPT 0
1 Result Published select /*+ result_cache */ count(*) from DEPT 50vtzqa5u0xcy4bnh6z600nj1u 0
 
SQL> select * from v$locked_object natural join (select object_id,object_name from user_objects)
no rows selected
 
SQL> select xidusn,xidslot,xidsqn,start_time,used_urec from v$transaction
 
XIDUSN XIDSLOT XIDSQN START_TIME USED_UREC
------ ------- ------ ---------- ---------
8 31 766 03/18/18 20:30:10 2

I delete from the EMP table and if you are familiar with the SCOTT schema, you know that it has a foreign key to DEPT.

SQL> delete from EMP where DEPTNO>=40;
0 rows deleted.

The delete acquires a Row eXclusive lock on EMP even when there are no rows deleted (DML locks are about the intention to modify rows). And from the table above, it acquires a Row Share (mode=2) on the parent table:

SQL> select * from v$locked_object natural join (select object_id,object_name from user_objects)
 
OBJECT_ID XIDUSN XIDSLOT XIDSQN SESSION_ID ORACLE_USERNAME OS_USER_NAME PROCESS LOCKED_MODE CON_ID OBJECT_NAME
--------- ------ ------- ------ ---------- --------------- ------------ ------- ----------- ------ -----------
73376 8 31 766 21 SCOTT oracle 7772 2 5 DEPT
73378 8 31 766 21 SCOTT oracle 7772 3 5 EMP

I commit and check the result cache:

SQL> commit;
Commit complete.
 
SQL> select id,type,status,name,cache_id,invalidations from v$result_cache_objects order by id
 
ID TYPE STATUS NAME CACHE_ID INVALIDATIONS
-- ---- ------ ---- -------- -------------
0 Dependency Published SCOTT.DEPT SCOTT.DEPT 0
1 Result Published select /*+ result_cache */ count(*) from DEPT 50vtzqa5u0xcy4bnh6z600nj1u 0

No invalidation here, which makes sense because Row Share is not an intention to write.

DML on another table – RS though referential integrity

I do the same here, but with an insert on EMP which acquires a Row eXclusive on the parent table.

SQL> insert into EMP(EMPNO,DEPTNO)values(9999,40);
1 row inserted.
 
SQL> select * from v$locked_object natural join (select object_id,object_name from user_objects)
 
OBJECT_ID XIDUSN XIDSLOT XIDSQN SESSION_ID ORACLE_USERNAME OS_USER_NAME PROCESS LOCKED_MODE CON_ID OBJECT_NAME
--------- ------ ------- ------ ---------- --------------- ------------ ------- ----------- ------ -----------
73376 10 32 560 21 SCOTT oracle 7772 3 5 DEPT
73378 10 32 560 21 SCOTT oracle 7772 3 5 EMP
 
SQL> select xidusn,xidslot,xidsqn,start_time,used_urec from v$transaction
 
XIDUSN XIDSLOT XIDSQN START_TIME USED_UREC
------ ------- ------ ---------- ---------
10 32 560 03/18/18 20:30:10 4
 
SQL> commit;
Commit complete.
 
SQL> select id,type,status,name,cache_id,invalidations from v$result_cache_objects order by id
---- ------ ------------------------------------------ ---- ---------------------- ----- -- --
ID TYPE STATUS NAME CACHE_ID INVALIDATIONS
0 Dependency Published SCOTT.DEPT SCOTT.DEPT 1
1 Result Invalid select /*+ result_cache */ count(*) from DEPT 50vtzqa5u0xcy4bnh6z600nj1u 0

Here, DEPT has been invalidated after the insert on EMP. There were no modifications on DEPT, but the result cache is not directly tracking the modifications, but rather the intention of modification. And then, the implementation of the result cache invalidation tracks, at commit, the Row eXclusive locks when they are related to a know transaction. You can have the same result if, from a transaction that has already started, you run a simple:

SQL> lock table DEPT in row exclusive mode;
Lock succeeded.

This means that there are many reasons why the result cache may be invalid even when the objects in the ‘Dependency’ are not modified. Be careful, invalidations and cache misses are expensive and do not scale on high load. And because of locks through referential integrity, this can happen even on static tables. Let’s take an example. in an order entry system, you may think that the ‘products’ table is a good candidate for result cache – updated twice a year but read all the times. But now that you know that inserts on child tables, such as the order table, will invalidate this cache, you may think again about it.

 

Cet article Result cache invalidation caused by DML locks est apparu en premier sur Blog dbi services.

Automatic Block Media Recovery in a DataGuard

Sat, 2018-03-17 13:39

With Oracle 12.2, in a Data Guard environment corrupted data blocks can be automatically replaced with uncorrupted copies of those blocks.
There are just some requirements:
• The physical standby database must be operating in real-time query mode, which requires an Oracle Active Data Guard license.
• The physical standby database must be running real-time apply.
Automatic block media recovery works in two directions depending on whether the corrupted blocks are encountered on the primary or on the standby.
This recovery can happen if corrupted data are encountered in the primary database or in the secondary database.
In this blog we are going to see how this feature works.
Below the configuration we are using

DGMGRL> show configuration;

Configuration - ORCL_DR

  Protection Mode: MaxAvailability
  Members:
  ORCL_SITE  - Primary database
    ORCL_SITE1 - Physical standby database
    ORCL_SITE2 - Physical standby database

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS   (status updated 19 seconds ago)

DGMGRL>

We can verifiy the status of our databases

DGMGRL> show database  'ORCL_SITE' ;

Database - ORCL_SITE

  Role:               PRIMARY
  Intended State:     TRANSPORT-ON
  Instance(s):
    ORCL

Database Status:
SUCCESS

DGMGRL> show database  'ORCL_SITE1' ;

Database - ORCL_SITE1

  Role:               PHYSICAL STANDBY
  Intended State:     APPLY-ON
  Transport Lag:      0 seconds (computed 1 second ago)
  Apply Lag:          0 seconds (computed 1 second ago)
  Average Apply Rate: 5.00 KByte/s
  Real Time Query:    ON
  Instance(s):
    ORCL

Database Status:
SUCCESS

DGMGRL> show database  'ORCL_SITE2' ;

Database - ORCL_SITE2

  Role:               PHYSICAL STANDBY
  Intended State:     APPLY-ON
  Transport Lag:      0 seconds (computed 1 second ago)
  Apply Lag:          0 seconds (computed 1 second ago)
  Average Apply Rate: 4.00 KByte/s
  Real Time Query:    ON
  Instance(s):
    ORCL

Database Status:
SUCCESS

DGMGRL>

The feature works for any protection mode, but in our case the tests are done with a MaxAvailability mode

If corrupt data blocks are on the primary database, then the primary automatically searches for good copies of those blocks on a standby and, if they are found, has them shipped back to the primary.
This only requirement is that the primary requires a LOG_ARCHIVE_DEST_n to the standby. That already should be the case in a Data Guard environment.

If corrupted block is located on the standby, the standby will automatically request uncorrupted copies of those blocks to the primary. The condition for this mechanism to work is
• The LOG_ARCHIVE_CONFIG parameter is configured with a DG_CONFIG list and a LOG_ARCHIVE_DEST_n parameter is configured for the primary database.
or
• The FAL_SERVER parameter is configured and its value contains an Oracle Net service name for the primary database.

In this demonstration we will simulate data corruption in the primary database. But the scenario is the same for a corrupted blocks at standby side and will work same.
From the primary we can verify that LOG_ARCHIVE_DEST_n is set.

SQL> select dest_name,DESTINATION,status from v$archive_dest where destination is not null;

DEST_NAME            DESTINATION                    STATUS
-------------------- ------------------------------ ---------
LOG_ARCHIVE_DEST_1   USE_DB_RECOVERY_FILE_DEST      VALID
LOG_ARCHIVE_DEST_2   ORCL_SITE1                     VALID
LOG_ARCHIVE_DEST_3   ORCL_SITE2                     VALID

For the demonstration let’s consider a table of user SCOTT in a tablespace mytab

SQL> select table_name,tablespace_name from dba_tables where owner='SCOTT' and table_name='EMPBIS';

TABLE_NAME      TABLESPACE_NAME
--------------- ------------------------------
EMPBIS          MYTAB

SQL>

Now let’s identify blocks for the table on the primary and let’s corrupt them.

SQL> select * from (select distinct dbms_rowid.rowid_block_number(rowid)  from scott.empbis);

DBMS_ROWID.ROWID_BLOCK_NUMBER(ROWID)
------------------------------------
                                 131

SQL>

And then let’s run following command to corrupt corresponding blocks.

[oracle@primaserver ORCL]$ dd of=/u01/app/oracle/oradata/ORCL/mytab01.dbf bs=8192 seek=131 conv=notrunc count=1 if=/dev/zero
1+0 records in
1+0 records out
8192 bytes (8.2 kB) copied, 0.000315116 s, 26.0 MB/s
[oracle@primaserver ORCL]$

In a normal environment, accessing to corrupted data by a SELECT will return errors.
But in our case on the primary if we flush the buffer_cache, and and we do a select on the table, rows are returned without errors.

13:41:18 SQL> alter system flush buffer_cache;

System altered.

13:41:22 SQL> select * from scott.empbis;

     EMPNO ENAME      JOB              MGR HIREDATE         SAL       COMM
---------- ---------- --------- ---------- --------- ---------- ----------
    DEPTNO
----------
      7369 SMITH      CLERK           7902 17-DEC-80        800
        20

      7499 ALLEN      SALESMAN        7698 20-FEB-81       1600        300
        30

      7521 WARD       SALESMAN        7698 22-FEB-81       1250        500
        30


     EMPNO ENAME      JOB              MGR HIREDATE         SAL       COMM
---------- ---------- --------- ---------- --------- ---------- ----------
    DEPTNO
----------
      7566 JONES      MANAGER         7839 02-APR-81       2975
        20

      7654 MARTIN     SALESMAN        7698 28-SEP-81       1250       1400
        30

      7698 BLAKE      MANAGER         7839 01-MAY-81       2850
        30


     EMPNO ENAME      JOB              MGR HIREDATE         SAL       COMM
---------- ---------- --------- ---------- --------- ---------- ----------
    DEPTNO
----------
      7782 CLARK      MANAGER         7839 09-JUN-81       2450
        10

      7788 SCOTT      ANALYST         7566 19-APR-87       3000
        20

      7839 KING       PRESIDENT            17-NOV-81       5000
        10


     EMPNO ENAME      JOB              MGR HIREDATE         SAL       COMM
---------- ---------- --------- ---------- --------- ---------- ----------
    DEPTNO
----------
      7844 TURNER     SALESMAN        7698 08-SEP-81       1500          0
        30

      7876 ADAMS      CLERK           7788 23-MAY-87       1100
        20

      7900 JAMES      CLERK           7698 03-DEC-81        950
        30


     EMPNO ENAME      JOB              MGR HIREDATE         SAL       COMM
---------- ---------- --------- ---------- --------- ---------- ----------
    DEPTNO
----------
      7902 FORD       ANALYST         7566 03-DEC-81       3000
        20

      7934 MILLER     CLERK           7782 23-JAN-82       1300
        10


14 rows selected.

13:41:27 SQL>

Indeed Oracle automatically recovered corrupted blocks. And in the primary alert log at the same time we can see that a recovery was done.

ALTER SYSTEM: Flushing buffer cache inst=0 container=0 global
2018-01-26T13:41:26.540640+01:00
Hex dump of (file 2, block 131) in trace file /u01/app/oracle/diag/rdbms/orcl_site/ORCL/trace/ORCL_ora_3359.trc

Corrupt block relative dba: 0x00800083 (file 2, block 131)
Completely zero block found during multiblock buffer read

Reading datafile '/u01/app/oracle/oradata/ORCL/mytab01.dbf' for corruption at rdba: 0x00800083 (file 2, block 131)
Reread (file 2, block 131) found same corrupt data (no logical check)
Automatic block media recovery requested for (file# 2, block# 131)
2018-01-26T13:41:26.545798+01:00
Corrupt Block Found
         TIME STAMP (GMT) = 01/26/2018 13:41:25
         CONT = 0, TSN = 7, TSNAME = MYTAB
         RFN = 2, BLK = 131, RDBA = 8388739
         OBJN = 74352, OBJD = 74352, OBJECT = EMPBIS, SUBOBJECT =
         SEGMENT OWNER = SCOTT, SEGMENT TYPE = Table Segment
2018-01-26T13:41:27.002934+01:00
Automatic block media recovery successful for (file# 2, block# 131)
2018-01-26T13:41:27.005015+01:00
Automatic block media recovery successful for (file# 2, block# 131)

Conclusion:
We have seen, that an active DataGuard environment may help with corrupted data.

 

Cet article Automatic Block Media Recovery in a DataGuard est apparu en premier sur Blog dbi services.

Connect-times to the DB suddenly become very slow using sqlcl

Mon, 2018-03-12 06:26

I recently wrote a couple of sql-scripts which had to run on all of my customer’s DBs. The sql-scripts had to be started from a Linux-client, which does not have any Oracle client software installed. So I thought of using sqlcl (see http://www.oracle.com/technetwork/developer-tools/sqlcl/downloads/index.html), because there is no need to “install” something then. All I needed was an installed JRE on the Linux-machine. Fortunately that was available. So I downloaded the newest version of sqlcl and unzipped it. Initially I had an issue with the timezone when connecting:


ORA-00604: error occurred at recursive SQL level 1
ORA-01882: timezone region not found

I could workaround this by adding “AddVMOption -Doracle.jdbc.timezoneAsRegion=false” in <sqlcl-home>/bin/sql:


#
# set up the main arguments for java.
#
function setupArgs {
...
AddVMOption -Doracle.jdbc.timezoneAsRegion=false
....

When looping over the databases the connect-times suddenly became very slow. I had to wait for up to 2 minutes to setup a connection to the database. After doing strace on the sqlcl-connects I saw the reason for the slow connects:

The Oracle JDBC driver requires a random number for encrypting the connect string. By default this random number is generated from /dev/random. If the number of entropy-entries (cat /proc/sys/kernel/random/entropy_avail) becomes too low, the access to /dev/random is blocked until enough entropy-entries are available again. The number of entropy-entries is generated through some random noise on the machine (drivers for keyboard, mouse, network, etc. generate the entries). If there is no driver-activity then the entropy-pool may become empty.

The workaround is to
– either artificially generate some driver load (examples on how to do it are in referenced links at the end of the blog)
– or use /dev/urandom instead of /dev/random

The first workaround was not feasible in my case. So I had to use the more unsecure workaround of using /dev/urandom. I.e. I updated <sqlcl-home>/bin/sql again and added “AddVMOption -Djava.security.egd=file:///dev/urandom”:


#
# set up the main arguments for java.
#
function setupArgs {
...
AddVMOption -Djava.security.egd=file:///dev/urandom
....

Afterwards my sqlcl-connects were fast again.

See the following links for more info:

http://www.usn-it.de/index.php/2009/02/20/oracle-11g-jdbc-driver-hangs-blocked-by-devrandom-entropy-pool-empty
https://stackoverflow.com/questions/2327220/oracle-jdbc-intermittent-connection-issue
https://oraganism.wordpress.com/2015/09/20/slow-jdbc-connections-strace-and-random-numbers
http://www.oracle.com/technetwork/database/database-appliance/documentation/oda-eval-comparing-performance-1895230.pdf –> see page 7: “Specify non-blocking random number generators”

Remark: I updated the Oracle Community entry https://community.oracle.com/thread/4073170 with that information as well.

 

Cet article Connect-times to the DB suddenly become very slow using sqlcl est apparu en premier sur Blog dbi services.

18c PDB switchover

Sun, 2018-03-11 08:58

In multitenant, the recovery and availability are at CDB level. But customers asked for a switchover at PDB level so Oracle has done that in 18c, based on refreshable PDBs.

For this test I have two multitenant database on an Oracle Cloud service in 18c: CDB1 and CDB2. The only special thing I did was disable the mandatory TDE encryption, because I was not able to have the switchover working. With TDE encryption, I got the “ORA-46697: Keystore password required”. But there is no ‘keystore identified by’ option in the ‘alter pluggable database’. Then If you came upon this post from a search on this error, I’ve no solution yet (SR 3-17001228251 opened on the Oracle Cloud Support ).

Creating the source PDB

In CDB1 I create CDB1DEMO pluggable database:

21:06:06 //localhost/CDB1 SQL>create pluggable database CDB1DEMO admin user admin identified by oracle file_name_convert=('/pdbseed/','/CDB1DEMO/');
Pluggable database CDB1DEMO created.

I could use ‘show pdb’ but I use SQLcl and the current version (17.4) does not recognize Oracle 18c as a container database, so I’ve created my own alias to run a query onV$PDBS and DBA_PDBS:

21:06:12 //localhost/CDB1 SQL>select con_id,name con_name,open_mode,restricted,status,foreign_cdb_dbid,foreign_pdb_id,to_char(creation_time, 'hh24:mi:ss') creation_time,refresh_mode,refresh_interval,last_refresh_scn,cdb_dbid,current_scn from v$pdbs right outer join dba_pdbs using(con_id,creation_time,dbid) cross join (select dbid cdb_dbid,current_scn from v$database) order by pdb_id;
 
CON_ID CON_NAME OPEN_MODE RESTRICTED STATUS FOREIGN_CDB_DBID FOREIGN_PDB_ID CREATION_TIME REFRESH_MODE REFRESH_INTERVAL LAST_REFRESH_SCN CDB_DBID CURRENT_SCN
------ -------- --------- ---------- ------ ---------------- -------------- ------------- ------------ ---------------- ---------------- -------- -----------
2 PDB$SEED READ ONLY NO NORMAL 1214140 2 12:43:46 NONE 944121613 1524290
3 CDB1DEMO MOUNTED NEW 944121613 2 21:06:06 NONE 944121613 1524290

CDB1DEMO is in mount state in V$PDBS and status is NEW because just created. You may wonder why I ‘right join’ here as PDBs known by the database should always be known by the instance. But I said that I opened a SR for switchover with TDE and here I got an inconsistency between V$PDBS and DBA_PDBS.


21:06:12 //localhost/CDB1 SQL>alter pluggable database CDB1DEMO open;
Pluggable database CDB1DEMO altered.
21:06:14 //localhost/CDB1 SQL>select * from pdb_plug_in_violations;
no rows selected
 
21:06:15 //localhost/CDB1 SQL>select con_id,name con_name,open_mode,restricted,status,foreign_cdb_dbid,foreign_pdb_id,to_char(creation_time, 'hh24:mi:ss') creation_time,refresh_mode,refresh_interval,last_refresh_scn,cdb_dbid,current_scn from v$pdbs right join dba_pdbs using(con_id,creation_time,dbid) cross join (select dbid cdb_dbid,current_scn from v$database) order by pdb_id
 
CON_ID CON_NAME OPEN_MODE RESTRICTED STATUS FOREIGN_CDB_DBID FOREIGN_PDB_ID CREATION_TIME REFRESH_MODE REFRESH_INTERVAL LAST_REFRESH_SCN CDB_DBID CURRENT_SCN
------ -------- --------- ---------- ------ ---------------- -------------- ------------- ------------ ---------------- ---------------- -------- -----------
2 PDB$SEED READ ONLY NO NORMAL 1214140 2 12:43:46 NONE 944121613 1524741
3 CDB1DEMO READ WRITE NO NORMAL 944121613 2 21:06:06 NONE 944121613 1524741

The CDB1DEMO PDB is opened READ WRITE. As my goal is to show refreshes, I need to have database with updates. To do it autonomously ;) I create a materialized view refreshing its timestamp every second.


21:06:15 //localhost/CDB1 SQL>create materialized view DEMO refresh complete start with (sysdate) next (sysdate+1/24/60/60) as select current_timestamp "refresh timestamp",current_scn "refresh scn" from v$database;
Materialized view DEMO created.

Here is how I’ll query this autonomous ;) materialized view, comparing the timestamp and SCN at the time of refresh with the current ones:

21:06:16 //localhost/CDB1 SQL>select demo.*,ora_rowscn,current_timestamp,(select current_scn from v$database) current_scn from demo;
 
refresh timestamp refresh scn ORA_ROWSCN CURRENT_TIMESTAMP CURRENT_SCN
------- --------- ------- --- ---------- ----------------- -----------
10-MAR-18 09.06.16.010529000 PM EUROPE/ZURICH 1524747 1524749 10-MAR-18 09.06.16.560146000 PM EUROPE/ZURICH 1524848

Refreshable PDB refreshing every minute

On CDB1 I create the user I’ll use for the remote clone: C##SYSOPER which needs either SYSOPER or CREATE PLUGGABLE DATABASE privilege.

21:06:17 //localhost/CDB1 SQL>connect sys/oracle@//localhost/CDB1 as sysdba
Connected.
21:06:17 //localhost/CDB1 SQL>grant create session, sysoper to C##SYSOPER identified by oracle1 container=all;
Grant succeeded.
 
21:06:18 //localhost/CDB1 SQL>select con_id,name con_name,open_mode,restricted,status,foreign_cdb_dbid,foreign_pdb_id,to_char(creation_time, 'hh24:mi:ss') creation_time,refresh_mode,refresh_interval,last_refresh_scn,cdb_dbid,current_scn from v$pdbs right join dba_pdbs using(con_id,creation_time,dbid) cross join (select dbid cdb_dbid,current_scn from v$database) order by pdb_id
 
CON_ID CON_NAME OPEN_MODE RESTRICTED STATUS FOREIGN_CDB_DBID FOREIGN_PDB_ID CREATION_TIME REFRESH_MODE REFRESH_INTERVAL LAST_REFRESH_SCN CDB_DBID CURRENT_SCN
------ -------- --------- ---------- ------ ---------------- -------------- ------------- ------------ ---------------- ---------------- -------- -----------
2 PDB$SEED READ ONLY NO NORMAL 1214140 2 12:43:46 NONE 944121613 1524907
3 CDB1DEMO READ WRITE NO NORMAL 944121613 2 21:06:06 NONE 944121613 1524907

On CDB2 I create a database link to connect to this CDB1 user.

21:06:18 //localhost/CDB1 SQL>connect sys/oracle@//localhost/CDB2 as sysdba
Connected.
21:06:18 //localhost/CDB2 SQL>create database link CDB1@SYSOPER connect to C##SYSOPER identified by oracle1 using '//localhost/CDB1'
Database link CDB1@SYSOPER created.
 
21:06:18 //localhost/CDB2 SQL>select con_id,name con_name,open_mode,restricted,status,foreign_cdb_dbid,foreign_pdb_id,to_char(creation_time, 'hh24:mi:ss') creation_time,refresh_mode,refresh_interval,last_refresh_scn,cdb_dbid,current_scn from v$pdbs right join dba_pdbs using(con_id,creation_time,dbid) cross join (select dbid cdb_dbid,current_scn from v$database) order by pdb_id
 
CON_ID CON_NAME OPEN_MODE RESTRICTED STATUS FOREIGN_CDB_DBID FOREIGN_PDB_ID CREATION_TIME REFRESH_MODE REFRESH_INTERVAL LAST_REFRESH_SCN CDB_DBID CURRENT_SCN
------ -------- --------- ---------- ------ ---------------- -------------- ------------- ------------ ---------------- ---------------- -------- -----------
2 PDB$SEED READ ONLY NO NORMAL 1214140 2 12:53:21 NONE 717451787 1522015

Here is the remote clone creating CDB2DEMO from CDB1DEMO, as a refreshable PDB, automatically refreshed every 1 minute (when it is in MOUNT):

21:06:18 //localhost/CDB2 SQL>create pluggable database CDB2DEMO from CDB1DEMO@CDB1@SYSOPER file_name_convert=('/CDB1/','/CDB2/','/CDB1DEMO/','/CDB2DEMO/') refresh mode every 1 minutes;
Pluggable database CDB2DEMO created.

This is not new, we got it in 12cR2 and If you go to Collaborate 18 in Las Vegas next month I’ll demo it: https://app.attendcollaborate.com/event/member/448410 , with all transportable tablespace and pluggable databases data movement.

From the alert.log you can see the clone and one first refresh:

21:06:28 //localhost/CDB2 SQL>host ssh oracle@localhost 'ORACLE_HOME=/u01/app/oracle/product/18.0.0/dbhome_1 /u01/app/oracle/product/18.0.0/dbhome_1/bin/adrci exec="set home diag/rdbms/cdb2/CDB2; show alert -tail 30" '| grep --color -E "(^Completed:|^alter pluggable.*|^create pluggable.*|Media Recovery|onlined Undo|KILL|^Pluggable.*|^)"
2018-03-10 21:06:18.317000 +01:00
create pluggable database CDB2DEMO from CDB1DEMO@CDB1@SYSOPER file_name_convert=('/CDB1/','/CDB2/','/CDB1DEMO/','/CDB2DEMO/') refresh mode every 1 minutes keystore identified by *
Opatch validation is skipped for PDB CDB2DEMO (con_id=4)
2018-03-10 21:06:25.942000 +01:00
Endian type of dictionary set to little
****************************************************************
Pluggable Database CDB2DEMO with pdb id - 4 is created as UNUSABLE.
If any errors are encountered before the pdb is marked as NEW,
then the pdb must be dropped
local undo-1, localundoscn-0x00000000000000fb
****************************************************************
2018-03-10 21:06:27.413000 +01:00
Applying media recovery for pdb-4099 from SCN 1524926 to SCN 1525064
Remote log information: count-1
thr-1, seq-2, logfile-/u01/fast_recovery_area/CDB1/foreign_archivelog/CDB1DEMO/2018_03_10/o1_mf_1_2_212315018_.arc, los-1497297, nxs-18446744073709551615
Media Recovery Start
Serial Media Recovery started
max_pdb is 4
Media Recovery Log /u01/fast_recovery_area/CDB1/foreign_archivelog/CDB1DEMO/2018_03_10/o1_mf_1_2_212315018_.arc
Incomplete Recovery applied until change 1525064 time 03/10/2018 21:06:26
Media Recovery Complete (CDB2)
Completed: create pluggable database CDB2DEMO from CDB1DEMO@CDB1@SYSOPER file_name_convert=('/CDB1/','/CDB2/','/CDB1DEMO/','/CDB2DEMO/') refresh mode every 1 minutes keystore identified by *
alter pluggable database refresh

You can’t open this one in READ WRITE as it is refreshed with the redo from the source, as you see in the alert.log, but you can open it in READ ONLY to query it or to clone it further:

21:06:28 //localhost/CDB2 SQL>alter pluggable database CDB2DEMO open read only;
Pluggable database CDB2DEMO altered.
 
21:06:32 //localhost/CDB2 SQL>alter session set container=CDB2DEMO;
Session altered.
 
21:06:32 //localhost/CDB2 SQL>select demo.*,ora_rowscn,current_timestamp,(select current_scn from v$database) current_scn from demo;
 
refresh timestamp refresh scn ORA_ROWSCN CURRENT_TIMESTAMP CURRENT_SCN
------- --------- ------- --- ---------- ----------------- -----------
10-MAR-18 09.06.27.140229000 PM +01:00 1525071 1525072 10-MAR-18 09.06.32.565600000 PM EUROPE/ZURICH 1525100

if you look at the timestamps, you can see that it is in sync from the source as of the time of the end of creation. The alert.log shows that a refresh happened just after the creation completion.

21:06:39 //localhost/CDB2 SQL>host ssh oracle@localhost 'ORACLE_HOME=/u01/app/oracle/product/18.0.0/dbhome_1 /u01/app/oracle/product/18.0.0/dbhome_1/bin/adrci exec="set home diag/rdbms/cdb2/CDB2; show alert -tail 30" '| grep --color -E "(^Completed:|^alter pluggable.*|^create pluggable.*|Media Recovery|onlined Undo|KILL|^Pluggable.*|^)"
2018-03-10 21:06:28.674000 +01:00
alter pluggable database CDB2DEMO open read only
Applying media recovery for pdb-4099 from SCN 1525064 to SCN 1525083
Remote log information: count-1
thr-1, seq-2, logfile-/u01/fast_recovery_area/CDB1/foreign_archivelog/CDB1DEMO/2018_03_10/o1_mf_1_2_212315018_.arc, los-1497297, nxs-18446744073709551615
Media Recovery Start
Serial Media Recovery started
2018-03-10 21:06:29.721000 +01:00
max_pdb is 4
Media Recovery Log /u01/fast_recovery_area/CDB1/foreign_archivelog/CDB1DEMO/2018_03_10/o1_mf_1_2_212315018_.arc
Incomplete Recovery applied until change 1525083 time 03/10/2018 21:06:28
Media Recovery Complete (CDB2)
Completed: alter pluggable database refresh
Autotune of undo retention is turned on.
2018-03-10 21:06:30.880000 +01:00
Undo initialization finished serial:0 start:4386360 end:4386360 diff:0 ms (0.0 seconds)
Database Characterset for CDB2DEMO is AL32UTF8
2018-03-10 21:06:32.305000 +01:00
Opening pdb with no Resource Manager plan active
Pluggable database CDB2DEMO opened read only
Completed: alter pluggable database CDB2DEMO open read only

The refresh can happen only when the PDB is in MOUNT. If it was opened (READ ONLY) for a long time, it will have to retreive some archive logs. This is why you can see FOREIGN ARCHIVED LOG in your recovery area in 12cR2.

So, basically here you have a kind of standby database at PDB level refreshed asynchonously, that you can open when you want:

21:06:32 //localhost/CDB2 SQL>alter session set container=CDB$ROOT;
Session altered.
21:06:32 //localhost/CDB2 SQL>alter pluggable database CDB2DEMO close;
Pluggable database CDB2DEMO altered.
21:06:39 //localhost/CDB2 SQL>alter pluggable database CDB2DEMO open read only;
Pluggable database CDB2DEMO altered.
21:06:40 //localhost/CDB2 SQL>alter session set container=CDB2DEMO;
Session altered.
21:06:40 //localhost/CDB2 SQL>select demo.*,ora_rowscn,current_timestamp,(select current_scn from v$database) current_scn from demo;
 
refresh timestamp refresh scn ORA_ROWSCN CURRENT_TIMESTAMP CURRENT_SCN
------- --------- ------- --- ---------- ----------------- -----------
10-MAR-18 09.06.27.140229000 PM +01:00 1525071 1525072 10-MAR-18 09.06.40.159432000 PM EUROPE/ZURICH 1525117
 
21:06:40 //localhost/CDB2 SQL>alter session set container=CDB$ROOT;
Session altered.

You will probably do a last refresh before opening it, but I left it on purpose to show that the switchover will ensure that they are in sync.

In 12cR2 if you want to ‘switchover’, you need to stop modifications to CDB1DEMO, alter CDB2DEMO in REFRESH NONE, then open it READ WRITE to be the new ‘primary’ after a last refresh, and drop CDB1DEMO to create it as a refreshable clone from CDB2DEMO. And this is where comes the new 18c PDB Switchover: one command to do all this.

Note that I leave the CDB2DEMO in opened (READ ONLY) or the switchover will fail with ORA-17628: Oracle error 65036 returned by remote Oracle server, ORA-65036: pluggable database not open in required mode.


21:06:40 //localhost/CDB2 SQL>select con_id,name con_name,open_mode,restricted,status,foreign_cdb_dbid,foreign_pdb_id,to_char(creation_time, 'hh24:mi:ss') creation_time,refresh_mode,refresh_interval,last_refresh_scn,cdb_dbid,current_scn from v$pdbs right join dba_pdbs using(con_id,creation_time,dbid) cross join (select dbid cdb_dbid,current_scn from v$database) order by pdb_id
 
CON_ID CON_NAME OPEN_MODE RESTRICTED STATUS FOREIGN_CDB_DBID FOREIGN_PDB_ID CREATION_TIME REFRESH_MODE REFRESH_INTERVAL LAST_REFRESH_SCN CDB_DBID CURRENT_SCN
------ -------- --------- ---------- ------ ---------------- -------------- ------------- ------------ ---------------- ---------------- -------- -----------
2 PDB$SEED READ ONLY NO NORMAL 1214140 2 12:53:21 NONE 717451787 1525131
4 CDB2DEMO READ ONLY NO REFRESHING 944121613 3 21:06:18 AUTO 1 1525083 717451787 1525131

PDB Switchover

As the goal is to change roles, I need a SYSOPER user and a database link on the other sides:

A user to connect to CDB2 which is the source we will refresh from after the switchover:

21:06:40 //localhost/CDB2 SQL>connect sys/oracle@//localhost/CDB2 as sysdba
Connected.
21:06:40 //localhost/CDB2 SQL>grant create session, sysoper to C##SYSOPER identified by oracle2 container=all;
Grant succeeded.

A database link from the CDB1 which will become the refreshing side:

21:06:41 //localhost/CDB2 SQL>connect sys/oracle@//localhost/CDB1 as sysdba
Connected.
21:06:41 //localhost/CDB1 SQL> create database link CDB2@SYSOPER connect to C##SYSOPER identified by oracle2 using '//localhost/CDB2'
Database link CDB2@SYSOPER created.

For the moment this side is in READ WRITE as it is the current ‘primary’

21:06:41 //localhost/CDB1 SQL>select con_id,name con_name,open_mode,restricted,status,foreign_cdb_dbid,foreign_pdb_id,to_char(creation_time, 'hh24:mi:ss') creation_time,refresh_mode,refresh_interval,last_refresh_scn,cdb_dbid,current_scn from v$pdbs right join dba_pdbs using(con_id,creation_time,dbid) cross join (select dbid cdb_dbid,current_scn from v$database) order by pdb_id
 
CON_ID CON_NAME OPEN_MODE RESTRICTED STATUS FOREIGN_CDB_DBID FOREIGN_PDB_ID CREATION_TIME REFRESH_MODE REFRESH_INTERVAL LAST_REFRESH_SCN CDB_DBID CURRENT_SCN
------ -------- --------- ---------- ------ ---------------- -------------- ------------- ------------ ---------------- ---------------- -------- -----------
2 PDB$SEED READ ONLY NO NORMAL 1214140 2 12:43:46 NONE 944121613 1525456
3 CDB1DEMO READ WRITE NO NORMAL 944121613 2 21:06:06 NONE 944121613 1525456

Here is the one-command refresh in 18c. We alter CDB1DEMO to be refreshable from CDB2DEMO, and we add ‘switchover’ to stop refreshing the remote PDB as it will be now the ‘primary’.

21:06:41 //localhost/CDB1 SQL>alter pluggable database CDB1DEMO refresh mode every 1 minutes from CDB2DEMO@CDB2@SYSOPER switchover;
Pluggable database CDB1DEMO altered.

The alert.log here from CDB1 shows ‘Deleted file’ as in a DROP PLUGGABLE DATABASE, then ‘created as UNUSABLE’ as in CREATE PLUGGABLE DATABASE, then ‘Applying media recovery’ as in refreshable clone:

21:06:59 //localhost/CDB1 SQL>host ssh oracle@localhost 'ORACLE_HOME=/u01/app/oracle/product/18.0.0/dbhome_1 /u01/app/oracle/product/18.0.0/dbhome_1/bin/adrci exec="set home diag/rdbms/cdb1/CDB1; show alert -tail 30" '| grep --color -E "(^Completed:|^alter pluggable.*|^create pluggable.*|Media Recovery|onlined Undo|KILL|^Pluggable.*|^)"
2018-03-10 21:06:41.354000 +01:00
alter pluggable database CDB1DEMO refresh mode every 1 minutes from CDB2DEMO@CDB2@SYSOPER switchover
JIT: pid 11896 requesting stop
Buffer Cache flush started: 3
Buffer Cache flush finished: 3
While transitioning the pdb 3 to clean state, clearing all its abort bits in the control file.
Pluggable database CDB1DEMO closed
2018-03-10 21:06:45.734000 +01:00
Deleted file /u01/oradata/CDB1/CDB1DEMO/temp012018-03-10_12-43-46-436-PM.dbf
Deleted file /u01/oradata/CDB1/CDB1DEMO/undotbs01.dbf
Deleted file /u01/oradata/CDB1/CDB1DEMO/sysaux01.dbf
Deleted file /u01/oradata/CDB1/CDB1DEMO/system01.dbf
2018-03-10 21:06:48.199000 +01:00
Opatch validation is skipped for PDB CDB1DEMO (con_id=3)
2018-03-10 21:06:55.321000 +01:00
Endian type of dictionary set to little
****************************************************************
Pluggable Database CDB1DEMO with pdb id - 3 is created as UNUSABLE.
If any errors are encountered before the pdb is marked as NEW,
then the pdb must be dropped
local undo-1, localundoscn-0x00000000000000fb
****************************************************************
2018-03-10 21:06:59.142000 +01:00
Applying media recovery for pdb-4099 from SCN 1526441 to SCN 1526451
Remote log information: count-1
thr-1, seq-2, logfile-/u01/fast_recovery_area/CDB2/foreign_archivelog/CDB2DEMO/2018_03_10/o1_mf_1_2_2195948769_.arc, los-1497207, nxs-18446744073709551615
Media Recovery Start
Serial Media Recovery started
max_pdb is 4
Media Recovery Log /u01/fast_recovery_area/CDB2/foreign_archivelog/CDB2DEMO/2018_03_10/o1_mf_1_2_2195948769_.arc
Incomplete Recovery applied until change 1526451 time 03/10/2018 21:06:55
Media Recovery Complete (CDB1)
Completed: alter pluggable database CDB1DEMO refresh mode every 1 minutes from CDB2DEMO@CDB2@SYSOPER switchover
alter pluggable database refresh

The CDB1DEMO which was the ‘primary’ in READ WRITE is now the ‘standby’ in MOUNT, automatically refreshing every minute as mentioned in the switchover command:

21:07:00 //localhost/CDB1 SQL>select con_id,name con_name,open_mode,restricted,status,foreign_cdb_dbid,foreign_pdb_id,to_char(creation_time, 'hh24:mi:ss') creation_time,refresh_mode,refresh_interval,last_refresh_scn,cdb_dbid,current_scn from v$pdbs right join dba_pdbs using(con_id,creation_time,dbid) cross join (select dbid cdb_dbid,current_scn from v$database) order by pdb_id
 
CON_ID CON_NAME OPEN_MODE RESTRICTED STATUS FOREIGN_CDB_DBID FOREIGN_PDB_ID CREATION_TIME REFRESH_MODE REFRESH_INTERVAL LAST_REFRESH_SCN CDB_DBID CURRENT_SCN
------ -------- --------- ---------- ------ ---------------- -------------- ------------- ------------ ---------------- ---------------- -------- -----------
2 PDB$SEED READ ONLY NO NORMAL 1214140 2 12:43:46 NONE 944121613 1526489
3 CDB1DEMO MOUNTED REFRESHING 717451787 4 21:06:06 AUTO 1 1526451 944121613 1526489

As any refreshable clone, I can open it READ ONLY and query it:

21:07:00 //localhost/CDB1 SQL>alter pluggable database CDB1DEMO open read only;
Pluggable database CDB1DEMO altered.
 
21:07:02 //localhost/CDB1 SQL>alter session set container=CDB1DEMO;
Session altered.
 
21:07:02 //localhost/CDB1 SQL>select demo.*,ora_rowscn,current_timestamp,(select current_scn from v$database) current_scn from demo;
 
refresh timestamp refresh scn ORA_ROWSCN CURRENT_TIMESTAMP CURRENT_SCN
------- --------- ------- --- ---------- ----------------- -----------
10-MAR-18 09.06.41.175918000 PM +01:00 1525436 1525594 10-MAR-18 09.07.02.875782000 PM EUROPE/ZURICH 1526520

Look at the timestamp: the data is freshed as of the switchover. No data is lost: the transactions that were committed on the source at the time of switchover are applied on the clone.

Another switchover

I’m now doing a switchover on the opposite way. Same as before: the destination is READ ONLY and the source is READ WRITE:

21:07:20 //localhost/CDB1 SQL>connect sys/oracle@//localhost/CDB2 as sysdba
Connected.
21:07:20 //localhost/CDB2 SQL>select con_id,name con_name,open_mode,restricted,status,foreign_cdb_dbid,foreign_pdb_id,to_char(creation_time, 'hh24:mi:ss') creation_time,refresh_mode,refresh_interval,last_refresh_scn,cdb_dbid,current_scn from v$pdbs right join dba_pdbs using(con_id,creation_time,dbid) cross join (select dbid cdb_dbid,current_scn from v$database) order by pdb_id
 
CON_ID CON_NAME OPEN_MODE RESTRICTED STATUS FOREIGN_CDB_DBID FOREIGN_PDB_ID CREATION_TIME REFRESH_MODE REFRESH_INTERVAL LAST_REFRESH_SCN CDB_DBID CURRENT_SCN
------ -------- --------- ---------- ------ ---------------- -------------- ------------- ------------ ---------------- ---------------- -------- -----------
2 PDB$SEED READ ONLY NO NORMAL 1214140 2 12:53:21 NONE 717451787 1526851
4 CDB2DEMO READ WRITE NO NORMAL 944121613 3 21:06:18 NONE 717451787 1526851

Here is the switchover

21:07:20 //localhost/CDB2 SQL>alter pluggable database CDB2DEMO refresh mode every 1 minutes from CDB1DEMO@CDB1@SYSOPER switchover;
Pluggable database CDB2DEMO altered.

The alert.log shows an error at the end on the last refresh after the switchover:

21:07:33 //localhost/CDB2 SQL>host ssh oracle@localhost 'ORACLE_HOME=/u01/app/oracle/product/18.0.0/dbhome_1 /u01/app/oracle/product/18.0.0/dbhome_1/bin/adrci exec="set home diag/rdbms/cdb2/CDB2; show alert -tail 30" '| grep --color -E "(^Completed:|^alter pluggable.*|^create pluggable.*|Media Recovery|onlined Undo|KILL|^Pluggable.*|^)"
2018-03-10 21:07:32.707000 +01:00
Incomplete Recovery applied until change 1527253 time 03/10/2018 21:07:31
Media Recovery Complete (CDB2)
Completed: alter pluggable database CDB2DEMO refresh mode every 1 minutes from CDB1DEMO@CDB1@SYSOPER switchover
alter pluggable database refresh
ORA-65376 signalled during: alter pluggable database refresh...
Errors in file /u01/app/oracle/diag/rdbms/cdb2/CDB2/trace/CDB2_j000_12081.trc:
ORA-12012: error on auto execute of job "SYS"."CDB2DEMO_510111146_REFRESH"
ORA-65376: unable to refresh the PDB
ORA-06512: at "SYS.DBMS_SQL", line 2995
ORA-06512: at line 1

The content of the tracefile doesn’t tell a lot more:

ORA-12012: error on auto execute of job "SYS"."CDB2DEMO_510111146_REFRESH"
at 0x7ffd3c59af38 placed jslv.c@1659
ORA-65376: unable to refresh the PDB
ORA-06512: at "SYS.DBMS_SQL", line 2995
ORA-06512: at line 1

However, the switchover was ok, so nothing was lost and I’ll be able to run new refreshes later.

The CDB2DEMO is now the ‘standby’ again:

21:07:33 //localhost/CDB2 SQL>select con_id,name con_name,open_mode,restricted,status,foreign_cdb_dbid,foreign_pdb_id,to_char(creation_time, 'hh24:mi:ss') creation_time,refresh_mode,refresh_interval,last_refresh_scn,cdb_dbid,current_scn from v$pdbs right join dba_pdbs using(con_id,creation_time,dbid) cross join (select dbid cdb_dbid,current_scn from v$database) order by pdb_id
 
CON_ID CON_NAME OPEN_MODE RESTRICTED STATUS FOREIGN_CDB_DBID FOREIGN_PDB_ID CREATION_TIME REFRESH_MODE REFRESH_INTERVAL LAST_REFRESH_SCN CDB_DBID CURRENT_SCN
------ -------- --------- ---------- ------ ---------------- -------------- ------------- ------------ ---------------- ---------------- -------- -----------
2 PDB$SEED READ ONLY NO NORMAL 1214140 2 12:53:21 NONE 717451787 1527275
4 CDB2DEMO MOUNTED REFRESHING 944121613 3 21:06:18 AUTO 1 1527253 717451787 1527275

The CDB1 DEMO is now the ‘primary’ in READ WRITE':

21:07:34 //localhost/CDB2 SQL>connect sys/oracle@//localhost/CDB1 as sysdba
Connected.
21:07:34 //localhost/CDB1 SQL>pdbs
21:07:34 //localhost/CDB1 SQL>select con_id,name con_name,open_mode,restricted,status,foreign_cdb_dbid,foreign_pdb_id,to_char(creation_time, 'hh24:mi:ss') creation_time,refresh_mode,refresh_interval,last_refresh_scn,cdb_dbid,current_scn from v$pdbs right join dba_pdbs using(con_id,creation_time,dbid) cross join (select dbid cdb_dbid,current_scn from v$database) order by pdb_id
 
CON_ID CON_NAME OPEN_MODE RESTRICTED STATUS FOREIGN_CDB_DBID FOREIGN_PDB_ID CREATION_TIME REFRESH_MODE REFRESH_INTERVAL LAST_REFRESH_SCN CDB_DBID CURRENT_SCN
------ -------- --------- ---------- ------ ---------------- -------------- ------------- ------------ ---------------- ---------------- -------- -----------
2 PDB$SEED READ ONLY NO NORMAL 1214140 2 12:43:46 NONE 944121613 1527257
3 CDB1DEMO READ WRITE NO NORMAL 717451787 4 21:06:06 NONE 944121613 1527257

The standby is refreshing, containing the data as-of the time of switchover, until ne next refresh:

21:07:34 //localhost/CDB1 SQL>connect sys/oracle@//localhost/CDB2 as sysdba
Connected.
21:07:35 //localhost/CDB2 SQL>select con_id,name con_name,open_mode,restricted,status,foreign_cdb_dbid,foreign_pdb_id,to_char(creation_time, 'hh24:mi:ss') creation_time,refresh_mode,refresh_interval,last_refresh_scn,cdb_dbid,current_scn from v$pdbs right join dba_pdbs using(con_id,creation_time,dbid) cross join (select dbid cdb_dbid,current_scn from v$database) order by pdb_id
 
CON_ID CON_NAME OPEN_MODE RESTRICTED STATUS FOREIGN_CDB_DBID FOREIGN_PDB_ID CREATION_TIME REFRESH_MODE REFRESH_INTERVAL LAST_REFRESH_SCN CDB_DBID CURRENT_SCN
------ -------- --------- ---------- ------ ---------------- -------------- ------------- ------------ ---------------- ---------------- -------- -----------
2 PDB$SEED READ ONLY NO NORMAL 1214140 2 12:53:21 NONE 717451787 1527276
4 CDB2DEMO MOUNTED REFRESHING 944121613 3 21:06:18 AUTO 1 1527253 717451787 1527276
 
21:07:35 //localhost/CDB2 SQL>alter session set container=CDB2DEMO;
Session altered.
 
21:07:36 //localhost/CDB2 SQL>select demo.*,ora_rowscn,current_timestamp,(select current_scn from v$database) current_scn from demo;
 
refresh timestamp refresh scn ORA_ROWSCN CURRENT_TIMESTAMP CURRENT_SCN
------- --------- ------- --- ---------- ----------------- -----------
10-MAR-18 09.07.20.108426000 PM +01:00 1526838 1526839 10-MAR-18 09.07.36.114424000 PM EUROPE/ZURICH 1527282

I’ve checked the state the next day. The ‘primary’ PDB had its materlialized view still refreshing every second:

12:44:48 //localhost/CDB2 SQL>connect sys/oracle@//localhost/CDB1 as sysdba
Connected.
12:44:48 //localhost/CDB1 SQL>select con_id,name con_name,open_mode,restricted,status,foreign_cdb_dbid,foreign_pdb_id,to_char(creation_time, 'hh24:mi:ss') creation_time,refresh_mode,refresh_interval,last_refresh_scn,cdb_dbid,current_scn from v$pdbs right join dba_pdbs using(con_id,creation_time,dbid) cross join (select dbid cdb_dbid,current_scn from v$database) order by pdb_id
 
CON_ID CON_NAME OPEN_MODE RESTRICTED STATUS FOREIGN_CDB_DBID FOREIGN_PDB_ID CREATION_TIME REFRESH_MODE REFRESH_INTERVAL LAST_REFRESH_SCN CDB_DBID CURRENT_SCN
------ -------- --------- ---------- ------ ---------------- -------------- ------------- ------------ ---------------- ---------------- -------- -----------
2 PDB$SEED READ ONLY NO NORMAL 1214140 2 12:43:46 NONE 944121613 1767409
4 CDB1DEMO READ WRITE NO NORMAL 717451787 3 12:43:31 NONE 944121613 1767409
 
12:44:48 //localhost/CDB1 SQL>alter session set container=CDB1DEMO;
Session altered.
 
12:44:49 //localhost/CDB1 SQL>select demo.*,ora_rowscn,current_timestamp,(select current_scn from v$database) current_scn from demo;
 
refresh timestamp refresh scn ORA_ROWSCN CURRENT_TIMESTAMP CURRENT_SCN
------- --------- ------- --- ---------- ----------------- -----------
11-MAR-18 12.44.33.078430000 PM +01:00 1766975 1766976 11-MAR-18 12.44.49.085200000 PM EUROPE/ZURICH 1767410

And the ‘standby’ PDB is updating every minute:

12:44:49 //localhost/CDB1 SQL>connect sys/oracle@//localhost/CDB2 as sysdba
Connected.
 
12:44:49 //localhost/CDB2 SQL>select con_id,name con_name,open_mode,restricted,status,foreign_cdb_dbid,foreign_pdb_id,to_char(creation_time, 'hh24:mi:ss') creation_time,refresh_mode,refresh_interval,last_refresh_scn,cdb_dbid,current_scn from v$pdbs right join dba_pdbs using(con_id,creation_time,dbid) cross join (select dbid cdb_dbid,current_scn from v$database) order by pdb_id
 
CON_ID CON_NAME OPEN_MODE RESTRICTED STATUS FOREIGN_CDB_DBID FOREIGN_PDB_ID CREATION_TIME REFRESH_MODE REFRESH_INTERVAL LAST_REFRESH_SCN CDB_DBID CURRENT_SCN
------ -------- --------- ---------- ------ ---------------- -------------- ------------- ------------ ---------------- ---------------- -------- -----------
2 PDB$SEED READ ONLY NO NORMAL 1214140 2 12:53:21 NONE 717451787 1767422
3 CDB2DEMO MOUNTED REFRESHING 944121613 4 12:43:42 AUTO 1 1767399 717451787 1767422
 
12:44:49 //localhost/CDB2 SQL>alter session set container=CDB2DEMO;
Session altered.
 
12:44:49 //localhost/CDB2 SQL>alter pluggable database open read only;
Pluggable database OPEN altered.
 
12:44:50 //localhost/CDB2 SQL>select demo.*,ora_rowscn,current_timestamp,(select current_scn from v$database) current_scn from demo;
 
refresh timestamp refresh scn ORA_ROWSCN CURRENT_TIMESTAMP CURRENT_SCN
------- --------- ------- --- ---------- ----------------- -----------
11-MAR-18 12.44.33.078430000 PM +01:00 1766975 1766976 11-MAR-18 12.44.50.205050000 PM EUROPE/ZURICH 1767432

But no failover

When we are talking about ‘standby’ at PDB level, we want to be able to do a failover. Of course, we accept to loose some transactions as the refresh is every minutes, but 1 minute RPO is still an interesting solution for a feature that is available in all editions.

However, as in 12.2.0.1, this doesn’t work because you cannot alter ‘standby’ PDB to REFRESH MODE NONE when the ‘primary’ is not available.

Here is my ‘standby’ PDB CDB2DEMO:

12:44:50 //localhost/CDB2 SQL>connect sys/oracle@//localhost/CDB2 as sysdba
Connected.
 
12:44:50 //localhost/CDB2 SQL>select con_id,name con_name,open_mode,restricted,status,foreign_cdb_dbid,foreign_pdb_id,to_char(creation_time, 'hh24:mi:ss') creation_time,refresh_mode,refresh_interval,last_refresh_scn,cdb_dbid,current_scn from v$pdbs right join dba_pdbs using(con_id,creation_time,dbid) cross join (select dbid cdb_dbid,current_scn from v$database) order by pdb_id
 
CON_ID CON_NAME OPEN_MODE RESTRICTED STATUS FOREIGN_CDB_DBID FOREIGN_PDB_ID CREATION_TIME REFRESH_MODE REFRESH_INTERVAL LAST_REFRESH_SCN CDB_DBID CURRENT_SCN
------ -------- --------- ---------- ------ ---------------- -------------- ------------- ------------ ---------------- ---------------- -------- -----------
2 PDB$SEED READ ONLY NO NORMAL 1214140 2 12:53:21 NONE 717451787 1767433
3 CDB2DEMO READ ONLY NO REFRESHING 944121613 4 12:43:42 AUTO 1 1767405 717451787 1767433

I stop the listener which listens for the ‘primary’ CDB1DEMO:

12:44:50 //localhost/CDB2 SQL>host lsnrctl stop
 
LSNRCTL for Linux: Version 18.0.0.0.0 - Production on 11-MAR-2018 12:44:50
 
Copyright (c) 1991, 2017, Oracle. All rights reserved.
 
Connecting to (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521))
The command completed successfully

You can see in the alert.log that the refresh fails (it is running from a scheduler job):

Errors in file /u01/app/oracle/diag/rdbms/cdb2/CDB2/trace/CDB2_j000_25443.trc:
ORA-12012: error on auto execute of job "SYS"."CDB2DEMO_1159316120_REFRESH"
ORA-17627: ORA-12514: TNS:listener does not currently know of service requested in connect descriptor
ORA-17629: Cannot connect to the remote database server
ORA-06512: at "SYS.DBMS_SQL", line 2995
ORA-06512: at line 1

Then, an attempt to stop the refreshing mode of the ‘standby’ fails with ‘Cannot connect to the remote database server':

12:44:50 //localhost/CDB2 SQL>alter pluggable database CDB2DEMO close;
Pluggable database CDB2DEMO altered.
 
12:44:50 //localhost/CDB2 SQL>alter pluggable database CDB2DEMO refresh mode none;
 
Error starting at line : 180 File @ ~/PDB-switchover.sql
In command -
alter pluggable database CDB2DEMO refresh mode none
Error report -
ORA-17627: ORA-12541: TNS:no listener
ORA-17629: Cannot connect to the remote database server
17627. 00000 - "%s"
*Cause: An error returned by OCI while sending/receiving message from remote instance
*Action: Look at error message and take appropriate action or contact Oracle Support Services for further assistance

So nothing new here about failover. I already explained how to do something like a failover by cloning the standby, which can be a snapshot clone to be faster: https://blog.dbi-services.com/12cr2-pdb-refresh-as-a-poor-man-standby/.

Note that this new feature is leashed to specific platforms only – Oracle Cloud PaaS and Oracle Exadata machine, so most of Oracle customers paying for software update will not be able to use it. However, Don’t worry, you can do the same with a few commands, as in 12cR2.

 

Cet article 18c PDB switchover est apparu en premier sur Blog dbi services.

Enabled, Accepted, Fixed SQL Plan Baselines

Fri, 2018-03-09 15:44

When the documentation is not always clear, I prefer to build a test case to be sure about the behavior in different context and different versions. Here is a test on SQL Plan Management to show which plan is chosen among the different states of SQL Plan Baselines: Enabled, Accepted, Fixed. Thanks to Oracle ACE program, I have some Oracle Cloud credits to quickly provision a database, so I tested that on Oracle 18c.

For this test, I’ve created a table:

create table DEMO as select rownum n from xmltable('1 to 10000');

with 8 indexes:

exec for i in 1..8 loop execute immediate 'create index DEMO'||i||' on DEMO(n,'||i||')'; end loop;

and a procedure to query it several times, setting random costs for the indexes, with only one cheapest:

create or replace procedure runplans(n number) as
dummy number;
begin
-- run all this 30 times
for k in 1..30 loop
run from index DEMO1 to DEMOt with one of them cheaper each time
for t in 1..n loop
-- set random cost for all indexes
for i in 1..8 loop dbms_stats.set_index_stats(user,'DEMO'||i,indlevel=>round(dbms_random.value(10,100)),no_invalidate=>true); end loop;
-- set cheap cost for index DEMOt
dbms_stats.set_index_stats(user,'DEMO'||t,indlevel=>1,no_invalidate=>true);
-- change some parameters to parse new child
execute immediate 'alter session set optimizer_index_caching='||(t*8+k);
-- query with an index hint but not specifying the index so the cheaper is chosen
select /*+ index(DEMO) */ n into dummy from DEMO where n=1;
end loop;
end loop;
end;
/

So you understand the idea: have 8 possible execution plans, with one cheaper than the others. And the goal is to see which one is chosen depending of the state of the SQL Plan Baseline.

I will play with the baselines and will display the cursor execution with the following SQLcl alias:

SQL> alias sqlpb=select sql_id,plan_hash_value,sum(executions),sql_plan_baseline,id,object_name from v$sql join v$sql_plan using(sql_id,plan_hash_value,child_number) where sql_text like 'SELECT % FROM DEMO%' and id=1 group by sql_id,plan_hash_value,sql_plan_baseline,id,object_name order by 3;

So, I call my procedure to run the query 30 times when index DEMO1 is the cheapest:

SQL> exec runplans(1)
PL/SQL procedure successfully completed.

Here is my alias to show the 30 executions using DEMO1 (object_name from v$sqlplan):

SQL> sqlpb
 
SQL_ID PLAN_HASH_VALUE SUM(EXECUTIONS) SQL_PLAN_BASELINE ID OBJECT_NAME
------------- --------------- --------------- -------------------- -- -----------
gqnkn2d17zjvv 3739632713 30 1 DEMO1

I load this to SPM

SQL> exec dbms_output.put_line(dbms_spm.load_plans_from_cursor_cache(sql_id=>'gqnkn2d17zjvv'));
PL/SQL procedure successfully completed.

Here is my SQL Plan Baseline, enabled and accepted:

SQL> select plan_name,enabled,accepted,fixed,executions from dba_sql_plan_baselines;
 
PLAN_NAME ENABLED ACCEPTED FIXED EXECUTIONS
------------------------------ ------- -------- ----- ----------
SQL_PLAN_dcc9d14j7k1vu97e16a35 YES YES NO 30

Now I run my procedure to run 30 times the cursor and for each, 8 times with one different index being the cheapest:

SQL> exec runplans(8)
PL/SQL procedure successfully completed.

So all executions have used the only one SQL Plan Baseline which is enabled and accepted:

SQL> sqlpb
 
SQL_ID PLAN_HASH_VALUE SUM(EXECUTIONS) SQL_PLAN_BASELINE ID OBJECT_NAME
------------- --------------- --------------- --------------------------------- -- -----------
gqnkn2d17zjvv 3739632713 232 SQL_PLAN_dcc9d14j7k1vu97e16a35 1 DEMO1

And the other plans (because only 8 of them had this DEMO1 plan being the cheapest) were loaded, enabled but not accepted:

SQL> select plan_name,enabled,accepted,fixed,executions from dba_sql_plan_baselines;
 
PLAN_NAME ENABLED ACCEPTED FIXED EXECUTIONS
------------------------------ ------- -------- ----- ----------
SQL_PLAN_dcc9d14j7k1vu287d1344 YES NO NO 0
SQL_PLAN_dcc9d14j7k1vu452ab005 YES NO NO 0
SQL_PLAN_dcc9d14j7k1vu4564f9cd YES NO NO 0
SQL_PLAN_dcc9d14j7k1vu4cdc9ee7 YES NO NO 0
SQL_PLAN_dcc9d14j7k1vu5353a77e YES NO NO 0
SQL_PLAN_dcc9d14j7k1vu97e16a35 YES YES NO 30
SQL_PLAN_dcc9d14j7k1vuc6a3f323 YES NO NO 0
SQL_PLAN_dcc9d14j7k1vufb8f9e5a YES NO NO 0

Now, I change the status of the baselines to get all combinations of enabled, fixed and accepted, and in addition to that change the plan name to tne line of plan which differs:

SQL> begin
2 for i in (select rownum n,plan_name from dba_sql_plan_baselines) loop
3 if i.n in (2,4,6,8) then dbms_output.put_line(dbms_spm.evolve_sql_plan_baseline(plan_name=>i.plan_name,verify=>'no',commit=>'yes')); end if;
4 if i.n in (1,2,5,6) then dbms_output.put_line(dbms_spm.alter_sql_plan_baseline(plan_name=>i.plan_name,attribute_name=>'enabled',attribute_value=>'no')); end if;
5 if i.n in (5,6,7,8) then dbms_output.put_line(dbms_spm.alter_sql_plan_baseline(plan_name=>i.plan_name,attribute_name=>'fixed',attribute_value=>'yes')); end if;
6 for p in ( select plan_table_output from dbms_xplan.display_sql_plan_baseline(plan_name=>i.plan_name,format=>'basic') where plan_table_output like '%| DEMO%' ) loop
7 dbms_output.put_line(dbms_spm.alter_sql_plan_baseline(plan_name=>i.plan_name,attribute_name=>'plan_name',attribute_value=>p.plan_table_output));
8 end loop;
9 end loop;
10 end;
11 /

So here they are, with their new name:

SQL> select plan_name,enabled,accepted,fixed,executions from dba_sql_plan_baselines;
 
PLAN_NAME ENABLED ACCEPTED FIXED EXECUTIONS
---------------------------------- ------- -------- ----- ----------
| 1 | INDEX RANGE SCAN| DEMO4 | NO NO NO 0
| 1 | INDEX RANGE SCAN| DEMO3 | NO YES NO 0
| 1 | INDEX RANGE SCAN| DEMO2 | YES NO NO 0
| 1 | INDEX RANGE SCAN| DEMO6 | YES YES NO 0
| 1 | INDEX RANGE SCAN| DEMO5 | NO NO YES 0
| 1 | INDEX RANGE SCAN| DEMO1 | NO YES YES 30
| 1 | INDEX RANGE SCAN| DEMO8 | YES NO YES 0
| 1 | INDEX RANGE SCAN| DEMO7 | YES YES YES 0

Fixed plans have priority

I flush the shared pool and run my 240 statements again:

SQL> alter system flush shared_pool;
System FLUSH altered
SQL> exec runplans(8)
PL/SQL procedure successfully completed.

Here is the result in V$SQL, only one plan used for all those executions:

SQL> sqlpb
 
SQL_ID PLAN_HASH_VALUE SUM(EXECUTIONS) SQL_PLAN_BASELINE ID OBJECT_NAME
------------- --------------- --------------- ------------------------------------- -- -----------
gqnkn2d17zjvv 1698325646 240 | 1 | INDEX RANGE SCAN| DEMO7 | 1 DEMO7

The only possible plans are those that are fixed and accepted, which are the DEMO6 and DEMO7 ones. However, fixed plans have a priority, so even when the CBO came with the DEMO6 plan it was not used. When there are fixed enabled accepted SQL Plan Baseline, those are the only one considered.

Enabled and Accepted are used

Now setting all fixed attribute to no:

SQL> begin
2 for i in (select rownum n,plan_name from dba_sql_plan_baselines) loop
3 dbms_output.put_line(dbms_spm.alter_sql_plan_baseline(plan_name=>i.plan_name,attribute_name=>'fixed',attribute_value=>'no'));
4 end loop;
5 end;
6 /
PL/SQL procedure successfully completed.
 
SQL> select plan_name,enabled,accepted,fixed,executions from dba_sql_plan_baselines;
PLAN_NAME ENABLED ACCEPTED FIXED EXECUTIONS
---------------------------------- ------- -------- ----- ----------
| 1 | INDEX RANGE SCAN| DEMO4 | NO NO NO 0
| 1 | INDEX RANGE SCAN| DEMO3 | NO YES NO 0
| 1 | INDEX RANGE SCAN| DEMO2 | YES NO NO 0
| 1 | INDEX RANGE SCAN| DEMO6 | YES YES NO 0
| 1 | INDEX RANGE SCAN| DEMO5 | NO NO NO 0
| 1 | INDEX RANGE SCAN| DEMO1 | NO YES NO 30
| 1 | INDEX RANGE SCAN| DEMO8 | YES NO NO 0
| 1 | INDEX RANGE SCAN| DEMO7 | YES YES NO 0

Here is another run:

SQL> alter system flush shared_pool;
System FLUSH altered.
SQL> exec runplans(8)
PL/SQL procedure successfully completed.
 
SQL> sqlpb
 
SQL_ID PLAN_HASH_VALUE SUM(EXECUTIONS) SQL_PLAN_BASELINE ID OBJECT_NAME
------------- --------------- --------------- ------------------------------------- -- -----------
gqnkn2d17zjvv 1698325646 95 | 1 | INDEX RANGE SCAN| DEMO7 | 1 DEMO7
gqnkn2d17zjvv 3449379882 145 | 1 | INDEX RANGE SCAN| DEMO6 | 1 DEMO6

Now that there are no fixed plans taking the priority, all enabled and accepted plans are possible, but only them.

All possible plans in the baseline but none enabled and accepted

Then what happens when all possible plans are in the SQL Plan Baseline but none of them are both enabled and accepted?

SQL> begin
2 for i in (select rownum n,plan_name from dba_sql_plan_baselines where accepted='YES') loop
3 dbms_output.put_line(dbms_spm.alter_sql_plan_baseline(plan_name=>i.plan_name,attribute_name=>'enabled',attribute_value=>'no'));
4 end loop;
5 end;
6 /
 
SQL> select plan_name,enabled,accepted,fixed,executions from dba_sql_plan_baselines;
 
PLAN_NAME ENABLED ACCEPTED FIXED EXECUTIONS
---------------------------------- ------- -------- ----- ----------
| 1 | INDEX RANGE SCAN| DEMO4 | NO NO NO 0
| 1 | INDEX RANGE SCAN| DEMO3 | NO YES NO 0
| 1 | INDEX RANGE SCAN| DEMO2 | YES NO NO 0
| 1 | INDEX RANGE SCAN| DEMO6 | NO YES NO 0
| 1 | INDEX RANGE SCAN| DEMO5 | NO NO NO 0
| 1 | INDEX RANGE SCAN| DEMO1 | NO YES NO 30
| 1 | INDEX RANGE SCAN| DEMO8 | YES NO NO 0
| 1 | INDEX RANGE SCAN| DEMO7 | NO YES NO 0

So all combinations of indexes are there (and my query forces index access with a hint) but none are accepted and enabled.

SQL> alter system flush shared_pool;
System FLUSH altered.
SQL> exec runplans(8)
PL/SQL procedure successfully completed.
 
SQL> sqlpb
SQL_ID PLAN_HASH_VALUE SUM(EXECUTIONS) SQL_PLAN_BASELINE ID OBJECT_NAME
------------- --------------- --------------- -------------------- -- -----------
gqnkn2d17zjvv 3739632713 8 1 DEMO1
gqnkn2d17zjvv 4234411015 16 1 DEMO2
gqnkn2d17zjvv 2199479965 24 1 DEMO3
gqnkn2d17zjvv 1698325646 30 1 DEMO7
gqnkn2d17zjvv 3449379882 30 1 DEMO6
gqnkn2d17zjvv 2144220082 30 1 DEMO5
gqnkn2d17zjvv 918903766 30 1 DEMO4
gqnkn2d17zjvv 39208404 72 1 DEMO8

When there are no enabled and accepted plans, then anything is possible and each execution keeps the one the CBO came with.

None enabled and accepted, but new plan possible

Now, in order to have a new plan possible I’ll still run the same query but after dropping all indexes.

SQL> exec for i in 1..8 loop execute immediate 'drop index DEMO'||i; end loop;
PL/SQL procedure successfully completed.

I’ve run the same as before but without the dbms_stats calls.

Here all executions have run with the only possible plan: a full table scan:

SQL> select sql_id,plan_hash_value,sum(executions),sql_plan_baseline,id,object_name from v$sql join v$sql_plan using(sql_id,plan_hash_value,child_number) where sql_text like 'SELECT % FROM DEMO%' and id=1 group by sql_id,plan_hash_value,sql_plan_baseline,id,object_name order by 3
 
SQL_ID PLAN_HASH_VALUE SUM(EXECUTIONS) SQL_PLAN_BASELINE ID OBJECT_NAME
------------- --------------- --------------- -------------------- -- -----------
gqnkn2d17zjvv 4000794843 29 1 DEMO

this plan has been added, enabled but not accepted, to the SQL Plan Baseline:

SQL> select plan_name,enabled,accepted,fixed,executions from dba_sql_plan_baselines;
 
PLAN_NAME ENABLED ACCEPTED FIXED EXECUTIONS
------------- --------------- --------------- -------------------- -- -----------
| 1 | INDEX RANGE SCAN| DEMO4 | NO NO NO 0
| 1 | INDEX RANGE SCAN| DEMO3 | NO YES NO 0
| 1 | INDEX RANGE SCAN| DEMO2 | YES NO NO 0
| 1 | INDEX RANGE SCAN| DEMO6 | NO YES NO 0
| 1 | INDEX RANGE SCAN| DEMO5 | NO NO NO 0
SQL_PLAN_dcc9d14j7k1vu838f84a8 YES NO NO 0
| 1 | INDEX RANGE SCAN| DEMO1 | NO YES NO 30
| 1 | INDEX RANGE SCAN| DEMO8 | YES NO NO 0
| 1 | INDEX RANGE SCAN| DEMO7 | NO YES NO 0

Not accepted means that it cannot be used. But as there are no other plan possible, it will be used anyway.

In summary:
Fixed plans are like telling to the optimizer: You must use one of these.
Enabled accepted plans are like telling the optimizer: You should use one of these.
Disabled or non-accepted plans are like telling the optimizer: Try to find another plan.
The optimizer will always come with a plan, so if the rules cannot be applied, the optimizer best-estimated plan is used. It may be a non-enabled or non-accepted plan. Or it can be a new plan, which will then be added as non accepted.

 

Cet article Enabled, Accepted, Fixed SQL Plan Baselines est apparu en premier sur Blog dbi services.

Parallel pg_dump is slow by default?

Fri, 2018-03-09 00:25

Short answer: Yes, it is. Being at a customer the last days we wanted to parallel pg_dump a 2TB database. We were quite surprised that it was quite slow and it was not immediately clear why it was. Well, the answer is in the documentation: When you go for parallel dumps you need to use the directory format and this comes with: “This format is compressed by default and also supports parallel dumps.”. Compression takes time, so the question was if we could disable compression which was not clear from the statement: Does “compressed by default” mean that it per default is compressed and you can not change that or does it just mean it is the default, but you can change it?

As always, lets setup a short test case:

postgres=# create table dmp1 as 
           select a,a::varchar b,now() c 
             from generate_series ( 1, 1000000) a;
SELECT 1000000
postgres=# create table dmp2 as select * from dmp1;
SELECT 1000000
postgres=# create table dmp3 as select * from dmp1;
SELECT 1000000
postgres=# create table dmp4 as select * from dmp1;
SELECT 1000000
postgres=# \d dmp*
                        Table "public.dmp1"
 Column |           Type           | Collation | Nullable | Default 
--------+--------------------------+-----------+----------+---------
 a      | integer                  |           |          | 
 b      | character varying        |           |          | 
 c      | timestamp with time zone |           |          | 

                        Table "public.dmp2"
 Column |           Type           | Collation | Nullable | Default 
--------+--------------------------+-----------+----------+---------
 a      | integer                  |           |          | 
 b      | character varying        |           |          | 
 c      | timestamp with time zone |           |          | 

                        Table "public.dmp3"
 Column |           Type           | Collation | Nullable | Default 
--------+--------------------------+-----------+----------+---------
 a      | integer                  |           |          | 
 b      | character varying        |           |          | 
 c      | timestamp with time zone |           |          | 

                        Table "public.dmp4"
 Column |           Type           | Collation | Nullable | Default 
--------+--------------------------+-----------+----------+---------
 a      | integer                  |           |          | 
 b      | character varying        |           |          | 
 c      | timestamp with time zone |           |          | 

We have four tables each containing 1’000’000 rows. When we use pg_dump in parallel with the default it looks like this:

postgres@pgbox:/home/postgres/ [PG10] mkdir /var/tmp/dmp
postgres@pgbox:/home/postgres/ [PG10] time pg_dump --format=d --jobs=4 --file=/var/tmp/dmp/ postgres

real	0m2.788s
user	0m2.459s
sys	0m0.597s
postgres@pgbox:/home/postgres/ [PG10] ls -la /var/tmp/dmp/
total 19528
drwxr-xr-x. 2 postgres postgres    4096 Mar  9 07:16 .
drwxrwxrwt. 4 root     root          51 Mar  9 07:15 ..
-rw-r--r--. 1 postgres postgres      25 Mar  9 07:16 3113.dat.gz
-rw-r--r--. 1 postgres postgres      25 Mar  9 07:16 3114.dat.gz
-rw-r--r--. 1 postgres postgres      25 Mar  9 07:16 3115.dat.gz
-rw-r--r--. 1 postgres postgres 4991138 Mar  9 07:16 3116.dat.gz
-rw-r--r--. 1 postgres postgres 4991138 Mar  9 07:16 3117.dat.gz
-rw-r--r--. 1 postgres postgres 4991138 Mar  9 07:16 3118.dat.gz
-rw-r--r--. 1 postgres postgres 4991138 Mar  9 07:16 3119.dat.gz
-rw-r--r--. 1 postgres postgres    5819 Mar  9 07:16 toc.dat

As stated in the documentation the result is compressed. When speed is more important then the size on disk you can however disable the compression:

postgres@pgbox:/home/postgres/ [PG10] rm -rf /var/tmp/dmp/*
postgres@pgbox:/home/postgres/ [PG10] time pg_dump --format=d --jobs=4 --file=/var/tmp/dmp/ --compress=0 postgres

real	0m5.357s
user	0m0.065s
sys	0m0.460s
postgres@pgbox:/home/postgres/ [PG10] ls -la /var/tmp/dmp/
total 171040
drwxr-xr-x. 2 postgres postgres     4096 Mar  9 07:18 .
drwxrwxrwt. 4 root     root           51 Mar  9 07:15 ..
-rw-r--r--. 1 postgres postgres        5 Mar  9 07:18 3113.dat
-rw-r--r--. 1 postgres postgres        5 Mar  9 07:18 3114.dat
-rw-r--r--. 1 postgres postgres        5 Mar  9 07:18 3115.dat
-rw-r--r--. 1 postgres postgres 43777797 Mar  9 07:18 3116.dat
-rw-r--r--. 1 postgres postgres 43777797 Mar  9 07:18 3117.dat
-rw-r--r--. 1 postgres postgres 43777797 Mar  9 07:18 3118.dat
-rw-r--r--. 1 postgres postgres 43777797 Mar  9 07:18 3119.dat
-rw-r--r--. 1 postgres postgres     5819 Mar  9 07:18 toc.dat

In my case it got slower than the compressed dump but this is because I do not really have fast disks on my little VM. When you have a good storage solution disabling compression should bring you more speed.

 

Cet article Parallel pg_dump is slow by default? est apparu en premier sur Blog dbi services.

MySQL – Foreign keys and “Cannot delete or update a parent row: a foreign key constraint fails” error

Thu, 2018-03-08 08:55

As you know, foreign keys establish a sort of relationship between 2 tables. MySQL requires InnoDB storage engine to support foreign keys.

In our example, we have the following parent table in a MySQL 5.7.21 server:

mysqld3-(root@localhost) [sakila]> show create table actor\G
*************************** 1. row ***************************
       Table: actor
Create Table: CREATE TABLE `actor` (
  `actor_id` smallint(5) unsigned NOT NULL AUTO_INCREMENT,
  `first_name` varchar(45) NOT NULL,
  `last_name` varchar(45) NOT NULL,
  `last_update` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  PRIMARY KEY (`actor_id`),
  KEY `idx_actor_last_name` (`last_name`)
) ENGINE=InnoDB AUTO_INCREMENT=201 DEFAULT CHARSET=utf8
1 row in set (0.00 sec)

 

and a foreign key is defined on the child table by using the “FOREIGN KEY… REFERENCES” syntax:

mysqld3-(root@localhost) [sakila]> show create table film_actor\G
*************************** 1. row ***************************
       Table: film_actor
Create Table: CREATE TABLE `film_actor` (
  `actor_id` smallint(5) unsigned NOT NULL,
  `film_id` smallint(5) unsigned NOT NULL,
  `last_update` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  PRIMARY KEY (`actor_id`,`film_id`),
  KEY `idx_fk_film_id` (`film_id`),
  CONSTRAINT `fk_film_actor_actor` FOREIGN KEY (`actor_id`) REFERENCES `actor` (`actor_id`) ON UPDATE CASCADE,
  CONSTRAINT `fk_film_actor_film` FOREIGN KEY (`film_id`) REFERENCES `film` (`film_id`) ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8
1 row in set (0.00 sec)

 

The “ON UPDATE CASCADE” clause means that if we update values in the parent table (‘actor’ in our example),

mysqld3-(root@localhost) [sakila]> select * from actor where actor_id=1;
+----------+------------+-----------+---------------------+
| actor_id | first_name | last_name | last_update         |
+----------+------------+-----------+---------------------+
|        1 | PENELOPE   | GUINESS   | 2006-02-15 04:34:33 |
+----------+------------+-----------+---------------------+
1 row in set (0.00 sec)
mysqld3-(root@localhost) [sakila]> select * from film_actor where actor_id=1;
+----------+---------+---------------------+
| actor_id | film_id | last_update         |
+----------+---------+---------------------+
|        1 |       1 | 2006-02-15 05:05:03 |
|        1 |      23 | 2006-02-15 05:05:03 |
|        1 |      25 | 2006-02-15 05:05:03 |
|        1 |     106 | 2006-02-15 05:05:03 |
|        1 |     140 | 2006-02-15 05:05:03 |
|        1 |     166 | 2006-02-15 05:05:03 |
|        1 |     277 | 2006-02-15 05:05:03 |
|        1 |     361 | 2006-02-15 05:05:03 |
|        1 |     438 | 2006-02-15 05:05:03 |
|        1 |     499 | 2006-02-15 05:05:03 |
|        1 |     506 | 2006-02-15 05:05:03 |
|        1 |     509 | 2006-02-15 05:05:03 |
|        1 |     605 | 2006-02-15 05:05:03 |
|        1 |     635 | 2006-02-15 05:05:03 |
|        1 |     749 | 2006-02-15 05:05:03 |
|        1 |     832 | 2006-02-15 05:05:03 |
|        1 |     939 | 2006-02-15 05:05:03 |
|        1 |     970 | 2006-02-15 05:05:03 |
|        1 |     980 | 2006-02-15 05:05:03 |
+----------+---------+---------------------+
19 rows in set (0.00 sec)
mysqld3-(root@localhost) [sakila]> update actor set actor_id=300 where actor_id=1;
Query OK, 1 row affected (0.02 sec)
Rows matched: 1  Changed: 1  Warnings: 0

 

then this will cause an automatic update of the matching rows in the child table (‘film_actor’):

mysqld3-(root@localhost) [sakila]> select * from actor where actor_id=1;
Empty set (0.00 sec)
mysqld3-(root@localhost) [sakila]> select * from film_actor where actor_id=1;
Empty set (0.00 sec)
mysqld3-(root@localhost) [sakila]> select * from actor where actor_id=300;
+----------+------------+-----------+---------------------+
| actor_id | first_name | last_name | last_update         |
+----------+------------+-----------+---------------------+
|      300 | PENELOPE   | GUINESS   | 2018-02-07 15:41:45 |
+----------+------------+-----------+---------------------+
1 row in set (0.00 sec)
mysqld3-(root@localhost) [sakila]> select * from film_actor where actor_id=300;
+----------+---------+---------------------+
| actor_id | film_id | last_update         |
+----------+---------+---------------------+
|      300 |       1 | 2006-02-15 05:05:03 |
|      300 |      23 | 2006-02-15 05:05:03 |
|      300 |      25 | 2006-02-15 05:05:03 |
|      300 |     106 | 2006-02-15 05:05:03 |
|      300 |     140 | 2006-02-15 05:05:03 |
|      300 |     166 | 2006-02-15 05:05:03 |
|      300 |     277 | 2006-02-15 05:05:03 |
|      300 |     361 | 2006-02-15 05:05:03 |
|      300 |     438 | 2006-02-15 05:05:03 |
|      300 |     499 | 2006-02-15 05:05:03 |
|      300 |     506 | 2006-02-15 05:05:03 |
|      300 |     509 | 2006-02-15 05:05:03 |
|      300 |     605 | 2006-02-15 05:05:03 |
|      300 |     635 | 2006-02-15 05:05:03 |
|      300 |     749 | 2006-02-15 05:05:03 |
|      300 |     832 | 2006-02-15 05:05:03 |
|      300 |     939 | 2006-02-15 05:05:03 |
|      300 |     970 | 2006-02-15 05:05:03 |
|      300 |     980 | 2006-02-15 05:05:03 |
+----------+---------+---------------------+

 

Sometimes, when we must drop an InnoDB table in MySQL, we could encounter the following error due to foreign keys:

mysqld3-(root@localhost) [sakila]> drop table actor;
ERROR 1217 (23000): Cannot delete or update a parent row: a foreign key constraint fails

 

In this example, the ‘actor’ table is referenced by the ‘film_actor’ one:

mysqld3-(root@localhost) [sakila]> SELECT TABLE_NAME,COLUMN_NAME,CONSTRAINT_NAME, REFERENCED_TABLE_NAME,REFERENCED_COLUMN_NAME
    -> FROM INFORMATION_SCHEMA.KEY_COLUMN_USAGE
    -> WHERE REFERENCED_TABLE_SCHEMA = 'sakila'
    -> AND REFERENCED_TABLE_NAME = 'actor';
+------------+-------------+---------------------+-----------------------+------------------------+
| TABLE_NAME | COLUMN_NAME | CONSTRAINT_NAME     | REFERENCED_TABLE_NAME | REFERENCED_COLUMN_NAME |
+------------+-------------+---------------------+-----------------------+------------------------+
| film_actor | actor_id    | fk_film_actor_actor | actor                 | actor_id               |
+------------+-------------+---------------------+-----------------------+------------------------+
1 row in set (0.01 sec)

 

This foreign key constraint let data being consistent over different tables and that’s also the reason why we could not drop the parent table.
We can find this same information and the error cause displaying the state of the InnoDB storage engine through the “SHOW ENGINE INNODB STATUS” command:

mysqld3-(root@localhost) [sakila]> show engine innodb status\G
*************************** 1. row ***************************
  Type: InnoDB
  Name:
Status:
=====================================
2018-02-07 15:44:34 0x7fb734174700 INNODB MONITOR OUTPUT
=====================================
Per second averages calculated from the last 46 seconds
-----------------
BACKGROUND THREAD
-----------------
srv_master_thread loops: 84 srv_active, 0 srv_shutdown, 85720 srv_idle
srv_master_thread log flush and writes: 85803
----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 1607
OS WAIT ARRAY INFO: signal count 1552
RW-shared spins 0, rounds 757, OS waits 384
RW-excl spins 0, rounds 342, OS waits 11
RW-sx spins 2, rounds 60, OS waits 1
Spin rounds per wait: 757.00 RW-shared, 342.00 RW-excl, 30.00 RW-sx
------------------------
LATEST FOREIGN KEY ERROR
------------------------
2018-02-07 15:42:45 0x7fb734174700  Cannot drop table `sakila`.`actor`
because it is referenced by `sakila`.`film_actor`
...

 

To avoid these constraint errors during table deletion, there are different solutions:
– Drop table in the correct order (child table first, parent table as the last one)
– In case of a loop in foreign keys, remove this loop and redefine tables structure before dropping tables
– You can also temporarily set “FOREIGN_KEY_CHECKS=0”, drop the table and put again “FOREIGN_KEY_CHECKS=1”, but I don’t recommend using this method (especially in a production environment!)

 

Cet article MySQL – Foreign keys and “Cannot delete or update a parent row: a foreign key constraint fails” error est apparu en premier sur Blog dbi services.

EDB Failover Manager 3.0 and PostgreSQL 10.1

Wed, 2018-03-07 10:31

Edb Failover Manager edb-efm30 is now released and supports PostgreSQL 10.1.
Efm is an EDB tool which can be used to perform a switchover and a failover in a PostgreSQL standby environment.
In this blog we are going to see how we can install and configure efm in a standby environment. We suppose that the standby is already confugured and is running. We also have not configured any VIP. I just use 3 virtual machines to test. Below the configuration we are using 3 servers with Oracle Linux 7.2

pgservertools: 192.168.56.30 which is the witness
pgserver1: 192.168.56.36 which is the primary server
pgserver2: 192.168.56.37 which is the standby
config
EFM must be installed on both 3 nodes.To install EDB EFM,I used the rpm provided by EnterpriseDB. Note that you will need an Edb account.We will show the installation on only one node, but is the same for both nodes.

[root@host tmp]# wget http://yum.enterprisedb.com/edbrepos/edb-repo-latest.noarch.rpm
[root@host tmp]# yum localinstall -y edb-repo-latest.noarch.rpm

After we have to enable the corresponding edb repository (You will need a login and password)

[root@pgservertools yum.repos.d]# cat /etc/yum.repos.d/edb.repo
[edbas10]
name=EnterpriseDB Advanced Server 10 $releasever - $basearch
baseurl=http://login:password@yum.enterprisedb.com/10/redhat/rhel-$releasever-$basearch
enabled=1
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/ENTERPRISEDB-GPG-KEY

And then we can search for the package

[root@pgservertools ~]# yum search efm
Loaded plugins: langpacks, ulninfo
=============================== N/S matched: efm ===============================
edb-efm30.x86_64 : EnterpriseDB Failover Manager
efm20.x86_64 : EnterpriseDB Failover Manager
efm21.x86_64 : EnterpriseDB Failover Manager

And then install it (ouputs are truncated)

[root@pgservertools ~]# yum install edb-efm30.x86_64
Loaded plugins: langpacks, ulninfo
Resolving Dependencies
--> Running transaction check
---> Package edb-efm30.x86_64 0:3.0.0-1.rhel7 will be installed
--> Finished Dependency Resolution
…
…
Installed:
  edb-efm30.x86_64 0:3.0.0-1.rhel7
Complete!

efm requires also openjdk. So we also have installed openjdk 1.8 on all nodes:

root@host tmp]# yum install java-1.8.0-openjdk.x86_64

To manage efm we create a dedicated user in the primary cluster

postgres=# create user efm with login password 'root' superuser;
CREATE ROLE
postgres=#

We have to add following entries to pg_hba.conf of all databases clusters to allow connection for user efm.

##for efm
host  postgres    efm  192.168.56.36/32     md5
host  postgres    efm  192.168.56.37/32     md5
host  postgres    efm  192.168.56.30/32     md5

The configuration of efm consists of editing 2 main configuration files: efm.nodes and efm.properties. In my case these files are located in /etc/edb/efm-3.0. There are already two sample files that we can copy and then edit.
First we need to encrypt the password of user efm and after we have to configure efm.nodes and efm.properties on both nodes.

[root@pgserver1 efm-3.0]# /usr/edb/efm-3.0/bin/efm encrypt efm
This utility will generate an encrypted password for you to place in your
EFM cluster property file.
Please enter the password and hit enter:
Please enter the password again to confirm:
The encrypted password is: ff7f041651e5e864013c1102d26a5e08
Please paste this into your cluster properties file.
        db.password.encrypted= ff7f041651e5e864013c1102d26a5e08

Below we show the contents of our two configuration files
On pgserver1 which is the primary
efm.nodes

root@pgserver1 101]# cat /etc/edb/efm-3.0/efm.nodes
# List of node address:port combinations separated by whitespace.
# The list should include at least the membership coordinator's address.
192.168.56.30:9998 192.168.56.37:9998
[root@pgserver1 101]#

efm.properties

[root@pgserver1 101]# cat /etc/edb/efm-3.0/efm.properties | grep -v ^#
db.user=efm
db.password.encrypted=ff7f041651e5e864013c1102d26a5e08
db.port=5436
db.database=postgres
db.service.owner=postgres
db.service.name=
db.bin=/u01/app/postgres/product/10/db_1/bin
db.recovery.conf.dir=/u90/mydata/101
jdbc.sslmode=disable
user.email=xxx@xxx.fr
script.notification=
bind.address=192.168.56.36:9998
admin.port=9999
is.witness=false
local.period=10
local.timeout=60
local.timeout.final=10
remote.timeout=10
node.timeout=50
stop.isolated.master=false
pingServerIp=192.168.56.30
pingServerCommand=/bin/ping -q -c3 -w5
auto.allow.hosts=false
db.reuse.connection.count=0
auto.failover=true
auto.reconfigure=true
promotable=true
minimum.standbys=0
recovery.check.period=2
auto.resume.period=0
script.fence=
script.post.promotion=
script.resumed=
script.db.failure=
script.master.isolated=
script.remote.pre.promotion=
script.remote.post.promotion=
script.custom.monitor=
custom.monitor.interval=
custom.monitor.timeout=
custom.monitor.safe.mode=
sudo.command=sudo
sudo.user.command=sudo -u %u
log.dir=/var/log/efm-3.0
jgroups.loglevel=INFO
efm.loglevel=INFO
jvm.options=-Xmx32m
[root@pgserver1 101]#

On pgserver2 which is the standby
efm.nodes

[root@pgserver2 tmp]# cat /etc/edb/efm-3.0/efm.nodes
# List of node address:port combinations separated by whitespace.
# The list should include at least the membership coordinator's address.
192.168.56.30:9998 192.168.56.36:9998
[root@pgserver2 tmp]#

efm.properties

[root@pgserver2 tmp]# cat /etc/edb/efm-3.0/efm.properties | grep -v ^#
db.user=efm
db.password.encrypted=ff7f041651e5e864013c1102d26a5e08
db.port=5436
db.database=postgres
db.service.owner=postgres
db.service.name=
db.bin=/u01/app/postgres/product/10/db_1/bin
db.recovery.conf.dir=/u90/mydata/101
jdbc.sslmode=disable
user.email=xxx@xxx.fr
script.notification=
bind.address=192.168.56.37:9998
admin.port=9999
is.witness=false
local.period=10
local.timeout=60
local.timeout.final=10
remote.timeout=10
node.timeout=50
stop.isolated.master=false
pingServerIp=192.168.56.30
pingServerCommand=/bin/ping -q -c3 -w5
auto.allow.hosts=true
db.reuse.connection.count=0
auto.failover=true
auto.reconfigure=true
promotable=true
minimum.standbys=0
recovery.check.period=2
auto.resume.period=0
script.fence=
script.post.promotion=
script.resumed=
script.db.failure=
script.master.isolated=
script.remote.pre.promotion=
script.remote.post.promotion=
script.custom.monitor=
custom.monitor.interval=
custom.monitor.timeout=
custom.monitor.safe.mode=
sudo.command=sudo
sudo.user.command=sudo -u %u
log.dir=/var/log/efm-3.0
jgroups.loglevel=INFO
efm.loglevel=INFO
jvm.options=-Xmx32m
[root@pgserver2 tmp]#

On pgservertools which is the witness
efm.nodes

[root@pgservertools efm-3.0]# cat /etc/edb/efm-3.0/efm.nodes
# List of node address:port combinations separated by whitespace.
# The list should include at least the membership coordinator's address.
192.168.56.36:9998 192.168.56.37:9998
[root@pgservertools efm-3.0]#

efm.properties

[root@pgservertools efm-3.0]# cat /etc/edb/efm-3.0/efm.properties | grep -v ^#
db.user=efm
db.password.encrypted=ff7f041651e5e864013c1102d26a5e08
db.port=5436
db.database=postgres
db.service.owner=postgres
db.service.name=
db.bin=/u01/app/postgres/product/10/db_1/bin
db.recovery.conf.dir=/u90/mydata/101
jdbc.sslmode=disable
user.email=xxx@xxx.fr
script.notification=
bind.address=192.168.56.30:9998
admin.port=9999
is.witness=true
local.period=10
local.timeout=60
local.timeout.final=10
remote.timeout=10
node.timeout=50
stop.isolated.master=false
pingServerIp=192.168.56.30
pingServerCommand=/bin/ping -q -c3 -w5
auto.allow.hosts=false
db.reuse.connection.count=0
auto.failover=true
auto.reconfigure=true
promotable=true
minimum.standbys=0
recovery.check.period=2
auto.resume.period=0
script.fence=
script.post.promotion=
script.resumed=
script.db.failure=
script.master.isolated=
script.remote.pre.promotion=
script.remote.post.promotion=
script.custom.monitor=
custom.monitor.interval=
custom.monitor.timeout=
custom.monitor.safe.mode=
sudo.command=sudo
sudo.user.command=sudo -u %u
log.dir=/var/log/efm-3.0
jgroups.loglevel=INFO
efm.loglevel=INFO
jvm.options=-Xmx32m
[root@pgservertools efm-3.0]#

Now let’s start efm on both nodes. If there is any error check logs on /var/log/efm-3.0/.
I started on following order: pgserver1, pgserver2 and pgservertools. Services can be configured to be started automatically when the server starts.
Below an example on pgserverools

[root@pgservertools efm-3.0]# service efm-3.0 start
.
[root@pgservertools efm-3.0]# service efm-3.0 status
Redirecting to /bin/systemctl status  efm-3.0.service
● efm-3.0.service - EnterpriseDB Failover Manager 3.0
   Loaded: loaded (/usr/lib/systemd/system/efm-3.0.service; disabled; vendor preset: disabled)
   Active: active (running) since Tue 2018-03-06 15:58:51 CET; 1h 46min ago
  Process: 22260 ExecStart=/bin/bash -c /usr/edb/efm-3.0/bin/runefm.sh start ${CLUSTER} (code=exited, status=0/SUCCESS)
 Main PID: 22321 (java)
   CGroup: /system.slice/efm-3.0.service
           └─22321 /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.151-1.b12.el7_4.x86_64/jre/bin/java -cp /usr/edb/efm-3.0/lib/EFM-3.0.0.jar -Xmx32m com.enterprisedb.efm.main.ServiceCommand __int_start /e...
Mar 06 15:58:45 pgservertools.localdomain systemd[1]: Starting EnterpriseDB Failover Manager 3.0...
Mar 06 15:58:51 pgservertools.localdomain systemd[1]: Started EnterpriseDB Failover Manager 3.0.
[root@pgservertools efm-3.0]#

Once started we can verified from any node the status of our cluster

[root@pgservertools efm-3.0]# /usr/edb/efm-3.0/bin/efm cluster-status efm
Cluster Status: efm
VIP:
        Agent Type  Address              Agent  DB       Info
        --------------------------------------------------------------
        Witness     192.168.56.30        UP     N/A
        Master      192.168.56.36        UP     UP
        Standby     192.168.56.37        UP     UP
Allowed node host list:
        192.168.56.36 192.168.56.37 192.168.56.30
Membership coordinator: 192.168.56.30
Standby priority host list:
        192.168.56.37
Promote Status:
        DB Type     Address              XLog Loc         Info
        --------------------------------------------------------------
        Master      192.168.56.36        0/430001A8
        Standby     192.168.56.37        0/430001A8
        Standby database(s) in sync with master. It is safe to promote.
[root@pgservertools efm-3.0]#

Now that everything is ok, let’s do a switchover

[root@pgservertools efm-3.0]# /usr/edb/efm-3.0/bin/efm promote efm -switchover
Promote/switchover command accepted by local agent. Proceeding with promotion and will reconfigure original master. Run the 'cluster-status' command for information about the new cluster state.

And if we run the cluster-status command during the switchover

[root@pgservertools efm-3.0]# /usr/edb/efm-3.0/bin/efm cluster-status efm
Cluster Status: efm
VIP:
        Agent Type  Address              Agent  DB       Info
        --------------------------------------------------------------
        Witness     192.168.56.30        UP     N/A
        Idle        192.168.56.36        UP     UNKNOWN
        Standby     192.168.56.37        UP     UP
Allowed node host list:
        192.168.56.36 192.168.56.37 192.168.56.30
Membership coordinator: 192.168.56.30
Standby priority host list:
        192.168.56.37
Promote Status:
        DB Type     Address              XLog Loc         Info
        --------------------------------------------------------------
        Standby     192.168.56.37        0/44000098
        No master database was found.
Idle Node Status (idle nodes ignored in XLog location comparisons):
        Address              XLog Loc         Info
        --------------------------------------------------------------
        192.168.56.36        UNKNOWN          Connection to 192.168.56.36:5436 refused. Check that the hostname and port are correct and that the postmaster is accepting TCP/IP connections.

And we can see the promoting phase

[root@pgservertools efm-3.0]# /usr/edb/efm-3.0/bin/efm cluster-status efm
Cluster Status: efm
VIP:
        Agent Type  Address              Agent  DB       Info
        --------------------------------------------------------------
        Witness     192.168.56.30        UP     N/A
        Idle        192.168.56.36        UP     UNKNOWN
        Promoting   192.168.56.37        UP     UP
Allowed node host list:
        192.168.56.36 192.168.56.37 192.168.56.30
Membership coordinator: 192.168.56.30
Standby priority host list:
        (List is empty.)
Promote Status:
        DB Type     Address              XLog Loc         Info
        --------------------------------------------------------------
        Master      192.168.56.37        0/44000170
        No standby databases were found.
Idle Node Status (idle nodes ignored in XLog location comparisons):
        Address              XLog Loc         Info
        --------------------------------------------------------------
        192.168.56.36        UNKNOWN          Connection to 192.168.56.36:5436 refused. Check that the hostname and port are correct and that the postmaster is accepting TCP/IP connections.

And after a few time we can see that the new master is on pgserver2

[root@pgservertools efm-3.0]# /usr/edb/efm-3.0/bin/efm cluster-status efm
Cluster Status: efm
VIP:
        Agent Type  Address              Agent  DB       Info
        --------------------------------------------------------------
        Witness     192.168.56.30        UP     N/A
        Standby     192.168.56.36        UP     UP
        Master      192.168.56.37        UP     UP
Allowed node host list:
        192.168.56.36 192.168.56.37 192.168.56.30
Membership coordinator: 192.168.56.30
Standby priority host list:
        192.168.56.36
Promote Status:
        DB Type     Address              XLog Loc         Info
        --------------------------------------------------------------
        Master      192.168.56.37        0/440001A8
        Standby     192.168.56.36        0/440001A8
        Standby database(s) in sync with master. It is safe to promote.
[root@pgservertools efm-3.0]#

The purpose of the witness is to do an automatic failover when the primary is down. Let’s simulate a crash of our primary database by killing the corresponding process.

[root@pgserver2 tmp]# ps -ef | grep postgres
...
...
root     17529 14103  0 16:45 pts/1    00:00:00 tail -f /u90/mydata/101/log/postgresql-2018-03-06.log
postgres 20612     1  0 17:56 ?        00:00:00 /u01/app/postgres/product/10/db_1/bin/postgres -D /u90/mydata/101
postgres 20613 20612  0 17:56 ?        00:00:00 postgres: logger process
postgres 20615 20612  0 17:56 ?        00:00:00 postgres: checkpointer process
postgres 20616 20612  0 17:56 ?        00:00:00 postgres: writer process
postgres 20617 20612  0 17:56 ?        00:00:00 postgres: stats collector process
postgres 20819 20612  0 18:00 ?        00:00:00 postgres: wal writer process
postgres 20820 20612  0 18:00 ?        00:00:00 postgres: autovacuum launcher process
postgres 20821 20612  0 18:00 ?        00:00:00 postgres: archiver process   last was 00000008.history
postgres 20822 20612  0 18:00 ?        00:00:00 postgres: bgworker: logical replication launcher
postgres 20832 20612  0 18:00 ?        00:00:00 postgres: wal sender process repliuser 192.168.56.36(45827) streaming 0/440001A8
root     21481 16868  0 18:16 pts/0    00:00:00 grep --color=auto postgres
[root@pgserver2 tmp]#

And let’s execute the kill command

[root@pgserver2 tmp]# kill -9 20612

If we check the cluster status from the witness server, we can see that the master is in an UNKNOWN status

[root@pgservertools efm-3.0]# /usr/edb/efm-3.0/bin/efm cluster-status efm
Cluster Status: efm
VIP:
        Agent Type  Address              Agent  DB       Info
        --------------------------------------------------------------
        Witness     192.168.56.30        UP     N/A
        Standby     192.168.56.36        UP     UP
        Idle        192.168.56.37        UP     UNKNOWN
Allowed node host list:
        192.168.56.36 192.168.56.37 192.168.56.30
Membership coordinator: 192.168.56.30
Standby priority host list:
        192.168.56.36
Promote Status:
        DB Type     Address              XLog Loc         Info
        --------------------------------------------------------------
        Standby     192.168.56.36        0/440001A8
        No master database was found.
Idle Node Status (idle nodes ignored in XLog location comparisons):
        Address              XLog Loc         Info
        --------------------------------------------------------------
        192.168.56.37        UNKNOWN          Connection to 192.168.56.37:5436 refused. Check that the hostname and port are correct and that the postmaster is accepting TCP/IP connections.
[root@pgservertools efm-3.0]#

In the alert log in our standby server pgserver1, we can see that that the database is converted to a primary one a few time after.

2018-03-06 18:17:49.381 CET [18384] FATAL:  could not receive data from WAL stream: server closed the connection unexpectedly
                This probably means the server terminated abnormally
                before or while processing the request.
2018-03-06 18:17:49.382 CET [18380] LOG:  invalid record length at 0/440001A8: wanted 24, got 0
2018-03-06 18:17:49.387 CET [19049] FATAL:  could not connect to the primary server: could not connect to server: Connection refused
                Is the server running on host "192.168.56.37" and accepting
                TCP/IP connections on port 5436?
2018-03-06 18:17:54.404 CET [19055] FATAL:  could not connect to the primary server: could not connect to server: Connection refused
                Is the server running on host "192.168.56.37" and accepting
                TCP/IP connections on port 5436?
2018-03-06 18:17:59.406 CET [19107] FATAL:  could not connect to the primary server: could not connect to server: Connection refused
                Is the server running on host "192.168.56.37" and accepting
…
….
….
                TCP/IP connections on port 5436?
                TCP/IP connections on port 5436?
2018-03-06 18:18:34.450 CET [19128] FATAL:  could not connect to the primary server: could not connect to server: Connection refused
                Is the server running on host "192.168.56.37" and accepting
                TCP/IP connections on port 5436?
2018-03-06 18:18:39.451 CET [19134] FATAL:  could not connect to the primary server: could not connect to server: Connection refused
                Is the server running on host "192.168.56.37" and accepting
                TCP/IP connections on port 5436?
2018-03-06 18:18:44.462 CET [19135] FATAL:  could not connect to the primary server: could not connect to server: Connection refused
                Is the server running on host "192.168.56.37" and accepting
                TCP/IP connections on port 5436?
2018-03-06 18:18:49.456 CET [18380] LOG:  trigger file found: /tmp/postgresql.trigger
2018-03-06 18:18:49.456 CET [18380] LOG:  redo done at 0/44000170
2018-03-06 18:18:49.479 CET [18380] LOG:  selected new timeline ID: 9
2018-03-06 18:18:50.128 CET [18380] LOG:  archive recovery complete
2018-03-06 18:18:50.229 CET [18378] LOG:  database system is ready to accept connections

What we can confirm by querying the cluster status

[root@pgservertools efm-3.0]# /usr/edb/efm-3.0/bin/efm cluster-status efm
Cluster Status: efm
VIP:
        Agent Type  Address              Agent  DB       Info
        --------------------------------------------------------------
        Witness     192.168.56.30        UP     N/A
        Master      192.168.56.36        UP     UP
        Idle        192.168.56.37        UP     UNKNOWN
Allowed node host list:
        192.168.56.36 192.168.56.37 192.168.56.30
Membership coordinator: 192.168.56.30
Standby priority host list:
        (List is empty.)
Promote Status:
        DB Type     Address              XLog Loc         Info
        --------------------------------------------------------------
        Master      192.168.56.36        0/440002B8
        No standby databases were found.
Idle Node Status (idle nodes ignored in XLog location comparisons):
        Address              XLog Loc         Info
        --------------------------------------------------------------
        192.168.56.37        UNKNOWN          Connection to 192.168.56.37:5436 refused. Check that the hostname and port are correct and that the postmaster is accepting TCP/IP connections.
[root@pgservertools efm-3.0]#

and

[root@pgservertools efm-3.0]# /usr/edb/efm-3.0/bin/efm cluster-status efm
Cluster Status: efm
VIP:
        Agent Type  Address              Agent  DB       Info
        --------------------------------------------------------------
        Witness     192.168.56.30        UP     N/A
        Master      192.168.56.36        UP     UP
Allowed node host list:
        192.168.56.36 192.168.56.37 192.168.56.30
Membership coordinator: 192.168.56.30
Standby priority host list:
        (List is empty.)
Promote Status:
        DB Type     Address              XLog Loc         Info
        --------------------------------------------------------------
        Master      192.168.56.36        0/440002B8
        No standby databases were found.
[root@pgservertools efm-3.0]#

On the old primary pgserver2 we can see the contents of the file recovery.conf automatically created by EDB Failover manager

[postgres@pgserver2 101]$ cat recovery.conf
# EDB Failover Manager
# This generated recovery.conf file prevents the db server from accidentally
# being restarted as a master since a failover or promotion has occurred
standby_mode = on
restore_command = 'echo 2>"recovery suspended on failed server node"; exit 1'
[postgres@pgserver2 101]$

To rebuild our standby database we have to edit the recovery.conf file
Conclusion
We have seen in this blog how to configure edb-efm30. Note that a virtual IP can be also configured. The official documentation can help.

 

Cet article EDB Failover Manager 3.0 and PostgreSQL 10.1 est apparu en premier sur Blog dbi services.

ODA – Reducing number of cores : Getting error DCS-10045

Sun, 2018-03-04 13:27

To perform some tests in order to determine the accurate number of cores to be used by the database application, I had to increase and decrease the number of CPU cores on an ODA using the DCS stack.  When increasing would never be a problem, decreasing number of CPU cores would fail into a DCS-10045 error.

The ODA was initialized with 6 CPU cores, and purpose was to reduce it to 4 CPU cores (of course for testing ;-) ).

[root@BApdl006060 ~]# odacli list-cpucores

Node  Cores  Modified                       Job Status
----- ------ ------------------------------ ---------------
0     36     February 23, 2018 12:22:12 AM CET Configured
0     6      February 26, 2018 11:41:06 AM CET Configured

 

Running the below command to reduce CPU cores

[root@BApdl006060 ~]# odacli update-cpucore -c 4

 

Unfortunately this failed into the error :

DCS-10045:Validation error encountered: reduction in number of cores is not supported

 

If failing into this case, the tip would be to use the “force” option.

[root@BApdl006060 ~]# odacli update-cpucore -c 4 --force
{
  "jobId" : "5d343d01-6eb1-47f1-af2b-7df3f8adab84",
  "status" : "Created",
  "message" : null,
  "reports" : [ ],
  "createTimestamp" : "February 28, 2018 08:49:21 AM CET",
  "resourceList" : [ ],
  "description" : "CPU cores  service update",
  "updatedTime" : "February 28, 2018 08:49:21 AM CET"
}

 

Once the job is completed successfully, the ODA is running the expected 4 CPU cores.

root@BApdl006060 ~]# odacli list-cpucores

Node  Cores  Modified                       Job Status
----- ------ ------------------------------ ---------------
0     36     February 23, 2018 12:22:12 AM CET Configured
0     6      February 26, 2018 11:41:06 AM CET Configured
0     4      February 28, 2018 8:49:29 AM CET Configured

[root@BApdl006060 ~]# odacli describe-cpucore

Node  Cores  Modified                       Job Status
----- ------ ------------------------------ ---------------
0     4      February 28, 2018 8:49:29 AM CET Configured

 

Of course, knowing that reducing the CPU cores is not supported, recommendation would be to use the force option only in test environment and not in production.

 

 

 

Cet article ODA – Reducing number of cores : Getting error DCS-10045 est apparu en premier sur Blog dbi services.

18c new Lost Write Protection

Sat, 2018-03-03 16:26

There are many layers between the Oracle Database pwrite() calls and the physical sector written on disk: filesystem, logical volume, SAN or NAS, with a lot of smart software running for Virtualisation, Compression, Snapshotting, Synchronisation… Are you sure that the changes you made to your data is actually persisted on disk, completely and without any corruption? In case of bug or crash in the storage layer, it may happen that only part of the changes was written. In the case of crash, Oracle ensures that the datafile headers are written at the end, so that recovery can kick-in after the crash. Then, a partially written block can be detected and restored. With different checksum settings, you can also check block integrity while writing or reading. But that protects only for fractured blocks. What if a block write just did not occur? An old version of the block remains and then is perfectly correct for checksum, RMAN, and DBV.

You may be 100% that you have never experienced lost writes. But then I’ll ask you: how do you know it? You don’t. Except if you enable Lost Write Protection.

In 11g Oracle introduced this feature for Data Guard configurations. Data Guard is the best protection as the synchronization is done at the highest level: the change vector, generated before any I/O and block modification occurred. Do not use SAN synchronization for your database. Data Guard is less expensive (no option needed), more efficient (only the persistent change information is shipped), and protects over all layers. It protects from lost writes because blocks are written on both sites by a different server, instance, storage. And Data Guard can detect lost writes by shipping the block SCN for each read to compare it with the standby.

However, this has an overhead: redo generation for reads. Oracle 18c comes with a new solution with no need for a standby database: a new LOST WRITE PROTECTION tablespace is created to store the SCN of each block modified.

Do you have lost writes?

First I’ll show what happens without this feature. I create a table filled with number, timestamp and SCN:

SQL> create table DEMO.DEMO pctfree 99 as select rownum id,1 n, current_timestamp ts , (select current_scn from v$database) scn from xmltable('1 to 10');
Table DEMO.DEMO created.
 
SQL> select owner,segment_name,segment_type,block_id,blocks,sum(blocks)over(partition by owner,segment_name,segment_type) from dba_extents where owner='DEMO' and segment_name='DEMO' order by 1,2,3,4;
 
OWNER SEGMENT_NAME SEGMENT_TYPE BLOCK_ID BLOCKS SUM(BLOCKS)OVER(PARTITIONBYOWNER,SEGMENT_NAME,SEGMENT_TYPE)
----- ------------ ------------ -------- ------ -----------------------------------------------------------
DEMO DEMO TABLE 3128 8 8
 
SQL> column block_id new_value last_block_id
SQL> select dbms_rowid.rowid_block_number(rowid) block_id,DEMO.* from DEMO.DEMO;
 
BLOCK_ID ID N TS SCN
-------- -- - -- ---
3131 1 1 03-MAR-18 04.51.37.838991000 PM EUROPE/ZURICH 4619734
3131 2 1 03-MAR-18 04.51.37.838991000 PM EUROPE/ZURICH 4619734
3132 3 1 03-MAR-18 04.51.37.838991000 PM EUROPE/ZURICH 4619734
3132 4 1 03-MAR-18 04.51.37.838991000 PM EUROPE/ZURICH 4619734
3133 5 1 03-MAR-18 04.51.37.838991000 PM EUROPE/ZURICH 4619734
3133 6 1 03-MAR-18 04.51.37.838991000 PM EUROPE/ZURICH 4619734
3134 7 1 03-MAR-18 04.51.37.838991000 PM EUROPE/ZURICH 4619734
3134 8 1 03-MAR-18 04.51.37.838991000 PM EUROPE/ZURICH 4619734
3135 9 1 03-MAR-18 04.51.37.838991000 PM EUROPE/ZURICH 4619734
3135 10 1 03-MAR-18 04.51.37.838991000 PM EUROPE/ZURICH 4619734
 
SQL> column block_id clear

I’ve recorded the las BLOCK_ID in &last_block_id and ensures that all those modifications are written on dosk:

SQL> alter system checkpoint;
System CHECKPOINT altered.
 
SQL> alter system flush buffer_cache;
System FLUSH altered.

I save the block 3133 to keep an old version of it. This will be my way to simulate a lost write.

SQL> host dd if=/u01/oradata/CDB1/PDB1/users01.dbf of=/var/tmp/lwp.blk skip=$(( &last_block_id - 2 )) bs=8k count=1
1+0 records in
1+0 records out
8192 bytes (8.2 kB) copied, 0.000263416 s, 31.1 MB/s

Now, I update all rows, set new timestamp, SCN and increase the number 1 to 2.

SQL> update DEMO.DEMO set n=2+1, ts=current_timestamp, scn=(select current_scn from v$database);
10 rows updated.
SQL> commit;
Commit complete.
 
SQL> alter system checkpoint;
System CHECKPOINT altered.
SQL> alter system flush buffer_cache;
System FLUSH altered.

Here is how I reproduce lost writes. All blocks changed were written to disk, but I restore the old version of block 3133 as if this one was not written:

SQL> host dd if=/var/tmp/lwp.blk of=/u01/oradata/CDB1/PDB1/users01.dbf seek=$(( &last_block_id - 2 )) bs=8k count=1 conv=notrunc
1+0 records in
1+0 records out
8192 bytes (8.2 kB) copied, 0.000111582 s, 73.4 MB/s

This is what you can see if one of your storage layers missed a write and nevertheless acknowledged the I/O call.
s
SQL> select dbms_rowid.rowid_block_number(rowid) block_id,DEMO.* from DEMO.DEMO;
BLOCK_ID ID N TS SCN
-------- -- - --------------------------------------------- -------
3131 1 3 03-MAR-18 04.51.39.255582000 PM EUROPE/ZURICH 4619806
3131 2 3 03-MAR-18 04.51.39.255582000 PM EUROPE/ZURICH 4619806
3132 3 3 03-MAR-18 04.51.39.255582000 PM EUROPE/ZURICH 4619806
3132 4 3 03-MAR-18 04.51.39.255582000 PM EUROPE/ZURICH 4619806
3133 5 1 03-MAR-18 04.51.37.838991000 PM EUROPE/ZURICH 4619734
3133 6 1 03-MAR-18 04.51.37.838991000 PM EUROPE/ZURICH 4619734
3134 7 3 03-MAR-18 04.51.39.255582000 PM EUROPE/ZURICH 4619806
3134 8 3 03-MAR-18 04.51.39.255582000 PM EUROPE/ZURICH 4619806
3135 9 3 03-MAR-18 04.51.39.255582000 PM EUROPE/ZURICH 4619806
3135 10 3 03-MAR-18 04.51.39.255582000 PM EUROPE/ZURICH 4619806

No errors. No corruption. Just old values for the rows in this block. And there’s no way to detect it. Physical reads, RMAN, DBV, will all see a perfectly correct block. Only if those changes are inconsistent with other tables (with constraints) or indexes, you may detect a logical corruption.

The probability that this problem occurs is very low (a storage bug on exactly one 8k block or multiple of it). But it is critical because it cannot be detected. To detect it, you need to compare the full block or a checksum, or simply the SCN with another copy, such as in the standby database. Or, with this new feature, store the SCN of each data block in a new structure: the 18c LOST WRITE DETECTION.

Enabling Lost Write Protection

You need to create a tablespace to store those SCN. There’s no choice you must use a bigfile tablespace, but you can create multiple small ones if you want.

SQL> create bigfile tablespace SHADOW
2 datafile '/u01/oradata/CDB1/PDB1/shadow.dbf'
3 size 5M
4 lost write protection;
 
Tablespace SHADOW created.
 
SQL> select tablespace_name,status,bigfile,contents,logging,allocation_type,encrypted,lost_write_protect,chunk_tablespace from dba_tablespaces;
TABLESPACE_NAME STATUS BIGFILE CONTENTS LOGGING ALLOCATION_TYPE ENCRYPTED LOST_WRITE_PROTECT CHUNK_TABLESPACE
--------------- ------ ------- -------- ------- --------------- --------- ------------------ ----------------
SYSTEM ONLINE NO PERMANENT LOGGING SYSTEM NO OFF N
SYSAUX ONLINE NO PERMANENT LOGGING SYSTEM NO OFF N
UNDOTBS1 ONLINE NO UNDO LOGGING SYSTEM NO OFF N
TEMP ONLINE NO TEMPORARY NOLOGGING UNIFORM NO OFF N
SHADOW ONLINE YES LOST WRITE PROTECTION LOGGING SYSTEM NO OFF N
USERS ONLINE NO PERMANENT LOGGING SYSTEM NO OFF N
 
SQL> select * from dba_data_files;
 
FILE_NAME FILE_ID TABLESPACE_NAME BYTES BLOCKS STATUS RELATIVE_FNO AUTOEXTENSIBLE MAXBYTES MAXBLOCKS INCREMENT_BY USER_BYTES USER_BLOCKS ONLINE_STATUS LOST_WRITE_PROTECT
--------- ------- --------------- ----- ------ ------ ------------ -------------- -------- --------- ------------ ---------- ----------- ------------- ------------------
/u01/oradata/CDB1/PDB1/undotbs01.dbf 164 UNDOTBS1 104857600 12800 AVAILABLE 9 YES 34359721984 4194302 640 103809024 12672 ONLINE OFF
/u01/oradata/CDB1/PDB1/sysaux01.dbf 163 SYSAUX 408944640 49920 AVAILABLE 4 YES 34359721984 4194302 1280 407896064 49792 ONLINE OFF
/u01/oradata/CDB1/PDB1/system01.dbf 162 SYSTEM 272629760 33280 AVAILABLE 1 YES 34359721984 4194302 1280 271581184 33152 SYSTEM OFF
/u01/oradata/CDB1/PDB1/users01.dbf 169 USERS 104857600 12800 AVAILABLE 169 NO 0 0 0 103809024 12672 ONLINE OFF
/u01/oradata/CDB1/PDB1/shadow.dbf 58 SHADOW 5242880 640 AVAILABLE 1024 NO 0 0 0 4194304 512 ONLINE OFF

Then you enable this feature for the database (or pluggable database) and for the tablespaces you want to protect:

SQL> alter pluggable database enable lost write protection;
Pluggable database ENABLE altered.
 
SQL> alter tablespace USERS enable lost write protection;
Tablespace USERS altered.

Here are the new columns in DBA_TABLESPACES and DBA_DATA_FILES:

SQL> select tablespace_name,status,bigfile,contents,logging,allocation_type,encrypted,lost_write_protect,chunk_tablespace from dba_tablespaces;
 
TABLESPACE_NAME STATUS BIGFILE CONTENTS LOGGING ALLOCATION_TYPE ENCRYPTED LOST_WRITE_PROTECT CHUNK_TABLESPACE
--------------- ------ ------- -------- ------- --------------- --------- ------------------ ----------------
SYSTEM ONLINE NO PERMANENT LOGGING SYSTEM NO OFF N
SYSAUX ONLINE NO PERMANENT LOGGING SYSTEM NO OFF N
UNDOTBS1 ONLINE NO UNDO LOGGING SYSTEM NO OFF N
TEMP ONLINE NO TEMPORARY NOLOGGING UNIFORM NO OFF N
SHADOW ONLINE YES LOST WRITE PROTECTION LOGGING SYSTEM NO OFF N
USERS ONLINE NO PERMANENT LOGGING SYSTEM NO ENABLED N
 
SQL> select * from dba_data_files;
 
FILE_NAME FILE_ID TABLESPACE_NAME BYTES BLOCKS STATUS RELATIVE_FNO AUTOEXTENSIBLE MAXBYTES MAXBLOCKS INCREMENT_BY USER_BYTES USER_BLOCKS ONLINE_STATUS LOST_WRITE_PROTECT
--------- ------- --------------- ----- ------ ------ ------------ -------------- -------- --------- ------------ ---------- ----------- ------------- ------------------
/u01/oradata/CDB1/PDB1/undotbs01.dbf 164 UNDOTBS1 104857600 12800 AVAILABLE 9 YES 34359721984 4194302 640 103809024 12672 ONLINE OFF
/u01/oradata/CDB1/PDB1/sysaux01.dbf 163 SYSAUX 408944640 49920 AVAILABLE 4 YES 34359721984 4194302 1280 407896064 49792 ONLINE OFF
/u01/oradata/CDB1/PDB1/system01.dbf 162 SYSTEM 272629760 33280 AVAILABLE 1 YES 34359721984 4194302 1280 271581184 33152 SYSTEM OFF
/u01/oradata/CDB1/PDB1/users01.dbf 169 USERS 104857600 12800 AVAILABLE 169 NO 0 0 0 103809024 12672 ONLINE ENABLED
/u01/oradata/CDB1/PDB1/shadow.dbf 58 SHADOW 5242880 640 AVAILABLE 1024 NO 0 0 0 4194304 512 ONLINE OFF

Note that we are on the Oracle Cloud here and all tablespaces must be encrypted. This is also the case with the LOST WRITE PROTECTION one. Here you can see ENCRYPTION at none only because I decrypted them to look at what is inside the files.

Here are some internal tables giving some information about the storage. Note that tablespace number TSID=7 here is USERS, the one protected, and the TSID=6 one is the LOST WRITE PROTECTION one.

SQL> select * from new_lost_write_datafiles$;
 
DATAFILE_TSID_TRACKED DATAFILE_RFN_TRACKED SHADOW_DATAFILE_TSID SHADOW_DATAFILE_RFN SHADOW_DATAFILE_OFFSET NUMBER_OF_BLOCKS_ALLOCATED DATAFILE_CURRENT_STATUS
--------------------- -------------------- -------------------- ------------------- ---------------------- -------------------------- -----------------------
7 169 6 1024 128 184 enabled
 
SQL> select * from new_lost_write_extents$;
 
EXTENT_DATAFILE_RFN EXTENT_DATAFILE_TSID EXTENT_START EXTENT_LENGTH_BLOCKS_2K EXTENT_NEXT_BLOCK
------------------- -------------------- ------------ ----------------------- -----------------
1024 6 312 1312 641
 
SQL> select * from new_lost_write_shadows$;
 
SHADOW_DATAFILE_RFN SHADOW_DATAFILE_TSID SHADOW_NUMBER_BLOCKS_ALLOC SHADOW_FIRST_FREE_BLOCK SHADOW_BLOCK_SIZE_BYTES SHADOW_RECS_PER_BLOCK
------------------- -------------------- -------------------------- ----------------------- ----------------------- ---------------------
1024 6 640 128 8192 136

Because there was already a ‘lost write’ feature, this one is called ‘new lost write’.

NEW_LOST_WRITE_DATAFILE$ lists all protected (aka tracked) datafiles with the LOST WRITE PROTECTION (aka shadow) tablespace protecting it. The status can be ‘enabled’ or ‘suspended’. The row is deleted if the protection is removed.

NEW_LOST_WRITE_SHADOWS$ lists the LOST WRITE PROTECTION tablespaces. It contains a 1MB bitmap in the first 128 blocks, and extents starts after this.

NEW_LOST_WRITE_EXTENTS$ maps the free extents in the shadow tablespace. Here, the lost write protection for my USERS tablespace (100MB) takes 312 – 128 = 184 blocks (1.4 MB). The extent is 4MB. The 1312 EXTENT_LENGTH_BLOCKS_2K are the remaining free space in the extent, in 2KB blocks. 4MB-1.4MB=2.6MB which is 1312 x 2k blocks.

Simulate Lost Write

As I did before, I create a DEMO table

SQL> create table DEMO.DEMO pctfree 99 as select rownum id,1 n, current_timestamp ts , (select current_scn from v$database) scn from xmltable('1 to 10');
Table DEMO.DEMO created.
 
SQL> column block_id new_value last_block_id
SQL> select dbms_rowid.rowid_block_number(rowid) block_id,DEMO.* from DEMO.DEMO;
BLOCK_ID ID N TS SCN
-------- -- - --------------------------------------------- -------
3123 1 1 03-MAR-18 04.21.00.797975000 PM EUROPE/ZURICH 4578079
3123 2 1 03-MAR-18 04.21.00.797975000 PM EUROPE/ZURICH 4578079
3124 3 1 03-MAR-18 04.21.00.797975000 PM EUROPE/ZURICH 4578079
3124 4 1 03-MAR-18 04.21.00.797975000 PM EUROPE/ZURICH 4578079
3125 5 1 03-MAR-18 04.21.00.797975000 PM EUROPE/ZURICH 4578079
3125 6 1 03-MAR-18 04.21.00.797975000 PM EUROPE/ZURICH 4578079
3126 7 1 03-MAR-18 04.21.00.797975000 PM EUROPE/ZURICH 4578079
3126 8 1 03-MAR-18 04.21.00.797975000 PM EUROPE/ZURICH 4578079
3127 9 1 03-MAR-18 04.21.00.797975000 PM EUROPE/ZURICH 4578079
3127 10 1 03-MAR-18 04.21.00.797975000 PM EUROPE/ZURICH 4578079

I save one block:

SQL> column block_id clear
SQL> alter system checkpoint;
System CHECKPOINT altered.
SQL> alter system flush buffer_cache;
System FLUSH altered.
SQL> host dd if=/u01/oradata/CDB1/PDB1/users01.dbf of=/var/tmp/lwp.blk skip=$(( &last_block_id - 2 )) bs=8k count=1
1+0 records in
1+0 records out
8192 bytes (8.2 kB) copied, 0.000325309 s, 25.2 MB/s

Update the table:

SQL> update DEMO.DEMO set n=2+1, ts=current_timestamp, scn=(select current_scn from v$database);
10 rows updated.
SQL> commit;
Commit complete.
SQL> alter system checkpoint;
System CHECKPOINT altered.
SQL> alter system flush buffer_cache;
System FLUSH altered.

Write back the block I saved, to simulate a lost write:

SQL> host dd if=/var/tmp/lwp.blk of=/u01/oradata/CDB1/PDB1/users01.dbf seek=$(( &last_block_id - 2 )) bs=8k count=1 conv=notrunc
1+0 records in
1+0 records out
8192 bytes (8.2 kB) copied, 0.000104103 s, 78.7 MB/s

Lost Write detection

Now, when I query the table, the lost write is detected and an error is raised:

SQL> select dbms_rowid.rowid_block_number(rowid) block_id,DEMO.* from DEMO.DEMO;
 
Error starting at line : 93 File @ /media/sf_share/18c/lost_write_protection.sql
In command -
select dbms_rowid.rowid_block_number(rowid) block_id,DEMO.* from DEMO.DEMO
Error report -
ORA-65478: shadow lost write protection - found lost write

In the alert.log I have the mention of the block that failed:

2018-03-03 16:21:06.842000 +01:00
ERROR - I/O type:buffered I/O found lost write in block with file#:169 rdba:0x2a400c35, Expected SCN:0x000000000045db54 SCN in block:0x000000000045db23, approx current SCN:0x000000000045dbbb, RAC instance:1 pdb:5
*****************************************************************
An internal routine has requested a dump of selected redo.
This usually happens following a specific internal error, when
analysis of the redo logs will help Oracle Support with the
diagnosis.
It is recommended that you retain all the redo logs generated (by
all the instances) during the past 12 hours, in case additional
redo dumps are required to help with the diagnosis.
*****************************************************************

The block defined by the RDBA in alert.log is the one I have manually corrupted:

SQL> select dbms_utility.data_block_address_file(to_number('2a400c35','XXXXXXXX'))file#,dbms_utility.data_block_address_block( to_number('2a400c35','XXXXXXXX'))block# from dual;
 
FILE# BLOCK#
---------- ----------
169 3125

As mentioned in the alert.log the session has dumped the redo, as found in the session trace file:

ALTER SYSTEM DUMP REDO DBA MIN 169 3125 DBA MAX 169 3125 SCN MIN 4578132 SCN MAX 4578235 CON_ID 5

This SCN 4578132 is the commit SCN for my update. And the 4578235 is the current one. I can see the change that was lost here:

CHANGE #10 CON_ID:5 TYP:0 CLS:1 AFN:169 DBA:0x2a400c35 OBJ:73527 SCN:0x000000000045db23 SEQ:2 OP:11.4 ENC:0 RBL:0 FLG:0x0000
KTB Redo
op: 0x01 ver: 0x01
compat bit: 4 (post-11) padding: 1
op: F xid: 0x0008.019.000002d6 uba: 0x02401257.01a2.24
KDO Op code: LKR row dependencies Disabled
xtype: XA flags: 0x00000000 bdba: 0x2a400c35 hdba: 0x2a400c32
itli: 2 ispac: 0 maxfr: 4858
tabn: 0 slot: 0 to: 2
CHANGE #11 CON_ID:5 TYP:0 CLS:1 AFN:169 DBA:0x2a400c35 OBJ:73527 SCN:0x000000000045db54 SEQ:1 OP:11.5 ENC:0 RBL:0 FLG:0x0000
KTB Redo
op: 0x02 ver: 0x01
compat bit: 4 (post-11) padding: 1
op: C uba: 0x02401257.01a2.25
KDO Op code: URP row dependencies Disabled
xtype: XAxtype KDO_KDOM2 flags: 0x00000080 bdba: 0x2a400c35 hdba: 0x2a400c32
itli: 2 ispac: 0 maxfr: 4858
tabn: 0 slot: 0(0x0) flag: 0x2c lock: 2 ckix: 0
ncol: 4 nnew: 3 size: 0
Vector content:
col 1: [ 2] c1 04
col 2: [13] 78 76 03 03 10 16 02 0d 73 cd 48 86 58
col 3: [ 5] c4 05 3a 52 20

Then we need to recover…

However, unfortunately, this block is not marked as corrupt:

SQL> select * from V$DATABASE_BLOCK_CORRUPTION;
no rows selected

This means that I cannot use RMAN block recovery:

SQL> host rman target / <<
Starting recover at 03-MAR-18
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=152 device type=DISK
allocated channel: ORA_DISK_2
 
starting media recovery
media recovery complete, elapsed time: 00:00:00
 
Finished recover at 03-MAR-18

And the RMAN recovery advisor is not aware of the problem:

RMAN> list failure;
 
using target database control file instead of recovery catalog
Database Role: PRIMARY
 
no failures found that match specification

So the solution is to mark it as corrupt or restore the whole datafile (or section). And maybe ensure that the write was lost on the data, and not on the lost write tracking block itself. The redo dump may help for that.

strace: additional I/Os

I traced the server process to see what files are read during my query:

SQL> column spid new_value pid
SQL> select spid from v$process join v$session on v$session.paddr=v$process.addr where sid=sys_context('userenv','sid');
SPID
9360
SQL> column spid clear
SQL> define bg=&:
SQL> host strace -p &pid -o strace.txt &bg

This small awk filters I/O calls on users01.dbf and shadow.dbf and displays the system calls on the file handle

awk '/open[(]["].*(shadow.dbf|users01.dbf).*["],/{l=$0;gsub(/[(,)]/," ");h[$NF]=$2" "$NF;print $l;$0=$l}/^[a-zA-Z0-9]+[(][0-9]+[,)]/{l=$0;gsub(/[(,)]/," "); if ( $1 == "close" ) h[$2]="" ; if ( h[$2]!="" ) printf "%-130s \t%80s\n",l,h[$2]}/F_DUPFD/{if ( h[$2]!="" ) h[$NF]=h[$2]" "$NF;h[$2]=""}' strace.txt | grep --color=auto -E "^|^.*users01.*"

Here are the open() and pread() calls:

open "/u01/oradata/CDB1/PDB1/shadow.dbf" O_RDWR|O_DSYNC = 8
fcntl(8, F_SETFD, FD_CLOEXEC) = 0 "/u01/oradata/CDB1/PDB1/shadow.dbf" 8
fcntl(8, F_DUPFD, 256) = 256 "/u01/oradata/CDB1/PDB1/shadow.dbf" 8
fcntl(256, F_SETFD, FD_CLOEXEC) = 0 "/u01/oradata/CDB1/PDB1/shadow.dbf" 8 256
pread64(256, "X\242\226V\333E\1\f\372\366"..., 8192, 1056768) = 8192 "/u01/oradata/CDB1/PDB1/shadow.dbf" 8 256
open "/u01/oradata/CDB1/PDB1/users01.dbf" O_RDWR|O_DSYNC = 8
fcntl(8, F_SETFD, FD_CLOEXEC) = 0 "/u01/oradata/CDB1/PDB1/users01.dbf" 8
fcntl(8, F_DUPFD, 256) = 257 "/u01/oradata/CDB1/PDB1/users01.dbf" 8
fcntl(257, F_SETFD, FD_CLOEXEC) = 0 "/u01/oradata/CDB1/PDB1/users01.dbf" 8 257
pread64(257, "#\242002\f@*%\333E\1\4\370\264"..., 8192, 25575424) = 8192 "/u01/oradata/CDB1/PDB1/users01.dbf" 8 257
pread64(257, "\6\242003\f@*T\333E\5\6\317H\1007\37\1!\333E"..., 40960, 25583616) = 40960 "/u01/oradata/CDB1/PDB1/users01.dbf" 8 257
pread64(257, "\6\242005\f@*#\333E\2\4zo\1007\37\1!\333E"..., 8192, 25600000) = 8192 "/u01/oradata/CDB1/PDB1/users01.dbf" 8 257
pread64(257, "\6\242003\f@*T\333E\5\6\317H\1007\37\1!\333E"..., 40960, 25583616) = 40960 "/u01/oradata/CDB1/PDB1/users01.dbf" 8 257
pread64(257, "\6\242005\f@*#\333E\2\4zo\1007\37\1!\333E"..., 8192, 25600000) = 8192 "/u01/oradata/CDB1/PDB1/users01.dbf" 8 257
pread64(257, "\6\242003\f@*T\333E\5\6\317H\1007\37\1!\333E"..., 40960, 25583616) = 40960 "/u01/oradata/CDB1/PDB1/users01.dbf" 8 257
pread64(257, "\6\242005\f@*#\333E\2\4zo\1007\37\1!\333E"..., 8192, 25600000) = 8192 "/u01/oradata/CDB1/PDB1/users01.dbf" 8 257

We can see the lost write protection file read first (1 block at offset 1056768 which is block 129, the first one after the 1MB header) and the SCNs for my 5 blocks table are all there. Then the table blocks are read. Note that all those blocks (lost protection and data) goes into the buffer cache, and then do not have to be re-read each time. Here, I’ve run my failing select 3 times and only the first one had to read the shadow datafile.

X$BH: additional buffer gets

As those blocks are read through the buffer cache during the consistent reads, I checked the buffer cache headers for the 3 times I’ve run the queries. I’ve identified them from the function that reads them: kcbr_lost_get_lost_write_scns

SQL> select dbms_rowid.rowid_block_number(rowid) block_id,DEMO.* from DEMO.DEMO;
ORA-65478: shadow lost write protection - found lost write
 
SQL> select obj,state,tch,fp_whr from x$bh where fp_whr like 'kr_gcur_4: kcbr_lost_get_lost_w%';
 
OBJ STATE TCH FP_WHR
--- ----- --- ------
4294967295 1 1 kr_gcur_4: kcbr_lost_get_lost_w
 
 
SQL> select dbms_rowid.rowid_block_number(rowid) block_id,DEMO.* from DEMO.DEMO;
ORA-65478: shadow lost write protection - found lost write
 
SQL> select obj,state,tch,fp_whr from x$bh where fp_whr like 'kr_gcur_4: kcbr_lost_get_lost_w%';
 
OBJ STATE TCH FP_WHR
--- ----- --- ------
4294967295 1 2 kr_gcur_4: kcbr_lost_get_lost_w
 
 
SQL> select dbms_rowid.rowid_block_number(rowid) block_id,DEMO.* from DEMO.DEMO;
ORA-65478: shadow lost write protection - found lost write
 
SQL> select obj,state,tch,fp_whr from x$bh where fp_whr like 'kr_gcur_4: kcbr_lost_get_lost_w%';
 
OBJ STATE TCH FP_WHR
--- ----- --- ------
4294967295 1 3 kr_gcur_4: kcbr_lost_get_lost_w

Here we can see the touch count increasing. It seems that for each query the kcbr_lost_get_lost_write_scns is called, even when there was no modification and no new read from disk.

While we’re there, let’s breakpoint on this fonction to see when it is called:

(gdb) break kcbr_lost_get_lost_write_scns
Breakpoint 1 at 0x85a9140
(gdb) c
Continuing.
 
Breakpoint 1, 0x00000000085a9140 in kcbr_lost_get_lost_write_scns ()
(gdb) bt
#0 0x00000000085a9140 in kcbr_lost_get_lost_write_scns ()
#1 0x0000000001cf9c01 in kcbzibmlt ()
#2 0x0000000001ce2f29 in kcbzib ()
#3 0x0000000011e7f6e9 in kcbgtcr ()
#4 0x0000000011e366bd in ktrget2 ()
#5 0x00000000121d5ca7 in kdst_fetch0 ()
#6 0x00000000121e4f5a in kdstf000110100000000km ()
#7 0x00000000121d398e in kdsttgr ()
#8 0x000000001224d28f in qertbFetch ()
#9 0x00000000120340ef in opifch2 ()
#10 0x0000000002d8d033 in kpoal8 ()
#11 0x000000001203af9c in opiodr ()
#12 0x000000001230acf7 in ttcpip ()
#13 0x00000000026a5667 in opitsk ()
#14 0x00000000026aa27d in opiino ()
#15 0x000000001203af9c in opiodr ()
#16 0x00000000026a10a3 in opidrv ()
#17 0x00000000032a58af in sou2o ()
#18 0x0000000000d68047 in opimai_real ()
#19 0x00000000032b2667 in ssthrdmain ()
#20 0x0000000000d67e53 in main ()

Look at Frits Hoogland annotations for the signification and you will see that this is called during consistent reads -> input buffer.

So what?

This feature is interesting. Of course, we need to measure the overhead of this detection, but this additional storage of the SCN being implemented as any data block, benefits from all its efficiency: buffer cache, background writes by dbwr, protection by redo, backups,… These times, I see more and more databases installed on storage with fancy features, and admins playing with snapshot without really knowing whether it is consistent or not. This is the opposite of the ‘reliable’ and ‘keep it simple’ properties that we want for our data. For these environments, when I cannot convince the storage admins to forget about those features and rely on Data Guard on top of the simplest storage, then at least we have a way to protect us from failures in those layers.

 

Cet article 18c new Lost Write Protection est apparu en premier sur Blog dbi services.

Error accessing D2: Error while fetching repositories

Thu, 2018-03-01 06:55

d2_logo_failed

After patching our D2 we ran into an error we never had before. After all, it’s only a patch. But as you know, patching D2 means recreating the war file and deploying it again. As well as setting up again all the customs you made…

Well this time we had a popup when accessing the D2 interface, which said “Error while fetching repositories”. That means D2 wans’t even able to load the list of available repo. And here’s what we had in the logs:

[ERROR] [http-bio-8080-exec-3] - c.e.d.d.w.s.config.D2X3ConfigService    : Error obtaining the default offline skin java.lang.NoClassDefFoundError: Could not initialize class com.documentum.fc.client.DfClient

I didn’t bring the whole stack trace but it was as “precise” as this line…

I found the solution in the official documentation, but not for my tomcat version… we were using tomcat 7 whereas it is for tomcat 8.

The solution:

Running Tomcat 8.0.5 with D7.1SP1 and DFS7.1.1 requires modifications to catalina.properties:
jnlp.com.rsa.cryptoj.fips140loader=true
File location: <CATALINA_HOME>/conf/catalina.properties

Running Tomcat 8.0.5 with D7.1 CS and DFS7.1 requires copying cryptojce.jar, cryptojcommon.jar and jcmFIPS.jar files to the /D2/WEN-INF/lib and /D2-Config/WEB-INF/lib folders. Also, modify the catalina.properties files as mentioned above.

 

Cet article Error accessing D2: Error while fetching repositories est apparu en premier sur Blog dbi services.

Error starting D2 4.5 on Tomcat 8 No class found: JSAFE_SecureRandom

Thu, 2018-03-01 06:29

d2_logo_failed

When installing D2 and launching the tomcat server you can have an error on a missing lib: JSAFE_SecureRandom.

Here’s the trace you can have:

2018-01-17 17:32:19 [DEBUG] [localhost-startStop-1] - c.e.c.d.contents.D2FileCleaningTracker : Init D2FileCleaningTracker with /pkgs/dms-tomcat/D2_TOMCAT_HOME/temp/D2FileCleaningTracker_D2.ser
2018-01-17 17:32:20 [ERROR] [localhost-startStop-1] - c.emc.d2fs.dctm.servlets.D2HttpServlet : LoadOnStartup - java.lang.NoClassDefFoundError: com/rsa/jsafe/JSAFE_SecureRandom
2018-01-17 17:32:20 [ERROR] [localhost-startStop-1] - c.emc.d2fs.dctm.servlets.D2HttpServlet : {}
java.lang.NoClassDefFoundError: com/rsa/jsafe/JSAFE_SecureRandom
at com.documentum.fc.client.security.impl.InitializeKeystoreForDfc.execute(InitializeKeystoreForDfc.java:46) [dfc.jar:na]
at com.documentum.fc.client.security.internal.KeystoreMgr.initForDfcWithDefaultPolicy(KeystoreMgr.java:71) [dfc.jar:na]
at com.documentum.fc.client.security.impl.IdentityManager.initFromKeystore(IdentityManager.java:139) [dfc.jar:na]
at com.documentum.fc.client.security.impl.IdentityManager.<init>(IdentityManager.java:41) [dfc.jar:na]
at com.documentum.fc.impl.RuntimeContext.<init>(RuntimeContext.java:52) [dfc.jar:na]
at com.documentum.fc.impl.RuntimeContext.<clinit>(RuntimeContext.java:185) [dfc.jar:na]
at com.documentum.fc.client.DfClient.<clinit>(DfClient.java:772) [dfc.jar:na]
at com.emc.d2fs.dctm.servlets.init.SessionIterator.<init>(SessionIterator.java:77) [D2FS4DCTM-WEB-4.5.0.jar:na]
at com.emc.d2fs.dctm.servlets.init.LoadOnStartup.getSessions(LoadOnStartup.java:441) [D2FS4DCTM-WEB-4.5.0.jar:na]
at com.emc.d2fs.dctm.servlets.init.LoadOnStartup.processRequest(LoadOnStartup.java:164) [D2FS4DCTM-WEB-4.5.0.jar:na]
at com.emc.d2fs.dctm.servlets.D2HttpServlet.execute(D2HttpServlet.java:243) [D2FS4DCTM-API-4.5.0.jar:na]
at com.emc.d2fs.dctm.servlets.D2HttpServlet.doGetAndPost(D2HttpServlet.java:506) [D2FS4DCTM-API-4.5.0.jar:na]
at com.emc.d2fs.dctm.servlets.D2HttpServlet.doGet(D2HttpServlet.java:116) [D2FS4DCTM-API-4.5.0.jar:na]
at com.emc.d2fs.dctm.servlets.init.LoadOnStartup.init(LoadOnStartup.java:116) [D2FS4DCTM-WEB-4.5.0.jar:na]
at javax.servlet.GenericServlet.init(GenericServlet.java:158) [servlet-api.jar:3.0.FR]
at org.apache.catalina.core.StandardWrapper.initServlet(StandardWrapper.java:1284) [catalina.jar:7.0.53]
at org.apache.catalina.core.StandardWrapper.loadServlet(StandardWrapper.java:1197) [catalina.jar:7.0.53]
at org.apache.catalina.core.StandardWrapper.load(StandardWrapper.java:1087) [catalina.jar:7.0.53]
at org.apache.catalina.core.StandardContext.loadOnStartup(StandardContext.java:5210) [catalina.jar:7.0.53]
at org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5493) [catalina.jar:7.0.53]
at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150) [catalina.jar:7.0.53]
at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:901) [catalina.jar:7.0.53]
at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:877) [catalina.jar:7.0.53]
at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:632) [catalina.jar:7.0.53]
at org.apache.catalina.startup.HostConfig.deployWAR(HostConfig.java:1073) [catalina.jar:7.0.53]
at org.apache.catalina.startup.HostConfig$DeployWar.run(HostConfig.java:1857) [catalina.jar:7.0.53]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_152]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_152]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_152]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_152]
at java.lang.Thread.run(Thread.java:748) [na:1.8.0_152]
Caused by: java.lang.ClassNotFoundException: com.rsa.jsafe.JSAFE_SecureRandom
at org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1720) [catalina.jar:7.0.53]
at org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1571) [catalina.jar:7.0.53]
... 31 common frames omitted

What’s the cause of this?

When you generate the D2 war file it requires some libraries like the dfs sdk. If you look into the logs you can see what has been added to the war file or at least what is missing. Well, for the sdk 7.x there are 2 libs missing, for generating the D2 war file for the version 4.5:

[copy] Warning: Could not find file /D24.5P29/emc-dfs-sdk-7.3/lib/java/dfc/certjFIPS.jar to copy.
[copy] Warning: Could not find file /D24.5P29/emc-dfs-sdk-7.3/lib/java/dfc/jsafeFIPS.jar to copy.
Resolution

To resolve the issue, copy these 2 libraries from the 6.7SP2 dfs sdk and paste them into the lib folder of D2.

 

Cet article Error starting D2 4.5 on Tomcat 8 No class found: JSAFE_SecureRandom est apparu en premier sur Blog dbi services.

18c dbms_xplan note about failed SQL Plan Baseline

Tue, 2018-02-27 15:17

SQL Plan Baselines is a great feature for plan stability: you capture the plans that you accept. However, if the data model changes and the accepted plans cannot reproduce, the optimizer will come with a new plan. In 18c we have a note from DBMS_XPLAN when the optimization ‘failed to use SQL plan baseline for this statement’.

I create a table, with an index, and run a query on it using this index:

SQL> create table DEMO as select rownum n from xmltable('1 to 1000');
Table DEMO created.
 
SQL> create index DEMO_N on DEMO(n);
Index DEMO_N created.
 
SQL> select * from DEMO where n=1;
 
N
-
1

The execution plan is correct, using the index:

SQL> select * from dbms_xplan.display_cursor();
 
PLAN_TABLE_OUTPUT
-----------------
SQL_ID 4mcr18aqntpkq, child number 0
-------------------------------------
select * from DEMO where n=1
 
Plan hash value: 217077817
 
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 1 (100)| |
|* 1 | INDEX RANGE SCAN| DEMO_N | 1 | 4 | 1 (0)| 00:00:01 |
---------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
1 - access("N"=1)

I’m happy with this plan, and I capture it as an accepted SQL Plan Baseline for this statemement:

SQL> exec dbms_output.put_line( dbms_spm.load_plans_from_cursor_cache('4mcr18aqntpkq') )
PL/SQL procedure successfully completed.
 
SQL> select * from dba_sql_plan_baselines;
 
SIGNATURE SQL_HANDLE SQL_TEXT PLAN_NAME CREATOR ORIGIN PARSING_SCHEMA_NAME DESCRIPTION VERSION CREATED LAST_MODIFIED LAST_EXECUTED LAST_VERIFIED ENABLED ACCEPTED FIXED REPRODUCED AUTOPURGE ADAPTIVE OPTIMIZER_COST MODULE ACTION EXECUTIONS ELAPSED_TIME CPU_TIME BUFFER_GETS DISK_READS DIRECT_WRITES ROWS_PROCESSED FETCHES END_OF_FETCH_COUNT
--------- ---------- -------- --------- ------- ------ ------------------- ----------- ------- ------- ------------- ------------- ------------- ------- -------- ----- ---------- --------- -------- -------------- ------ ------ ---------- ------------ -------- ----------- ---------- ------------- -------------- ------- ------------------
5689790730784434204 SQL_4ef632861ab7681c select * from DEMO where n=1 SQL_PLAN_4xxjkhsdbfu0wd5d62705 DEMO MANUAL-LOAD-FROM-CURSOR-CACHE DEMO 18.0.0.0.0 27-FEB-18 09.37.55.000000000 PM 27-FEB-18 09.37.55.000000000 PM YES YES NO YES YES NO 1 java@VM181 (TNS V1-V3) 3 5771 4824 82 1 0 3 6

When I run the sattement again, this plan is used and DBMS_XPLAN mentions the SQL Plan BAseline that was used:

SQL> select * from DEMO where n=1;
 
N
-
1
 
SQL> select * from dbms_xplan.display_cursor();
 
PLAN_TABLE_OUTPUT
-----------------
SQL_ID 4mcr18aqntpkq, child number 0
-------------------------------------
select * from DEMO where n=1
 
Plan hash value: 217077817
 
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 1 (100)| |
|* 1 | INDEX RANGE SCAN| DEMO_N | 1 | 4 | 1 (0)| 00:00:01 |
---------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
1 - access("N"=1)
 
Note
-----
- SQL plan baseline SQL_PLAN_4xxjkhsdbfu0wd5d62705 used for this statement

Now, if I drop the index, the accepted plan cannot be used:

SQL> drop index DEMO_N;
Index DEMO_N dropped.
 
SQL> select * from DEMO where n=1;
 
N
-
1
 
SQL> select * from dbms_xplan.display_cursor();
 
PLAN_TABLE_OUTPUT
-----------------
SQL_ID 4mcr18aqntpkq, child number 0
-------------------------------------
select * from DEMO where n=1
 
Plan hash value: 4000794843
 
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 2 (100)| |
|* 1 | TABLE ACCESS FULL| DEMO | 1 | 4 | 2 (0)| 00:00:01 |
--------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
1 - filter("N"=1)
 
Note
-----
- Failed to use SQL plan baseline for this statement

So the new note in 18c explains that there is an SQL Plan Baseline that cannot be used. Unfortunately, there is no identification of the SQL Plan baselines.


SQL> select * from dba_sql_plan_baselines;
 
SIGNATURE SQL_HANDLE SQL_TEXT PLAN_NAME CREATOR ORIGIN PARSING_SCHEMA_NAME DESCRIPTION VERSION CREATED LAST_MODIFIED LAST_EXECUTED LAST_VERIFIED ENABLED ACCEPTED FIXED REPRODUCED AUTOPURGE ADAPTIVE OPTIMIZER_COST MODULE ACTION EXECUTIONS ELAPSED_TIME CPU_TIME BUFFER_GETS DISK_READS DIRECT_WRITES ROWS_PROCESSED FETCHES END_OF_FETCH_COUNT
--------- ---------- -------- --------- ------- ------ ------------------- ----------- ------- ------- ------------- ------------- ------------- ------- -------- ----- ---------- --------- -------- -------------- ------ ------ ---------- ------------ -------- ----------- ---------- ------------- -------------- ------- ------------------
5689790730784434204 SQL_4ef632861ab7681c select * from DEMO where n=1 SQL_PLAN_4xxjkhsdbfu0w838f84a8 DEMO AUTO-CAPTURE DEMO 18.0.0.0.0 27-FEB-18 09.37.56.000000000 PM 27-FEB-18 09.37.56.000000000 PM YES NO NO YES YES NO 2 java@VM181 (TNS V1-V3) 0 0 0 0 0 0 0 0
5689790730784434204 SQL_4ef632861ab7681c select * from DEMO where n=1 SQL_PLAN_4xxjkhsdbfu0wd5d62705 DEMO MANUAL-LOAD-FROM-CURSOR-CACHE DEMO 18.0.0.0.0 27-FEB-18 09.37.55.000000000 PM 27-FEB-18 09.37.55.000000000 PM 27-FEB-18 09.37.55.000000000 PM YES YES NO YES YES NO 1 java@VM181 (TNS V1-V3) 3 5771 4824 82 1 0 3 6

So, because the accepted plan was not able to reproduce, because the index has been dropped, the new plan was captured but not accepted.

Note that if I re-create the index but with a different name, then the accepted SQL Plan cannot be used either:

SQL> create index DEMO_XXX on DEMO(n);
Index DEMO_XXX created.
 
SQL> select * from DEMO where n=1;
 
N
-
1
 
SQL> select * from dbms_xplan.display_cursor();
 
PLAN_TABLE_OUTPUT
-----------------
SQL_ID 4mcr18aqntpkq, child number 0
-------------------------------------
select * from DEMO where n=1
 
Plan hash value: 1306684165
 
-----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 1 (100)| |
|* 1 | INDEX RANGE SCAN| DEMO_XXX | 1 | 4 | 1 (0)| 00:00:01 |
-----------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
1 - access("N"=1)
 
Note
-----
- Failed to use SQL plan baseline for this statement

So, it is not a bad idea to monitor the SQL PLan Baseline which did not reproduce. We can get them from the ‘baseline_repro_fail’ mention in OTHER_XML:

SQL> select sql_id,other_xml from v$sql_plan where other_xml like '%baseline_repro_fail%';
 
SQL_ID OTHER_XML
------ ---------
4mcr18aqntpkq <other_xml><info type="db_version">18.0.0.0</info><info type="parse_schema"><![CDATA["DEMO"]]></info><info type="plan_hash_full">211349514</info><info type="plan_hash">1306684165</info><info type="plan_hash_2">211349514</info><info type="baseline_repro_fail" note="y">yes</info><outline_data><hint><![CDATA[IGNORE_OPTIM_EMBEDDED_HINTS]]></hint><hint><![CDATA[OPTIMIZER_FEATURES_ENABLE('18.1.0')]]></hint><hint><![CDATA[DB_VERSION('18.1.0')]]></hint><hint><![CDATA[ALL_ROWS]]></hint><hint><![CDATA[OUTLINE_LEAF(@"SEL$1")]]></hint><hint><![CDATA[INDEX(@"SEL$1" "DEMO"@"SEL$1" ("DEMO"."N"))]]></hint></outline_data></other_xml>

From the SQL_ID I can get the SIGNATURE used by SQL Plan Management:

SQL> select sql_id,exact_matching_signature from v$sql where sql_id='4mcr18aqntpkq';
 
SQL_ID EXACT_MATCHING_SIGNATURE
------ ------------------------
4mcr18aqntpkq 5689790730784434204

And then the SQL Plan Baselines:

SQL> select * from dba_sql_plan_baselines where signature=5689790730784434204;
 
SIGNATURE SQL_HANDLE SQL_TEXT PLAN_NAME CREATOR ORIGIN PARSING_SCHEMA_NAME DESCRIPTION VERSION CREATED LAST_MODIFIED LAST_EXECUTED LAST_VERIFIED ENABLED ACCEPTED FIXED REPRODUCED AUTOPURGE ADAPTIVE OPTIMIZER_COST MODULE ACTION EXECUTIONS ELAPSED_TIME CPU_TIME BUFFER_GETS DISK_READS DIRECT_WRITES ROWS_PROCESSED FETCHES END_OF_FETCH_COUNT
--------- ---------- -------- --------- ------- ------ ------------------- ----------- ------- ------- ------------- ------------- ------------- ------- -------- ----- ---------- --------- -------- -------------- ------ ------ ---------- ------------ -------- ----------- ---------- ------------- -------------- ------- ------------------
5689790730784434204 SQL_4ef632861ab7681c select * from DEMO where n=1 SQL_PLAN_4xxjkhsdbfu0w0c98f00a DEMO AUTO-CAPTURE DEMO 18.0.0.0.0 27-FEB-18 10.02.07.000000000 PM 27-FEB-18 10.02.07.000000000 PM YES NO NO YES YES NO 1 java@VM181 (TNS V1-V3) 0 0 0 0 0 0 0 0
5689790730784434204 SQL_4ef632861ab7681c select * from DEMO where n=1 SQL_PLAN_4xxjkhsdbfu0w838f84a8 DEMO AUTO-CAPTURE DEMO 18.0.0.0.0 27-FEB-18 10.02.07.000000000 PM 27-FEB-18 10.02.07.000000000 PM YES NO NO YES YES NO 2 java@VM181 (TNS V1-V3) 0 0 0 0 0 0 0 0
5689790730784434204 SQL_4ef632861ab7681c select * from DEMO where n=1 SQL_PLAN_4xxjkhsdbfu0wd5d62705 DEMO MANUAL-LOAD-FROM-CURSOR-CACHE DEMO 18.0.0.0.0 27-FEB-18 10.02.06.000000000 PM 27-FEB-18 10.02.06.000000000 PM 27-FEB-18 10.02.06.000000000 PM YES YES NO YES YES NO 1 java@VM181 (TNS V1-V3) 3 4634 4210 75 1 0 3 6

Now it is easy to look at the SPM baselines to understand why it did not reproduce:

SQL> select * from dbms_xplan.display_sql_plan_baseline('SQL_4ef632861ab7681c',format=>'basic');
 
PLAN_TABLE_OUTPUT
-----------------
 
--------------------------------------------------------------------------------
SQL handle: SQL_4ef632861ab7681c
SQL text: select * from DEMO where n=1
--------------------------------------------------------------------------------
 
--------------------------------------------------------------------------------
Plan name: SQL_PLAN_4xxjkhsdbfu0w0c98f00a Plan id: 211349514
Enabled: YES Fixed: NO Accepted: NO Origin: AUTO-CAPTURE
Plan rows: From dictionary
--------------------------------------------------------------------------------
 
Plan hash value: 1306684165
 
-------------------------------------
| Id | Operation | Name |
-------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | INDEX RANGE SCAN| DEMO_XXX |
-------------------------------------
 
--------------------------------------------------------------------------------
Plan name: SQL_PLAN_4xxjkhsdbfu0w838f84a8 Plan id: 2207220904
Enabled: YES Fixed: NO Accepted: NO Origin: AUTO-CAPTURE
Plan rows: From dictionary
--------------------------------------------------------------------------------
 
Plan hash value: 4000794843
 
----------------------------------
| Id | Operation | Name |
----------------------------------
| 0 | SELECT STATEMENT | |
| 1 | TABLE ACCESS FULL| DEMO |
----------------------------------
 
--------------------------------------------------------------------------------
Plan name: SQL_PLAN_4xxjkhsdbfu0wd5d62705 Plan id: 3587581701
Enabled: YES Fixed: NO Accepted: YES Origin: MANUAL-LOAD-FROM-CURSOR-CACHE
Plan rows: From dictionary
--------------------------------------------------------------------------------
 
Plan hash value: 217077817
 
-----------------------------------
| Id | Operation | Name |
-----------------------------------
| 0 | SELECT STATEMENT | |
| 1 | INDEX RANGE SCAN| DEMO_N |
-----------------------------------
 

Now we can see the unaccepted plan using index DEMO_XXX and the accepted one using index DEMO_N. If we add format=>’+outline’ we can even see that they index the same column. REady to accept the new plan and remove the old one.

In conclusion: check you V$SQL_PLAN.OTHER_XML for info type=”baseline_repro_fail” in 18c and you can do some housekeeping on SQL Plan baselines which do not reproduce. Because if you accepted a plan which cannot reproduce, you may have a problem, and better address this pro-actively.

 

Cet article 18c dbms_xplan note about failed SQL Plan Baseline est apparu en premier sur Blog dbi services.

ODA migration challenges: Non-OMF to OMF + 11.2.0.3 to 11.2.0.4

Tue, 2018-02-27 09:59

To do some application and performances tests, I had to copy a database from a third party Linux server to an ODA X7-2M. Looks pretty simple on the paper, but 2 small challenges came into the game. The first was that of course the source database was in Non-OMF while ODA works fully in OMF. The second was that the source database is running 11.2.0.3 which is not supported and cannot be installed on the ODA “lite”. Therefore I had to find a way to copy the database on 11.2.0.4 binaries and get the upgrade done before opening it.

My first idea was of course to do a duplicate of the source database to the ODA.  To get everything ready on ODA side (folders, instance…), I simply created an 11.2.0.4 database using ODACLI CREATE-DATABASE and then shut it down to delete all data files, redo logs and control files.

As duplicate from active database wasn’t possible, I checked the backup of the source database and looked for the best SCN to get to. Once I had defined this I could start preparing my duplicate as following:

RMAN> run {
2> set until scn XXXXXXXXX;
3> allocate channel t1 type disk;
4> allocate channel t2 type disk;
5> allocate channel t3 type disk;
6> allocate channel t4 type disk;
7> allocate auxiliary channel a1 type disk;
8> allocate auxiliary channel a2 type disk;
9> allocate auxiliary channel a3 type disk;
10> allocate auxiliary channel a4 type disk;
11> duplicate target database to 'DBTST1';
12> }

As explained above the first little challenge here was that my target database is in Non-OMF and I wanted to make it “proper” on ODA which means OMF based structure.

Usually in a duplicate you would use db_file_name_convert and log_file_name_convert to change the path of the files. The issue with this solution is that it will not rename files except if you do it file per file.

The second option is to use in RMAN the command SET NEWNAME FOR DATAFILE. Here same “issue” I have to do it file per file and I had more than 180 files. Of course I could easily script it with SQLPlus but the list would be awful and not easy to crosscheck  if I’m missing anything. In addition doing the SETNAME requires to take some precaution as the file name still need to be OMF generated. This can be handled by providing followings string for the filename: 01_mf_<dbname>_%u.dbf

However I still wanted a more “elegant” way. The solution indeed was simply to use SET NEWNAME FOR DATABASE in conjunction with the TO NEW option. This automatically generates a new file name for all database files. The condition there is that following parameters are properly set on the auxiliary database:

  • db_create_file_dest
  • db_create_online_log_dest_n
    Configure from 1 to 5 depending on the number of members you want per redo log group
  • control_files
    Should be reset as new control file(s) name(s) will be generated

So I got finally the following RMAN script to run the duplicate:

RMAN> run {
2> set until scn XXXXXXXXX;
3> set newname for database to new;
4> allocate channel t1 type disk;
5> allocate channel t2 type disk;
6> allocate channel t3 type disk;
7> allocate channel t4 type disk;
8> allocate auxiliary channel a1 type disk;
9> allocate auxiliary channel a2 type disk;
10> allocate auxiliary channel a3 type disk;
11> allocate auxiliary channel a4 type disk;
12> duplicate target database to 'DBTEST1';
13> }

At this point I solved the Non-OMF to OMF conversion issue and almost got a copy of my database on the ODA.

Why almost? Simply because the duplicate failed :-)

Indeed this is fully “normal” and part of the process. As you know the last step in a duplicate is ALTER CLONE DATABASE OPEN RESETLOGS on the auxiliary database. However the database was still in 11.2.0.3 while the binaries on ODA are 11.2.0.4. The result was the duplicate crashing on last step as binaries are not compatible.

This didn’t really matter as the restore and recover operation worked meaning that my database was on a consistent point in time. Unfortunately simply opening the database with ALTER DATABASE OPEN RESETLOGS UPDATE did not work claiming that the database need media recovery

SQL> alter database open resetlogs upgrade;
alter database open resetlogs upgrade
*
ERROR at line 1:
ORA-01194: file 1 needs more recovery to be consistent
ORA-01110: data file 1:
'/u02/app/oracle/oradata/DBTST1_SITE1/DBTST1_SITE1/datafile/o1_mf_system_f98
0sv9j_.dbf'

My first idea here was to try a RECOVERY UNTIL CANCEL and then try again but I had nothing else than the last archive logs applied during the duplicate to recover :-(

Another situation where you have to open a database with RESETLOGS is when you restored the control files. So I chose to re-create the control file with an SQL script.

CREATE CONTROLFILE REUSE DATABASE "QUANTUMQ" RESETLOGS ARCHIVELOG
MAXLOGFILES 202
MAXLOGMEMBERS 5
MAXDATAFILES 200
MAXINSTANCES 1
MAXLOGHISTORY 33012
LOGFILE
GROUP 1 SIZE 1G BLOCKSIZE 512,
GROUP 2 SIZE 1G BLOCKSIZE 512,
GROUP 3 SIZE 1G BLOCKSIZE 512
DATAFILE
'/u02/app/oracle/oradata/DBTST1_SITE1/DBTST1_SITE1/datafile/o1_mf_system_f980sv9j_.dbf',
'/u02/app/oracle/oradata/DBTST1_SITE1/QUANTUMQ_SITE1/datafile/o1_mf_sysaux_f97x0l8m_.dbf',
'/u02/app/oracle/oradata/DBTST1_SITE1/QUANTUMQ_SITE1/datafile/o1_mf_undotbs1_f97w67k2_.dbf',
'/u02/app/oracle/oradata/DBTST1_SITE1/QUANTUMQ_SITE1/datafile/o1_mf_users_f97w67md_.dbf',
...
...
CHARACTER SET AL32UTF8;

The question  here was where to find the different information for my script as the BACKUP TO TRACE does not work in MOUNT status?

I used following statements

SQL> select type,RECORDS_TOTAL from v$controlfile_record_section;

TYPE			     RECORDS_TOTAL
---------------------------- -------------
DATABASE				 1		==> MAXINSTANCE (is obvious as I'm in single instance :-) )
CKPT PROGRESS				11
REDO THREAD				 8
REDO LOG			       202		==> MAXREDOLOG
DATAFILE			       200		==> MAXDATAFILE
FILENAME			      3056
TABLESPACE			       200
TEMPORARY FILENAME		       200
RMAN CONFIGURATION			50
LOG HISTORY			     33012		==> MAXLOGHISTORY
OFFLINE RANGE			       245
...
...

 

SQL> select group#,members,bytes/1024/1024,blocksize from v$log;

    GROUP#    MEMBERS BYTES/1024/1024  BLOCKSIZE
---------- ---------- --------------- ----------
	 1	    2		 1024	     512
	 2	    2		 1024	     512
	 3	    2		 1024	     512

 

SQL> select '''' || name || ''''||',' from v$datafile order by file# asc;

''''||NAME||''''||','
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
'/u02/app/oracle/oradata/DBTST1_SITE1/DBTST1_SITE1/datafile/o1_mf_system_f980sv9j_.dbf',
'/u02/app/oracle/oradata/DBTST1_SITE1/DBTST1_SITE1/datafile/o1_mf_sysaux_f97x0l8m_.dbf',
'/u02/app/oracle/oradata/DBTST1_SITE1/DBTST1_SITE1/datafile/o1_mf_undotbs1_f97w67k2_.dbf',
...
...

 

Once the control file was re-created the OPEN RESETLOGS was still failing with an ORA-01194. Hmm.. same issue.
Then I finally tried to recover the only files I had, the new empty redo logs

SQL> RECOVER DATABASE USING BACKUP CONTROLFILE UNTIL CANCEL;
ORA-00279: change 20833391792 generated at 02/26/2018 14:36:55 needed for
thread 1
ORA-00289: suggestion :
/u03/app/oracle/fast_recovery_area/DBTST1_SITE1/archivelog/2018_02_26/o1_mf_1_
1_%u_.arc
ORA-00280: change 20833391792 for thread 1 is in sequence #1


Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
/u03/app/oracle/redo/DBTST1_SITE1/onlinelog/o1_mf_1_f983clsn_.log
Log applied.
Media recovery complete.

Successful media recovery, great!

Finally I got my database open in RESET LOG mode!

SQL> alter database open resetlogs upgrade;

Database altered.

At this point I just had to follow the traditional upgrade process from 11.2.0.3 to 11.2.0.4. The last trap was to not forget creating the TEMP files for the tablespace.

SQL> alter tablespace TEMP add tempfile size 30G;

Tablespace altered.

SQL> alter tablespace TEMP add tempfile size 30G;

Tablespace altered.

 

Then the upgrade process is quite easy:

  1. Run utlu112i.sql as pre-upgrade script
  2. Run catupgrd.sql for the upgrade
  3. Restart the database
  4. Run utlu112is.sql as post-upgrade script and make sure no error is shown and all components are valid
  5. Run catuppst.sql to finalize the upgrade
  6. Run utlrp.sql to re-compile the invalid object

Should you forget to add the temp files in the temporary tablespace you will get multiple errors ORA-25153 “Temporay Tablespace Is Empty” (see note 843899.1). Basically the only thing to do in such a case is to add the temp files and re-run catupgrd.sql

Cheers!

 

Cet article ODA migration challenges: Non-OMF to OMF + 11.2.0.3 to 11.2.0.4 est apparu en premier sur Blog dbi services.

ORA-12547 TNS lost contact on SLES12 and oracle 11.2.0.x

Tue, 2018-02-27 08:16

On a client’s site, I have migrated successfully Oracle 12c databases from RedHat to SLES12 without any problem. I encountered a problem doing the same work with Oracle 11.2.0.3 and oracle 11.2.0.4 on SLES 12.

Once I have moved the database data files, redo logs, control files and spfile to the new server, when I try to startup the database, I receive the error message:

oracle@server:/u00/app/oracle/diag/ [db1] sq
SQL*Plus: Release 11.2.0.3.0 Production on Thu Feb 22 09:31:18 2018
Copyright (c) 1982, 2011, Oracle.  All rights reserved.
SQL> startup
ERROR:
ORA-12547: TNS:lost contact

In the alert.log file, we receive the following error:

Exception [type: SIGSEGV, SI_KERNEL(general_protection)] [ADDR:0x0] [PC:0x7F09646224A0, __lll_unlock_elision()+48] [flags: 0x0, count: 1]
ORA-12547 : TNS lost contact

I remembered that some months ago, I had quite a similar problem while installing Oracle Clusterware on SLES12 and after a quick search on Metalink I discovered the Metalink note 2297117.1 explaining that glibc in SuSE 12 makes use of a Hardware Lock Elision (HLE) available in newer Intel Processors.

The solution is equivalent to the one purposed in Metalink node 2270947.1 for the Oracle Clusterware problem with SLES12:

We have to add a line in /etc/ld.so.conf:

/lib64/noelision

And then we have to create a symbolic link as follows:

ln -s /lib64/noelision/libpthread-xxx $ORACLE_HOME/lib/libpthread.so.0

Then we edit /etc/ld.so.conf and we add the necessary line

/lib64/noelision
/usr/local/lib64
/usr/local/lib
include /etc/ld.so.conf.d/*.conf

And we create a symbolic link:

ln -s /lib64/noelision/libpthread-2.22.so $ORACLE_HOME/lib/libpthread.so.0

We can check:

oracle@server:/u00/app/oracle/product/11.2.0.3/dbhome_1/bin/ [db1] ldd sqlplus
        linux-vdso.so.1 (0x00007ffdf8513000)
        libsqlplus.so => /u00/app/oracle/product/11.2.0.3/dbhome_1/lib/libsqlplus.so (0x00007fb948303000)
        libclntsh.so.11.1 => /u00/app/oracle/product/11.2.0.3/dbhome_1/lib/libclntsh.so.11.1 (0x00007fb945944000)
        libnnz11.so => /u00/app/oracle/product/11.2.0.3/dbhome_1/lib/libnnz11.so (0x00007fb945577000)
        libdl.so.2 => /lib64/libdl.so.2 (0x00007fb945357000)
        libm.so.6 => /lib64/libm.so.6 (0x00007fb945059000)
        libpthread.so.0 => /u00/app/oracle/product/11.2.0.3/dbhome_1/lib/libpthread.so.0 (0x00007fb944e3c000)
        libnsl.so.1 => /lib64/libnsl.so.1 (0x00007fb944c24000)
        libc.so.6 => /lib64/libc.so.6 (0x00007fb944880000)
        libaio.so.1 => /lib64/libaio.so.1 (0x00007fb94467e000)
        /lib64/ld-linux-x86-64.so.2 (0x000055c70f7fc000)

Normally a simple reconnection should have solved the problem, but I had some semaphores and memory segments blocked.

I had to run sysresv and ipcrm to completely solve my problem:=)

The first connection did not solve the problem:

oracle@server:/u00/app/oracle/ [db1] sq
SQL*Plus: Release 11.2.0.3.0 Production on Thu Feb 22 09:45:41 2018
Copyright (c) 1982, 2011, Oracle.  All rights reserved.
Connected.
SQL> startup
ERROR:
ORA-12547: TNS:lost contact

I had to use the sysresv utility which provides instance status for a specified ORACLE_SID. This utility offers the possibility to identify resources to be deleted, especially if the instance is detected to be dead.

In my case, typically my instance was crashed and memory segments or semaphores were always present.

So running sysresv:

oracle@server:/u00/app/oracle/  [db1] sysresv
IPC Resources for ORACLE_SID "db1" :
Shared Memory:
ID              KEY
1278378019      0x00000000
1278410788      0x00000000
1278443557      0x186a9d18
Semaphores:
ID              KEY
5931036         0x7e57b354
Unable to determine if Oracle instance alivefor sid "db1"

Sysresv detects the shared memory and semaphores id. To delete the memory segments or the semaphores segments, we use ipcrm -m or ipcrm -s:

In my case I removed the memory segments:

ipcrm -m 1278378019
ipcrm -m 1278410788
ipcrm -m 1278443557

Finally after performing the listed actions, I was able to start the database without any problem :=)

 

 

 

 

 

 

 

 

 

Cet article ORA-12547 TNS lost contact on SLES12 and oracle 11.2.0.x est apparu en premier sur Blog dbi services.

Pages