Yann Neuhaus

Subscribe to Yann Neuhaus feed
dbi services technical blog
Updated: 1 hour 31 min ago

EDB BART 2.0 – How to upgrade and block level incremental backups

Mon, 2017-03-13 06:37

We already have some posts on how you can use EDB BART to backup and restore your PostgreSQL instances from one central server (EnterpriseDB Backup and Recovery Tool (BART), getting started with postgres plus advanced server (2) – setting up a backup and recovery server). The current version you can download from the EnterpriseDB website is 1.1 but version 2.0 is in beta and can be tested already. The main new feature is that BART 2.0 allows you to perform block level incremental backups of your PostgreSQL instances starting with PostgreSQL 9.5. In this post we’ll be looking at that feature and we’ll upgrade from BART 1.1. Lets go …

In my test environment I have BART 1.1 configured and working against an EDB Postgres Plus 9.6 instance:

postgres@edbbart:/home/postgres/ [pg950] bart -v
bart (EnterpriseDB) 1.1.1
postgres@edbbart:/home/postgres/ [pg950] bart SHOW-BACKUPS -s pg3
 SERVER NAME   BACKUP ID       BACKUP TIME               BACKUP SIZE   WAL(s) SIZE   WAL FILES   STATUS  
                                                                                                         
 pg3           1489078376562   2017-03-09 17:53:01 CET   61.93 MB      16.00 MB      1           active  

What do I need to do to upgrade to BART 2.0? Quite easy: Either you have access to the EDB yum repositories (this requires a username and a password) or just install the rpm you which can download here. I will install BART 2.0 from the EDB repositories:

postgres@edbbart:/home/postgres/ [pg950] sudo yum install edb-bart20

That’s it. The first thing to highlight is that the location where BART gets installed changed. Starting with version 2.0 BART will be installed under “/usr/edb/” instead of “/usr/edb-bart-***”.

postgres@edbbart:/home/postgres/ [pg950] ls -la /usr/edb/bart2.0/
total 16
drwxr-xr-x. 5 root root    59 Mar  9 17:56 .
drwxr-xr-x. 4 root root    32 Mar  9 17:56 ..
-rw-r--r--. 1 root root 15272 Feb 21 10:00 bart_license.txt
drwxr-xr-x. 2 root root    36 Mar  9 17:56 bin
drwxr-xr-x. 2 root root    21 Mar  9 17:56 etc
drwxr-xr-x. 2 root root    56 Mar  9 17:56 lib

We always recommend to work with links to point to you current BART installation to make life more easy. If you followed that rule all you have to do is:

postgres@edbbart:/home/postgres/ [pg950] sudo rm /usr/edb-bart
postgres@edbbart:/home/postgres/ [pg950] sudo ln -s /usr/edb/bart2.0/ /usr/edb-bart
postgres@edbbart:/home/postgres/ [pg950] bart -v
bart (EnterpriseDB) 2.0.0

… and you point to the new binaries. For sure you want to copy over your BART 1.1 configuration to the new location:

postgres@edbbart:/home/postgres/ [pg950] sudo cp /usr/edb-bart-1.1/etc/bart.cfg /usr/edb/bart2.0/etc/

From now on you are working with BART 2.0:

postgres@edbbart:/home/postgres/ [pg950] bart show-servers
SERVER NAME         : pg3
BACKUP FRIENDLY NAME: PG3_%year-%month-%dayT%hour:%minute
HOST NAME           : 192.168.22.37
USER NAME           : backupuser
PORT                : 4445
REMOTE HOST         : postgres@192.168.22.37
RETENTION POLICY    : 2016-12-09 18:02:23 CET
DISK UTILIZATION    : 189.93 MB
NUMBER OF ARCHIVES  : 8
ARCHIVE PATH        : /u90/pgdata/backup/pg3/archived_wals
ARCHIVE COMMAND     : scp %p postgres@edbbart:/u90/pgdata/backup/pg3/archived_wals/%f
XLOG METHOD         : fetch
WAL COMPRESSION     : enabled
TABLESPACE PATH(s)  : 
INCREMENTAL BACKUP  : DISABLED      <============================ NEW
DESCRIPTION         : "postgres PG3"

You’ll immediately notice that there is a new line in the output: “INCREMENTAL BACKUP”. Lets keep that for later. What I am most interested in right now is: Can I still backup my PostgreSQL instances with the new version of BART:

postgres@edbbart:/home/postgres/ [pg950] bart backup -s pg3
INFO:  creating backup for server 'pg3'
INFO:  backup identifier: '1489078978132'
63413/63413 kB (100%), 1/1 tablespace

INFO:  backup completed successfully
WARNING: log_timezone is not set in the server, using the local timezone information
INFO:  backup checksum: d318744e42819f76b137edf197a0b59b of base.tar
INFO:  
BACKUP DETAILS:
BACKUP STATUS: active
BACKUP IDENTIFIER: 1489078978132
BACKUP NAME: PG3_2017-03-09T18:02
BACKUP PARENT: none
BACKUP LOCATION: /u90/pgdata/backup/pg3/1489078978132
BACKUP SIZE: 61.93 MB
BACKUP FORMAT: tar
XLOG METHOD: fetch
BACKUP CHECKSUM(s): 1
 ChkSum                             File      
 d318744e42819f76b137edf197a0b59b   base.tar  
TABLESPACE(s): 0
START WAL LOCATION: 00000001000000000000000D
BACKUP METHOD: streamed
BACKUP FROM: master
START TIME: 2017-03-09 18:03:00 CET
STOP TIME: 2017-03-09 18:02:59 CET
TOTAL DURATION: 0 sec(s)

Looks fine. I did not need to change anything in the configuration file and can just start using BART 2.0. So, now I have two backups:

postgres@edbbart:/home/postgres/ [pg950] bart show-backups -s PG3
 SERVER NAME   BACKUP ID       BACKUP NAME            PARENT ID   BACKUP TIME               BACKUP SIZE   WAL(s) SIZE   WAL FILES   STATUS  
                                                                                                                                            
 pg3           1489078978132   PG3_2017-03-09T18:02   none        2017-03-09 18:02:59 CET   61.93 MB      16.00 MB      1           active  
 pg3           1489078376562   PG3_2017-03-09T17:52   none        2017-03-09 17:53:01 CET   61.93 MB      32.00 MB      2           active  

Time to do my first incremental backup. What do I need to do? First of all we need to disable wal compression as this is not supported with incremental backups:

postgres@edbbart:/home/postgres/ [pg950] cat /usr/edb-bart/etc/bart.cfg | grep -i COMPRE
wal_compression = disabled

The next thing to do is obviously to enable incremental backups:

postgres@edbbart:/home/postgres/ [pg950] cat /usr/edb-bart/etc/bart.cfg | grep increme
allow_incremental_backups = enabled

My complete BART configuration then looks like this:

[BART]
bart-host = postgres@edbbart
backup_path = /u90/pgdata/backup
pg_basebackup_path = /usr/edb/as9.6/bin/pg_basebackup
xlog-method = fetch
retention_policy = 3 MONTHS
logfile = /var/log/bart.logs

[PG3]
host = 192.168.22.37
port = 4445
user = backupuser
backup-name = PG3_%year-%month-%dayT%hour:%minute
remote-host = postgres@192.168.22.37
description = "postgres PG3"
wal_compression = disabled
allow_incremental_backups = enabled

Make sure that the show-servers command displays what you expect:

postgres@edbbart:/home/postgres/ [pg950] bart show-servers
SERVER NAME         : pg3
BACKUP FRIENDLY NAME: PG3_%year-%month-%dayT%hour:%minute
HOST NAME           : 192.168.22.37
USER NAME           : backupuser
PORT                : 4445
REMOTE HOST         : postgres@192.168.22.37
RETENTION POLICY    : 2016-12-10 16:53:05 CET
DISK UTILIZATION    : 283.86 MB
NUMBER OF ARCHIVES  : 10
ARCHIVE PATH        : /u90/pgdata/backup/pg3/archived_wals
ARCHIVE COMMAND     : scp %p postgres@edbbart:/u90/pgdata/backup/pg3/archived_wals/%f
XLOG METHOD         : fetch
WAL COMPRESSION     : disabled
TABLESPACE PATH(s)  : 
INCREMENTAL BACKUP  : ENABLED
DESCRIPTION         : "postgres PG3"

With BART 2.0 there is a new component called the “WAL scanner”. The binary is at the same location as BART itself:

postgres@edbbart:/home/postgres/ [pg950] ls -la /usr/edb/bart2.0/bin/bart-scanner 
-rwxr-xr-x. 1 root root 603832 Feb 21 10:01 /usr/edb/bart2.0/bin/bart-scanner

What does it do? The WAL scanner (as the name implies) scans the WALs that are archived from your PostgreSQL instances to the BART host for changes and then writes a so called “modified block map (MBM)” (this is the reason why wal compression is not supported). As this should happen as soon as the WALs arrive on the BART host the WAL scanner should run all the time. To start it in daemon mode:

echo $PATH / include bart bin
postgres@edbbart:/home/postgres/ [pg950] /usr/edb-bart/bin/bart-scanner --daemon
postgres@edbbart:/home/postgres/ [pg950] ps -ef | grep scanner
postgres  2473     1  0 16:58 ?        00:00:00 /usr/edb-bart/bin/bart-scanner --daemon
postgres  2474  2473  0 16:58 ?        00:00:00 /usr/edb-bart/bin/bart-scanner --daemon
postgres  2476  2185  0 16:58 pts/0    00:00:00 grep --color=auto scanner

This will fork one wal scanner process for each PostgreSQL instance configured for incremental backups.

Now we need a new full backup (the scanner did not run when we did the previous backups and the previous WAL files were compressed, so they can not be used):

postgres@edbbart:/home/postgres/ [pg950] bart backup -s PG3 --backup-name full0
INFO:  creating backup for server 'pg3'
INFO:  backup identifier: '1489161554590'
63416/63416 kB (100%), 1/1 tablespace

INFO:  backup completed successfully
WARNING: log_timezone is not set in the server, using the local timezone information
INFO:  backup checksum: f1c917edd0734c155ddace77bfbc3a17 of base.tar
INFO:  
BACKUP DETAILS:
BACKUP STATUS: active
BACKUP IDENTIFIER: 1489161554590
BACKUP NAME: full0
BACKUP PARENT: none
BACKUP LOCATION: /u90/pgdata/backup/pg3/1489161554590
BACKUP SIZE: 61.93 MB
BACKUP FORMAT: tar
XLOG METHOD: fetch
BACKUP CHECKSUM(s): 1
 ChkSum                             File      
 f1c917edd0734c155ddace77bfbc3a17   base.tar  

TABLESPACE(s): 0
START WAL LOCATION: 00000001000000000000000F
BACKUP METHOD: streamed
BACKUP FROM: master
START TIME: 2017-03-10 16:59:14 CET
STOP TIME: 2017-03-10 16:59:18 CET
TOTAL DURATION: 4 sec(s)

postgres@edbbart:/home/postgres/ [pg950] bart show-backups -s pg3
 SERVER NAME   BACKUP ID       BACKUP NAME            PARENT ID   BACKUP TIME               BACKUP SIZE   WAL(s) SIZE   WAL FILES   STATUS  
                                                                                                                                            
 pg3           1489161554590   full0                  none        2017-03-10 16:59:18 CET   61.93 MB      16.00 MB      1           active  
 pg3           1489078978132   PG3_2017-03-09T18:02   none        2017-03-09 18:02:59 CET   61.93 MB      32.00 MB      2           active  
 pg3           1489078376562   PG3_2017-03-09T17:52   none        2017-03-09 17:53:01 CET   61.93 MB      32.00 MB      2           active  

Lets do some work on the PostgreSQL which we are backing up to generate WAL:

(postgres@[local]:4445) [postgres] > create table t1 ( a int );
CREATE TABLE
Time: 2.972 ms
(postgres@[local]:4445) [postgres] > insert into t1 (a) values (generate_series(1,1000000));
INSERT 0 1000000
Time: 512.806 ms
(postgres@[local]:4445) [postgres] > select * from pg_switch_xlog();
 pg_switch_xlog 
----------------
 0/10C19D98
(1 row)

Do the first incremental backup based on the full backup from above:

postgres@edbbart:/home/postgres/ [pg950] bart backup -s pg3 -F p --parent 1489161554590 --backup-name incr1
INFO:  creating incremental backup for server 'pg3'
INFO:  checking mbm files /u90/pgdata/backup/pg3/archived_wals
INFO:  new backup id generated 1489161760522
INFO:  reading directory /u90/pgdata/backup/pg3/archived_wals
INFO:  all files processed
WARNING: log_timezone is not set in the server, using the local timezone information
INFO:  server has been successfully backed up

Now you can do another incremental backup but from the incremental backup taken above instead of the full backup:

postgres@edbbart:/home/postgres/ [pg950] bart BACKUP -s pg3 -F p --parent 1489161760522 --backup-name incr2
INFO:  creating incremental backup for server 'pg3'
INFO:  checking mbm files /u90/pgdata/backup/pg3/archived_wals
INFO:  new backup id generated 1489162048588
INFO:  reading directory /u90/pgdata/backup/pg3/archived_wals
INFO:  all files processed
WARNING: log_timezone is not set in the server, using the local timezone information
INFO:  server has been successfully backed up

So, what do we have now:

postgres@edbbart:/home/postgres/ [pg950] bart show-backups
 SERVER NAME   BACKUP ID       BACKUP NAME            PARENT ID       BACKUP TIME               BACKUP SIZE   WAL(s) SIZE   WAL FILES   STATUS  
                                                                                                                                                
 pg3           1489162048588   incr2                  1489161760522   2017-03-10 17:09:51 CET   45.98 MB                                active  
 pg3           1489161760522   incr1                  1489161554590   2017-03-10 17:06:10 CET   67.35 MB                                active  
 pg3           1489161554590   full0                  none            2017-03-10 16:59:18 CET   61.93 MB      112.00 MB     7           active  
 pg3           1489078978132   PG3_2017-03-09T18:02   none            2017-03-09 18:02:59 CET   61.93 MB      32.00 MB      2           active  
 pg3           1489078376562   PG3_2017-03-09T17:52   none            2017-03-09 17:53:01 CET   61.93 MB      32.00 MB      2           active  

Hm this does not really look an improvement. The first incremental backup is even larger than the full backup it is based on. The second one is a bit smaller but as I did not change anything on the source database between the two incremental backups my expectation was that at least the second incremental backup should use far less space. Lets check it on disk:

postgres@edbbart:/u90/pgdata/backup/pg3/ [pg950] pwd
/u90/pgdata/backup/pg3
postgres@edbbart:/u90/pgdata/backup/pg3/ [pg950] du -sh 1489161554590
62M	1489161554590
postgres@edbbart:/u90/pgdata/backup/pg3/ [pg950] du -sh 1489161760522
68M	1489161760522
postgres@edbbart:/u90/pgdata/backup/pg3/ [pg950] du -sh 1489162048588
47M	1489162048588

At least this seems to be consistent. Lets do another one:

postgres@edbbart:/u90/pgdata/backup/pg3/ [pg950] bart backup -s pg3 -F p --parent 1489162048588 --backup-name incr3
INFO:  creating incremental backup for server 'pg3'
INFO:  checking mbm files /u90/pgdata/backup/pg3/archived_wals
INFO:  new backup id generated 1489224698357
INFO:  reading directory /u90/pgdata/backup/pg3/archived_wals
INFO:  all files processed
WARNING: log_timezone is not set in the server, using the local timezone information

postgres@edbbart:/u90/pgdata/backup/pg3/ [pg950] bart show-backups
 SERVER NAME   BACKUP ID       BACKUP NAME            PARENT ID       BACKUP TIME               BACKUP SIZE   WAL(s) SIZE   WAL FILES   STATUS  
                                                                                                                                                
 pg3           1489224698357   incr3                  1489162048588   2017-03-11 10:31:41 CET   16.58 MB                                active  
 pg3           1489162048588   incr2                  1489161760522   2017-03-10 17:09:51 CET   45.98 MB                                active  
 pg3           1489161760522   incr1                  1489161554590   2017-03-10 17:06:10 CET   67.35 MB                                active  
 pg3           1489161554590   full0                  none            2017-03-10 16:59:18 CET   61.93 MB      160.00 MB     10          active  
 pg3           1489078978132   PG3_2017-03-09T18:02   none            2017-03-09 18:02:59 CET   61.93 MB      32.00 MB      2           active  
 pg3           1489078376562   PG3_2017-03-09T17:52   none            2017-03-09 17:53:01 CET   61.93 MB      32.00 MB      2           active  

Ok, now we can see a real improvement (not sure why there is no improvement for the first ones, need to do more testing). Restores should work as well (I’ll restore the last incremental backup):

postgres@edbbart:/home/postgres/ [pg950] /usr/edb/bart2.0/bin/bart restore -s pg3 -i incr3 -p /var/tmp/restore_test/  -r postgres@localhost
INFO:  restoring incremental backup 'incr3' of server 'pg3'
INFO:  base backup restored
ERROR: failed to overlay modified blocks
command failed with exit code 127
bash: bart: command not found

Hm, what’s that? Why does bart not find bart? Can I restore the full backup?

postgres@edbbart:/home/postgres/ [pg950] /usr/edb/bart2.0/bin/bart restore -s pg3 -i full0 -p /var/tmp/restore_test/  -r postgres@localhost
INFO:  restoring backup 'full0' of server 'pg3'
INFO:  base backup restored
INFO:  archiving is disabled
postgres@edbbart:/home/postgres/ [pg950] ls /var/tmp/restore_test/
backup_label  global        pg_dynshmem    pg_log        pg_notify    pg_snapshots  pg_subtrans  PG_VERSION            postgresql.conf
base          pg_clog       pg_hba.conf    pg_logical    pg_replslot  pg_stat       pg_tblspc    pg_xlog               tablespace_map
dbms_pipe     pg_commit_ts  pg_ident.conf  pg_multixact  pg_serial    pg_stat_tmp   pg_twophase  postgresql.auto.conf

Looks fine, the first incremental:

postgres@edbbart:/home/postgres/ [pg950] /usr/edb/bart2.0/bin/bart restore -s pg3 -i incr1 -p /var/tmp/restore_test/  -r postgres@localhost
INFO:  restoring incremental backup 'incr1' of server 'pg3'
INFO:  base backup restored
ERROR: failed to overlay modified blocks
command failed with exit code 127
bash: bart: command not found

No. Time for the “–debug” mode:

postgres@edbbart:/home/postgres/ [pg950] rm -rf /var/tmp/restore_test/*
postgres@edbbart:/home/postgres/ [pg950] /usr/edb/bart2.0/bin/bart --debug restore -s pg3 -i incr1 -p /var/tmp/restore_test/  -r postgres@localhost
DEBUG: Server: Global, Now: 2017-03-13 12:12:24 CET, RetentionWindow: 7776000 (secs) ==> 2160 hour(s)
DEBUG: Exec Command: ssh -o BatchMode=yes -o PasswordAuthentication=no postgres@localhost exit
DEBUG: Exec Command: ssh -o BatchMode=yes -o PasswordAuthentication=no postgres@localhost test -d /var/tmp/restore_test && echo "exists"
DEBUG: Exec Command: ssh -o BatchMode=yes -o PasswordAuthentication=no postgres@localhost touch /var/tmp/restore_test/tmp-incr1 && echo "exists"
DEBUG: Exec Command: ssh -o BatchMode=yes -o PasswordAuthentication=no postgres@localhost rm -f /var/tmp/restore_test/tmp-incr1
DEBUG: Exec Command: ssh -o BatchMode=yes -o PasswordAuthentication=no postgres@localhost ls -A /var/tmp/restore_test
INFO:  restoring incremental backup 'incr1' of server 'pg3'
DEBUG: restoring backup: 1489161554590
DEBUG: restoring backup to /var/tmp/restore_test
DEBUG: restore command: cat /u90/pgdata/backup/pg3/1489161554590/base.tar | ssh -o BatchMode=yes -o PasswordAuthentication=no postgres@localhost " tar -C /var/tmp/restore_test -xf - "
DEBUG: Exec Command: cat /u90/pgdata/backup/pg3/1489161554590/base.tar | ssh -o BatchMode=yes -o PasswordAuthentication=no postgres@localhost " tar -C /var/tmp/restore_test -xf - "
INFO:  base backup restored
DEBUG: backup '1489161554590' restored to '/var/tmp/restore_test'
DEBUG: restoring backup: 1489161760522
DEBUG: Exec Command: cd /u90/pgdata/backup/pg3/1489161760522/base && tar -cf - * | ssh -o BatchMode=yes -o PasswordAuthentication=no postgres@localhost " tar -C /var/tmp/restore_test -xf - "
DEBUG: Exec Command: cd . && scp -o BatchMode=yes -o PasswordAuthentication=no  -r /u90/pgdata/backup/pg3/1489161760522/base/../1489161760522.cbm postgres@localhost:/var/tmp/restore_test
DEBUG: Exec Command: ssh -o BatchMode=yes -o PasswordAuthentication=no postgres@localhost "bart --debug apply-incremental --source /var/tmp/restore_test --tsmap '' --backup 1489161760522"
ERROR: failed to overlay modified blocks
command failed with exit code 127
bash: bart: command not found

The command that does not seem to work is this one:

DEBUG: Exec Command: ssh -o BatchMode=yes -o PasswordAuthentication=no postgres@localhost "bart --debug apply-incremental --source /var/tmp/restore_test --tsmap '' --backup 1489161760522"

Lets do that manually:

ostgres@edbbart:/home/postgres/ [pg950] bart --debug apply-incremental --source /var/tmp/restore_test --tsmap '' --backup 1489161760522
DEBUG: loading '/var/tmp/restore_test/1489161760522.cbm' MBM/CBM file 
DEBUG: mbm chksum: old f60a435d4d3709302e5b7acc3f8d8ecb, new f60a435d4d3709302e5b7acc3f8d8ecb
DEBUG: applying incremental 1489161760522 (pid 3686)
DEBUG: copying blocks for relation from: /var/tmp/restore_test/base/15184/1247.blk to /var/tmp/restore_test/base/15184/1247
src size: 16384, dst size: 139264
DEBUG: block copy complete: from /var/tmp/restore_test/base/15184/1247.blk, to /var/tmp/restore_test/base/15184/1247
src size: 16384, dst size: 139264
DEBUG: copying blocks for relation from: /var/tmp/restore_test/base/15184/1249.blk to /var/tmp/restore_test/base/15184/1249
src size: 8192, dst size: 753664
DEBUG: block copy complete: from /var/tmp/restore_test/base/15184/1249.blk, to /var/tmp/restore_test/base/15184/1249
src size: 8192, dst size: 753664
DEBUG: copying blocks for relation from: /var/tmp/restore_test/base/15184/1259.blk to /var/tmp/restore_test/base/15184/1259
src size: 8192, dst size: 139264
DEBUG: block copy complete: from /var/tmp/restore_test/base/15184/1259.blk, to /var/tmp/restore_test/base/15184/1259
src size: 8192, dst size: 139264
DEBUG: copying relation node files from (src): /var/tmp/restore_test/base/15184/16386.all, to (dst): /var/tmp/restore_test/base/15184/16386
src size: 36249600, dst size: 0
DEBUG: Exec Command: cp --preserve /var/tmp/restore_test/base/15184/16386.all /var/tmp/restore_test/base/15184/16386
DEBUG: copying blocks for relation from: /var/tmp/restore_test/base/15184/2608.blk to /var/tmp/restore_test/base/15184/2608
src size: 16384, dst size: 663552
DEBUG: block copy complete: from /var/tmp/restore_test/base/15184/2608.blk, to /var/tmp/restore_test/base/15184/2608
src size: 16384, dst size: 663552
DEBUG: copying blocks for relation from: /var/tmp/restore_test/base/15184/2658.blk to /var/tmp/restore_test/base/15184/2658
src size: 8192, dst size: 204800
DEBUG: block copy complete: from /var/tmp/restore_test/base/15184/2658.blk, to /var/tmp/restore_test/base/15184/2658
src size: 8192, dst size: 204800
DEBUG: copying blocks for relation from: /var/tmp/restore_test/base/15184/2659.blk to /var/tmp/restore_test/base/15184/2659
src size: 8192, dst size: 139264
DEBUG: block copy complete: from /var/tmp/restore_test/base/15184/2659.blk, to /var/tmp/restore_test/base/15184/2659
src size: 8192, dst size: 139264
DEBUG: copying blocks for relation from: /var/tmp/restore_test/base/15184/2662.blk to /var/tmp/restore_test/base/15184/2662
src size: 8192, dst size: 32768
DEBUG: block copy complete: from /var/tmp/restore_test/base/15184/2662.blk, to /var/tmp/restore_test/base/15184/2662
src size: 8192, dst size: 32768
DEBUG: copying blocks for relation from: /var/tmp/restore_test/base/15184/2663.blk to /var/tmp/restore_test/base/15184/2663
src size: 8192, dst size: 57344
DEBUG: block copy complete: from /var/tmp/restore_test/base/15184/2663.blk, to /var/tmp/restore_test/base/15184/2663
src size: 8192, dst size: 57344
DEBUG: copying blocks for relation from: /var/tmp/restore_test/base/15184/2673.blk to /var/tmp/restore_test/base/15184/2673
src size: 16384, dst size: 540672
DEBUG: block copy complete: from /var/tmp/restore_test/base/15184/2673.blk, to /var/tmp/restore_test/base/15184/2673
src size: 16384, dst size: 540672
DEBUG: copying blocks for relation from: /var/tmp/restore_test/base/15184/2674.blk to /var/tmp/restore_test/base/15184/2674
src size: 24576, dst size: 557056
DEBUG: block copy complete: from /var/tmp/restore_test/base/15184/2674.blk, to /var/tmp/restore_test/base/15184/2674
src size: 24576, dst size: 557056
DEBUG: copying blocks for relation from: /var/tmp/restore_test/base/15184/2703.blk to /var/tmp/restore_test/base/15184/2703
src size: 8192, dst size: 40960
DEBUG: block copy complete: from /var/tmp/restore_test/base/15184/2703.blk, to /var/tmp/restore_test/base/15184/2703
src size: 8192, dst size: 40960
DEBUG: copying blocks for relation from: /var/tmp/restore_test/base/15184/2704.blk to /var/tmp/restore_test/base/15184/2704
src size: 16384, dst size: 57344
DEBUG: block copy complete: from /var/tmp/restore_test/base/15184/2704.blk, to /var/tmp/restore_test/base/15184/2704
src size: 16384, dst size: 57344
DEBUG: copying blocks for relation from: /var/tmp/restore_test/base/15184/3455.blk to /var/tmp/restore_test/base/15184/3455
src size: 8192, dst size: 49152
DEBUG: block copy complete: from /var/tmp/restore_test/base/15184/3455.blk, to /var/tmp/restore_test/base/15184/3455
src size: 8192, dst size: 49152

.. and that works. So, next test:

postgres@edbbart:/home/postgres/ [pg950] ssh postgres@localhost "bart -version"
bash: bart: command not found
postgres@edbbart:/home/postgres/ [pg950] ssh postgres@localhost "which bart"
which: no bart in (/usr/local/bin:/usr/bin)

Here we have the issue. As we do not get a login shell for these types of ssh commands:

postgres@edbbart:/home/postgres/ [pg950] echo "PATH=\$PATH:/usr/edb-bart/bin/" >> ~/.bashrc
postgres@edbbart:/home/postgres/ [pg950] echo "export PATH" >> ~/.bashrc
postgres@edbbart:/home/postgres/ [pg950] ssh postgres@localhost "which bart"
/usr/edb-bart/bin/bart

Try again:

postgres@edbbart:/home/postgres/ [pg950] rm -rf /var/tmp/restore_test/*
postgres@edbbart:/home/postgres/ [pg950] /usr/edb/bart2.0/bin/bart restore -s pg3 -i incr3 -p /var/tmp/restore_test/  -r postgres@localhostINFO:  restoring incremental backup 'incr3' of server 'pg3'
INFO:  base backup restored
INFO:  archiving is disabled

… and it works. But: This does mean that you have to install BART on all the hosts where you have a PostgreSQL instance if you want to restore to the same host where the instance is running. Not sure if I really like that (or I completely missed something) …

 

Cet article EDB BART 2.0 – How to upgrade and block level incremental backups est apparu en premier sur Blog dbi services.

Google Cloud Platform instances and Oracle database performance

Sun, 2017-03-12 13:00

When it comes to choose a cloud instance to run Oracle Database, you want to be able to run your workload on the minimum CPU cores. This is why in a previous post I measured how many logical reads per seconds can be achieved with a SLOB workload, on AWS which is often the first considered, and will probably do it on Azure in the future. I did the same on the Oracle Cloud which is the only one where Oracle make it easy to run an license the Oracle Database.

CaptureGCPlogoPNGToday, I’ve seen that the Google Cloud offers a 1 year trial (1 year instead of 3 months) with 300$ free credits where you are sure that your credit card will not be debited before you accept it. Of course I wanted to play with it and tested the IaaS instance types with same method: SLOB.

Licencing No-Go

I said ‘play with it’ because you cannot consider Google Cloud as a platform for Oracle Database. Not because of the Google Cloud: it is on the top 3 cloud vendors with AWS and Azure. But just because Oracle doesn’t want to:

  • It is not an ‘authorized cloud’ where Oracle accepts the virtual cores as a metric
  • It is not running with an hypervisor where Oracle accepts the virtual cores as a metric
  • It is not the Oracle Cloud where Oracle accepts to count the virtual cores, and even apply the core factor

So, the sad truth is that if you want to run an Oracle Database on the Google Cloud, you may have to pay Oracle Licences to cover the whole physical infrastructure of Google data center… This is clearly a No-Go for processor metrics licenses. You may think about NUP+ licensing where the metric is not the processors, but the number of users. This was possible for Standard Edition for 11g (and for 12.1.0.1 but this is out of support now). But with 12.1.0.2 they changed the Standard Edition rules and, even when the metric is the number of users, you have to count the number of servers. This is again a No-Go for a public cloud environment.

So let’s play in the hope that one day the rules will change. For the moment, they think that this strategy will push the current Oracle Database users to the Oracle Cloud. They may realize one day that it increases the counterflow of users going to Open Source databases to run away from those arbitrary licensing rules.

Machine types

There are 3 types of processors available for Europe (datacenter in Brussels) Sandy Bridge, Ivy Bridge and Haswell. The regions are detailed in regions-zones documentation and there is more detail in machine_types.

Ivy Brige

Here I choose europe-west1-c and created an VM running RHEL7 on 4 vCPU Ivy Brige:

CaptureGCP00Ivy

Note that the price is the same for all kind of processors within the same region.

SLOB

Here is cached SLOB (UPDATE_PCT=0 RUN_TIME=300 SCALE=80M WORK_UNIT=64)

Load Profile Per Second Per Transaction Per Exec Per Call
~~~~~~~~~~~~~~~ --------------- --------------- --------- ---------
DB Time(s): 1.0 30.3 0.00 4.74
DB CPU(s): 1.0 29.4 0.00 4.59
Redo size (bytes): 14,254.9 433,831.2
Logical read (blocks): 674,052.6 20,513,983.3

This is 674 kLIOPS per vCPU.

Load Profile Per Second Per Transaction Per Exec Per Call
~~~~~~~~~~~~~~~ --------------- --------------- --------- ---------
DB Time(s): 2.0 25.0 0.00 7.24
DB CPU(s): 2.0 24.6 0.00 7.11
Logical read (blocks): 1,377,553.4 17,267,173.0

This is 689 kLIOPS per vCPU. We are running on two cores

Now, let’s see if those 4 VCPUs are threads or cores.

Load Profile Per Second Per Transaction Per Exec Per Call
~~~~~~~~~~~~~~~ --------------- --------------- --------- ---------
DB Time(s): 3.0 1.6 0.00 12.10
DB CPU(s): 2.9 1.6 0.00 11.93
Logical read (blocks): 1,493,775.4 815,084.0

Load Profile Per Second Per Transaction Per Exec Per Call
~~~~~~~~~~~~~~~ --------------- --------------- --------- ---------
DB Time(s): 4.0 54.6 0.00 13.80
DB CPU(s): 4.0 54.0 0.00 13.64
Logical read (blocks): 1,780,041.3 24,329,604.1

This is 498 kLIOPS per vCPU with 3 threads and 445 kLIOPS per vCPU with 4 threads. We are obviously running on 2 hyper-threaded cores, where hyper-threading here gives only 30% additional LIOPS.

When going further, the performance will decrease:

Load Profile Per Second Per Transaction Per Exec Per Call
~~~~~~~~~~~~~~~ --------------- --------------- --------- ---------
DB Time(s): 5.0 48.5 0.00 15.81
DB CPU(s): 3.0 29.0 0.00 9.46
Logical read (blocks): 1,460,029.2 14,233,024.3

It is interesting to see that when running 5 sessions on 4 vCPU then actually 3 threads only are used on average.

Load Profile Per Second Per Transaction Per Exec Per Call
~~~~~~~~~~~~~~~ --------------- --------------- --------- ---------
DB Time(s): 6.0 72.1 0.00 16.24
DB CPU(s): 3.0 36.5 0.00 8.22
Logical read (blocks): 1,458,749.6 17,651,628.4

Load Profile Per Second Per Transaction Per Exec Per Call
~~~~~~~~~~~~~~~ --------------- --------------- --------- ---------
DB Time(s): 7.0 72.5 0.00 20.42
DB CPU(s): 3.0 31.5 0.00 8.86
Logical read (blocks): 1,445,423.3 15,073,622.1

Load Profile Per Second Per Transaction Per Exec Per Call
~~~~~~~~~~~~~~~ --------------- --------------- --------- ---------
DB Time(s): 8.0 82.9 0.00 18.93
DB CPU(s): 3.0 31.4 0.00 7.17
Logical read (blocks): 1,436,355.2 14,986,038.9

Those 8 runs are visible on the CPU usage graph from the VM dashboard where maximum CPU usage is when running 4 sessions on those 4 threads.

CaptureGCP00Ivy001

Here is the CPU description that we can see from lscpu and /proc/cpuinfo where we can see that we are virtualized with KVM:

InkedCaptureGCP00Ivy002_LI

For tty output I usually prefer to paste the text rather than a screenshot, but I wanted to show the ssh screen that you get on your browser with a simple click. This is very convenient. We are really in easy and fast provisioning here.

Haswell

Just to compare, here is the same running on the Haswell machine type.

CaptureGCP00Haswel

Load Profile Per Second Per Transaction Per Exec Per Call
~~~~~~~~~~~~~~~ --------------- --------------- --------- ---------
DB Time(s): 1.0 25.4 0.00 5.53
DB CPU(s): 1.0 24.4 0.00 5.32
Logical read (blocks): 598,274.2 15,275,586.5
 
Load Profile Per Second Per Transaction Per Exec Per Call
~~~~~~~~~~~~~~~ --------------- --------------- --------- ---------
DB Time(s): 2.0 27.3 0.00 8.46
DB CPU(s): 2.0 26.8 0.00 8.29
Logical read (blocks): 1,155,681.9 15,814,666.6
 
Load Profile Per Second Per Transaction Per Exec Per Call
~~~~~~~~~~~~~~~ --------------- --------------- --------- ---------
DB Time(s): 3.0 34.7 0.00 12.34
DB CPU(s): 3.0 34.2 0.00 12.18
Logical read (blocks): 1,300,659.1 15,052,978.1
 
Load Profile Per Second Per Transaction Per Exec Per Call
~~~~~~~~~~~~~~~ --------------- --------------- --------- ---------
DB Time(s): 4.0 54.6 0.00 13.80
DB CPU(s): 3.9 53.8 0.00 13.61
Logical read (blocks): 1,541,843.0 21,098,158.6

Sandy Brige

here is the same running on the Sandy Brige machine type.

CaptureGCP00Sandy


Load Profile Per Second Per Transaction Per Exec Per Call
~~~~~~~~~~~~~~~ --------------- --------------- --------- ---------
DB Time(s): 1.0 25.2 0.00 2.16
DB CPU(s): 1.0 25.0 0.00 2.14
Logical read (blocks): 668,393.8 16,935,651.8
 
Load Profile Per Second Per Transaction Per Exec Per Call
~~~~~~~~~~~~~~~ --------------- --------------- --------- ---------
DB Time(s): 2.0 23.1 0.00 11.13
DB CPU(s): 2.0 22.6 0.00 10.86
Logical read (blocks): 1,204,487.9 13,938,751.4
 
Load Profile Per Second Per Transaction Per Exec Per Call
~~~~~~~~~~~~~~~ --------------- --------------- --------- ---------
DB Time(s): 3.0 37.5 0.00 13.65
DB CPU(s): 3.0 36.9 0.00 13.43
Logical read (blocks): 1,383,602.0 17,334,975.8
 
Load Profile Per Second Per Transaction Per Exec Per Call
~~~~~~~~~~~~~~~ --------------- --------------- --------- ---------
DB Time(s): 4.0 50.0 0.00 16.23
DB CPU(s): 3.9 49.3 0.00 15.98
Logical read (blocks): 1,656,216.2 20,782,477.0

So what?

Whether you like it or not, you cannot run Oracle Database on the Google Cloud because Oracle licensing rules are set to make life difficult for running Oracle Database outside of the Oracle Cloud. For performance, it is still the Oracle IaaS that shows the best performance for this test. But of course, there are lot of other points to consider. You don’t run only the database, but application should be located in the same data center.

The Google Cloud Platform looks good and I’ll probably use my Google Cloud trial to test Spanner, and maybe the new PostgreSQL service. I love Oracle Database technology. But the customers going to a public cloud will tend migrate to database systems which can run on any virtualized environments without certification, support and licensing risks.

 

Cet article Google Cloud Platform instances and Oracle database performance est apparu en premier sur Blog dbi services.

Is it the right time to move to MariaDB now?

Sat, 2017-03-11 03:06

Do you think about adding MariaDB to your database landscape or do you even think about replacing other database systems with MariaDB? Then you should register here. We will be doing a webinar with MariaDB on Thursday, the 23rd of March. The title is: “Why it is a good time to move to MariaDB now” and after a short introduction of dbi services and what we do in the open source area Bruno Šimić (Sales Engineer, MariaDB Corporation) will highlight why the time to do so is now.

mariadb_webinar

Hope to see you registered.

 

Cet article Is it the right time to move to MariaDB now? est apparu en premier sur Blog dbi services.

12cR2: TNS_ADMIN in env.ora

Fri, 2017-03-10 04:37

The network files (sqlnet.ora, tnsnames.ora, lsnrctl.ora) are read by default from ORACLE_HOME/network/admin but you may have several Oracle installations, and want only one place for those files. Then you can use TNS_ADMIN environment variable. But are you sure that it is always set when starting the database? the listener? the client? They must be consistent (see https://blog.dbi-services.com/oracle-12cr2-plsql-new-feature-tnsping-from-the-database/). Then what we do for the cases where TNS_ADMIN is not set: define symbolic links from the ORACLE_HOME to the common location. It would be better to just change the default location and this is what can be done in 12.2 with env.ora

By default, the $ORACLE_HOME/env.ora is empty. There are only comments.

If you run any oracle 12.2 client the $ORACLE_HOME/ora.env will be read. If nothing is set here, then the default $ORACLE_HOME/network/admin location is read.

$ORACLE_HOME/network/admin

[oracle@VM104 tmp]$ strace -e trace=file tnsping localhost 2>&1 | grep -E "[.]ora"
lstat("/u01/app/oracle/product/12.2.0/dbhome_1/env.ora", {st_mode=S_IFREG|0644, st_size=852, ...}) = 0
open("/u01/app/oracle/product/12.2.0/dbhome_1/env.ora", O_RDONLY) = 3
access("/u01/app/oracle/product/12.2.0/dbhome_1/network/admin/sqlnet.ora", F_OK) = 0
open("/u01/app/oracle/product/12.2.0/dbhome_1/network/admin/sqlnet.ora", O_RDONLY) = 3
access("/u01/app/oracle/product/12.2.0/dbhome_1/network/admin/sqlnet.ora", F_OK) = 0
open("/u01/app/oracle/product/12.2.0/dbhome_1/network/admin/sqlnet.ora", O_RDONLY) = 3
access("/home/oracle/.tnsnames.ora", F_OK) = -1 ENOENT (No such file or directory)
access("/etc/tnsnames.ora", F_OK) = -1 ENOENT (No such file or directory)
access("/u01/app/oracle/product/12.2.0/dbhome_1/network/admin/tnsnames.ora", F_OK) = -1 ENOENT (No such file or directory)

Here I have a sqlnet.ora but no tnsnames.ora so the next locations that are searched are ~/.tnsnames.ora and /etc/tnsnames.ora

TNS_ADMIN environment variable

If I set the environment variable TNS_ADMIN to /tmp then

[oracle@VM104 tmp]$ TNS_ADMIN=/tmp strace -e trace=file tnsping localhost 2>&1 | grep -E "[.]ora"
lstat("/u01/app/oracle/product/12.2.0/dbhome_1/env.ora", {st_mode=S_IFREG|0644, st_size=852, ...}) = 0
open("/u01/app/oracle/product/12.2.0/dbhome_1/env.ora", O_RDONLY) = 3
access("/tmp/sqlnet.ora", F_OK) = 0
open("/tmp/sqlnet.ora", O_RDONLY) = 3
access("/tmp/sqlnet.ora", F_OK) = 0
open("/tmp/sqlnet.ora", O_RDONLY) = 3
access("/home/oracle/.tnsnames.ora", F_OK) = -1 ENOENT (No such file or directory)
access("/tmp/tnsnames.ora", F_OK) = -1 ENOENT (No such file or directory)
access("/etc/tnsnames.ora", F_OK) = -1 ENOENT (No such file or directory)
access("/u01/app/oracle/product/12.2.0/dbhome_1/network/admin/tnsnames.ora", F_OK) = -1 ENOENT (No such file or directory)

The directory defined in TNS_ADMIN is searched first

TNS_ADMIN in env.ora

I have added the TNS_ADMIN=/tmp in the env.ora:

[oracle@VM104 tmp]$ tail -3 $ORACLE_HOME/env.ora
# Default: $ORACLE_HOME/network/admin
#
TNS_ADMIN=/tmp

When I run tnsping without setting any environment variable, I have exactly the same as before:


[oracle@VM104 tmp]$ strace -e trace=file tnsping localhost 2>&1 | grep -E "[.]ora"
lstat("/u01/app/oracle/product/12.2.0/dbhome_1/env.ora", {st_mode=S_IFREG|0644, st_size=867, ...}) = 0
open("/u01/app/oracle/product/12.2.0/dbhome_1/env.ora", O_RDONLY) = 3
access("/tmp/sqlnet.ora", F_OK) = 0
open("/tmp/sqlnet.ora", O_RDONLY) = 3
access("/tmp/sqlnet.ora", F_OK) = 0
open("/tmp/sqlnet.ora", O_RDONLY) = 3
access("/home/oracle/.tnsnames.ora", F_OK) = -1 ENOENT (No such file or directory)
access("/tmp/tnsnames.ora", F_OK) = -1 ENOENT (No such file or directory)
access("/etc/tnsnames.ora", F_OK) = -1 ENOENT (No such file or directory)
access("/u01/app/oracle/product/12.2.0/dbhome_1/network/admin/tnsnames.ora", F_OK) = -1 ENOENT (No such file or directory)

The good thing about it is that the setting is centralized for all binaries running from this ORACLE_HOME set.

Both

However the setting in environment has priority over the env.ora one:

[oracle@VM104 tmp]$ TNS_ADMIN=/var/tmp strace -e trace=file tnsping localhost 2>&1 | grep -E "[.]ora"
lstat("/u01/app/oracle/product/12.2.0/dbhome_1/env.ora", {st_mode=S_IFREG|0644, st_size=867, ...}) = 0
open("/u01/app/oracle/product/12.2.0/dbhome_1/env.ora", O_RDONLY) = 3
access("/var/tmp/sqlnet.ora", F_OK) = 0
open("/var/tmp/sqlnet.ora", O_RDONLY) = 3
access("/var/tmp/sqlnet.ora", F_OK) = 0
open("/var/tmp/sqlnet.ora", O_RDONLY) = 3
access("/home/oracle/.tnsnames.ora", F_OK) = -1 ENOENT (No such file or directory)
access("/var/tmp/tnsnames.ora", F_OK) = -1 ENOENT (No such file or directory)
access("/etc/tnsnames.ora", F_OK) = -1 ENOENT (No such file or directory)
access("/u01/app/oracle/product/12.2.0/dbhome_1/network/admin/tnsnames.ora", F_OK) = -1 ENOENT (No such file or directory)

So the recommandation if you want to use the env.ora is not to set TNS_ADMIN, especially when starting the listener or the database, to be sure that the same environment is always used. Final note: I’ve not seen it in the documentation so if you rely on it for critical environment, better to validate with support.

 

Cet article 12cR2: TNS_ADMIN in env.ora est apparu en premier sur Blog dbi services.

Oracle 12cR2, RAC, Cloud, ACFS, PDB thin clones and asmadmin

Thu, 2017-03-09 15:21

In the Oracle Public Cloud, fast provisioning gets all its meaning when creating a RAC database service: in one hour you can get an operational highly available multitenant database. You can even create it in Data Guard for Disaster Recovery. Now, Oracle is pushing ACFS to store the datafiles rather than direct ASM. Especially in multitenant because a great feature is thin cloning: CREATE PLUGGABLE DATABASE AS SNAPSHOT COPY. However, I encountered an error when I tried it for the first time.

TDE keystore

SQL> create pluggable database pdb2 from pdb1 snapshot copy;
create pluggable database pdb2 from pdb1 snapshot copy
*
ERROR at line 1:
ORA-28357: password required to open the wallet

Oh yes, in the cloud all tablespaces are encrypted. In 12.2 we can put the keystore password in the command:

ORA-17517

SQL> create pluggable database pdb2 from pdb1 snapshot copy keystore identified by "Ach1z0#d" ;
 
create pluggable database pdb2 from pdb1 snapshot copy keystore identified by "Ach1z0#d"
*
ERROR at line 1:
ORA-65169: error encountered while attempting to copy file
/u02/app/oracle/oradata/CDB1/49FC0C40FCF62C05E053BAF6C40A9DD6/datafile/o1_mf_users_dcr220sd_.dbf
ORA-17517: Database cloning using storage snapshot failed on file
8:/u02/app/oracle/oradata/CDB1/49FC0C40FCF62C05E053BAF6C40A9DD6/datafile/o1_mf_users_dcr220sd_.dbf

Here we are. The call to the storage snapshot feature has failed. Usually the errors coming from OS calls are accompanied with additional information but not here.

alert.log and trace

In alert.log, the error is displayed with reference to some other trace files:

2017-03-05 16:24:38.935000 +00:00
create pluggable database pdb2 from pdb1 snapshot copy keystore identified by *
AUDSYS.AUD$UNIFIED (SQL_TEXT) - CLOB populated
WARNING: Detected that PDB needs to import keys from source. PDB can only open in restricted mode until import.
2017-03-05 16:24:40.447000 +00:00
Errors in file /u01/app/oracle/diag/rdbms/cdb1/cdb11/trace/cdb11_p000_8910.trc:
Errors in file /u01/app/oracle/diag/rdbms/cdb1/cdb11/trace/cdb11_p002_8918.trc:
Errors in file /u01/app/oracle/diag/rdbms/cdb1/cdb11/trace/cdb11_p001_8914.trc:
Errors in file /u01/app/oracle/diag/rdbms/cdb1/cdb11/trace/cdb11_p003_8922.trc:
**************************************************************
Undo Create of Pluggable Database PDB2 with pdb id - 4.
**************************************************************
ORA-65169 signalled during: create pluggable database pdb2 from pdb1 snapshot copy keystore identified by * ...

And those trace files have the following information:
ksfdsscre_clone: create snapshot failed error(-1) errmsg(OS dependent failure) voltag(49FF372094256196E053BAF6C40AEB9D) parent_voltag() mntbuf(/u02)

This is not very helpful by itself. We see the snapshot name (voltag) and the parent name (parent_voltag). You may know that error (-1) is EPERM which is ‘operation not permitted’. What I did to be sure was to try to create the snapshot myself:

[oracle@rac1 cdb11]$ acfsutil snap create -w 49FFA9651D1D58D5E053BAF6C40AF81C /u02
acfsutil snap create: CLSU-00100: operating system function: ioctl failed with error data: 1
acfsutil snap create: CLSU-00101: operating system error message: Operation not permitted
acfsutil snap create: CLSU-00103: error location: OI_0
acfsutil snap create: ACFS-03046: unable to perform snapshot operation on /u02

EPERM

This is more clear and I also strace’d it to see where the error comes from:

open("/u02", O_RDONLY) = 41
ioctl(41, RTC_UIE_ON, 0x7fff17ae17a0) = 0
ioctl(41, 0xffffffffc1287021, 0x7fff17ae0e90) = -1 EPERM (Operation not permitted)

I’m running that with the oracle user, as the instance does when creating a PDB:
uid=1001(oracle) gid=1001(oinstall) groups=1001(oinstall),1002(dba),1003(racoper),1004(asmdba)

grid

When connecting as grid, I am able to create the snapshot

[grid@rac1 ~]$ acfsutil snap create -w 49FFA9651D1D58D5E053BAF6C40AF81C /u02
acfsutil snap create: Snapshot operation is complete.
[grid@rac1 ~]$ acfsutil snap delete 49FFA9651D1D58D5E053BAF6C40AF81C /u02
acfsutil snap delete: Snapshot operation is complete.

Grid has the following permissions:
uid=1000(grid) gid=1001(oinstall) groups=1001(oinstall),1003(racoper),1004(asmdba),1005(asmoper),1006(asmadmin)

asmadmin

This is what /etc/group looks like:

opc:x:54323:
oinstall:x:1001:
dba:x:1002:oracle
racoper:x:1003:oracle,grid
asmdba:x:1004:oracle,grid
asmoper:x:1005:grid
asmadmin:x:1006:grid

This is what the Oracle Public Cloud defines at RAC DBaaS service creation, and asmadmin is not mentioned in documentation.

So, to solve (or workaround) the issue, I’ve added oracle to the asmadmin group:

asmadmin:x:1006:grid,oracle

and now, I’m able to create a snapshot when logging as oracle:

[oracle@rac1 ~]$ acfsutil snap create -w 49FFA9651D1D58D5E053BAF6C40AF81C /u02
acfsutil snap create: Snapshot operation is complete.
[oracle@rac1 ~]$ acfsutil snap delete 49FFA9651D1D58D5E053BAF6C40AF81C /u02
acfsutil snap delete: Snapshot operation is complete.

restart

I thought that restarting the instance would be sufficient, but it is not. I had to restart the cluster. And this is also something easy in the Oracle Public Cloud:

CaptureRestartOPCRAC

A simple click restarts the first node, and then, once it is up again, restarts the second node.Rolling reboot ensures that the service is always up.

Thin clone

Here it is. The instance is now able to create a snapshot.

Connected to:
Oracle Database 12c EE Extreme Perf Release 12.2.0.1.0 - 64bit Production
 
SQL> set echo on
SQL> create pluggable database pdb2 from pdb1 snapshot copy keystore identified by "Ach1z0#d" ;
Pluggable database created.
 
Elapsed: 00:00:30.36

So what?

In my opinion, the configurations that stores a CDB datafiles on ACFS should give the rights to create snapshots to the user running the database. The cloud interface is very simple, but the technology behind is complex. The consequence of this gap is that using the cloud is easy when everything goes as expected, but any exception can bring us into troubleshooting.

 

Cet article Oracle 12cR2, RAC, Cloud, ACFS, PDB thin clones and asmadmin est apparu en premier sur Blog dbi services.

Oracle 12cR2: changes for login.sql

Tue, 2017-03-07 15:58

If you use a login.sql script to set the SQL*Plus environment from your current working directory, you will see that it will not run anymore in 12.2. This is a security feature, and a good occasion to explain how sqlplus finds the scritps to run, on Linux.

For my test I have login.sql, LOGIN.SQL and script.sql in the following directories

$ tree /tmp/mytest/
/tmp/mytest/
├── a
│   ├── login.sql
│   ├── LOGIN.SQL
│   └── script.sqlL
├── b
│   ├── login.sql
│   ├── LOGIN.SQL
│   └── script.sql
├── login.sql
├── LOGIN.SQL
└── script.sql

I’m going to the parent directory
cd /tmp/mytest

The scripts display their name:

+ head login.sql LOGIN.SQL script.sql
==> login.sql LOGIN.SQL script.sql <==
prompt Hello from /tmp/mytest/script.sql

I’ll run commands from bash -x so that they are displayed, and environment variables are set only for the command to be run.

login.sql

+ sqlplus -s /nolog

Nothing displayed here. This is what has changed in 12.2 for security reasons the login.sql in the current working directory is not run anymore.

+ sqlplus -s /nolog @ login.sql

This is probably a side effect of the implementation of this new security feature: even when I explicitly want to run the login.sql script it is ignored

+ sqlplus -s /nolog @ login
Hello from /tmp/mytest/login.sql

Here, I rely on the implicit ‘.sql’ added and the the script is run. Probably the implementation of the security feature is done before this implicit extension.

+ sqlplus -s /nolog @ /tmp/mytest/login.sql
Hello from /tmp/mytest/login.sql

With full path, the script is run.

Actually, the only way to get the current directory login.sql run implicitely when starting sqlplus or connecting is to set the current directory in ORACLE_PATH:

+ ORACLE_PATH=.
+ sqlplus -s /nolog
Hello from /tmp/mytest/login.sql

Note that this defeats the security feature, in the same way it is not recommended to add ‘.’ to your shell PATH. It is better to put only absolute paths in the PATH, with directories whey you know nobody can add a trojan script.

LOGIN.SQL

+ sqlplus -s /nolog @ LOGIN.SQL

The implementation of this new feature is case insensitive. LOGIN.SQL is ignored even when specified in the command line.

+ sqlplus -s /nolog @ ./LOGIN.SQL
Hello from /tmp/mytest/LOGIN.SQL

Only when using less or more characters to specify it it is used.

Note that the implicit login.sql is case sensitive on Linux:
+ rm login.sql
+ ORACLE_PATH=.
+ sqlplus -s /nolog

Even with ORACLE_PATH it is not found.

SQLPATH

+ SQLPATH=/tmp/mytest
+ sqlplus -s /nolog

SQLPATH is not a solution to find login.sql unlike ORACLE_PATH

Note that the documentation tells very different things in http://docs.oracle.com/database/122/SQPUG/configuring-SQL-Plus.htm#SQPUG012

script.sql

Now, because SQLPATH and ORACLE_PATH was already a mess, I’ll try with a script that is not login.sql

+ sqlplus -s /nolog @ script.sql
Hello from /tmp/mytest/script.sql

Current directory is still searched for non-login scripts

+ sqlplus -s /nolog @ /tmp/mytest/script.sql
Hello from /tmp/mytest/script.sql

Absolute path can be used, or we can sete a PATH to search:

+ SQLPATH=/tmp/mytest
+ sqlplus -s /nolog @ script
Hello from /tmp/mytest/script.sql

Unlike login.sql, SQLPATH can be used to find a script in another directory

+ cd /tmp/mytest/a
+ SQLPATH=/tmp/mytest
+ sqlplus -s /nolog @ script
Hello from /tmp/mytest/a/script.sql

But current directory is still the first one that is searched

+ rm script.sql
+ SQLPATH=/tmp/mytest
+ sqlplus -s /nolog @ script
Hello from /tmp/mytest/script.sql

Only when the script is not in the current directory it is searched in SQLPATH

+ rm /tmp/mytest/script.sql
+ SQLPATH=/tmp/mytest
+ sqlplus -s /nolog @ script
SP2-0310: unable to open file "script.sql"

Again, the documentation is wrong. Only specified directories are looked-up, not sub-directories. But if I specify the subdirectory relative to SQLPATH (I am still in /tmp/mytest/a where shere is no script.sql)

+ SQLPATH=/tmp/mytest
+ sqlplus -s /nolog @ b/script
Hello from /tmp/mytest/b/script.sql

The b/script was resolved from the SQLPATH=/tmp/mytest

In SQLPATH, we can add multiple paths

+ SQLPATH=/tmp/mytest:/tmp/mytest/a:/tmp/mytest/b
+ sqlplus -s /nolog @ script
Hello from /tmp/mytest/b/script.sql

Here I have a script.sql only in the ‘b’ sub-directory and sqlplus finds it when this directory is listed in SQLPATH

ORACLE_PATH

Running the same with ORACLE_PATH instead of SQLPATH

+ ORACLE_PATH=/tmp/mytest:/tmp/mytest/a:/tmp/mytest/b
+ sqlplus -s /nolog @ script
Hello from /tmp/mytest/a/login.sql
Hello from /tmp/mytest/b/script.sql

We can have also multiple paths for ORACLE_PATH (this is not in the documentation) and it acts as with SQLPATH but there are a few differences.

First, you have seen that the login.sql script is run.

Then, if I have the script in my current directory, but not in ORACLE_PATH

+ cd /tmp
+ echo 'prompt Hello from /tmp' > script.sql
+ ORACLE_PATH=/tmp/mytest:/tmp/mytest/a:/tmp/mytest/b
+ sqlplus -s /nolog @ script
Hello from /tmp/mytest/a/login.sql
Hello from /tmp/mytest/b/script.sql

The ORACLE_PATH one is used first

+ ORACLE_PATH=/tmp/mytest:/tmp/mytest/a
+ sqlplus -s /nolog @ script
Hello from /tmp/mytest/a/login.sql
Hello from /tmp

The current directory is considered only when not found in ORACLE_PATH.

+ ORACLE_PATH=/tmp/mytest:/tmp/mytest/a
+ sqlplus -s /nolog @ b/script
Hello from /tmp/mytest/a/login.sql
Hello from /tmp/mytest/b/script.sql

As with SQLPATH, subdirectory is accessible if mentioned.

Both, in order

If you don’t want to keep it simple, you can specify both ORACLE_PATH and SQLPATH

+ cd mytest
+ ORACLE_PATH=/tmp
+ SQLPATH=/tmp/mytest/b
+ sqlplus -s /nolog @ script
Hello from /tmp

and ORACLE_PATH is read first. So the order is:

  1. ORACLE_PATH, in order of paths specified
  2. Current directory (except for login.sq)
  3. SQLPATH (except for login.sql) in order of paths specified
strace

Better than documentation or testing all combinations, in Linux we can trace the system calls when sqlplus is looking for the script.

I’ve set non-existing paths /ORACLE_PATH1 and /ORACLE_PATH2 for ORACLE_PATH, and /SQLPATH1 and /SQLPATH2 for SQLPATH and run ‘script’ without the extention

ORACLE_PATH=/ORACLE_PATH1:/ORACLE_PATH2 SQLPATH=/SQLPATH1:/SQLPATH2 strace -e trace=file sqlplus -s /nolog @ script

This traces all system calls with a file name:


access("/ORACLE_PATH1/script", F_OK) = -1 ENOENT (No such file or directory)
access("/ORACLE_PATH2/script", F_OK) = -1 ENOENT (No such file or directory)
access("/ORACLE_PATH1/script.sql", F_OK) = -1 ENOENT (No such file or directory)
access("/ORACLE_PATH2/script.sql", F_OK) = -1 ENOENT (No such file or directory)
access("/ORACLE_PATH1/script", F_OK) = -1 ENOENT (No such file or directory)
access("/ORACLE_PATH2/script", F_OK) = -1 ENOENT (No such file or directory)
access("/ORACLE_PATH1/script.sql", F_OK) = -1 ENOENT (No such file or directory)
access("/ORACLE_PATH2/script.sql", F_OK) = -1 ENOENT (No such file or directory)
stat("script.sql", 0x7fff01921400) = -1 ENOENT (No such file or directory)
access("/ORACLE_PATH1/script", F_OK) = -1 ENOENT (No such file or directory)
access("/ORACLE_PATH2/script", F_OK) = -1 ENOENT (No such file or directory)
access("/ORACLE_PATH1/script.sql", F_OK) = -1 ENOENT (No such file or directory)
access("/ORACLE_PATH2/script.sql", F_OK) = -1 ENOENT (No such file or directory)
stat("/SQLPATH1/script.sql", 0x7fff0191b430) = -1 ENOENT (No such file or directory)
stat("/SQLPATH2/script.sql", 0x7fff0191b430) = -1 ENOENT (No such file or directory)
SP2-0310: unable to open file "script.sql"
access("/ORACLE_PATH1/login.sql", F_OK) = -1 ENOENT (No such file or directory)
access("/ORACLE_PATH2/login.sql", F_OK) = -1 ENOENT (No such file or directory)
access("/ORACLE_PATH1/login.sql", F_OK) = -1 ENOENT (No such file or directory)
access("/ORACLE_PATH2/login.sql", F_OK) = -1 ENOENT (No such file or directory)
stat("/u01/app/oracle/product/12.2.0/dbhome_1/sqlplus/admin/glogin.sql", {st_mode=S_IFREG|0644, st_size=342, ...}) = 0
access("/u01/app/oracle/product/12.2.0/dbhome_1/sqlplus/admin/glogin.sql", F_OK) = 0
statfs("/u01/app/oracle/product/12.2.0/dbhome_1/sqlplus/admin/glogin.sql", {f_type="EXT2_SUPER_MAGIC", f_bsize=4096, f_blocks=6676009, f_bfree=2866104, f_bavail=2521221, f_files=1703936, f_ffree=1663469, f_fsid={-1731931108, 1057261682}, f_namelen=255, f_frsize=4096}) = 0
open("/u01/app/oracle/product/12.2.0/dbhome_1/sqlplus/admin/glogin.sql", O_RDONLY) = 9

This is very interesting. First we see that the paths are searched multiple time, and I don’t know why. Second, when passing a name without extension (i.e without dot in the name) the exact name is used forst for ORACLE_PATH, but lookup in current directory and in SQLPATH automatically adds ‘.sql’. The system calls are also different: ORACLE_PATH has no stat() call before access(), which is different with current directory and SQLPATH. Finally, login.sql is read from ORACLE_PATH only and glogin.sql from ORACLE_HOME/sqlplus/admin.

Change occurred between 12.2.0.1 and 12.2.0.1

As a comparison, sqlplus 12.1.0.2 and even 12.2.0.1 DBaaS version (built in October) has the following additional calls to look for login.sql in current path and in SQLPATH:
stat("login.sql", 0x7fffc14d5490) = -1 ENOENT (No such file or directory)
stat("/SQLPATH1/login.sql", 0x7fffc14cf4c0) = -1 ENOENT (No such file or directory)
stat("/SQLPATH2/login.sql", 0x7fffc14cf4c0) = -1 ENOENT (No such file or directory)

This has disappear in 12.2.0.1 on-premises version (build in January).

So what?

WTF login.sql is not run, even when explicitly started
Except when mentioning it as 'login' or as './login.sql' pic.twitter.com/ClfvAUizO6

— Franck Pachot (@FranckPachot) March 7, 2017

Big thanks to the SQL Developer team who gave me the solution approximately 3 seconds after my tweet.

This behavior changed and, as far as I know, is not documented and the MOS note about it is not published. It makes sense, for security reason, to prevent running scripts in the current directory without explicitely allowing it. However, login.sql is often used for formatting only. It seems that SQLcl will implement this in a finer way, running only the formatting commands when it comes implicitly.

Be careful when moving to/from the Oracle Cloud and your premises because you don’t run exactly the same version…

 

Cet article Oracle 12cR2: changes for login.sql est apparu en premier sur Blog dbi services.

Misleading wait event names clarified in V$EVENT_NAME

Mon, 2017-03-06 14:15

The oracle wait event names were originally implemented for the oracle rdbms developers and are now use by the database users to troubleshoot performance issues. The consequence is that the name may be misleading because they have a meaning from the internal point of view. Here is some clarification about them.

In 12c the clarification is easy because we have a new DISPLAY_NAME column in the V$EVENT_NAME view:


SQL> select wait_class,name, display_name from v$event_name where display_name != name order by 1,2;
WAIT_CLASS NAME DISPLAY_NAME
-------------- ----------------------------------- ----------------------------------------------
Administrative concurrent I/O completion online move datafile IO completion
Administrative datafile copy range completion online move datafile copy range completion
Administrative wait for possible quiesce finish quiesce database completion
Commit log file sync commit: log file sync
Configuration log buffer space log buffer full - LGWR bottleneck
Idle LGWR real time apply sync standby apply advance notification
Other DFS db file lock quiesce for datafile offline
Other Image redo gen delay redo resource management
Other datafile move cleanup during resize online move datafile resize cleanup
System I/O control file sequential read control file read
System I/O control file single write control file write
System I/O db file parallel write db list of blocks write
System I/O log file parallel write log file redo write
System I/O log file sequential read log file multiblock read
System I/O log file single write log file header write
User I/O db file parallel read db list of blocks read
User I/O db file scattered read db multiblock read
User I/O db file sequential read db single block read
User I/O db file single write db single block write

For long we know the misleading ‘db file sequential read’ which is what we call ‘random reads’ from storage point of view and ‘db file scattered read’ that is what we call ‘sequential reads’ from storage point of view. The DISPLAY_NAME clarifies everything: single block reads vs. multiblock reads.

‘db file parallel read’ is a batch of random reads, used by prefetching for example, which reads multiple blocks but non contiguous.
‘db file parallel write’ is similar, for DBWR to write a batch of blocks. The DISPLAY_NAME clarifies everything: ‘db list of blocks’.

‘log file parallel write’ is ‘parallel’ only because you can have multiplexed files. DISPLAY_NAME is less misleading with ‘log file redo write’.
The ‘log buffer space’ has a DISPLAY_NAME that is more focused on the cause: ‘log buffer full – LGWR bottleneck’

You can look at the others where DISPLAY_NAME is very clear about the operation: ‘online move’ for some operations on files, ‘commit’ for the well know log file sync…

Of course they are also described in the Database Reference documentation.

 

Cet article Misleading wait event names clarified in V$EVENT_NAME est apparu en premier sur Blog dbi services.

ODA – 32GB template but got a database with 16GB SGA???

Mon, 2017-03-06 11:40

I got an interesting question today from a customer which created a database on ODA. He selected the template odb-04 which shows 32GB Memory but got a database with a 16GB SGA… Is it due to the PGA size, a limitation in the system, the huge pages usage which is reached or even a bug?

Indeed, the answer is easier and funnier. If you look to the menu shown by OAKCLI while creating a database, you get something like that:

 Please select one of the following for Database Class  [1 .. 6]:

1    => odb-01s  (   1 cores ,     4 GB memory)

2    =>  odb-01  (   1 cores ,     8 GB memory)

3    =>  odb-02  (   2 cores ,    16 GB memory)

4    =>  odb-04  (   4 cores ,    32 GB memory)

5    =>  odb-06  (   6 cores ,    48 GB memory)

6    =>  odb-12  (  12 cores ,    96 GB memory)

4

Selected value is : odb-04  (   4 cores ,    32 GB memory)

 

So using the template odb-04 seems to use 32GB memory for the newly created database. However looking to what OAKCLI really does shows that the reality is a bit different. Following all files/scripts which are called by OAKCLI at execution, we come to following file

/opt/oracle/oak/lib/oakutilslib/DbSizingValues.pm

 

This script contains the definition of the DBCA template used including the memory definition

my $sga_size = $memory * $dbTypes{$dbtypeid}{sga_factor};
my $pga_size = $memory * $dbTypes{$dbtypeid}{pga_factor};

So the memory value is multiplied by a factor depending on the database type. Looking in the same script we find both information:

my  %dbTemplates  =

    (  1  => { name => 'odb-01s', cpus => 1,  sfactor  => 0.5},

       2  => { name => 'odb-01',  cpus => 1  },

       3  => { name => 'odb-02',  cpus => 2  },

       4  => { name => 'odb-04',  cpus => 4  },

       5  => { name => 'odb-06',  cpus => 6  },

       6  => { name => 'odb-12',  cpus => 12 },

       7  => { name => 'odb-16',  cpus => 16 },

       8  => { name => 'odb-24',  cpus => 24 },

       9  => { name => 'odb-32',  cpus => 32 },

       10 => { name => 'odb-36',  cpus => 36 }

    );

my  %dbTypes  =

    ( 1 => { name => 'OLTP', template_name => 'OAK_oltp.dbt',     sga_factor => 0.5,  pga_factor => 0.25 },

      2 => { name => 'DSS',  template_name => 'OAK_dss.dbt',      sga_factor => 0.25, pga_factor => 0.50 },

      3 => { name => 'In-Memory', template_name => 'OAK_oltp.dbt',sga_factor => 0.25, pga_factor => 0.25, in_mem_factor=>0.25, only12c=> 1}

    );

 

This means that If you create an OLTP database with the odb-04 template it takes 32GB as basis and multiplied them by 0,5.
Here we go we have our 16GB!!

In conclusion the memory information shown by OAKCLI CREATE DATABASE is the base memory used for the calculation and not the one assigned to the SGA. I must admit that this is quite confusing for the end users as the base memory as no signification and is useless…

To be fully fair, I have to mention that the correct information about SGA size per template is available in the documentation the appendix B:

http://docs.oracle.com/cd/E83239_01/doc.121/e83201/database-templates-oracle-database-appliance.htm#CMTAR269

Enjoy!

David

 

Cet article ODA – 32GB template but got a database with 16GB SGA??? est apparu en premier sur Blog dbi services.

Purging Unified Audit Trail in 12cR1

Sat, 2017-03-04 11:24

When you want to empty a table you have two methods: delete and truncate. If, for any reason (see previous post) the Unified Audit Trail has become too big, you cannot directly delete or truncate the table. You must call the dbms_audit_mgmt.clean_audit_trail. But then you want to know if it will do slow deletes or quick truncates. Let’s trace it.

I have filled my Unified Audit Trail with hundred of thousands failed logins:
SQL> select unified_audit_policies,action_name,count(*) from unified_audit_trail group by unified_audit_policies,action_name;
 
UNIFIED_AUDIT_POLICIES ACTION_NAME COUNT(*)
---------------------------------------- -------------------- ----------
EXECUTE 2
ORA_LOGON_FAILURES LOGON 255799

We have two methods to purge: purge records older than a timestamp or purge all.

Purge old

Auditing is different than logging. It’s a security feature. The goal is not to keep only recent information by specifying a retention. The goal is to read, process and archive the records, and then set a timestamp to the high water mark that has been processed. Then a background job will delete what is before this timestamp.

I set the timestamp to 6 hours before now

SQL> exec dbms_audit_mgmt.set_last_archive_timestamp(audit_trail_type=>dbms_audit_mgmt.audit_trail_unified
,last_archive_time=>sysdate-6/24);
PL/SQL procedure successfully completed.

And call the clean procedure:

SQL> exec dbms_audit_mgmt.clean_audit_trail(audit_trail_type=>dbms_audit_mgmt.audit_trail_unified
,use_last_arch_timestamp=TRUE);
PL/SQL procedure successfully completed.

This was fast but let’s look at the tkprof. Besides some select, I see a delete on the CLI_SWP$ table that stores the Unified Audit Trail in Secure File LOBs

delete from "CLI_SWP$2f516430$1$1" partition("HIGH_PART")
where
max_time < :1
 
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.47 1.82 20 650 47548 6279
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 2 0.47 1.82 20 650 47548 6279
 
Misses in library cache during parse: 1
Misses in library cache during execute: 1
Optimizer mode: ALL_ROWS
Parsing user id: 7 (recursive depth: 1)
Number of plan statistics captured: 1
 
Rows (1st) Rows (avg) Rows (max) Row Source Operation
---------- ---------- ---------- ---------------------------------------------------
0 0 0 DELETE CLI_SWP$2f516430$1$1 (cr=650 pr=20 pw=0 time=1827790 us)
6279 6279 6279 PARTITION RANGE SINGLE PARTITION: 1 1 (cr=248 pr=0 pw=0 time=15469 us cost=5 size=18020 card=530)
6279 6279 6279 TABLE ACCESS FULL CLI_SWP$2f516430$1$1 PARTITION: 1 1 (cr=248 pr=0 pw=0 time=10068 us cost=5 size=18020 card=530)

I will not go into the detail there. This delete may be optimized (120000 audit trail records were actually deleted here behind those 6000 rows. This table is partitioned, and we can expect that old partitions are truncated but there are many bugs with that. On lot of environments we see all rows in HIGH_PART.
This is improved in 12cR2 and will be the subject of a future post. I you have a huge audit trail to purge, then conventional delete is not optimal.

Purge all

I have still lot of rows remaining.

SQL> select unified_audit_policies,action_name,count(*) from unified_audit_trail group by unified_audit_policies,action_name;
 
UNIFIED_AUDIT_POLICIES ACTION_NAME COUNT(*)
---------------------------------------- -------------------- ----------
EXECUTE 4
ORA_LOGON_FAILURES LOGON 136149

When purging all without setting a timestamp, I expect a truncate which is faster than deletes. Let’s try it and trace it.

SQL> exec dbms_audit_mgmt.clean_audit_trail(audit_trail_type=>dbms_audit_mgmt.audit_trail_unified
,use_last_arch_timestamp=FALSE);
PL/SQL procedure successfully completed.

First, there seem to be an internal log acquired:
SELECT LOCKID FROM DBMS_LOCK_ALLOCATED WHERE NAME = :B1 FOR UPDATE
UPDATE DBMS_LOCK_ALLOCATED SET EXPIRATION = SYSDATE + (:B1 /86400) WHERE ROWID = :B2

Then a partition split:
alter table "CLI_SWP$2f516430$1$1" split partition high_part at (3013381) into (partition "PART_6", partition high_part lob(log_piece) store as securefile (cache logging tablespace SYSAUX) tablespace "SYSAUX")

The split point is the current timestamp SCN:

SQL> select scn_to_timestamp(3013381) from dual;
 
SCN_TO_TIMESTAMP(3013381)
---------------------------------------------------------------------------
02-MAR-17 05.59.06.000000000 PM

This is the time when I’ve run the purge and this is probably used to ‘truncate’ all previous partition but keep the on-going one.

Then , there is no TRUNCATE in the trace, but something similar: some segments are dropped:

delete from seg$
where
ts#=:1 and file#=:2 and block#=:3
 
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 6 0.00 0.00 0 18 12 6
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 7 0.00 0.00 0 18 12 6

There is finally a delete, but with no rows to delete as the rows were in the dropped segments:

delete from "CLI_SWP$2f516430$1$1" partition("HIGH_PART")
where
max_time < :1
 
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 3 0 0
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 2 0.00 0.00 0 3 0 0
 
Misses in library cache during parse: 1
Misses in library cache during execute: 1
Optimizer mode: ALL_ROWS
Parsing user id: 7 (recursive depth: 1)
Number of plan statistics captured: 1
 
Rows (1st) Rows (avg) Rows (max) Row Source Operation
---------- ---------- ---------- ---------------------------------------------------
0 0 0 DELETE CLI_SWP$2f516430$1$1 (cr=3 pr=0 pw=0 time=61 us)
0 0 0 PARTITION RANGE SINGLE PARTITION: 1 1 (cr=3 pr=0 pw=0 time=57 us cost=5 size=2310 card=33)
0 0 0 TABLE ACCESS FULL CLI_SWP$2f516430$1$1 PARTITION: 1 1 (cr=3 pr=0 pw=0 time=48 us cost=5 size=2310 card=33)

So what?

Cleaning the Unified Audit Trail is done with internal statements but looks like a delete when use_last_arch_timestamp=TRUE or a truncate when use_last_arch_timestamp=FALSE. This means that we can use this procedure when AUDSYS has grown too much. However, there are a few bug with this internal table, partitioned even when partitioning is not allowed. The implementation has changed in 12.2 so the next blog post will show the same test on 12cR2.

 

Cet article Purging Unified Audit Trail in 12cR1 est apparu en premier sur Blog dbi services.

Is your DBA_FEATURE_USAGE_STATISTICS up-to-date?

Sat, 2017-03-04 05:35

Last day we were doing a licensing review for a client. As many dbas may already know, this require to execute some oracle scripts at OS level and database level.
Among these scripts we have the script options_packs_usage_statistics.sql (docId 1317265.1) which is an official oracle script to check the usage of separately licensed Oracle Database Options/Management Packs
This script is using the DBA_FEATURE_USAGE_STATISTICS table to retrieve info. And sometimes it may happen that data of this table are not recent.
One important thing is that the DBA_FEATURE_USAGE_STATISTICS are based on the most recent sample in the column LAST_SAMPLE_DATE. In our case we got following results (outputs are truncated).

SYSDATE |
-------------------|
2017.02.17_13.36.44|


PRODUCT |LAST_SAMPLE_DATE |
-------------------------------|-------------------|
Active Data Guard |2014.01.02_13.37.53|
Advanced Analytics |2014.01.02_13.37.53|
Advanced Compression |2014.01.02_13.37.53|
Advanced Security |2014.01.02_13.37.53|
Database Vault |2014.01.02_13.37.53|
Diagnostics Pack |2014.01.02_13.37.53|
Label Security |2014.01.02_13.37.53|
OLAP |2014.01.02_13.37.53|
Partitioning |2014.01.02_13.37.53|
Real Application Clusters |2014.01.02_13.37.53|
Real Application Testing |2014.01.02_13.37.53|
Tuning Pack |2014.01.02_13.37.53|
.Exadata |2014.01.02_13.37.53|

If we compare sysdate and the date of the last_sample_date, we can see that we have to manually refresh our DBA_FEATURE_USAGE_STATISTICS data.
One way to do this is to run the procedure

exec dbms_feature_usage_internal.exec_db_usage_sampling(SYSDATE);

In our case the procedure did not refresh our data despite the fact that there was any error and we received message that procedure was successfully executed.

SQL> exec dbms_feature_usage_internal.exec_db_usage_sampling(SYSDATE);
PL/SQL procedure successfully completed.


SQL> select max(last_sample_date) from dba_feature_usage_statistics order by 1;
MAX(LAST_
---------
02-JAN-14

Following oracle document 1629485.1 we were able to refresh the last_sample_date using this ALTER SESSION
code>
SQL> alter session set “_SWRF_TEST_ACTION”=53;
Session altered.


SQL> alter session set NLS_DATE_FORMAT='DD/MM/YYYY HH24:MI:SS';
Session altered.


SQL> select MAX(LAST_SAMPLE_DATE) from dba_feature_usage_statistics;
MAX(LAST_SAMPLE_DAT
-------------------
16/02/2017 13:44:46

Hope this article may help

 

Cet article Is your DBA_FEATURE_USAGE_STATISTICS up-to-date? est apparu en premier sur Blog dbi services.

Sharding with Oracle 12c R2 Part I

Sat, 2017-03-04 05:33

Oracle 12.2 comes with many new features. In this article we are going to talk about sharding. It is a database scaling technique based on horizontal partitioning of data across multiple oracle databases called sharded databases (SDB). Each shard contains the table with the same columns but a different subset of rows. Sharding can be represented like this
shar1
For DBA: SDB is in fact multiples databases that can be managed collectively or individually
There are 3 methods of sharding System-managed sharding,User-defined sharding and Composite sharding
In this article we are using System-managed sharding where data are automatically distributed across shards using partitioning by consistent hash. It is the most used. We will just demonstrate how it is possible to create shards using oracle. In next articles we will show how we can connect to these shards and how it is possible to add new shards.
What do we need?
Oracle Database 12c Release 2 : linuxx64_12201_database.zip
Oracle Database 12c Release 2 Global Service Manager : linuxx64_12201_gsm.zip

In this demo we use following configuration to create sharded databases on sharddemo2 and sharddemo3.
VM sharddemo1:  catalog
VM sharddemo2: shard
VM sharddemo3: shard

Oracle 12.2 should be installed on all servers. We will not show oracle software installation.
GSM software should be installed on the catalog sharddemo1.

After unzipping the file just launch the installer, just launch the runInstaller
[oracle@sharddemo1 gsm122]$ ./runInstaller

gsm1

gsm2

gsm3

gsm4

gsm5


[root@sharddemo1 oracle122]# /u01/app/oracle/product/12.2.0.1/gsmhome_1/root.sh
Performing root user operation.
The following environment variables are set as:
ORACLE_OWNER= oracle
ORACLE_HOME= /u01/app/oracle/product/12.2.0.1/gsmhome_1
Enter the full pathname of the local bin directory: [/usr/local/bin]:
The contents of "dbhome" have not changed. No need to overwrite.
The contents of "oraenv" have not changed. No need to overwrite.
The contents of "coraenv" have not changed. No need to overwrite.
Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
[root@sharddemo1 oracle122]#

gsm6

The second step is to create the catalog database on sharddemo1. We will name it ORCLCAT (NON-CDB). Some database parameters need to be configured for ORCLCAT. Database creation is not shown here.

SQL> alter system set db_create_file_dest='/u01/app/oracle/oradata/’;
System altered.
SQL> alter system set open_links=16 scope=spfile;
System altered.
SQL> alter system set open_links_per_instance=16 scope=spfile;
System altered.

Oracle 12.2 database comes with a schema  gsmcatuser schema. This schema is used by the shard director while connecting to the shard catalog database. This schema is locked by default, so we have to unlock it.

SQL> alter user gsmcatuser account unlock;
User altered.
SQL> alter user gsmcatuser identified by root;
User altered.

We also have to create the gsm  administrator schema (mygdsadmin in our case) and give him  the required  privileges

SQL> create user mygdsadmin identified by root;
User created.
SQL> grant connect, create session, gsmadmin_role to mygdsadmin;
Grant succeeded.
SQL> grant inherit privileges on user SYS to GSMADMIN_INTERNAL;
Grant succeeded.
The next step is to configure the scheduler on the shard catalog by setting  the remote scheduler’s http port and the agent registration password on the shard catalog database ORCLCAT

SQL> execute dbms_xdb.sethttpport(8080);
PL/SQL procedure successfully completed.
SQL> commit;
Commit complete.
SQL> @?/rdbms/admin/prvtrsch.plb


SQL> exec DBMS_SCHEDULER.SET_AGENT_REGISTRATION_PASS('welcome');
PL/SQL procedure successfully completed.
SQL>

We have now to register sharddemo2 and sharddemo3 agents in the scheduler. The executable schagent in $ORACLE_HOME/bin is used. After registration, agents should be started
Below registration for sharddemo2

[oracle@sharddemo2 ~]$ schagent -start
Scheduler agent started using port 21440
[oracle@sharddemo2 ~]$ echo welcome | schagent -registerdatabase sharddemo1 8080
Agent Registration Password ?
Oracle Scheduler Agent Registration for 12.2.0.1.2 Agent
Agent Registration Successful!
[oracle@sharddemo2 ~]$

After agent registration, corresponding directories for database must be created on sharddemo2 and shardddemo3

[oracle@sharddemo2 ~]$ mkdir /u01/app/oracle/oradata
[oracle@sharddemo2 ~]$ mkdir /u01/app/oracle/fast_recovery_area

Now it’s time to launch the Global Data Services Control Utility (GDSCTL) on sharddemo1. GDSCTL is in $GSM_HOME/bin in our case /u01/app/oracle/product/12.2.0.1/gsmhome_1/bin

[oracle@sharddemo1 ~]$ gdsctl
GDSCTL: Version 12.2.0.1.0 - Production on Thu Mar 02 13:53:50 CET 2017
Copyright (c) 2011, 2016, Oracle. All rights reserved.
Welcome to GDSCTL, type "help" for information.
Warning: current GSM name is not set automatically because gsm.ora contains zero or several GSM entries. Use "set gsm" command to set GSM for the session.
Current GSM is set to GSMORA
GDSCTL>

And to create the shardcatalog

GDSCTL>create shardcatalog -database sharddemo1:1521:ORCLCAT -chunks 12 -user mygdsadmin/root -sdb cust_sdb -region region1
Catalog is created
GDSCTL>

Now let’s create and start the shard director. The listener of the gsm should use a free port. In our case the port is 1571

GDSCTL>add gsm -gsm region1_director -listener 1571 -pwd root -catalog sharddemo1:1521:ORCLCAT -region region1
GSM successfully added


GDSCTL>start gsm -gsm region1_director
GSM is started successfully
GDSCTL>


GDSCTL>status gsm
Alias REGION1_DIRECTOR
Version 12.2.0.1.0
Start Date 02-MAR-2017 14:03:36
Trace Level off
Listener Log File /u01/app/oracle/diag/gsm/sharddemo1/region1_director/alert/log.xml
Listener Trace File /u01/app/oracle/diag/gsm/sharddemo1/region1_director/trace/ora_21814_140615026692480.trc
Endpoint summary (ADDRESS=(HOST=sharddemo1.localdomain)(PORT=1571)(PROTOCOL=tcp))
GSMOCI Version 2.2.1
Mastership Y
Connected to GDS catalog Y
Process Id 21818
Number of reconnections 0
Pending tasks. Total 0
Tasks in process. Total 0
Regional Mastership TRUE
Total messages published 0
Time Zone +01:00
Orphaned Buddy Regions:
None
GDS region region1
GDSCTL>

We also have to set the scheduler agent password to “welcome” in gdsctl

GDSCTL>modify catalog -agent_password welcome
The operation completed successfully
GDSCTL>

The OS credential for the user “oracle” must be defined. We are using the same OS credential for all the shards

GDSCTL>add credential -credential oracle_cred -osaccount oracle -ospassword root
The operation completed successfully
GDSCTL>

Before deploying the shards we have to define metadata for them.

GDSCTL>set gsm -gsm region1_director
GDSCTL>connect mygdsadmin/root
Catalog connection is established
GDSCTL>


GDSCTL>add shardgroup -shardgroup shgrp1 -deploy_as primary -region region1
The operation completed successfully
GDSCTL>


GDSCTL>add invitednode sharddemo2


GDSCTL>create shard -shardgroup shgrp1 -destination sharddemo2 -credential oracle_cred
The operation completed successfully
DB Unique Name: sh1
GDSCTL>


GDSCTL>add invitednode sharddemo3


GDSCTL>create shard -shardgroup shgrp1 -destination sharddemo3 -credential oracle_cred
The operation completed successfully
DB Unique Name: sh21
GDSCTL>

We can then verify the status of our configuration

GDSCTL>config shard
Name Shard Group Status State Region Availability
---- ----------- ------ ----- ------ ------------
sh1 shgrp1 U none region1 -
sh21 shgrp1 U none region1 -

If there is no error it’s time to deploy our shards. Deployment is the last step before creating the schema we will use for System Managed Sharding

GDSCTL>deploy
deploy: examining configuration...
deploy: deploying primary shard 'sh1' ...
deploy: network listener configuration successful at destination 'sharddemo2'
deploy: starting DBCA at destination 'sharddemo2' to create primary shard 'sh1' ...
deploy: deploying primary shard 'sh21' ...
deploy: network listener configuration successful at destination 'sharddemo3'
deploy: starting DBCA at destination 'sharddemo3' to create primary shard 'sh21' ...
deploy: waiting for 2 DBCA primary creation job(s) to complete...
deploy: waiting for 2 DBCA primary creation job(s) to complete...
deploy: waiting for 2 DBCA primary creation job(s) to complete...
deploy: waiting for 2 DBCA primary creation job(s) to complete...
deploy: waiting for 2 DBCA primary creation job(s) to complete...
deploy: waiting for 2 DBCA primary creation job(s) to complete...
deploy: waiting for 2 DBCA primary creation job(s) to complete...
deploy: waiting for 2 DBCA primary creation job(s) to complete...
deploy: waiting for 2 DBCA primary creation job(s) to complete...
deploy: waiting for 2 DBCA primary creation job(s) to complete...
deploy: waiting for 2 DBCA primary creation job(s) to complete...
deploy: waiting for 2 DBCA primary creation job(s) to complete...
deploy: waiting for 2 DBCA primary creation job(s) to complete...
deploy: waiting for 2 DBCA primary creation job(s) to complete...
deploy: waiting for 2 DBCA primary creation job(s) to complete...
deploy: waiting for 2 DBCA primary creation job(s) to complete...
deploy: waiting for 2 DBCA primary creation job(s) to complete...
deploy: waiting for 2 DBCA primary creation job(s) to complete...
deploy: DBCA primary creation job succeeded at destination 'sharddemo3' for shard 'sh21'
deploy: waiting for 1 DBCA primary creation job(s) to complete...
deploy: DBCA primary creation job succeeded at destination 'sharddemo2' for shard 'sh1'
deploy: requesting Data Guard configuration on shards via GSM
deploy: shards configured successfully
The operation completed successfully
GDSCTL>

The command may take some times. Once done running again the config command may return.

GDSCTL>config shard
Name Shard Group Status State Region Availability
---- ----------- ------ ----- ------ ------------
sh1 shgrp1 Ok Deployed region1 ONLINE
sh21 shgrp1 Ok Deployed region1 ONLINE

We should have two instances running on sharddemo2 and sharddemo3: sh1 and sh21.
Now that shards are deployed let’s create in the shardcatalog database ORCLCAT the schema we will use for sharding. Here the user is called shard_user.

SQL> show parameter instance_name
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
instance_name string ORCLCAT
SQL>


SQL>alter session enable shard ddl;
SQL>create user user_shard identified by user_shard;
SQL>grant connect, resource, alter session to user_shard;
SQL>grant execute on dbms_crypto to user_shard;
SQL>grant create table, create procedure, create tablespace, create materialized view to user_shard;
SQL>grant unlimited tablespace to user_shard;
SQL>grant select_catalog_role to user_shard;
SQL>grant all privileges to user_shard;
SQL>grant gsmadmin_role to user_shard;
SQL>grant dba to user_shard;

In a sharding environment, we have two types of tables
Sharded tables : data are distributed in the different shards
Duplicated tables: data are duplicated in the different shards
Let’s create tablespaces for each type of tables

SQL> CREATE TABLESPACE SET TAB_PRIMA_SET using template (datafile size 100m autoextend on next 10M maxsize unlimited extent management local segment space management auto );
Tablespace created.


SQL> CREATE TABLESPACE TAB_PRODUCT datafile size 100m autoextend on next 10M maxsize unlimited extent management local uniform size 1m;
Tablespace created

Now under user_shard schema let’s create Sharded and Duplicated tables in ORCLCAT

CREATE SHARDED TABLE Customers
(
CustId VARCHAR2(60) NOT NULL,
FirstName VARCHAR2(60),
LastName VARCHAR2(60),
Class VARCHAR2(10),
Geo VARCHAR2(8),
CustProfile VARCHAR2(4000),
CONSTRAINT pk_customers PRIMARY KEY (CustId),
CONSTRAINT json_customers CHECK (CustProfile IS JSON)
) TABLESPACE SET TAB_PRIMA_SET PARTITION BY CONSISTENT HASH (CustId) PARTITIONS AUTO;
Table created.


CREATE SHARDED TABLE Orders
(
OrderId INTEGER NOT NULL,
CustId VARCHAR2(60) NOT NULL,
OrderDate TIMESTAMP NOT NULL,
SumTotal NUMBER(19,4),
Status CHAR(4),
constraint pk_orders primary key (CustId, OrderId),
constraint fk_orders_parent foreign key (CustId)
references Customers on delete cascade
) partition by reference (fk_orders_parent);


CREATE DUPLICATED TABLE Products
(
ProductId INTEGER GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
Name VARCHAR2(128),
DescrUri VARCHAR2(128),
LastPrice NUMBER(19,4)
) TABLESPACE TAB_PRODUCT;
Table created.

Some checks can be done on both instances (ORCLCAT, sh1, sh21) to verify that tablespaces, sharded tables, duplcated tables are created for example.

SQL> select name from v$database;
NAME
---------
ORCLCAT

SQL> select TABLESPACE_NAME, BYTES/1024/1024 MB from sys.dba_data_files order by tablespace_name;
TABLESPACE_NAME MB
------------------------------ ----------
SYSAUX 660
SYSTEM 890
TAB_PRIMA_SET 100
TAB_PRODUCT 100
UNDOTBS1 110
USERS 5

on sh1 and sh21

select TABLESPACE_NAME, BYTES/1024/1024 MB from sys.dba_data_files order by tablespace_name;
C001TAB_PRIMA_SET 100
C002TAB_PRIMA_SET 100
C003TAB_PRIMA_SET 100
C004TAB_PRIMA_SET 100
C005TAB_PRIMA_SET 100
C006TAB_PRIMA_SET 100
SYSAUX 660
SYSTEM 890
SYS_SHARD_TS 100
TAB_PRIMA_SET 100
TAB_PRODUCT 100
UNDOTBS1 115
USERS 5

On sharddemo2 (sh1) for example verify that the chunks and chunk tablespaces are created

SQL> select table_name, partition_name, tablespace_name from dba_tab_partitions where tablespace_name like 'C%TAB_PRIMA_SET' order by tablespace_name;
CUSTOMERS CUSTOMERS_P1 C001TAB_PRIMA_SET
ORDERS CUSTOMERS_P1 C001TAB_PRIMA_SET
CUSTOMERS CUSTOMERS_P2 C002TAB_PRIMA_SET
ORDERS CUSTOMERS_P2 C002TAB_PRIMA_SET
CUSTOMERS CUSTOMERS_P3 C003TAB_PRIMA_SET
ORDERS CUSTOMERS_P3 C003TAB_PRIMA_SET
CUSTOMERS CUSTOMERS_P4 C004TAB_PRIMA_SET
ORDERS CUSTOMERS_P4 C004TAB_PRIMA_SET
CUSTOMERS CUSTOMERS_P5 C005TAB_PRIMA_SET
ORDERS CUSTOMERS_P5 C005TAB_PRIMA_SET
CUSTOMERS CUSTOMERS_P6 C006TAB_PRIMA_SET
ORDERS CUSTOMERS_P6 C006TAB_PRIMA_SET

On ORCLCAT

SQL> select table_name from user_tables;
TABLE_NAME
---------------------------------------------------
MLOG$_PRODUCTS
PRODUCTS
CUSTOMERS
ORDERS
RUPD$_PRODUCTS

On sh1

SQL> select table_name from user_tables;
TABLE_NAME
------------------------------------------------------------
PRODUCTS
CUSTOMERS
ORDERS

on sh21

SQL> select table_name from user_tables;
TABLE_NAME
--------------------------------------------------------------
PRODUCTS
CUSTOMERS
ORDERS

Using gdsctl on sharddemo1 we can also see ddl executed by using the show ddl command on the gsm interface

GDSCTL>show ddl
id DDL Text Failed shards
-- -------- -------------
6 grant select_catalog_role to user_shard;
7 grant all privileges to user_shard;
8 grant gsmadmin_role to user_shard;
9 grant dba to user_shard;
10 CREATE TABLESPACE SET TAB_PRIMA_SET u...
11 CREATE TABLESPACE TAB_PRODUCT datafil...
12 CREATE SHARDED TABLE Customers ( Cust...
13 CREATE SHARDED TABLE Orders ( OrderId...
14 create database link "PRODUCTSDBLINK@...
15 CREATE MATERIALIZED VIEW "PRODUCTS" ...

And now our sharding should work. After inserting some data we can see that for duplicated tables whole data are replicated into the different shards
On ORCLCAT for example the number of rows for table products is 9

SQL> select count(*) from products;
COUNT(*)
----------
9

On sh1 as product is a duplucated table, the number of rows should be 9

SQL> select count(*) from products;
COUNT(*)
----------
9

Same for table product in on sh21

SQL> select count(*) from products;
COUNT(*)
----------
9

For sharded tables, we can see that rows are distributed

On ORCLCAT

SQL> select count(*) from customers;
COUNT(*)
----------
14

On sh1

SQL> select count(*) from customers;
COUNT(*)
----------
6

On sh21 number of rows of customers should be 8

SQL> select count(*) from customers;
COUNT(*)
----------
8

Conclusion
In this first part we talked about sharding configuration. We have seen how using Oracle Global Data Services we can create shards. In a second part we will see how to connect to shards and how scalibilty is possible in a shard environment

 

Cet article Sharding with Oracle 12c R2 Part I est apparu en premier sur Blog dbi services.

Speaking at the next SQL Nexus at Copenhagen 2017

Fri, 2017-03-03 06:40

On May 2nd, I will have the chance to speak during the next SQL Nexus event in Copenhagen (1 -3 may) about SQL Server 2016 and availability groups and if I have enough time, you will see what is coming with the SQL Server vNext.

SQL_Nexus2017_1200x627px_linkedIn_post_template_test

This is also a good opportunity to attend to other sessions held by well-known people in the industry like David Klee, Edwin M Sarmiento, Wolfgang Strasser and Uwe Ricken  to name a few ones ..

I’m looking forward to share and learn with the SQL Server community.

Hope to see you there!

 

Cet article Speaking at the next SQL Nexus at Copenhagen 2017 est apparu en premier sur Blog dbi services.

Oracle 12c – Is VKTM always your top process?

Fri, 2017-03-03 05:16

If VKTM is always your top cpu consuming process, then this blog might be something for you. Especially in virtual environments, I have seen often the VKTM process as the top process, even if the VM was idle. So, I am burning CPU without any obvious benefit. So what is the reason for the high CPU consumption? Well … it can a combination of many things like not correctly working NTP, missing VMware Tools, but for and foremost Oracle Bugs. I really don’t know why, but quite a lot of issues have been raised regarding the VKTM process, like the following.

  • Bug 20693049 – 12C VKTM CONSUMING MORE CPU THAN IN 11GR2
  • Bug 20552573 – VKTM PROCESSES FROM ASM AND A CDB DATABASE CONSUME 12% CPU PERMANENTLY.
  • BUG 12883034 – CHANGE THE INTERVAL WHEN VKTM PROCESS WAKES U
  • Bug 20542107 – WARNING: VKTM DETECTED A TIME DRIFT
  • Bug 20138957 – VKTM PROCESS CONSUMING HIGH CPU EVEN AFTER PATCH 18499306
  • Bug 11837095 – “TIME DRIFT DETECTED” APPEARS INTERMITTENTLY IN ALERT LOG, THO’ EVENT 10795 SET.

If you search around in MOS, you probably find even more. Usually VKTM and VKRM issues come together, at least when you are using the resource manager. The VTKM is the Virtual Keeper of Time Process. The VKTM acts as a time publisher for an Oracle instance.  VKTM publishes two sets of time: a wall clock time using a seconds interval and a higher resolution time (which is not wall clock time) for interval measurements.  The VKTM process is a process that is available in ASM instances and RDBMS instances. So if you see issues with VKTM process, it usually popps up on both. VKTM usage is affected mainly by two hidden parameters _timer_precision and _disable_highres_ticks. So, tuning these parameters can bring down VKTM CPU consumption.

The VKRM process is the Virtual Scheduler for Resource Manager Process, and it serves as centralized scheduler for Resource Manager activity. As there is no resource manager on a ASM instance, you will see this process only on RDBMS instances.

Ok .. my test environment is OEL 6.8 with 12.1.0.2 ASM and a 12.1.0.2 database on a virtual guest. To be more precise, it is a Database with PSU 12.1.0.2.161018 and Grid Infrastructure PSU 12.1.0.2.161018. The patch level plays a quite important role. For 11gR2 database, you might need patch 20531619.

So let’s start fixing the ASM VKTM issue first. It warns me all the time, that is has detected a time drift.

$ cat alert_+ASM.log | grep -i "Warning: VKTM detected a time drift."
Warning: VKTM detected a time drift.
Warning: VKTM detected a time drift.
Warning: VKTM detected a time drift.
Warning: VKTM detected a time drift.
Warning: VKTM detected a time drift.
...
...
Warning: VKTM detected a time drift.

Before continuing, it is important that you really don’t have a time drift. On VMware, you might want to consult the following knowledge base article and the time keeping pdf. Both are very good resources. KB Article: 1006427 Timekeeping best practices for Linux guests and http://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/techpaper/Timekeeping-In-VirtualMachines.pdf

Next, check that your ntp is in sync.

$ ntpq -p
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
+aquila.init7.ne 162.23.41.10     2 u   16   64  377   16.375   25.656   3.873
*ntp0.as34288.ne 85.158.25.74     2 u   17   64  377   12.987   27.874   4.045

If your ntp is not in sync, you should stop already here, and correct it. Because in that case, the warning message from Oracle is correct, that VKTM has detected a time drift. Ok. Let’s continue with checking the ASM instance. The 12.1.0.2 ASM defaults regarding VKTM are the following:

Parameter                           Session_Value  Instance_Value Description
----------------------------------- -------------- -------------- --------------------------------------------
_disable_highres_ticks              FALSE          FALSE          disable high-res tick counter
_high_priority_processes            LMS*           LMS*           High Priority Process Name Mask
_highest_priority_processes         VKTM           VKTM           Highest Priority Process Name Mask
_timer_precision                    10             10             VKTM timer precision in milli-sec
_vkrm_schedule_interval             10             10             VKRM scheduling interval
_vktm_assert_thresh                 30             30             soft assert threshold VKTM timer drift

Because I don’t need high resolution ticks on my ASM instance, I am going to disable it, and besides that I am going to disable the excessiv trace by the VKTM process which is done with the 10795 event.

SQL> alter system set "_disable_highres_ticks"=true scope=spfile;

System altered.

SQL> alter system set event="10795 trace name context forever, level 2" scope=spfile;

System altered.

Unfortunately, these changes can’t be done online, and so I have to bounce my ASM instance.

$ srvctl stop asm -f
$ srvctl start asm

My alert.log does not report time drift issues anymore and the VKTM process from my ASM instance  disappeared  from my top process list. As soon as the ASM VKTM process went away, the one from the database popped up. :-)

oracle@oel001:/home/oracle/ [OCM121] top -c
top - 11:17:18 up  1:21,  2 users,  load average: 0.69, 0.77, 0.98
Tasks: 229 total,   2 running, 227 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.7%us,  0.4%sy,  0.0%ni, 98.9%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  10021556k total,  3816024k used,  6205532k free,   209916k buffers
Swap:  6160380k total,        0k used,  6160380k free,   672856k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 5696 oracle    -2   0 1880m  46m  44m S 22.7  0.5   0:23.49 ora_vktm_OCM121

Ok. Let’s fix that one as well. The RDBMS defaults regarding VKTM with 12.1.0.2 are the same as with ASM.

Parameter                           Session_Value  Instance_Value Description
----------------------------------- -------------- -------------- --------------------------------------------
_disable_highres_ticks              FALSE          FALSE          disable high-res tick counter
_high_priority_processes            LMS*           LMS*           High Priority Process Name Mask
_highest_priority_processes         VKTM           VKTM           Highest Priority Process Name Mask
_timer_precision                    10             10             VKTM timer precision in milli-sec
_vkrm_schedule_interval             10             10             VKRM scheduling interval
_vktm_assert_thresh                 30             30             soft assert threshold VKTM timer drift

Without any changes, the tracing for the VKTM and VKRM background processes are enabled, and quite a lot  of information go into these trace files.

Tracing for the VKRM process can be disabled via the following event:

alter system set events '10720 trace name context forever, level 0x10000000';

Tracing for the VKRM process can be disabled via the following event:

alter system set events '10795 trace name context forever, level 2';

Because I don’t need any of those, I am going to disable both in one shot.

SQL> alter system set event='10720 trace name context forever, level 0x10000000','10795 trace name context forever, level 2' comment='Turn off VKRM tracing and turn off VKTM tracing' scope=spfile;

System altered.

And like on the ASM instance, I don’t need the high-resolution ticks here as well.

SQL> alter system set "_disable_highres_ticks"=true scope=spfile;

System altered.

After the restart of the database, the extensive traceing and cpu usage went away.

oracle@oel001:/home/oracle/ [OCM121] srvctl stop database -d OCM121
oracle@oel001:/home/oracle/ [OCM121] srvctl start database -d OCM121

I am not seeing the VKTM process in my top processes anymore. Beforehand, even on an idle system, the VKTM from the ASM and the one from the RDBMS instance have always been at the top.

oracle@oel001:/home/oracle/ [OCM121] top -c
top - 11:29:06 up  1:33,  2 users,  load average: 0.19, 0.69, 0.69
Tasks: 233 total,   2 running, 231 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.8%us,  0.5%sy,  0.0%ni, 98.7%id,  0.1%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  10021556k total,  3839340k used,  6182216k free,   211884k buffers
Swap:  6160380k total,        0k used,  6160380k free,   686380k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 3643 grid      20   0 1559m  84m  39m S  1.3  0.9   0:55.64 /u00/app/grid/12.1.0.2/bin/oraagent.bin
 3660 grid      20   0  329m  28m  22m S  1.0  0.3   0:28.15 /u00/app/grid/12.1.0.2/bin/evmd.bin
 3517 grid      20   0 1507m  69m  48m S  0.7  0.7   0:34.74 /u00/app/grid/12.1.0.2/bin/ohasd.bin reboot
 3738 grid      20   0  262m  26m  21m S  0.7  0.3   0:28.03 /u00/app/grid/12.1.0.2/bin/evmlogger.bin -o /u00/app/grid/12.1
 3757 grid      20   0  791m  32m  23m S  0.7  0.3   0:31.30 /u00/app/grid/12.1.0.2/bin/cssdagent
Conclusion

Especially in virtual environment I have seen often a quite high cpu usage by the VKTM process, so take care that your time keeping via NTP is setup correctly. After NTP is running smoothly, you might want to disable the high-resolution ticks and disable the extensive tracing by the VKTM and VKRM processes. Obviously, this is not a general recommendation. You should test it yourself.

 

Cet article Oracle 12c – Is VKTM always your top process? est apparu en premier sur Blog dbi services.

Oracle 12c – How to Recover lost DataGuard Broker Configuration Files

Tue, 2017-02-28 07:53

If you are using RMAN to back up your database, you are already doing the right thing. However, RMAN does not take care of everything. e.g. it is not backing up the following things, to mention just a few.

  • Oracle Home
  • Grid Infrastructure Home
  • Data Guard broker files
  • Password File
  • SQL*Net file like ldap.ora, sqlnet.ora, tnsnames.ora and listener.ora
  • /etc/oratab
  • OS audit files
  • Wallets
  • /etc/sysctl.conf and limits.conf
  • OLR and OCR
  • Voting Disks
  • ASM Metadata
  • passwd, shadow, group
  • RMAN scripts itself ksh/rcv (some exceptions when using the RMAN catalog)
  • crontab

There are for sure many more, but in this article I would like to take a closer look at the Data Guard broker configuraiton files. The DataGuard Broker Configuration Files are quite important files. The contain entries that describe the state and the properties of the DataGuard configuration like the sites and databases that are part of the configuration, the roles and properties of each of the databases, and the state of each of the elements of the configuration. The broker configuration files are very small, in my case they have only 12K. Don’t expect them to grow very big. They usually stay at this size.

oracle@dbidg01:/u01/app/oracle/admin/DBIT121/pfile/ [DBIT121] ls -lh dr*
-rw-r----- 1 oracle oinstall 12K Feb 28 09:27 dr1DBIT121_SITE1.dat
-rw-r----- 1 oracle oinstall 12K Feb 28 09:34 dr2DBIT121_SITE1.dat

You can have a maximum of two different copies which can be configured via the DG_BROKER_CONFIG_FILEn (where n = 1, 2) parameter. Maybe, maximum is not the correct word in this context, because if you don’t specify the broker file location, Oracle still creates one in the default directory which is OS dependent. On Linux they end up in $ORACLE_HOME/dbs/.

However, not like you might expect, the broker configuration files are not a 1:1 mirror like e.g. redo log members or controlfiles. They are individual copies and they maintain the last known good state of the configuration. You can check it quite easily yourself by editing e.g. the FastStartFailoverThreshold and check the time stamps of the broker files afterwards.

DGMGRL> EDIT CONFIGURATION SET PROPERTY FastStartFailoverThreshold='40';
Property "faststartfailoverthreshold" updated

-- Primary
oracle@dbidg01:/u01/app/oracle/admin/DBIT121/pfile/ [DBIT121] ls -l dr*
-rw-r----- 1 oracle oinstall 12288 Feb 28 09:27 dr1DBIT121_SITE1.dat
-rw-r----- 1 oracle oinstall 12288 Feb 28 09:34 dr2DBIT121_SITE1.dat

-- Standby
oracle@dbidg02:/u01/app/oracle/admin/DBIT121/pfile/ [DBIT121] ls -l dr*
-rw-r----- 1 oracle oinstall 12288 Feb 28 09:34 dr1DBIT121_SITE2.dat
-rw-r----- 1 oracle oinstall 12288 Feb 28 09:27 dr2DBIT121_SITE2.dat

As you can see here, the broker configuration files have two different time stamps. One with the last know good state at 09:27 and one with new state and 09:34. When the broker is started for the first time, it is expected that you see only one configuration file. But don’t panic, the other will be created by the next updates done by the broker.

The broker configuration file is a binary file, however the readable contents of that file can be extracted via the strings command. It gives an idea of some contents of the broker file. e.g. it shows you that the broker config file is brought to you by the fine folks from NEDC, whoever they are. :-) With 12cR1 is looks like the following.

oracle@dbidg01:/u01/app/oracle/admin/DBIT121/pfile/ [DBIT121] file dr2DBIT121_SITE1.dat
dr2DBIT121_SITE1.dat: data

oracle@dbidg01:/u01/app/oracle/admin/DBIT121/pfile/ [DBIT121] strings dr2DBIT121_SITE1.dat
}|{z
DBIT121
DBIT121_SITE1
DBIT121_SITE2
DBIT121_SITE2
Brought to you by the fine folks at NEDC.
TRUETRUEdbidg03ALLDBIT121FALSECONTINUE
DBIT121_SITE1DBIT121_SITE1FALSEFALSEFALSESYNCoptionalDISABLEON1,1948637,931101276,0*AUTO0,0,00,0,0AUTODBIT121_SITE2
dbidg01DBIT121DBIT121(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=dbidg01)(PORT=1521))(CONNECT_DATA=(SERVICE_NA
ME=DBIT121_SITE1_DGMGRL)(INSTANCE_NAME=DBIT121)(SERVER=DEDICATED)))USE_DB_RECOVERY_FILE_DEST%t_%s_%r.dbf4
DBIT121_SITE2DBIT121_SITE2FALSEFALSEFALSESYNCoptionalDISABLEON1,1948637,931101276,0*DBIT121AUTO0,0,00,0,0AUTODBIT121_SITE1
dbidg02DBIT121DBIT121(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=dbidg02)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=DBIT121_SITE2_DGMGRL)(INSTANCE_NAME=DBIT121)(SERVER=DEDICATED)))USE_DB_RECOVERY_FILE_DEST%t_%s_%r.dbf

Things are changing with Oracle 12cR2. The broker config files are still binary files, however the readable content via the strings command is very good now. You can get a lot of very useful information out of it. And even with Oracle 12cR2, it looks like that it is still brought to you by the same fine folks from NEDC. :-) Looks like that the Oracle developers have a sort of humor. :-)

oracle@dbidg01:/u01/app/oracle/admin/DBIT122/pfile/ [DBIT122] file dr2DBIT122_SITE1.dat
dr2DBIT122_SITE1.dat: data
oracle@dbidg01:/u01/app/oracle/admin/DBIT122/pfile/ [DBIT122] strings dr2DBIT122_SITE1.dat
}|{z
DBIT122
DBIT122_SITE1
Brought to you by the fine folks at NEDC.
<?xml version="1.0" encoding="UTF-8"?>
<DRC Version="12.2.0.1.0" Name="DBIT122" CurrentPath="True">
  <PlannedState>OFFLINE</PlannedState>
  <Status>
    <Severity>Success</Severity>
    <Error>0</Error>
  </Status>
  <DefaultState>ONLINE</DefaultState>
  <IntendedState>ONLINE</IntendedState>
  <MIV PropertyID="1">0</MIV>
  <PRIMARY_SITE_ID PropertyID="26">513</PRIMARY_SITE_ID>
  <DRC_UNIQUE_ID PropertyID="4">152596437</DRC_UNIQUE_ID>
  <DRC_UNIQUE_ID_SEQUENCE PropertyID="5">18</DRC_UNIQUE_ID_SEQUENCE>
  <EXT_COND PropertyID="29">7</EXT_COND>
  <OVERALL_PROTECTION_MODE PropertyID="30">2</OVERALL_PROTECTION_MODE>
  <FastStartFailoverMode PropertyID="32">0</FastStartFailoverMode>
  <FSFO_MIV PropertyID="33">11</FSFO_MIV>
  <FastStartFailoverOBID1 PropertyID="119">470173189</FastStartFailoverOBID1>
  <FastStartFailoverOBID2 PropertyID="120">470173183</FastStartFailoverOBID2>
  <FastStartFailoverOBID3 PropertyID="121">470173184</FastStartFailoverOBID3>
  <ObserverVersion1 PropertyID="133">0</ObserverVersion1>
  <Configuration_Name PropertyID="31">DBIT122</Configuration_Name>
  <ObserverName1 PropertyID="129">dbidg03</ObserverName1>
  <ConfigurationWideServiceName PropertyID="132">DBIT122_CFG</ConfigurationWideServiceName>
  <RoleChangeHistory PropertyID="106">
    <RoleChangeRecord>
      <Event>PhysicalFailover</Event>
      <OldPrimary>DBIT122_SITE1</OldPrimary>
      <NewPrimary>DBIT122_SITE2</NewPrimary>
      <Status>0</Status>
      <Timestamp>931098450</Timestamp>
    </RoleChangeRecord>
    <RoleChangeRecord>
      <Event>PhysicalFailover</Event>
      <OldPrimary>DBIT122_SITE2</OldPrimary>
      <NewPrimary>DBIT122_SITE1</NewPrimary>
      <Status>0</Status>
      <Timestamp>931098812</Timestamp>
    </RoleChangeRecord>
    <RoleChangeRecord>
      <Event>PhysicalFailover</Event>
      <OldPrimary>DBIT122_SITE1</OldPrimary>
      <NewPrimary>DBIT122_SITE2</NewPrimary>
      <Status>0</Status>
      <Timestamp>932306689</Timestamp>
    </RoleChangeRecord>
    <RoleChangeRecord>
      <Event>PhysicalSwitchover</Event>
      <OldPrimary>DBIT122_SITE2</OldPrimary>
      <NewPrimary>DBIT122_SITE1</NewPrimary>
      <Status>0</Status>
      <Timestamp>932307856</Timestamp>
    </RoleChangeRecord>
    <RoleChangeRecord>
      <Event>PhysicalSwitchover</Event>
      <OldPrimary>DBIT122_SITE1</OldPrimary>
      <NewPrimary>DBIT122_SITE2</NewPrimary>
      <Status>0</Status>
      <Timestamp>932377455</Timestamp>
    </RoleChangeRecord>
    <RoleChangeRecord>
      <Event>PhysicalSwitchover</Event>
      <OldPrimary>DBIT122_SITE2</OldPrimary>
      <NewPrimary>DBIT122_SITE1</NewPrimary>
      <Status>0</Status>
      <Timestamp>932381717</Timestamp>
    </RoleChangeRecord>
    <RoleChangeRecord>
      <Event>PhysicalSwitchover</Event>
      <OldPrimary>DBIT122_SITE1</OldPrimary>
      <NewPrimary>DBIT122_SITE2</NewPrimary>
      <Status>0</Status>
      <Timestamp>932382294</Timestamp>
    </RoleChangeRecord>
    <RoleChangeRecord>
      <Event>PhysicalSwitchover</Event>
      <OldPrimary>DBIT122_SITE2</OldPrimary>
      <NewPrimary>DBIT122_SITE1</NewPrimary>
      <Status>0</Status>
      <Timestamp>932383387</Timestamp>
    </RoleChangeRecord>
    <RoleChangeRecord>
      <Event>PhysicalSwitchover</Event>
      <OldPrimary>DBIT122_SITE1</OldPrimary>
      <NewPrimary>DBIT122_SITE2</NewPrimary>
      <Status>0</Status>
      <Timestamp>934017954</Timestamp>
    </RoleChangeRecord>
    <RoleChangeRecord>
      <Event>PhysicalSwitchover</Event>
      <OldPrimary>DBIT122_SITE2</OldPrimary>
      <NewPrimary>DBIT122_SITE1</New
Primary>
      <Status>0</Status>
      <Timestamp>934018548</Timestamp>
    </RoleChangeRecord>
  </RoleChangeHistory>
  <Member MemberID="1" Name="DBIT122_SITE1" CurrentPath="True" Enabled="True" MultiInstanced="True">
    <PlannedState>STANDBY</PlannedState>
    <StandbyRole>PhysicalStandby</StandbyRole>
    <Status>
      <Severity>Success</Severity>
      <Error>0</Error>
    </Status>
    <DefaultState>PRIMARY</DefaultState>
    <IntendedState>PRIMARY</IntendedState>
    <ResourceType>Database</ResourceType>
    <CurrentState>PRIMARY</CurrentState>
    <Role>
      <ConditionState>PRIMARY</ConditionState>
      <DefaultState>READ-WRITE-XPTON</DefaultState>
      <IntendedState>READ-WRITE-XPTON</IntendedState>
    </Role>
    <Role>
      <ConditionState>STANDBY</ConditionState>
      <DefaultState>PHYSICAL-APPLY-ON</DefaultState>
      <IntendedState>OFFLINE</IntendedState>
    </Role>
    <DB_Unique_Name PropertyID="23">DBIT122_SITE1</DB_Unique_Name>
    <DGConnectIdentifier PropertyID="6">DBIT122_SITE1</DGConnectIdentifier>
    <DbDomain PropertyID="37"/>
    <ClusterDatabase PropertyID="42">FALSE</ClusterDatabase>
    <DbChangeCritical PropertyID="8">FALSE</DbChangeCritical>
    <DbIsCritical PropertyID="9">FALSE</DbIsCritical>
    <LogXptMode PropertyID="40">SYNC</LogXptMode>
    <IncarnationTable PropertyID="57">6,2568637,932306696,5*5,2514031,931098817,4#4,2513489,931098453,2#2,1396169,929894741,1#1,1,924281211,0#</IncarnationTable>
    <SRLStatus PropertyID="58">0</SRLStatus>
    <ActualApplyInstance PropertyID="7"/>
    <StandbyFileManagement PropertyID="72">AUTO</StandbyFileManagement>
    <ArchiveLagTarget PropertyID="73">0</ArchiveLagTarget>
    <LogArchiveMaxProcesses PropertyID="74">4</LogArchiveMaxProcesses>
    <LogArchiveMinSucceedDest PropertyID="75">1</LogArchiveMinSucceedDest>
    <DataGuardSyncLatency PropertyID="138">0</DataGuardSyncLatency>
    <DbFileNameConvert PropertyID="76"/>
    <LogFileNameConvert PropertyID="77"/>
    <FastStartFailoverTarget PropertyID="38">DBIT122_SITE2</FastStartFailoverTarget>
    <ReinstateContextArray PropertyID="39"/>
    <Instance InstanceID="1" Name="DBIT122" CurrentPath="True" Enabled="True" MultiInstanced="True" DefaultWriteOnce="True">
      <PlannedState>OFFLINE</PlannedState>
      <HostName PropertyID="2" Default="True">dbidg01</HostName>
      <SidName PropertyID="3">DBIT122</SidName>
      <InstanceName PropertyID="36">DBIT122</InstanceName>
      <StaticConnectIdentifier PropertyID="25">(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=dbidg01)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=DBIT122_SITE1_DGMGRL)(UR=A)(INSTANCE_NAME=DBIT122)(SERVER=DEDICATED)))</StaticConnectIdentifier>
      <StandbyArchiveLocation PropertyID="96" Default="True">USE_DB_RECOVERY_FILE_DEST</StandbyArchiveLocation>
      <LogArchiveTrace PropertyID="98">0</LogArchiveTrace>
      <LogArchiveFormat PropertyID="99">%t_%s_%r.dbf</LogArchiveFormat>
    </Instance>
  </Member>
  <Member MemberID="2" Name="DBIT122_SITE2" CurrentPath="True" Enabled="True" MultiInstanced="True">
    <PlannedState>STANDBY</PlannedState>
    <StandbyRole>PhysicalStandby</StandbyRole>
    <Status>
      <Severity>Success</Severity>
      <Error>0</Error>
    </Status>
    <DefaultState>STANDBY</DefaultState>
    <IntendedState>STANDBY</IntendedState>
    <ResourceType>Database</ResourceType>
    <CurrentState>STANDBY</CurrentState>
    <Role>
      <ConditionState>PRIMARY</ConditionState>
      <DefaultState>READ-WRITE-XPTON</DefaultState>
      <IntendedState>OFFLINE</IntendedState>
    </Role>
    <Role>
      <ConditionState>STANDBY</ConditionState>
      <DefaultState>PHYSICAL-APPLY-ON</DefaultState>
      <IntendedState>PHYSICAL-APPLY-ON</IntendedState>
    </Role>
    <DB_Unique_Name PropertyID="23">DBIT122_SITE2</DB_Unique_Name>
    <DGConnectIdentifier PropertyID="6">DBIT122_SITE2</DGConnectIdentifier>
    <DbDomain PropertyID="37"/>
    <ClusterDatabase PropertyID="42">FALSE</ClusterDatabase>
    <DbChangeCritical PropertyID="8">FALSE</DbChangeCritical>
    <DbIsCritical PropertyID="9">FALSE</DbIsC
ritical>
    <LogXptMode PropertyID="40">SYNC</LogXptMode>
    <IncarnationTable PropertyID="57">8,2568637,932306696,4*4,2514031,931098817,3#3,2513489,931098453,2#2,1396169,929894741,1#1,1,924281211,0#</IncarnationTable>
    <SRLStatus PropertyID="58">0</SRLStatus>
    <ActualApplyInstance PropertyID="7">DBIT122</ActualApplyInstance>
    <StandbyFileManagement PropertyID="72">AUTO</StandbyFileManagement>
    <ArchiveLagTarget PropertyID="73">0</ArchiveLagTarget>
    <LogArchiveMaxProcesses PropertyID="74">4</LogArchiveMaxProcesses>
    <LogArchiveMinSucceedDest PropertyID="75">1</LogArchiveMinSucceedDest>
    <DataGuardSyncLatency PropertyID="138">0</DataGuardSyncLatency>
    <DbFileNameConvert PropertyID="76"/>
    <LogFileNameConvert PropertyID="77"/>
    <FastStartFailoverTarget PropertyID="38">DBIT122_SITE1</FastStartFailoverTarget>
    <ReinstateContextArray PropertyID="39"/>
    <Instance InstanceID="1" Name="DBIT122" CurrentPath="True" Enabled="True" MultiInstanced="True" DefaultWriteOnce="True">
      <PlannedState>OFFLINE</PlannedState>
      <HostName PropertyID="2" Default="True">dbidg02</HostName>
      <SidName PropertyID="3">DBIT122</SidName>
      <InstanceName PropertyID="36">DBIT122</InstanceName>
      <StaticConnectIdentifier PropertyID="25">(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=dbidg02)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=DBIT122_SITE2_DGMGRL)(UR=A)(INSTANCE_NAME=DBIT122)(SERVER=DEDICATED)))</StaticConnectIdentifier>
      <StandbyArchiveLocation PropertyID="96" Default="True">USE_DB_RECOVERY_FILE_DEST</StandbyArchiveLocation>
      <LogArchiveTrace PropertyID="98">0</LogArchiveTrace>
      <LogArchiveFormat PropertyID="99">%t_%s_%r.dbf</LogArchiveFormat>
    </Instance>
  </Member>
</DRC>

In 12cR2, Oracle put more information into the broker files and that’s why they are getting bigger. With a standard config they have now 16k instead of 12k like they had before with 12cR1. Ok .. it looks like I am drifting away. Let’s get back to the original question, how do I recover the broker configuration files, in case they get lost? Like always … it depends. You could lose the old copy on the standby, you could lose the new copy on the standby, or you could lose both copies on the standby, and you could lose them while the standby is up and running or while it is shutdown. The same applies to the primary. It might get even more complicated in case you have a far sync database in between, or more standby’s.

Not making it too complex, in the end, we might end up with 6 different recovery scenarios.

1.) We lose the old copy of the broker config file on either the primary or the standby
2.) We lose both copies of the broker config file on the standby while it is up and running
3.) We lose both copies of the broker config file on the standby while it is shut down
4.) We lose both copies of the broker config file on the primary while it is up and running
5.) We lose both copies of the broker config file on the primary while it is shut down
6.) We lose both copies of the broker contfig files on the primary and the standby (quite unlikely to happen)

My demos are built on a simple primary/standby setup with 12cR1. The primary is on host dbidg01 and the standby is currently on host dbidg02.

DGMGRL> show configuration;

Configuration - DBIT121

  Protection Mode: MaxAvailability
  Members:
  DBIT121_SITE1 - Primary database
    DBIT121_SITE2 - Physical standby database

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS   (status updated 2 seconds ago)

oracle@dbidg01:/u01/app/oracle/admin/DBIT121/pfile/ [DBIT121] ls -lh dr*
-rw-r----- 1 oracle oinstall 12K Feb 28 11:03 dr1DBIT121_SITE1.dat
-rw-r----- 1 oracle oinstall 12K Feb 28 11:04 dr2DBIT121_SITE1.dat

oracle@dbidg02:/u01/app/oracle/admin/DBIT121/pfile/ [DBIT121] ls -lh dr*
-rw-r----- 1 oracle oinstall 12K Feb 28 11:04 dr1DBIT121_SITE2.dat
-rw-r----- 1 oracle oinstall 12K Feb 28 11:03 dr2DBIT121_SITE2.dat
Scenario 1.) We lose the old copy of the broker config file on either the primary or the standby

There is absolutely not need to panic in this case. I am deleting the old copies on the primary and standby at the same time.

-- primary
oracle@dbidg01:/u01/app/oracle/admin/DBIT121/pfile/ [DBIT121] rm dr1DBIT121_SITE1.dat
oracle@dbidg01:/u01/app/oracle/admin/DBIT121/pfile/ [DBIT121]

-- standby
oracle@dbidg02:/u01/app/oracle/admin/DBIT121/pfile/ [DBIT121] rm dr2DBIT121_SITE2.dat
oracle@dbidg02:/u01/app/oracle/admin/DBIT121/pfile/ [DBIT121]

The broker does not care at all.

DGMGRL> show configuration;

Configuration - DBIT121

  Protection Mode: MaxAvailability
  Members:
  DBIT121_SITE1 - Primary database
    DBIT121_SITE2 - Physical standby database

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS   (status updated 57 seconds ago)

The second copy will be created automatically as soon as I am changing a property. e.g. I am setting the FastStartFailoverThreshold to the same value as it had before.

DGMGRL> EDIT CONFIGURATION SET PROPERTY FastStartFailoverThreshold='40';
Property "faststartfailoverthreshold" updated

The broker log has no information about the new created file and also the alert.log does not say anything, but automagically we end up with two broker config files after the update was done. So .. nothing to do here. Oracle handles this itself.

oracle@dbidg01:/u01/app/oracle/diag/rdbms/dbit121_site1/DBIT121/trace/ [DBIT121] tail -20f drcDBIT121.log
...
02/28/2017 11:10:25
EDIT CONFIGURATION SET PROPERTY faststartfailoverthreshold = 40
FSFO threshold value set to 40 seconds

oracle@dbidg01:/u01/app/oracle/admin/DBIT121/pfile/ [DBIT121] ls -l dr*
-rw-r----- 1 oracle oinstall 12288 Feb 28 11:10 dr1DBIT121_SITE1.dat
-rw-r----- 1 oracle oinstall 12288 Feb 28 11:04 dr2DBIT121_SITE1.dat

oracle@dbidg02:/u01/app/oracle/admin/DBIT121/pfile/ [DBIT121] ls -lh dr*
-rw-r----- 1 oracle oinstall 12K Feb 28 11:04 dr1DBIT121_SITE2.dat
-rw-r----- 1 oracle oinstall 12K Feb 28 11:10 dr2DBIT121_SITE2.dat

 

2.) We lose both copies of the broker config file on the standby while it is up and running

But what happens if I lose both copies on the standby while it is up and running.

oracle@dbidg02:/u01/app/oracle/admin/DBIT121/pfile/ [DBIT121] rm dr*.dat
oracle@dbidg02:/u01/app/oracle/admin/DBIT121/pfile/ [DBIT121] ls -lh dr*
ls: cannot access dr*: No such file or directory

I am logging in into the standby database and check the configuration. The broker does not care at all if the files are there or not. It seems like nothing has happend.

oracle@dbidg02:/home/oracle/ [DBIT121] dgmgrl -debug -xml
DGMGRL for Linux: Version 12.1.0.2.0 - 64bit Production

Copyright (c) 2000, 2013, Oracle. All rights reserved.

Welcome to DGMGRL, type "help" for information.
DGMGRL> connect sys/manager@DBIT121_SITE2
[W000 02/28 11:17:00.64] Connecting to database using DBIT121_SITE2.
[W000 02/28 11:17:01.72] Checking broker version [BEGIN :version := dbms_drs.dg_broker_info('VERSION'); END;].
[W000 02/28 11:17:01.72] Oracle database version is '12.1.0.2.0'
Connected as SYSDBA.
DGMGRL> show configuration;

Configuration - DBIT121

  Protection Mode: MaxAvailability
  Members:
  DBIT121_SITE1 - Primary database
    DBIT121_SITE2 - Physical standby database

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS   (status updated 8 seconds ago)

The broker is reading the information from memory, maybe because of performance reasons. The current state is also reflected in the x$drc.

SQL> desc x$drc
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 ADDR                                               RAW(8)
 INDX                                               NUMBER
 INST_ID                                            NUMBER
 CON_ID                                             NUMBER
 OBJECT_ID                                          NUMBER
 ATTRIBUTE                                          VARCHAR2(30)
 VALUE                                              VARCHAR2(512)
 PARENT_ID                                          VARCHAR2(15)
 STATUS                                             VARCHAR2(30)
 MESSAGE                                            VARCHAR2(256)
 ERRNUM                                             NUMBER
 VALUE_RAW                                          RAW(512)
 ERRTIME                                            NUMBER

So … how do we get the broker files back?  By simply editing any property. You can choose any property you want.

oracle@dbidg02:/home/oracle/ [DBIT121] dgmgrl -debug -xml
DGMGRL for Linux: Version 12.1.0.2.0 - 64bit Production

Copyright (c) 2000, 2013, Oracle. All rights reserved.

Welcome to DGMGRL, type "help" for information.
DGMGRL> connect sys/manager@DBIT121_SITE2
[W000 02/28 11:27:58.23] Connecting to database using DBIT121_SITE2.
[W000 02/28 11:27:59.33] Checking broker version [BEGIN :version := dbms_drs.dg_broker_info('VERSION'); END;].
[W000 02/28 11:27:59.33] Oracle database version is '12.1.0.2.0'
Connected as SYSDBA.
DGMGRL> EDIT CONFIGURATION SET PROPERTY FastStartFailoverThreshold='40';
<DO_CONFIGURE_DRC version="12.1"><EDIT_DRC><PROPERTY name="faststartfailoverthreshold" value="40" context="0"/></EDIT_DRC></DO_CONFIGURE_DRC>
<RESULT ><MESSAGE ><SUCCESS ></SUCCESS></MESSAGE><MESSAGE ><SUCCESS ></SUCCESS></MESSAGE></RESULT>
<RESULT ><MESSAGE ><SUCCESS ></SUCCESS></MESSAGE><MESSAGE ><SUCCESS ></SUCCESS></MESSAGE></RESULT>
Property "faststartfailoverthreshold" updated

As soon as you have edited your configuration, a new broker file appears on the standby.

oracle@dbidg02:/u01/app/oracle/admin/DBIT121/pfile/ [DBIT121] ls -lh dr*
-rw-r----- 1 oracle oinstall 12K Feb 28 11:28 dr1DBIT121_SITE2.dat

If you run the same command again, the second one appears as well.

oracle@dbidg02:/u01/app/oracle/admin/DBIT121/pfile/ [DBIT121] ls -lh dr*
-rw-r----- 1 oracle oinstall 12K Feb 28 11:28 dr1DBIT121_SITE2.dat
-rw-r----- 1 oracle oinstall 12K Feb 28 11:30 dr2DBIT121_SITE2.dat

 

3.) We lose both copies of the broker config file on the standby while it is shut down

What happens if we lose both copies while the standby is shut down. Does the standby come up correctly? Is there something in the drc.log or alert.log?

SQL> shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.

oracle@dbidg02:/u01/app/oracle/admin/DBIT121/pfile/ [DBIT121] rm dr*.dat
oracle@dbidg02:/u01/app/oracle/admin/DBIT121/pfile/ [DBIT121]

SQL> startup mount
ORACLE instance started.

Total System Global Area 1325400064 bytes
Fixed Size                  2924112 bytes
Variable Size             352321968 bytes
Database Buffers          956301312 bytes
Redo Buffers               13852672 bytes
Database mounted.

The standby comes up and no entries in the alert.log, however the drc.log shows that the broker files are missing.

02/28/2017 11:35:26
>> Starting Data Guard Broker bootstrap <<
Broker Configuration File Locations:
      dg_broker_config_file1 = "/u01/app/oracle/admin/DBIT121/pfile/dr1DBIT121_SITE2.dat"
      dg_broker_config_file2 = "/u01/app/oracle/admin/DBIT121/pfile/dr2DBIT121_SITE2.dat"
2017-02-28 11:35:26.313                      DMON: Attach state object
2017-02-28 11:35:26.314                      DMON: cannot open configuration file "/u01/app/oracle/admin/DBIT121/pfile/dr1DBIT121_SITE2.dat", retrying
2017-02-28 11:35:27.334                      DMON: cannot open configuration file "/u01/app/oracle/admin/DBIT121/pfile/dr1DBIT121_SITE2.dat"
2017-02-28 11:35:27.334                        ORA-27037: unable to obtain file status
2017-02-28 11:35:27.335                        Linux-x86_64 Error: 2: No such file or directory
2017-02-28 11:35:27.335                        Additional information: 3
2017-02-28 11:35:27.335                      DMON: Error opening "/u01/app/oracle/admin/DBIT121/pfile/dr1DBIT121_SITE2.dat", error = ORA-16572
2017-02-28 11:35:27.335                      DMON: Establishing "/u01/app/oracle/admin/DBIT121/pfile/dr2DBIT121_SITE2.dat" as the more current file
2017-02-28 11:35:27.335                      DMON: cannot open configuration file "/u01/app/oracle/admin/DBIT121/pfile/dr2DBIT121_SITE2.dat", retrying
2017-02-28 11:35:28.355                      DMON: cannot open configuration file "/u01/app/oracle/admin/DBIT121/pfile/dr2DBIT121_SITE2.dat"
...
2017-02-28 11:35:42.893                      Configuration does not exist, Data Guard broker ready
2017-02-28 11:35:42.894 7fffffff           0 DMON: Entered rfm_release_chief_lock() for CTL_BOOTSTRAP


DGMGRL> connect sys/manager@DBIT121_SITE2
[W000 02/28 11:38:50.58] Connecting to database using DBIT121_SITE2.
[W000 02/28 11:38:51.67] Checking broker version [BEGIN :version := dbms_drs.dg_broker_info('VERSION'); END;].
[W000 02/28 11:38:51.68] Oracle database version is '12.1.0.2.0'
Connected as SYSDBA.
DGMGRL> show configuration;
ORA-16532: Oracle Data Guard broker configuration does not exist

Configuration details cannot be determined by DGMGRL


SQL> select attribute, value from x$drc;
select attribute, value from x$drc
                             *
ERROR at line 1:
ORA-16532: Oracle Data Guard broker configuration does not exist

If we logon to the primary, we can see that the standby DBIT121_SITE2 was disabled.

oracle@dbidg01:/u01/app/oracle/admin/DBIT121/pfile/ [DBIT121] dgmgrl -debug -xml
DGMGRL for Linux: Version 12.1.0.2.0 - 64bit Production

Copyright (c) 2000, 2013, Oracle. All rights reserved.

Welcome to DGMGRL, type "help" for information.
DGMGRL> connect sys/manager@DBIT121_SITE1
[W000 02/28 11:45:40.56] Connecting to database using DBIT121_SITE1.
[W000 02/28 11:45:41.68] Checking broker version [BEGIN :version := dbms_drs.dg_broker_info('VERSION'); END;].
[W000 02/28 11:45:41.68] Oracle database version is '12.1.0.2.0'
Connected as SYSDBA.
DGMGRL> show configuration;

Configuration - DBIT121

  Protection Mode: MaxAvailability
  Members:
  DBIT121_SITE1 - Primary database
    Warning: ORA-16629: database reports a different protection level from the protection mode

    DBIT121_SITE2 - Physical standby database (disabled)

Fast-Start Failover: DISABLED

Configuration Status:
WARNING   (status updated 7 seconds ago)

However, after we enable the standby, we can see in the drc.log that the standby receives the metadata from the primary and creates the broker config file.

DGMGRL> enable database 'DBIT121_SITE2';
<DO_CONTROL version="12.1"><DO_COMMAND type="Enable" object_id="33554432"/></DO_CONTROL>
<RESULT ><MESSAGE ><SUCCESS ></SUCCESS></MESSAGE><MESSAGE ><SUCCESS ></SUCCESS></MESSAGE></RESULT>
Enabled.
DGMGRL>


drc.log

2017-02-28 11:46:03.352                      DRCX: Start receiving metadata file: "/u01/app/oracle/admin/DBIT121/pfile/dr1DBIT121_SITE2.dat"
2017-02-28 11:46:03.356                      DRCX: Receiving block #1 (containing Seq.MIV = 2.20), 2 blocks
2017-02-28 11:46:03.358                      DRCX: End receiving metadata file: opcode CTL_ENABLE (1.1.631286030)
2017-02-28 11:46:03.360                      DMON: Entered rfm_get_chief_lock() for CTL_ENABLE, reason 1
2017-02-28 11:46:03.360 02001000   631286030 DMON: start task execution: for metadata resynchronization
2017-02-28 11:46:03.360 02001000   631286030 DMON: status from posting standby instances for RESYNCH = ORA-00000
2017-02-28 11:46:03.360                      INSV: Received message for inter-instance publication
2017-02-28 11:46:03.361                            req ID 1.1.631286030, opcode CTL_ENABLE, phase RESYNCH, flags 8005
2017-02-28 11:46:03.361 02001000   631286030 DMON: Metadata available (1.1.631286030), loading from "/u01/app/oracle/admin/DBIT121/pfile/dr1DBIT121_SITE2.dat"
2017-02-28 11:46:03.361 02001000   631286030       Opcode = CTL_ENABLE, Chief Instance I_ID = 1
2017-02-28 11:46:03.364                      DMON Registering service DBIT121_SITE2_DGB with listener(s)
2017-02-28 11:46:03.364                      DMON: Executing SQL [ALTER SYSTEM REGISTER]
2017-02-28 11:46:03.365                      SQL [ALTER SYSTEM REGISTER] Executed successfully
02/28/2017 11:46:06
Creating process RSM0


oracle@dbidg02:/u01/app/oracle/admin/DBIT121/pfile/ [DBIT121] ls -lh dr*
-rw-r----- 1 oracle oinstall 12K Feb 28 11:46 dr1DBIT121_SITE2.dat

Again, if you do any property change again, the second broker file will be created.

4.) We lose both copies of the broker config file on the primary while it is up and running

There is not much difference with the scenario we have seen with the standby. The broker just reads from memory and as soon as any update to the config is done, the broker file is created again. There will be no entries in the drc.log or the alert.log. The broker file is just silently recreated.

oracle@dbidg01:/u01/app/oracle/admin/DBIT121/pfile/ [DBIT121] rm dr*
oracle@dbidg01:/u01/app/oracle/admin/DBIT121/pfile/ [DBIT121] ls -lh dr*
ls: cannot access dr*: No such file or directory

oracle@dbidg01:/u01/app/oracle/admin/DBIT121/pfile/ [DBIT121] dgmgrl -debug -xml
DGMGRL for Linux: Version 12.1.0.2.0 - 64bit Production

Copyright (c) 2000, 2013, Oracle. All rights reserved.

Welcome to DGMGRL, type "help" for information.
DGMGRL> connect sys/manager
[W000 02/28 12:28:45.07] Connecting to database using .
[W000 02/28 12:28:45.10] Checking broker version [BEGIN :version := dbms_drs.dg_broker_info('VERSION'); END;].
[W000 02/28 12:28:45.10] Oracle database version is '12.1.0.2.0'
Connected as SYSDG.
DGMGRL> show configuration;

Configuration - DBIT121

  Protection Mode: MaxAvailability
  Members:
  DBIT121_SITE1 - Primary database
    DBIT121_SITE2 - Physical standby database

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS   (status updated 9 seconds ago)


DGMGRL> EDIT CONFIGURATION SET PROPERTY FastStartFailoverThreshold='40';
<DO_CONFIGURE_DRC version="12.1"><EDIT_DRC><PROPERTY name="faststartfailoverthreshold" value="40" context="0"/></EDIT_DRC></DO_CONFIGURE_DRC>
<RESULT ><MESSAGE ><SUCCESS ></SUCCESS></MESSAGE><MESSAGE ><SUCCESS ></SUCCESS></MESSAGE></RESULT>
<RESULT ><MESSAGE ><SUCCESS ></SUCCESS></MESSAGE><MESSAGE ><SUCCESS ></SUCCESS></MESSAGE></RESULT>
Property "faststartfailoverthreshold" updated


oracle@dbidg01:/u01/app/oracle/admin/DBIT121/pfile/ [DBIT121] ls -lh dr*
-rw-r----- 1 oracle oinstall 12K Feb 28 12:30 dr2DBIT121_SITE1.dat

 

5.) We lose both copies of the broker config file on the primary while it is shut down

Let’s do the whole thing again while the primary DB is shutdown.

SQL> shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL>

oracle@dbidg01:/u01/app/oracle/admin/DBIT121/pfile/ [DBIT121] rm dr*
oracle@dbidg01:/u01/app/oracle/admin/DBIT121/pfile/ [DBIT121] ls -lh dr*
ls: cannot access dr*: No such file or directory

In the drc.log log file you can immediately see that there is something wrong.

>> Starting Data Guard Broker bootstrap <<
Broker Configuration File Locations:
      dg_broker_config_file1 = "/u01/app/oracle/admin/DBIT121/pfile/dr1DBIT121_SITE1.dat"
      dg_broker_config_file2 = "/u01/app/oracle/admin/DBIT121/pfile/dr2DBIT121_SITE1.dat"
2017-02-28 12:34:45.866                      DMON: Attach state object
2017-02-28 12:34:45.866                      DMON: cannot open configuration file "/u01/app/oracle/admin/DBIT121/pfile/dr1DBIT121_SITE1.dat", retrying
2017-02-28 12:34:46.876                      DMON: cannot open configuration file "/u01/app/oracle/admin/DBIT121/pfile/dr1DBIT121_SITE1.dat"
2017-02-28 12:34:46.880                        ORA-27037: unable to obtain file status
2017-02-28 12:34:46.881                        Linux-x86_64 Error: 2: No such file or directory
2017-02-28 12:34:46.881                        Additional information: 3
2017-02-28 12:34:46.881                      DMON: Error opening "/u01/app/oracle/admin/DBIT121/pfile/dr1DBIT121_SITE1.dat", error = ORA-16572
2017-02-28 12:34:46.881                      DMON: Establishing "/u01/app/oracle/admin/DBIT121/pfile/dr2DBIT121_SITE1.dat" as the more current file
2017-02-28 12:34:46.882                      DMON: cannot open configuration file "/u01/app/oracle/admin/DBIT121/pfile/dr2DBIT121_SITE1.dat", retrying
2017-02-28 12:34:47.899                      DMON: cannot open configuration file "/u01/app/oracle/admin/DBIT121/pfile/dr2DBIT121_SITE1.dat"
...
2017-02-28 12:35:02.058 7fffffff           0 DMON: Entered rfm_release_chief_lock() for CTL_BOOTSTRAP
2017-02-28 12:35:02.424                      Fore: Continuing with primary evaluation, rfmsp.drc_status_rfmp = ORA-16532
2017-02-28 12:35:03.507                      Fore: Initiating post-open tasks
2017-02-28 12:35:09.192                      DMON: Initiating post-open tasks
2017-02-28 12:35:22.242 00000000  1934847279 DMON: GET_FSFO will be retired
2017-02-28 12:35:22.242 00000000  1934847279       severity = ORA-16501, status = ORA-16532
2017-02-28 12:35:22.242 00000000  1934847279 DMON: GET_FSFO operation completed
2017-02-28 12:35:52.250 00000000  1934847280 DMON: GET_FSFO will be retired
2017-02-28 12:35:52.250 00000000  1934847280       severity = ORA-16501, status = ORA-16532
2017-02-28 12:35:52.251 00000000  1934847280 DMON: GET_FSFO operation completed

If you take a look at the alert.log, everything is fine.

oracle@dbidg01:/u01/app/oracle/admin/DBIT121/pfile/ [DBIT121] dgmgrl -debug -xml
DGMGRL for Linux: Version 12.1.0.2.0 - 64bit Production

Copyright (c) 2000, 2013, Oracle. All rights reserved.

Welcome to DGMGRL, type "help" for information.
DGMGRL> connect sys/manager@DBIT121_SITE1
[W000 02/28 12:39:01.18] Connecting to database using DBIT121_SITE1.
[W000 02/28 12:39:02.28] Checking broker version [BEGIN :version := dbms_drs.dg_broker_info('VERSION'); END;].
[W000 02/28 12:39:02.28] Oracle database version is '12.1.0.2.0'
Connected as SYSDBA.
DGMGRL> show configuration;
ORA-16532: Oracle Data Guard broker configuration does not exist

Configuration details cannot be determined by DGMGRL

But if connected to the standby, you will see a clear error message: ORA-16532: Oracle Data Guard broker configuration does not exist.

DGMGRL> connect sys/manager@DBIT121_SITE2
[W000 02/28 12:39:18.81] Connecting to database using DBIT121_SITE2.
[W000 02/28 12:39:19.90] Checking broker version [BEGIN :version := dbms_drs.dg_broker_info('VERSION'); END;].
[W000 02/28 12:39:19.90] Oracle database version is '12.1.0.2.0'
Connected as SYSDBA.
DGMGRL> show configuration;

Configuration - DBIT121

  Protection Mode: MaxAvailability
  Members:
  DBIT121_SITE1 - Primary database
    Error: ORA-16532: Oracle Data Guard broker configuration does not exist

    DBIT121_SITE2 - Physical standby database

Fast-Start Failover: DISABLED

Configuration Status:
ERROR   (status updated 0 seconds ago)

Taking a close look at the error message, it does not give you a hint how to correct the issue.

oracle@dbidg01:/u01/app/oracle/admin/DBIT121/pfile/ [DBIT121] oerr ora 16532
16532, 00000, "Oracle Data Guard broker configuration does not exist"
// *Cause:  A broker operation was requested that required a broker
//          configuration to already be created.
// *Action: Create a Data Guard broker configuration prior to performing
//          other broker operations. If only one instance of a RAC
//          database is reporting this error, ensure that the
//          DG_BROKER_CONFIG_FILE[1|2] initialization parameters are
//          set to file locations that are shared by all instances of
//          the RAC database.

Let’s try the same trick, as we have done on the standby by simply enabling the database again.

DGMGRL> enable database 'DBIT121_SITE1';
<DO_CONTROL version="12.1"><DO_COMMAND type="Enable" object_id="16777216"/></DO_CONTROL>
<RESULT ><MESSAGE ><FAILURE  error_num="16532" error_prefix="ORA"><ERROR_TEXT >ORA-16532: Oracle Data Guard broker configuration does not exist
</ERROR_TEXT></FAILURE></MESSAGE><MESSAGE ><FAILURE  error_num="16625" error_prefix="ORA" error_tag1="DBIT121_SITE1"><ERROR_TEXT >ORA-16625: cannot reach database &quot;DBIT121_SITE1&quot;
</ERROR_TEXT></FAILURE></MESSAGE></RESULT>
<RESULT ><MESSAGE ><FAILURE  error_num="16532" error_prefix="ORA"><ERROR_TEXT >ORA-16532: Oracle Data Guard broker configuration does not exist
</ERROR_TEXT></FAILURE></MESSAGE><MESSAGE ><FAILURE  error_num="16625" error_prefix="ORA" error_tag1="DBIT121_SITE1"><ERROR_TEXT >ORA-16625: cannot reach database &quot;DBIT121_SITE1&quot;
</ERROR_TEXT></FAILURE></MESSAGE></RESULT>
Failed.

Hmmmm … does not look good. To recovery from that situation, we need to figure out the latest broker file version on the standby, which is dr2DBIT121_SITE2.dat is my case, and copy it over to the primary. Before doing that, we need to stop the broker on the primary.

SQL> alter system set dg_broker_start=false;

System altered.

Now we can copy the latest version over.

-- standby
oracle@dbidg02:/u01/app/oracle/admin/DBIT121/pfile/ [DBIT121] ls -rlth dr*
-rw-r----- 1 oracle oinstall 12K Feb 28 12:30 dr1DBIT121_SITE2.dat
-rw-r----- 1 oracle oinstall 12K Feb 28 12:33 dr2DBIT121_SITE2.dat

oracle@dbidg02:/u01/app/oracle/admin/DBIT121/pfile/ [DBIT121] scp -p dr2DBIT121_SITE2.dat oracle@dbidg01:/u01/app/oracle/admin/DBIT121/pfile/dr1DBIT121_SITE1.dat
dr2DBIT121_SITE2.dat                                                                          100%   12KB  12.0KB/s   00:00

And optionally create the second broker file on the primary.

-- primary
oracle@dbidg01:/u01/app/oracle/admin/DBIT121/pfile/ [DBIT121] cp -p dr1DBIT121_SITE1.dat dr2DBIT121_SITE1.dat
oracle@dbidg01:/u01/app/oracle/admin/DBIT121/pfile/ [DBIT121]

Last but not least, we need to enable the broker again.

SQL> alter system set dg_broker_start=true;

System altered.

DGMGRL> connect sys/manager@DBIT121_SITE1
Connected as SYSDBA.
DGMGRL> show configuration;

Configuration - DBIT121

  Protection Mode: MaxAvailability
  Members:
  DBIT121_SITE1 - Primary database
    DBIT121_SITE2 - Physical standby database

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS   (status updated 7 seconds ago)
6.) We lose both copies of the broker config files on the primary and the standby (quite unlikely to happen)

This scenario is quite unlikely to happen, but if it happens it is good to have your Data Guard configuration as a script in place.

-- shutdown primary
SQL> shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.

-- shutdown standby
SQL> shutdown immediate
ORA-01109: database not open
Database dismounted.
ORACLE instance shut down.

Remove the broker files on the primary and the standby

-- primary
oracle@dbidg01:/u01/app/oracle/admin/DBIT121/pfile/ [DBIT121] rm dr*
oracle@dbidg01:/u01/app/oracle/admin/DBIT121/pfile/ [DBIT121]

-- standby
oracle@dbidg02:/u01/app/oracle/admin/DBIT121/pfile/ [DBIT121] rm dr*
oracle@dbidg02:/u01/app/oracle/admin/DBIT121/pfile/ [DBIT121]

Now, after we have lost everything, meaning all broker config files, the only chance that we have is to create it from scratch. It is quite cool, if you have the configuration is place, because in a blink of an eye you recreate the whole configuration.

oracle@dbidg01:/u01/app/oracle/admin/DBIT121/create/ [DBIT121] cat broker.cfg
CONNECT SYS/manager@DBIT121_SITE1
REMOVE CONFIGURATION;
sql "alter system set log_archive_dest_2=''";
CREATE CONFIGURATION 'DBIT121' AS
PRIMARY DATABASE IS 'DBIT121_SITE1'
CONNECT IDENTIFIER IS 'DBIT121_SITE1';
ADD DATABASE 'DBIT121_SITE2' AS
CONNECT IDENTIFIER IS 'DBIT121_SITE2';
EDIT DATABASE 'DBIT121_SITE1' SET PROPERTY StandbyFileManagement='AUTO';
EDIT DATABASE 'DBIT121_SITE2' SET PROPERTY StandbyFileManagement='AUTO';
EDIT DATABASE 'DBIT121_SITE1' SET PROPERTY LogXptMode='SYNC';
EDIT DATABASE 'DBIT121_SITE2' SET PROPERTY LogXptMode='SYNC';
ENABLE CONFIGURATION;
EDIT CONFIGURATION SET PROTECTION MODE AS MaxAvailability;
EDIT CONFIGURATION SET PROPERTY FastStartFailoverThreshold='40';
EDIT DATABASE 'DBIT121_SITE1' SET PROPERTY FastStartFailoverTarget='DBIT121_SITE2';
EDIT DATABASE 'DBIT121_SITE2' SET PROPERTY FastStartFailoverTarget='DBIT121_SITE1';
sql "alter system archive log current";
sql "alter system register";
SHOW CONFIGURATION;

oracle@dbidg01:/u01/app/oracle/admin/DBIT121/create/ [DBIT121] dgmgrl -debug -xml < broker.cfg > broker.log 2>&1
oracle@dbidg01:/u01/app/oracle/admin/DBIT121/create/ [DBIT121]

DGMGRL> show configuration;

Configuration - DBIT121

  Protection Mode: MaxAvailability
  Members:
  DBIT121_SITE1 - Primary database
    DBIT121_SITE2 - Physical standby database

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS   (status updated 2 seconds ago)
Conclusion

Loosing broker configuration files is not the end of the world. As you have seen in the blog, it makes quite a huge difference if you loose the broker file while the DB is up and running or it is shut down. In case you haven’t lost all of them you can recover them in most of the situations. However, the alert.log is not a good candidate to check if the broker files are missing. So better scan the alert.log and the drc.log with your preferred monitoring tool to get the whole picture. Besides that, it is good practice to adjust your broker.cfg file after you have done some changes because it makes it very easy to rebuild the whole broker config from scratch.

 

Cet article Oracle 12c – How to Recover lost DataGuard Broker Configuration Files est apparu en premier sur Blog dbi services.

Introducing AlwaysOn availability groups on Linux

Tue, 2017-02-28 06:13

A couple of days ago, Microsoft has rolled out the SQL Server vNext CTP 1.3 which includes very nice features and the one that interested me in the supportability of AlwaysOn availability groups on Linux. That is definitely a good news because we may benefit from new ways of architecting high-availability. There are a lot of new cool enhancements, which we’ll try to study when writing new blog posts. In this first blog, let’s learn new available configuration options.

First of all, let’s say that configuring availability groups on Linux is not different than configuring them on Windows in workgroup mode (domainless configuration) from SQL Server perspective.

Basically, the same steps remain as it is (please refer to the Microsoft installation documentation):

  • Creating login and users on each replica
  • Creating certificate and grant authorization to the corresponding user on each replica
  • Creating endpoint for data mirroring and grant permission connected to the corresponding certificate

So let’s just set the scene before moving forward on the other installation steps. I used for my demo two virtual machines on Hyper-V which run on Linux CentOS 7 (LINUX02 and LINUX04) . I also installed two SQL Server instances (CTP 1.3) on each machine which will run on the top of the cluster infrastructure with Pacemaker and Corosync.

blog 119 - 00 - ag linux - demo infra

Obviously this time the NFS server is not part of this infrastructure and this time I used a symmetric storage on each virtual machine which includes two mount points and two ext4 partitions (respectively /SQL_DATA and /SQL_LOG to host my AdventureWorks2012 database files).

[mikedavem@linux02 ~]$ cat /etc/fstab

/dev/mapper/cl-root     /                       xfs     defaults        0 0
UUID=d6eb8d27-35c7-4f0f-b0c1-42e380ab2eca /boot                   xfs     defaults        0 0
/dev/mapper/cl-swap     swap                    swap    defaults        0 0
/dev/sdb1       /sql_data       ext4    defaults 0 0
/dev/sdc1       /sql_log        ext4    defaults 0 0

 

SELECT 
	name AS logical_name,
	physical_name
FROM sys.master_files
WHERE database_id = DB_ID('AdventureWorks2012')

 

blog 119 - 01 - ag linux - adventurworks2012 files

First step to enable the AlwaysOn feature: we have to use the mssql-conf tool (which replaces the famous SQL Server configuration manager) with the new option set hadrenabled 1

Then if we use the Linux firewall on each machine, we have to open the corresponding TCP endpoint port on the Linux firewall on each Linux machine.

[root@linux02 data]firewall-cmd --zone=public --add-port=5022/tcp --permanent
success
[root@linux02 data]firewall-cmd –reload 
success
[root@linux02 data]firewall-cmd --permanent --zone=public --list-all
public
  target: default
  icmp-block-inversion: no
  interfaces:
  sources:
  services: dhcpv6-client high-availability mountd nfs rpc-bind ssh
  ports: 1433/tcp 5022/tcp
  protocols:
  masquerade: no
  forward-ports:
  sourceports:
  icmp-blocks:
  rich rules:

[root@linux04 ~]firewall-cmd --zone=public --add-port=5022/tcp --permanent
success
[root@linux04 ~]firewall-cmd --reload

 

No really new stuff so far … The most interesting part comes now. After installing SQL Server on Linux and achieving endpoint configurations, it’s time to create the availability group. But wait, at this stage we didn’t install any clustering part right? And in fact, we don’t have to do this. We are now able to create an availability group without any cluster dependencies by using a new T-SQL parameter CLUSTER_TYPE = NONE as follows. Very interesting because we may think about new scenarios where only read-scalability capabilities are considered on DR site. In this case we don’t have to setup additional cluster nodes which may lead to manageability overhead in this case.

We may also use the direct seeding mode feature available since SQL Server 2016 to simplify the process of adding a database in the corresponding availability group (AdventureWorks2012 database in my case).

:CONNECT LINUX02 -U sa -P Xxxxx

CREATE AVAILABILITY GROUP [agLinux]
WITH 
(
	DB_FAILOVER = ON, --> Trigger the failover of the entire AG if one DB fails 
	CLUSTER_TYPE = NONE --> SQL Server is not a member of a Windows Server Failover Cluster 
)
FOR REPLICA ON
N'LINUX02' 
WITH 
(
	ENDPOINT_URL = N'tcp://192.168.5.18:5022',
    AVAILABILITY_MODE = SYNCHRONOUS_COMMIT,
    FAILOVER_MODE = AUTOMATIC,
    SEEDING_MODE = AUTOMATIC, --> Use direct seeding mode
    SECONDARY_ROLE (ALLOW_CONNECTIONS = ALL)
),
N'LINUX04' 
WITH 
( 
	ENDPOINT_URL = N'tcp://192.168.5.20:5022', 
    AVAILABILITY_MODE = SYNCHRONOUS_COMMIT,
	FAILOVER_MODE = AUTOMATIC,
	SEEDING_MODE = AUTOMATIC, --> Use direct seeding mode
    SECONDARY_ROLE (ALLOW_CONNECTIONS = ALL)
);
GO

ALTER AVAILABILITY GROUP [agLinux] 
GRANT CREATE ANY DATABASE;
GO

:CONNECT LINUX04 -U sa -P Xxxxx
ALTER AVAILABILITY GROUP [agLinux] JOIN WITH (CLUSTER_TYPE = NONE)
ALTER AVAILABILITY GROUP [agLinux] GRANT CREATE ANY DATABASE;
GO


:CONNECT LINUX02 -U sa -P Xxxxx
ALTER DATABASE [AdventureWorks2012] SET RECOVERY FULL;
GO
BACKUP DATABASE [AdventureWorks2012] TO DISK = 'NUL';
GO
-- Add the AdventureWorks2012 database to the agLinux AG
ALTER AVAILABILITY GROUP [agLinux] ADD DATABASE [AdventureWorks2012];
GO

 

After configuring the agLinux availability group let’s have a look at the different DMVs I usually use in this case:

  • sys.dm_hadr_availability_group_states
  • sys.dm_hadr_availability_replica_states
  • sys.dm_hadr_database_replica_states
-- groups info
SELECT 
	g.name as ag_name,
	rgs.primary_replica, 
	rgs.primary_recovery_health_desc as recovery_health, 
	rgs.synchronization_health_desc as sync_health
FROM sys.dm_hadr_availability_group_states as rgs
JOIN sys.availability_groups AS g
				 ON rgs.group_id = g.group_id

-- replicas info
SELECT 
	g.name as ag_name,
	r.replica_server_name,
	rs.is_local,
	rs.role_desc as role,
	rs.operational_state_desc as op_state,
	rs.connected_state_desc as connect_state,
	rs.synchronization_health_desc as sync_state,
	rs.last_connect_error_number,
	rs.last_connect_error_description
FROM sys.dm_hadr_availability_replica_states AS rs
JOIN sys.availability_replicas AS r
	ON rs.replica_id = r.replica_id
JOIN sys.availability_groups AS g
	ON g.group_id = r.group_id

-- DB level          
SELECT 
	g.name as ag_name,
	r.replica_server_name,
	DB_NAME(drs.database_id) as [database_name],
	drs.is_local,
	drs.is_primary_replica,
	synchronization_state_desc as sync_state,
	synchronization_health_desc as sync_health,
	database_state_desc as db_state
FROM sys.dm_hadr_database_replica_states AS drs
		 JOIN sys.availability_replicas AS r
		  ON r.replica_id = drs.replica_id
		 JOIN sys.availability_groups AS g
		  ON g.group_id = drs.group_id
ORDER BY g.name, drs.is_primary_replica DESC;
GO

 

Here the corresponding output:

blog 119 - 1 - ag linux - ag status

Ok everything seems to be ok. We have configured an availability group which includes two replicas and synchronous replication so far. But maybe you have already noticed we didn’t create any listener and the reason is pretty obvious: if we refer to previous versions of SQL Server, creating a listener requires to create a corresponding CAP on the WSFC side and in our context, no cluster exists at this stage. So go ahead and let’s do it. You may refer to the Microsoft documentation to add the pacemaker cluster to the existing infrastructure.

I didn’t expect a big change compared to my last installation for SQL Server FCI on Linux for the basics but for the integration of the AG resource it will probably be another story. Indeed, the declaration of the resource (ocf:mssql:ag) is completely different compared to the SQL Server FCI (ocf::sql:fci). In this case, we have to create a multi-state resource which may allow us to start multiples instances of the AG resource and have them on each concerned replica. Multi-state resources are specialized clone resources which may be in one of two operating modes – master and slave. We run exactly on the same concept with availability groups because we consider to use one primary replica (master) and at least one secondary (slave). According to the Microsoft documentation we have to setup meta-properties which limit the number of resource copies (on replicas) as well as the number of master resources (only one primary replica at time in our case).

[mikedavem@linux02 ~]$ sudo pcs resource create sqllinuxaag ocf:mssql:ag ag_name=agLinux --master meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1

 

Creating a multi-state resource generates two separate resources regarding their role. In my case, I will get respectively sqllinuxaag-master (for the resource enrolled as Master resource) and sqllinuxaag-slave (for resource(s) enrolled as Slave resource(s)). We have to define monitoring for both as follows:

[mikedavem@linux02 ~]$ sudo pcs resource op sqllinuxaag monitor interval=11s timeout=60s role=Master
[mikedavem@linux02 ~]$ sudo pcs resource op sqllinuxaag monitor interval=12s timeout=60s role=Slave

 

Finally, we have to create and associate a virtual IP address as follows. The virtual IP must run at the same location than the master resource in our case (INFINITY is mandatory here)

[mikedavem@linux02 ~]$ sudo pcs resource create virtualip ocf:heartbeat:IPaddr2 ip=192.168.5.30
[mikedavem@linux02 ~]$ sudo pcs constraint colocation add virtualip sqllinuxaag-master INFINITY with-rsc-role=Master

 

Here a recap of the existing constraints which concern my availability group resource:

[mikedavem@linux02 ~]$ sudo pcs constraint
Location Constraints:
Ordering Constraints:
  promote sqllinuxaag-master then start virtualip (kind:Mandatory)
Colocation Constraints:
  virtualip with sqllinuxaag-master (score:INFINITY) (with-rsc-role:Master)
Ticket Constraints:

 

We may notice the global state of the resources and their roles (Master/Slave)

[mikedavem@linux02 ~]$ sudo pcs status
Cluster name: clustlinuxag
Stack: corosync
Current DC: linux04.dbi-services.test (version 1.1.15-11.el7_3.2-e174ec8) - partition with quorum
Last updated: Mon Feb 27 06:16:03 2017          Last change: Mon Feb 27 06:14:11 2017 by root via cibadmin on linux02.dbi-services.test
…

Full list of resources:

 Master/Slave Set: sqllinuxaag-master [sqllinuxaag]
     Masters: [ linux02.dbi-services.test ]
     Slaves: [ linux04.dbi-services.test ]
 virtualip      (ocf::heartbeat:IPaddr2):       Started linux02.dbi-services.test

 

So, now let’s perform some failover tests. I used a basic PowerShell script to connect to my availability group by using the sqllinuxaag resource and return a response (OK and the server name of the concerned replica or KO). But let’s say the resource is not considered as a listener from the availability group. No listener exists at this stage.

  • First test:

The first test consisted in switching manually over the sqllinuxaag (master) to the next available node (LINUX02). At this stage, we can’t use neither the wizard nor T-SQL statement to trigger a failover event. This is a limitation (explained by Microsoft) and I expect to see it to disappear in the future. Keeping the control of such action from SQL Server side will make more sense for DBAs.

[mikedavem@linux04 ~]$ sudo pcs resource move sqllinuxaag-master linux02.dbi-services.test --master

 

During the failover event, no way to reach out the resource but the situation went back into normal as expected.

blog 119 - 2 - ag linux - ag failover manual test

  • Second test

The second test consisted in simulating “soft” failure by changing the state of the new active node (LINUX02) to standby in order to trigger a switch over the next available cluster node (LINUX04). But before going on this way, let’s configure stickiness to avoid unexpected failback of the sqllinuxaag-master resource when the situation will go back to normal.

[mikedavem@linux04 ~]$ sudo pcs resource defaults resource-stickiness=INFINITY
[mikedavem@linux04 ~]$ sudo pcs cluster standby linux02.dbi-services.test

[mikedavem@linux04 ~]$ sudo pcs status
Cluster name: clustlinuxag
Stack: corosync
Current DC: linux02.dbi-services.test (version 1.1.15-11.el7_3.2-e174ec8) - partition with quorum
Last updated: Sun Feb 26 18:55:01 2017          Last change: Sun Feb 26 18:54:42 2017 by root via crm_attribute on linux04.dbi-services.test

2 nodes and 3 resources configured

Node linux02.dbi-services.test: standby
Online: [ linux04.dbi-services.test ]

Full list of resources:

 Master/Slave Set: sqllinuxaag-master [sqllinuxaag]
     Masters: [ linux04.dbi-services.test ]
     Stopped: [ linux02.dbi-services.test ]
 virtualip      (ocf::heartbeat:IPaddr2):       Started linux04.dbi-services.test

 

Same result than previously. During the failover event, the resource was unreachable but after the situation went back to normal, the application was able to connect again.

blog 119 - 3 - ag linux - ag failover with standby test

We may also confirm the state of the availability group by using usual DMVs. The replica role has switched between replicas as expected and the synchronization state remained in healthy state.

blog 119 - 4- ag linux - ag failover with standby test

  • Third test

My third test consisted in simulating a network outage between my two cluster nodes and the infrastructure responded well and performed the necessary tasks to recover the situation.

It was a quick introduction to the new capabilities offered by the SQL Server vNext in terms of HA and availability groups. Other scenarios and tests as well will come soon. In any event, availability groups feature is very popular as well as Linux in my area and getting the way to mix the both will probably be a good argument for customer adoption.   We will see in a near future!

See you

 

Cet article Introducing AlwaysOn availability groups on Linux est apparu en premier sur Blog dbi services.

Oracle 12cR2: Online tablespace encryption

Sun, 2017-02-26 13:57

By default, all data is visible in the datafiles. Transparent Tablespace Encryption (TDE) can be used to get them encrypted. It requires Enterprise Edition plus Advanced Security Option. Except in the Oracle Cloud when it is available – and mandatory – in all editions. And we can foresee that security policies will be enforced in the future years, by law or because companies realize their files can be stolen. This means that lot of databases will have to be encrypted, and this may take too long to do it during a maintenance window. In 12.2 we can encrypt online. Online means that we can do it while our application is running, but of course there is a performance overhead on the system.

I’ve run a SLOB workload with reads and writes (PCT_UPDATE=25). Four times the same workload:

  • during the first one, I encrypted the tablespace online
  • the second one is running on the encrypted tablespace
  • during the third one, I decrypted the tablespace online
  • the fourth one is running on the decrypted tablespace

Here is the ASH visualized with Orachrome Lighty:

Online_encryption_ASH_Response_Time

The dark blue is ‘db file sequential read’ is ny 4 SLOB sessions activity. Light blue is all background activity (DBWR, LGWR) and the encrypt/decrypt (db file parallel write).
The green is CPU activity. The brown is free buffer gets: DBWR can’t keep up with the rate of changes we are doing, while encrypting the tablespace.

You may wonder how I was able to have un-encrypted tablespaces on 12.2 which is available only on the Oracle Cloud where encryption is mandatory. This is explained in Oracle Public Cloud 12cR2: TDE is not an option. This means that I created the SLOB database and I have created the wallet.

Configure the TDE keystore

I’ve created the directory
mkdir -p /u01/app/oracle/admin/SLOB/tde_wallet

I declared it in sqlnet.ora
ENCRYPTION_WALLET_LOCATION=(SOURCE=(METHOD=FILE)(METHOD_DATA=(DIRECTORY=/u01/app/oracle/admin/SLOB/tde_wallet)))

I created the wallet
administer key management create keystore '/u01/app/oracle/admin/SLOB/tde_wallet' identified by oracle;

Opened it
administer key management set keystore open identified by oracle;

Created the master key
administer key management set key identified by oracle with backup;

Optionally created an auto-login wallet
administer key management create auto_login keystore from keystore '/u01/app/oracle/admin/SLOB/tde_wallet' identified by oracle;

Statspack

I’ll show some statistics for the following runs: Inital (when not encrypted), Encryption (when encryption is running concurrently), Encrypted (when tablespace encryption has been completed), Decryption (when decrypt is running concurrently) and Decrypted (once decryption is completed).

Run on non-encrypted tablespace

Load Profile Per Second Per Transaction Per Exec Per Call
~~~~~~~~~~~~ ------------------ ----------------- ----------- -----------
DB time(s): 4.0 0.1 0.02 6.39
DB CPU(s): 0.4 0.0 0.00 0.68
Redo size: 1,064,743.9 18,829.0
Logical reads: 15,635.7 276.5
Block changes: 7,293.4 129.0
Physical reads: 9,451.4 167.1
Physical writes: 3,303.2 58.4

Top 5 Timed Events Avg %Total
~~~~~~~~~~~~~~~~~~ wait Call
Event Waits Time (s) (ms) Time
----------------------------------------- ------------ ----------- ------ ------
db file sequential read 817,591 1,000 1 48.9
log file parallel write 12,258 408 33 20.0
db file parallel write 18,284 217 12 10.6
CPU time 128 6.3
db file parallel read 46,263 121 3 5.9

Time Model System Stats DB/Inst: SLOB/SLOB Snaps: 26-27
-> Ordered by % of DB time desc, Statistic name
 
Statistic Time (s) % DB time
----------------------------------- -------------------- ---------
sql execute elapsed time 1,201.2 100.0
DB CPU 127.8 10.6
PL/SQL execution elapsed time 1.7 .1
parse time elapsed 0.0 .0
connection management call elapsed 0.0 .0
hard parse elapsed time 0.0 .0
Tablespace encryption elapsed time 0.0 .0
repeated bind elapsed time 0.0 .0
DB time 1,201.7

Run during tablespace encryption

As soon as I started this SLOB run, I started the encryption:
alter tablespace IOPS encryption encrypt;

Load Profile Per Second Per Transaction Per Exec Per Call
~~~~~~~~~~~~ ------------------ ----------------- ----------- -----------
DB time(s): 4.8 0.1 0.03 8.27
DB CPU(s): 0.4 0.0 0.00 0.72
Redo size: 644,447.5 18,839.1
Logical reads: 9,441.5 276.0
Block changes: 4,412.8 129.0
Physical reads: 5,702.6 166.7
Physical writes: 1,952.0 57.1

Time Model System Stats DB/Inst: SLOB/SLOB Snaps: 28-29
-> Ordered by % of DB time desc, Statistic name
 
Statistic Time (s) % DB time
----------------------------------- -------------------- ---------
sql execute elapsed time 1,455.0 99.9
DB CPU 126.9 8.7
Tablespace encryption elapsed time 15.4 1.1
Tablespace encryption cpu time 12.1 .8


Top 5 Timed Events Avg %Total
~~~~~~~~~~~~~~~~~~ wait Call
Event Waits Time (s) (ms) Time
----------------------------------------- ------------ ----------- ------ ------
db file sequential read 546,294 660 1 30.8
free buffer waits 30,704 325 11 15.2
log file parallel write 8,057 304 38 14.2
db file parallel write 9,929 260 26 12.1
db file async I/O submit 6,304 185 29 8.7

Statistic Total per Second per Trans
--------------------------------- ------------------ -------------- ------------
session logical reads 2,860,788 9,441.5 276.0
consistent gets 2,154,358 7,110.1 207.9
blocks decrypted 850,557 2,807.1 82.1
blocks encrypted 1,042,777 3,441.5 100.6

Run on encrypted tablespace

Load Profile Per Second Per Transaction Per Exec Per Call
~~~~~~~~~~~~ ------------------ ----------------- ----------- -----------
DB time(s): 4.0 0.1 0.02 6.95
DB CPU(s): 0.5 0.0 0.00 0.95
Redo size: 1,057,446.8 18,806.5
Logical reads: 15,534.1 276.3
Block changes: 7,248.4 128.9
Physical reads: 9,415.8 167.5
Physical writes: 3,266.7 58.1

Time Model System Stats DB/Inst: SLOB/SLOB Snaps: 30-31
-> Ordered by % of DB time desc, Statistic name
 
Statistic Time (s) % DB time
----------------------------------- -------------------- ---------
sql execute elapsed time 1,201.1 99.9
DB CPU 164.7 13.7
Tablespace encryption elapsed time 19.0 1.6
Tablespace encryption cpu time 10.1 .8

Top 5 Timed Events Avg %Total
~~~~~~~~~~~~~~~~~~ wait Call
Event Waits Time (s) (ms) Time
----------------------------------------- ------------ ----------- ------ ------
db file sequential read 824,329 958 1 47.3
log file parallel write 12,207 416 34 20.5
db file parallel write 17,405 202 12 10.0
CPU time 166 8.2
db file parallel read 46,394 113 2 5.6

Statistic Total per Second per Trans
--------------------------------- ------------------ -------------- ------------
session logical reads 4,706,832 15,534.1 276.3
consistent gets 3,546,519 11,704.7 208.2
blocks decrypted 2,852,666 9,414.7 167.4
blocks encrypted 989,254 3,264.9 58.1

Run during tablespace decryption

As soon as I started this SLOB run, I started the decryption:
alter tablespace IOPS encryption decrypt;

Load Profile Per Second Per Transaction Per Exec Per Call
~~~~~~~~~~~~ ------------------ ----------------- ----------- -----------
DB time(s): 4.9 0.2 0.04 7.56
DB CPU(s): 0.4 0.0 0.00 0.61
Redo size: 606,680.3 19,111.0
Logical reads: 8,817.1 277.8
Block changes: 4,121.4 129.8
Physical reads: 5,294.9 166.8
Physical writes: 1,827.2 57.6

Top 5 Timed Events Avg %Total
~~~~~~~~~~~~~~~~~~ wait Call
Event Waits Time (s) (ms) Time
----------------------------------------- ------------ ----------- ------ ------
db file sequential read 515,429 629 1 25.9
free buffer waits 34,335 362 11 14.9
log file parallel write 7,287 293 40 12.1
direct path write 7,703 275 36 11.3
db file parallel write 9,966 270 27 11.1

Time Model System Stats DB/Inst: SLOB/SLOB Snaps: 32-33
-> Ordered by % of DB time desc, Statistic name
 
Statistic Time (s) % DB time
----------------------------------- -------------------- ---------
sql execute elapsed time 1,464.6 99.9
DB CPU 117.9 8.0
Tablespace encryption elapsed time 9.4 .6
Tablespace encryption cpu time 4.6 .3

Statistic Total per Second per Trans
--------------------------------- ------------------ -------------- ------------
session logical reads 2,662,776 8,817.1 277.8
consistent gets 2,001,129 6,626.3 208.7
blocks decrypted 1,026,940 3,400.5 107.1
blocks encrypted 696,105 2,305.0 72.6

Run on decrypted tablespace

Load Profile Per Second Per Transaction Per Exec Per Call
~~~~~~~~~~~~ ------------------ ----------------- ----------- -----------
DB time(s): 4.0 0.1 0.02 6.79
DB CPU(s): 0.4 0.0 0.00 0.72
Redo size: 1,060,856.5 18,876.7
Logical reads: 15,565.8 277.0
Block changes: 7,258.6 129.2
Physical reads: 9,418.8 167.6
Physical writes: 3,330.5 59.3

Top 5 Timed Events Avg %Total
~~~~~~~~~~~~~~~~~~ wait Call
Event Waits Time (s) (ms) Time
----------------------------------------- ------------ ----------- ------ ------
db file sequential read 818,717 999 1 42.1
log file parallel write 11,799 421 36 17.8
direct path write 8,887 299 34 12.6
db file parallel write 18,817 222 12 9.4
CPU time 129 5.4

Time Model System Stats DB/Inst: SLOB/SLOB Snaps: 34-35
-> Ordered by % of DB time desc, Statistic name
 
Statistic Time (s) % DB time
----------------------------------- -------------------- ---------
sql execute elapsed time 1,201.1 100.0
DB CPU 127.1 10.6
PL/SQL execution elapsed time 1.7 .1
parse time elapsed 0.0 .0
connection management call elapsed 0.0 .0
Tablespace encryption cpu time 0.0 .0
Tablespace encryption elapsed time 0.0 .0

Statistic Total per Second per Trans
--------------------------------- ------------------ -------------- ------------
session logical reads 4,685,294 15,565.8 277.0
consistent gets 3,528,610 11,723.0 208.6
blocks decrypted 271 0.9 0.0
blocks encrypted 10 0.0 0.0

Observations

During encryption and decryption, we have contention on ‘free buffer waits’. When running a workload that is I/O bound, and with updates, the DBWR cannot keep-up when encryption/decryption is running in parallel. Online encryption works like online datafile move: there is a double write, one the the current file and one to the file encrypted one. Only when completed, the reads and writes are directed to ne new file and the old one is removed.

The statistics ‘blocks decrypted’ and ‘block encrypted’ are related to reads and writes from an encrypted tablespace.

The Time Model ‘Tablespace encryption’ statistics are significant only when the tablespace is encrypted, or during encryption/decryption. But the time is not so significant: 1% of DB Time. I’m not completely sure about how to interpret it and it is not yet documented. From my test, it looks like it measures the overhead of reading from encrypted tablespaces.

But fore sure, having the tablespaces encrypted is not a big overhead, and online encryption can be useful to avoid a large maintenance window (it can take few hours to encrypt hundred of GB) but don’t run it at a time where you have lot of modifications.

 

Cet article Oracle 12cR2: Online tablespace encryption est apparu en premier sur Blog dbi services.

12cR2: Recover nonlogged blocks after NOLOGGING in Data Guard

Sun, 2017-02-26 09:32

You can accept to do NOLOGGING operations on bulk loads or index build according that you do a backup just after, and that your recovery plan mentions how to load the data again in case of media recovery. With a standby database, we usually force logging because we want redo to be generated for all operations in order to ship it and apply it on standby database. 12.2 brings a new solution: do nologging operations, without generating redo, and then ship the blocks to the standby. This is done on the standby by RMAN.

On primary ORCLA

I create the demo table
SQL> create table DEMO tablespace users pctfree 99 as select rownum n from xmltable('1 to 1000');
Table created.

put it in NOLOGGING
SQL> alter table DEMO nologging;
Table altered.

The database is not in force logging:
SQL> select force_logging from v$database;
FORCE_LOGGING
---------------------------------------
NO

Here is a direct-path insert
SQL> insert /*+ append */ into DEMO select rownum n from xmltable('1 to 100000');
100000 rows created.
 
SQL> commit;
Commit complete.

My rows are here:
SQL> select count(*) from DEMO;
COUNT(*)
----------
200000

This is a nologging operation. Media recovery is not possible. The datafile needs backup:
RMAN> report unrecoverable;
using target database control file instead of recovery catalog
Report of files that need backup due to unrecoverable operations
File Type of Backup Required Name
---- ----------------------- -----------------------------------
7 full or incremental /u01/oradata/ORCLA/users01.dbf

On ADG standby ORCLB

In Active Data Guard, I can query the table, but:
SQL> select count(*) from DEMO
*
ERROR at line 1:
ORA-01578: ORACLE data block corrupted (file # 7, block # 16966)
ORA-01110: data file 7: '/u01/oradata/ORCLB/datafile/o1_mf_users_dbvmwdqc_.dbf'
ORA-26040: Data block was loaded using the NOLOGGING option

The blocks were not replicated because redo was not generated by the primary and then not shipped and applied on the standby.

Note that this is not identifed by RMAN on the standby:
RMAN> report unrecoverable;
using target database control file instead of recovery catalog
Report of files that need backup due to unrecoverable operations
File Type of Backup Required Name
---- ----------------------- -----------------------------------
 
RMAN>

recover nonlogged blocks

If I try some recovery here, I can’t because I’m still is apply mode, here is the message I get if I try:
ORA-01153: an incompatible media recovery is active

Let’s stop the apply:
DGMGRL> edit database orclb set state=apply-off;
Succeeded.

In 12.1 I can recover the datafile from the primary with ‘recover from service’ but in 12.2 there is no need to ship the whole datafile. The non-logged block list has been shipped to the standby, recorded in the standby controlfile, and we can list them from v$nonlogged_block.

And we can recover them with a simple command: RECOVER DATABASE NONLOGGED BLOCK

RMAN> recover database nonlogged block;
Starting recover at 25-FEB-17
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=58 device type=DISK
 
starting recovery of nonlogged blocks
List of Datafiles
=================
File Status Nonlogged Blocks Blocks Examined Blocks Skipped
---- ------ ---------------- --------------- --------------
1 OK 0 0 104959
2 OK 0 0 18783
3 OK 0 0 62719
4 OK 0 0 8959
7 OK 0 16731 18948
 
Details of nonlogged blocks can be queried from v$nonlogged_block view
 
recovery of nonlogged blocks complete, elapsed time: 00:00:03

Here it is, I can query the table now
SQL> select count(*) from DEMO;
COUNT(*)
----------
200000

I re-enable real-time apply
DGMGRL> edit database orclb set state=apply-on;
Succeeded.

Switchover

Now, what would happen if I do a switchover of failover between the nologging operation and the nonlogged recovery?
I did the same on primary and then:
DGMGRL> switchover to orclb;
Performing switchover NOW, please wait...
Operation requires a connection to database "orclb"
Connecting ...
Connected to "ORCLB"
Connected as SYSDBA.
New primary database "orclb" is opening...
Operation requires start up of instance "ORCLA" on database "orcla"
Starting instance "ORCLA"...
ORACLE instance started.
Database mounted.
Database opened.
Connected to "ORCLA"
Switchover succeeded, new primary is "orclb"

I can query the list of nonlogged blocks that was shipped to standby:

SQL> select * from v$nonlogged_block;
 
FILE# BLOCK# BLOCKS NONLOGGED_START_CHANGE# NONLOGGED
---------- ---------- ---------- ----------------------- ---------
NONLOGGED_END_CHANGE# NONLOGGED RESETLOGS_CHANGE# RESETLOGS
--------------------- --------- ----------------- ---------
OBJECT# REASON CON_ID
---------------------------------------- ------- ----------
7 307 16826 2197748
2197825 1396169 22-FEB-17
74006 UNKNOWN 0

But I cannot recover because the database (the old standby that became primary) is opened:
ORA-01126: database must be mounted in this instance and not open in any instance

So what?

This new feature is acceptable if you recover the nonlogged blocks on the standby just after the nologging operation on the primary. This can be used automatically for datawarehouse load, but also manually when doing a reorganization or an application release that touches the data. Just don’t forget the recover on the standby to avoid surprises later. It will not reduce the amount of data that is shipped to the standby, because shipping the blocks is roughly the same as shipping the redo for the direct-path writes. But one the primary you have the performance benefit of nologging operations.

 

Cet article 12cR2: Recover nonlogged blocks after NOLOGGING in Data Guard est apparu en premier sur Blog dbi services.

Oracle 12c – Recreating a Controlfile in a Data Guard environment with noresetlogs

Fri, 2017-02-24 04:44

Sometimes you might run into situations where the controlfile does not represent the backups and archivelogs correctly, because of a mismatch of the control_file_record_keep_time and the RMAN retention. The controlfile has non circular and a circular records. Non circular are e.g. database information, redo threads, datafiles and so on. These non circular records don’t age out, however, they can be reused, e.g. when a tablespace is dropped. The circular records are e.g. the log history, archived logs, backupsets, datafile copies and so on. These records can age out. So, when you have a control_file_record_keep_time of 7 days and a RMAN recovery window of 14 days, then you obviously have a mismatch here. In 11gR2, Oracle stores 37 different record types in the control file, which can be check with:

SELECT type FROM v$controlfile_record_section ORDER BY 1;

12cR1 stores 41 different record types, where the AUXILIARY DATAFILE COPY, MULTI INSTANCE REDO APPLY, PDB RECORD and PDBINC RECORD was added. In 12cR2 there are even more. The TABLESPACE KEY HISTORY record type was added, so you end up with 42 different record types in 12cR2.

If RMAN needs to add new backup set or archive log record to the control file, any records that expired as per the control_file_record_keep_time parameter are overwritten. But coming back to my issue. My controlfile is out of sync with the recovery catalog and in some situation you can’t correct it anymore, even with delete force commands or alike, and you end up with error like the following:

ORA-19633: control file record 8857 is out of sync with recovery catalog

There might be other solutions to fix it, however, I want to have a clean control file and so I am recreating it manually. However, I don’t want to open the DB with resetlogs.

The high level steps to get this done are

  • Disable everything that might interfere with your action e.g. Fast Start Failover, Broker and so on
  • Adjust your control_file_record_keep_time to a higher value
  • Create the controlfile to trace
  • Unregister from RMAN catalog
  • Shutdown immediate and re-create the controlfile
  • Re-catalog your backups and archivelogs
  • Re-register into the RMAN catalog

Ok, let’s get started and disable fast start failover first. We don’t want that the observer to kick in and do any nasty stuff during my action.

DGMGRL> show configuration;

Configuration - DBIT121

  Protection Mode: MaxAvailability
  Members:
  DBIT121_SITE1 - Primary database
    DBIT121_SITE2 - (*) Physical standby database

Fast-Start Failover: ENABLED

Configuration Status:
SUCCESS   (status updated 2 seconds ago)

DGMGRL> disable fast_start failover;
Disabled.

As a next step, I increase the control_file_record_keep_time to a much higher time. The formula is usually CONTROL_FILE_RECORD_KEEP_TIME = retention period + level 0 backup interval + 1. Meaning that with a retention period of 24 days and a weekly level 0 backup, it would be 24+7+1, so at least 32. But I don’t care if my controlfile is 20MB in size 30MB, so I set it directly to 72 days.

-- Primary

SQL> alter system set control_file_record_keep_time=72;

System altered.

-- Standby

SQL> alter system set control_file_record_keep_time=72;

System altered.

The next important step is to create a trace of the controlfile, which can be adjusted manually later on, depending on your needs. Beforehand, I specify a tracefile identifier, so that I easily spot my trace file in the DIAG destination.

SQL> alter session set tracefile_identifier='control';

Session altered.

SQL> alter database backup controlfile to trace noresetlogs;

Database altered.

oracle@dbidg01:/u01/app/oracle/diag/rdbms/dbit121_site1/DBIT121/trace/ [DBIT121] ls -rlt | grep control
-rw-r----- 1 oracle oinstall     101 Feb 24 09:10 DBIT121_ora_25050_control.trm
-rw-r----- 1 oracle oinstall    9398 Feb 24 09:10 DBIT121_ora_25050_control.trc

oracle@dbidg01:/u01/app/oracle/diag/rdbms/dbit121_site1/DBIT121/trace/ [DBIT121] mv DBIT121_ora_25050_control.trc /u01/app/oracle/admin/DBIT121/create/recreate_controlfile.sql

Let’s take a look at the control file trace which was created. It contains nearly everything that we need. Some parts might have to be adjusted, and some parts do not work at all or have to be done in a different way, but we will see later. But in general it is a very good starting point to get the job done.

oracle@dbidg01:/u01/app/oracle/admin/DBIT121/create/ [DBIT121] cat recreate_controlfile.sql
-- The following are current System-scope REDO Log Archival related
-- parameters and can be included in the database initialization file.
--
-- LOG_ARCHIVE_DEST=''
-- LOG_ARCHIVE_DUPLEX_DEST=''
--
-- LOG_ARCHIVE_FORMAT=%t_%s_%r.dbf
--
-- DB_UNIQUE_NAME="DBIT121_SITE1"
--
-- LOG_ARCHIVE_CONFIG='SEND, RECEIVE'
-- LOG_ARCHIVE_CONFIG='DG_CONFIG=("DBIT121_SITE2")'
-- LOG_ARCHIVE_MAX_PROCESSES=4
-- STANDBY_FILE_MANAGEMENT=AUTO
-- STANDBY_ARCHIVE_DEST=?/dbs/arch
-- FAL_CLIENT=''
-- FAL_SERVER=DBIT121_SITE2
--
-- LOG_ARCHIVE_DEST_2='SERVICE=DBIT121_SITE2'
-- LOG_ARCHIVE_DEST_2='OPTIONAL REOPEN=300 NODELAY'
-- LOG_ARCHIVE_DEST_2='LGWR AFFIRM NOVERIFY ASYNC=0'
-- LOG_ARCHIVE_DEST_2='REGISTER NOALTERNATE NODEPENDENCY'
-- LOG_ARCHIVE_DEST_2='NOMAX_FAILURE NOQUOTA_SIZE NOQUOTA_USED'
-- LOG_ARCHIVE_DEST_2='DB_UNIQUE_NAME=DBIT121_SITE2'
-- LOG_ARCHIVE_DEST_2='VALID_FOR=(STANDBY_LOGFILE,ONLINE_LOGFILES)'
-- LOG_ARCHIVE_DEST_STATE_2=ENABLE
--
-- LOG_ARCHIVE_DEST_1='LOCATION=USE_DB_RECOVERY_FILE_DEST'
-- LOG_ARCHIVE_DEST_1='OPTIONAL REOPEN=300 NODELAY'
-- LOG_ARCHIVE_DEST_1='ARCH NOAFFIRM NOVERIFY SYNC'
-- LOG_ARCHIVE_DEST_1='REGISTER NOALTERNATE NODEPENDENCY'
-- LOG_ARCHIVE_DEST_1='NOMAX_FAILURE NOQUOTA_SIZE NOQUOTA_USED NODB_UNIQUE_NAME'
-- LOG_ARCHIVE_DEST_1='VALID_FOR=(PRIMARY_ROLE,ONLINE_LOGFILES)'
-- LOG_ARCHIVE_DEST_STATE_1=ENABLE
--
-- The following commands will create a new control file and use it
-- to open the database.
-- Data used by Recovery Manager will be lost.
-- Additional logs may be required for media recovery of offline
-- Use this only if the current versions of all online logs are
-- available.
-- After mounting the created controlfile, the following SQL
-- statement will place the database in the appropriate
-- protection mode:
--  ALTER DATABASE SET STANDBY DATABASE TO MAXIMIZE AVAILABILITY
STARTUP NOMOUNT
CREATE CONTROLFILE REUSE DATABASE "DBIT121" NORESETLOGS FORCE LOGGING ARCHIVELOG
    MAXLOGFILES 16
    MAXLOGMEMBERS 3
    MAXDATAFILES 100
    MAXINSTANCES 8
    MAXLOGHISTORY 292
LOGFILE
  GROUP 1 (
    '/u02/oradata/DBIT121_SITE1/onlinelog/o1_mf_1_d4fpnop9_.log',
    '/u03/fast_recovery_area/DBIT121_SITE1/onlinelog/o1_mf_1_d4fpnq4o_.log'
  ) SIZE 50M BLOCKSIZE 512,
  GROUP 2 (
    '/u02/oradata/DBIT121_SITE1/onlinelog/o1_mf_2_d4fpo42k_.log',
    '/u03/fast_recovery_area/DBIT121_SITE1/onlinelog/o1_mf_2_d4fpo43q_.log'
  ) SIZE 50M BLOCKSIZE 512,
  GROUP 3 (
    '/u02/oradata/DBIT121_SITE1/onlinelog/o1_mf_3_d4fppn86_.log',
    '/u03/fast_recovery_area/DBIT121_SITE1/onlinelog/o1_mf_3_d4fppngb_.log'
  ) SIZE 50M BLOCKSIZE 512
-- STANDBY LOGFILE
--   GROUP 4 (
--     '/u02/oradata/DBIT121_SITE1/onlinelog/o1_mf_4_dbx3t840_.log',
--     '/u03/fast_recovery_area/DBIT121_SITE1/onlinelog/o1_mf_4_dbx3t89m_.log'
--   ) SIZE 50M BLOCKSIZE 512,
--   GROUP 5 (
--     '/u02/oradata/DBIT121_SITE1/onlinelog/o1_mf_5_dbx3tj3b_.log',
--     '/u03/fast_recovery_area/DBIT121_SITE1/onlinelog/o1_mf_5_dbx3tj8m_.log'
--   ) SIZE 50M BLOCKSIZE 512,
--   GROUP 6 (
--     '/u02/oradata/DBIT121_SITE1/onlinelog/o1_mf_6_dbx3tp52_.log',
--     '/u03/fast_recovery_area/DBIT121_SITE1/onlinelog/o1_mf_6_dbx3tpb4_.log'
--   ) SIZE 50M BLOCKSIZE 512,
--   GROUP 7 (
--     '/u02/oradata/DBIT121_SITE1/onlinelog/o1_mf_7_dbx3twdq_.log',
--     '/u03/fast_recovery_area/DBIT121_SITE1/onlinelog/o1_mf_7_dbx3twkt_.log'
--   ) SIZE 50M BLOCKSIZE 512
DATAFILE
  '/u02/oradata/DBIT121_SITE1/datafile/o1_mf_system_d4fjt03j_.dbf',
  '/u02/oradata/DBIT121_SITE1/datafile/o1_mf_sysaux_d4fjrlvs_.dbf',
  '/u02/oradata/DBIT121_SITE1/datafile/o1_mf_undotbs1_d4fjvtd1_.dbf',
  '/u02/oradata/DBIT121_SITE1/datafile/o1_mf_example_d4fjz1fz_.dbf',
  '/u02/oradata/DBIT121_SITE1/datafile/o1_mf_users_d4fjvqb1_.dbf'
CHARACTER SET AL32UTF8
;
-- Configure RMAN configuration record 1
VARIABLE RECNO NUMBER;
EXECUTE :RECNO := SYS.DBMS_BACKUP_RESTORE.SETCONFIG('RETENTION POLICY','TO RECOVERY WINDOW OF 14 DAYS');
-- Configure RMAN configuration record 2
VARIABLE RECNO NUMBER;
EXECUTE :RECNO := SYS.DBMS_BACKUP_RESTORE.SETCONFIG('BACKUP OPTIMIZATION','ON');
-- Configure RMAN configuration record 3
VARIABLE RECNO NUMBER;
EXECUTE :RECNO := SYS.DBMS_BACKUP_RESTORE.SETCONFIG('ARCHIVELOG DELETION POLICY','TO SHIPPED TO ALL STANDBY BACKED UP 1 TIMES TO DISK');
-- Configure RMAN configuration record 4
VARIABLE RECNO NUMBER;
EXECUTE :RECNO := SYS.DBMS_BACKUP_RESTORE.SETCONFIG('DEVICE TYPE','DISK PARALLELISM 2 BACKUP TYPE TO COMPRESSED BACKUPSET');
-- Configure RMAN configuration record 5
VARIABLE RECNO NUMBER;
EXECUTE :RECNO := SYS.DBMS_BACKUP_RESTORE.SETCONFIG('RMAN OUTPUT','TO KEEP FOR 32 DAYS');
-- Configure RMAN configuration record 6
VARIABLE RECNO NUMBER;
EXECUTE :RECNO := SYS.DBMS_BACKUP_RESTORE.SETCONFIG('DB_UNIQUE_NAME','''DBIT121_SITE1'' CONNECT IDENTIFIER  ''DBIT121_SITE1''');
-- Configure RMAN configuration record 7
VARIABLE RECNO NUMBER;
EXECUTE :RECNO := SYS.DBMS_BACKUP_RESTORE.SETCONFIG('DB_UNIQUE_NAME','''DBIT121_SITE2'' CONNECT IDENTIFIER  ''DBIT121_SITE2''');
-- Configure RMAN configuration record 8
VARIABLE RECNO NUMBER;
EXECUTE :RECNO := SYS.DBMS_BACKUP_RESTORE.SETCONFIG('CHANNEL','DEVICE TYPE ''SBT_TAPE'' PARMS  ''SBT_LIBRARY=oracle.disksbt,ENV=(BACKUP_DIR=/u99/backup/DBIT121)''');
-- Configure RMAN configuration record 9
VARIABLE RECNO NUMBER;
EXECUTE :RECNO := SYS.DBMS_BACKUP_RESTORE.SETCONFIG('DEVICE TYPE','''SBT_TAPE'' PARALLELISM 2 BACKUP TYPE TO COMPRESSED BACKUPSET');
-- Configure RMAN configuration record 10
VARIABLE RECNO NUMBER;
EXECUTE :RECNO := SYS.DBMS_BACKUP_RESTORE.SETCONFIG('DEFAULT DEVICE TYPE TO','DISK');
-- Commands to re-create incarnation table
-- Below log names MUST be changed to existing filenames on
-- disk. Any one log file from each branch can be used to
-- re-create incarnation records.
-- ALTER DATABASE REGISTER LOGFILE '/u03/fast_recovery_area/DBIT121_SITE1/archivelog/2017_02_24/o1_mf_1_1_%u_.arc';
-- Recovery is required if any of the datafiles are restored backups,
-- or if the last shutdown was not normal or immediate.
RECOVER DATABASE
-- Block change tracking was enabled, so re-enable it now.
ALTER DATABASE ENABLE BLOCK CHANGE TRACKING
USING FILE '/u02/oradata/DBIT121_SITE1/changetracking/o1_mf_dbx3wgqg_.chg' REUSE;
-- All logs need archiving and a log switch is needed.
ALTER SYSTEM ARCHIVE LOG ALL;
-- Database can now be opened normally.
ALTER DATABASE OPEN;
-- Commands to add tempfiles to temporary tablespaces.
-- Online tempfiles have complete space information.
-- Other tempfiles may require adjustment.
ALTER TABLESPACE TEMP ADD TEMPFILE '/u02/oradata/DBIT121_SITE1/datafile/o1_mf_temp_d4fjxn8l_.tmp'
     SIZE 206569472  REUSE AUTOEXTEND ON NEXT 655360  MAXSIZE 32767M;
-- End of tempfile additions.
--
--
--
----------------------------------------------------------
-- The following script can be used on the standby database
-- to re-populate entries for a standby controlfile created
-- on the primary and copied to the standby site.
----------------------------------------------------------
ALTER DATABASE ADD STANDBY LOGFILE THREAD 1 '/u02/oradata/DBIT121_SITE1/onlinelog/o1_mf_4_dbx3t840_.log'
 SIZE 50M BLOCKSIZE 512 REUSE;
ALTER DATABASE ADD STANDBY LOGFILE MEMBER '/u03/fast_recovery_area/DBIT121_SITE1/onlinelog/o1_mf_4_dbx3t89m_.log'
                                       TO '/u02/oradata/DBIT121_SITE1/onlinelog/o1_mf_4_dbx3t840_.log';
ALTER DATABASE ADD STANDBY LOGFILE THREAD 1 '/u02/oradata/DBIT121_SITE1/onlinelog/o1_mf_5_dbx3tj3b_.log'
 SIZE 50M BLOCKSIZE 512 REUSE;
ALTER DATABASE ADD STANDBY LOGFILE MEMBER '/u03/fast_recovery_area/DBIT121_SITE1/onlinelog/o1_mf_5_dbx3tj8m_.log'
                                       TO '/u02/oradata/DBIT121_SITE1/onlinelog/o1_mf_5_dbx3tj3b_.log';
ALTER DATABASE ADD STANDBY LOGFILE THREAD 1 '/u02/oradata/DBIT121_SITE1/onlinelog/o1_mf_6_dbx3tp52_.log'
 SIZE 50M BLOCKSIZE 512 REUSE;
ALTER DATABASE ADD STANDBY LOGFILE MEMBER '/u03/fast_recovery_area/DBIT121_SITE1/onlinelog/o1_mf_6_dbx3tpb4_.log'
                                       TO '/u02/oradata/DBIT121_SITE1/onlinelog/o1_mf_6_dbx3tp52_.log';
ALTER DATABASE ADD STANDBY LOGFILE THREAD 1 '/u02/oradata/DBIT121_SITE1/onlinelog/o1_mf_7_dbx3twdq_.log'
 SIZE 50M BLOCKSIZE 512 REUSE;
ALTER DATABASE ADD STANDBY LOGFILE MEMBER '/u03/fast_recovery_area/DBIT121_SITE1/onlinelog/o1_mf_7_dbx3twkt_.log'
                                       TO '/u02/oradata/DBIT121_SITE1/onlinelog/o1_mf_7_dbx3twdq_.log';

I am also stopping the broker to avoid any side effects and afterwards I unregister the database from the RMAN catalog. I will re-create it later on with the clean entries.

-- primary

SQL> alter system set dg_broker_start=false;

System altered.

oracle@dbidg01:/home/oracle/ [DBIT121] rman target sys/manager catalog rman/rman@rman

Recovery Manager: Release 12.1.0.2.0 - Production on Fri Feb 24 09:16:17 2017

Copyright (c) 1982, 2014, Oracle and/or its affiliates.  All rights reserved.

connected to target database: DBIT121 (DBID=172831209)
connected to recovery catalog database
recovery catalog schema release 12.02.00.01. is newer than RMAN release

RMAN> unregister database;

database name is "DBIT121" and DBID is 172831209

Do you really want to unregister the database (enter YES or NO)? YES
database unregistered from the recovery catalog

RMAN>

The next step is very important. We need to shutdown the DB cleanly, either with normal or immediate. Afterwards, I create a copy of the current controlfiles. You never know, it is always good to have another fallback.

SQL> shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL>

oracle@dbidg01:/home/oracle/ [DBIT121] cd /u02/oradata/DBIT121_SITE1/controlfile/
oracle@dbidg01:/u02/oradata/DBIT121_SITE1/controlfile/ [DBIT121] mv o1_mf_d4fjws55_.ctl o1_mf_d4fjws55_.ctl.old
oracle@dbidg01:/u02/oradata/DBIT121_SITE1/controlfile/ [DBIT121] cd /u03/fast_recovery_area/DBIT121_SITE1/controlfile/
oracle@dbidg01:/u03/fast_recovery_area/DBIT121_SITE1/controlfile/ [DBIT121] mv o1_mf_d4fjwsgr_.ctl o1_mf_d4fjwsgr_.ctl.old

Now we can startup nomount, and recreate our control from scratch. It is very important that you specify REUSE and NORESETLOGS here.

SQL> startup nomount
ORACLE instance started.

Total System Global Area 1325400064 bytes
Fixed Size                  2924112 bytes
Variable Size             436208048 bytes
Database Buffers          872415232 bytes
Redo Buffers               13852672 bytes

SQL> CREATE CONTROLFILE REUSE DATABASE "DBIT121" NORESETLOGS FORCE LOGGING ARCHIVELOG
    MAXLOGFILES 16
    MAXLOGMEMBERS 3
    MAXDATAFILES 100
    MAXINSTANCES 8
    MAXLOGHISTORY 292
LOGFILE
  GROUP 1 (
    '/u02/oradata/DBIT121_SITE1/onlinelog/o1_mf_1_d4fpnop9_.log',
    '/u03/fast_recovery_area/DBIT121_SITE1/onlinelog/o1_mf_1_d4fpnq4o_.log'
  ) SIZE 50M BLOCKSIZE 512,
  GROUP 2 (
    '/u02/oradata/DBIT121_SITE1/onlinelog/o1_mf_2_d4fpo42k_.log',
    '/u03/fast_recovery_area/DBIT121_SITE1/onlinelog/o1_mf_2_d4fpo43q_.log'
  ) SIZE 50M BLOCKSIZE 512,
  GROUP 3 (
    '/u02/oradata/DBIT121_SITE1/onlinelog/o1_mf_3_d4fppn86_.log',
    '/u03/fast_recovery_area/DBIT121_SITE1/onlinelog/o1_mf_3_d4fppngb_.log'
 19    ) SIZE 50M BLOCKSIZE 512
DATAFILE
  '/u02/oradata/DBIT121_SITE1/datafile/o1_mf_system_d4fjt03j_.dbf',
  '/u02/oradata/DBIT121_SITE1/datafile/o1_mf_sysaux_d4fjrlvs_.dbf',
  '/u02/oradata/DBIT121_SITE1/datafile/o1_mf_undotbs1_d4fjvtd1_.dbf',
  '/u02/oradata/DBIT121_SITE1/datafile/o1_mf_example_d4fjz1fz_.dbf',
  '/u02/oradata/DBIT121_SITE1/datafile/o1_mf_users_d4fjvqb1_.dbf'
CHARACTER SET AL32UTF8
 27  ;

Control file created.

SQL>

Now we can configure the RMAN persistent settings like retention and so on.

-- Configure RMAN configuration record 1
VARIABLE RECNO NUMBER;
EXECUTE :RECNO := SYS.DBMS_BACKUP_RESTORE.SETCONFIG('RETENTION POLICY','TO RECOVERY WINDOW OF 14 DAYS');
-- Configure RMAN configuration record 2
VARIABLE RECNO NUMBER;
EXECUTE :RECNO := SYS.DBMS_BACKUP_RESTORE.SETCONFIG('BACKUP OPTIMIZATION','ON');
-- Configure RMAN configuration record 3
VARIABLE RECNO NUMBER;
EXECUTE :RECNO := SYS.DBMS_BACKUP_RESTORE.SETCONFIG('ARCHIVELOG DELETION POLICY','TO SHIPPED TO ALL STANDBY BACKED UP 1 TIMES TO DISK');
-- Configure RMAN configuration record 4
VARIABLE RECNO NUMBER;
EXECUTE :RECNO := SYS.DBMS_BACKUP_RESTORE.SETCONFIG('DEVICE TYPE','DISK PARALLELISM 2 BACKUP TYPE TO COMPRESSED BACKUPSET');
-- Configure RMAN configuration record 5
VARIABLE RECNO NUMBER;
EXECUTE :RECNO := SYS.DBMS_BACKUP_RESTORE.SETCONFIG('RMAN OUTPUT','TO KEEP FOR 32 DAYS');
-- Configure RMAN configuration record 6
VARIABLE RECNO NUMBER;
EXECUTE :RECNO := SYS.DBMS_BACKUP_RESTORE.SETCONFIG('DB_UNIQUE_NAME','''DBIT121_SITE1'' CONNECT IDENTIFIER  ''DBIT121_SITE1''');
-- Configure RMAN configuration record 7
VARIABLE RECNO NUMBER;
EXECUTE :RECNO := SYS.DBMS_BACKUP_RESTORE.SETCONFIG('DB_UNIQUE_NAME','''DBIT121_SITE2'' CONNECT IDENTIFIER  ''DBIT121_SITE2''');
-- Configure RMAN configuration record 8
VARIABLE RECNO NUMBER;
EXECUTE :RECNO := SYS.DBMS_BACKUP_RESTORE.SETCONFIG('CHANNEL','DEVICE TYPE ''SBT_TAPE'' PARMS  ''SBT_LIBRARY=oracle.disksbt,ENV=(BACKUP_DIR=/u99/backup/DBIT121)''');
-- Configure RMAN configuration record 9
VARIABLE RECNO NUMBER;
EXECUTE :RECNO := SYS.DBMS_BACKUP_RESTORE.SETCONFIG('DEVICE TYPE','''SBT_TAPE'' PARALLELISM 2 BACKUP TYPE TO COMPRESSED BACKUPSET');
-- Configure RMAN configuration record 10
VARIABLE RECNO NUMBER;
EXECUTE :RECNO := SYS.DBMS_BACKUP_RESTORE.SETCONFIG('DEFAULT DEVICE TYPE TO','DISK');

The next step is to the re-create the incarnation table. This might fail with a recursive SQL error if you use the SQL provided in the trace file. Just use REGISTER PHYSICAL LOGFILE instead of REGISTER LOGFILE and then it works.

-- Commands to re-create incarnation table
-- Below log names MUST be changed to existing filenames on
-- disk. Any one log file from each branch can be used to re-create incarnation records.
-- ALTER DATABASE REGISTER LOGFILE '/u03/fast_recovery_area/DBIT121_SITE1/archivelog/2017_02_24/o1_mf_1_1_%u_.arc';

SQL> ALTER DATABASE REGISTER LOGFILE '/u03/fast_recovery_area/DBIT121_SITE1/archivelog/2017_02_24/o1_mf_1_142_dbzv31hq_.arc';
ALTER DATABASE REGISTER LOGFILE '/u03/fast_recovery_area/DBIT121_SITE1/archivelog/2017_02_24/o1_mf_1_142_dbzv31hq_.arc'
*
ERROR at line 1:
ORA-00604: error occurred at recursive SQL level


SQL> ALTER DATABASE REGISTER PHYSICAL LOGFILE '/u03/fast_recovery_area/DBIT121_SITE1/archivelog/2017_02_24/o1_mf_1_142_dbzv31hq_.arc';

Database altered.

Because I have shutdown the database cleanly, there is no need to do any recovery and I can continue to enable the block change tracking file, open the database, and add my tempfile back to the database.

SQL> recover database;
ORA-00283: recovery session canceled due to errors
ORA-00264: no recovery required

SQL> ALTER SYSTEM ARCHIVE LOG ALL;

System altered.

SQL> ALTER DATABASE ENABLE BLOCK CHANGE TRACKING USING FILE '/u02/oradata/DBIT121_SITE1/changetracking/o1_mf_dbx3wgqg_.chg' REUSE;

Database altered.

SQL> ALTER DATABASE OPEN;

Database altered.

SQL>

SQL> ALTER TABLESPACE TEMP ADD TEMPFILE '/u02/oradata/DBIT121_SITE1/datafile/o1_mf_temp_d4fjxn8l_.tmp'
  2  SIZE 206569472  REUSE AUTOEXTEND ON NEXT 655360  MAXSIZE 32767M;

Tablespace altered.

Regarding the Standby Redo logs, the easiest is to remove the old ones, and simply recreate them afterwards, because you can’t add them back as long as they have Oracle managed file names.

SQL> select * from v$standby_log;

no rows selected

SQL> ALTER DATABASE ADD STANDBY LOGFILE THREAD 1 '/u02/oradata/DBIT121_SITE1/onlinelog/o1_mf_4_dbx3t840_.log' SIZE 50M BLOCKSIZE 512 REUSE;
ALTER DATABASE ADD STANDBY LOGFILE THREAD 1 '/u02/oradata/DBIT121_SITE1/onlinelog/o1_mf_4_dbx3t840_.log' SIZE 50M BLOCKSIZE 512 REUSE
*
ERROR at line 1:
ORA-01276: Cannot add file
/u02/oradata/DBIT121_SITE1/onlinelog/o1_mf_4_dbx3t840_.log.  File has an Oracle
Managed Files file name.

-- delete standby redo logs

oracle@dbidg01:/u01/app/oracle/admin/DBIT121/create/ [DBIT121] cd /u02/oradata/DBIT121_SITE1/onlinelog/
oracle@dbidg01:/u02/oradata/DBIT121_SITE1/onlinelog/ [DBIT121] ls -l
total 358428
-rw-r----- 1 oracle oinstall 52429312 Feb 24 09:42 o1_mf_1_d4fpnop9_.log
-rw-r----- 1 oracle oinstall 52429312 Feb 24 09:42 o1_mf_2_d4fpo42k_.log
-rw-r----- 1 oracle oinstall 52429312 Feb 24 09:47 o1_mf_3_d4fppn86_.log
-rw-r----- 1 oracle oinstall 52429312 Feb 23 09:55 o1_mf_4_dbx3t840_.log
-rw-r----- 1 oracle oinstall 52429312 Feb 23 09:55 o1_mf_5_dbx3tj3b_.log
-rw-r----- 1 oracle oinstall 52429312 Feb 23 09:55 o1_mf_6_dbx3tp52_.log
-rw-r----- 1 oracle oinstall 52429312 Feb 23 09:55 o1_mf_7_dbx3twdq_.log
oracle@dbidg01:/u02/oradata/DBIT121_SITE1/onlinelog/ [DBIT121] rm o1_mf_4_dbx3t840_.log o1_mf_5_dbx3tj3b_.log o1_mf_6_dbx3tp52_.log o1_mf_7_dbx3twdq_.log
oracle@dbidg01:/u02/oradata/DBIT121_SITE1/onlinelog/ [DBIT121] cd /u03/fast_recovery_area/DBIT121_SITE1/onlinelog/
oracle@dbidg01:/u03/fast_recovery_area/DBIT121_SITE1/onlinelog/ [DBIT121] ls -l
total 358428
-rw-r----- 1 oracle oinstall 52429312 Feb 24 09:42 o1_mf_1_d4fpnq4o_.log
-rw-r----- 1 oracle oinstall 52429312 Feb 24 09:42 o1_mf_2_d4fpo43q_.log
-rw-r----- 1 oracle oinstall 52429312 Feb 24 09:47 o1_mf_3_d4fppngb_.log
-rw-r----- 1 oracle oinstall 52429312 Feb 23 09:55 o1_mf_4_dbx3t89m_.log
-rw-r----- 1 oracle oinstall 52429312 Feb 23 09:55 o1_mf_5_dbx3tj8m_.log
-rw-r----- 1 oracle oinstall 52429312 Feb 23 09:55 o1_mf_6_dbx3tpb4_.log
-rw-r----- 1 oracle oinstall 52429312 Feb 23 09:55 o1_mf_7_dbx3twkt_.log
oracle@dbidg01:/u03/fast_recovery_area/DBIT121_SITE1/onlinelog/ [DBIT121] rm o1_mf_4_dbx3t89m_.log o1_mf_5_dbx3tj8m_.log o1_mf_6_dbx3tpb4_.log o1_mf_7_dbx3twkt_.log

-- recreate standby redo logs

SQL> alter database add STANDBY LOGFILE THREAD 1 GROUP 4 SIZE 50M BLOCKSIZE 512;

Database altered.

SQL> alter database add STANDBY LOGFILE THREAD 1 GROUP 5 SIZE 50M BLOCKSIZE 512;

Database altered.

SQL> alter database add STANDBY LOGFILE THREAD 1 GROUP 6 SIZE 50M BLOCKSIZE 512;

Database altered.

SQL> alter database add STANDBY LOGFILE THREAD 1 GROUP 7 SIZE 50M BLOCKSIZE 512;

Database altered.

Don’t forget to enable Flashback as well, if your DataGuard is running in Max availability mode.

SQL> alter database flashback on;

Database altered.

Now we need to recatalog all our backups and archivelogs again.

oracle@dbidg01:/u03/fast_recovery_area/DBIT121_SITE1/archivelog/2017_02_24/ [DBIT121] rman target /

Recovery Manager: Release 12.1.0.2.0 - Production on Fri Feb 24 09:50:16 2017

Copyright (c) 1982, 2014, Oracle and/or its affiliates.  All rights reserved.

connected to target database: DBIT121 (DBID=172831209)

RMAN> catalog recovery area;

using target database control file instead of recovery catalog
searching for all files in the recovery area

List of Files Unknown to the Database
=====================================
File Name: /u03/fast_recovery_area/DBIT121_SITE1/archivelog/2017_02_24/o1_mf_1_140_dbzswh06_.arc
File Name: /u03/fast_recovery_area/DBIT121_SITE1/archivelog/2017_02_24/o1_mf_1_141_dbzsxpv5_.arc
File Name: /u03/fast_recovery_area/DBIT121_SITE1/flashback/o1_mf_dbx641px_.flb
File Name: /u03/fast_recovery_area/DBIT121_SITE1/flashback/o1_mf_dbx642pf_.flb
File Name: /u03/fast_recovery_area/DBIT121_SITE1/flashback/o1_mf_dby398lz_.flb
File Name: /u03/fast_recovery_area/DBIT121_SITE1/flashback/o1_mf_dbymcg20_.flb
File Name: /u03/fast_recovery_area/DBIT121_SITE1/flashback/o1_mf_dbyyg1r0_.flb
File Name: /u03/fast_recovery_area/DBIT121_SITE1/controlfile/o1_mf_d4fjwsgr_.ctl.old
File Name: /u03/fast_recovery_area/DBIT121_SITE1/backupset/2017_02_23/o1_mf_annnn_TAG20170223T090854_dbx64pz6_.bkp
File Name: /u03/fast_recovery_area/DBIT121_SITE1/backupset/2017_02_23/o1_mf_annnn_TAG20170223T090854_dbx64q0b_.bkp
File Name: /u03/fast_recovery_area/DBIT121_SITE1/backupset/2017_02_23/o1_mf_nnndf_TAG20170223T090856_dbx64s0z_.bkp
File Name: /u03/fast_recovery_area/DBIT121_SITE1/backupset/2017_02_23/o1_mf_nnndf_TAG20170223T090856_dbx64s3n_.bkp
File Name: /u03/fast_recovery_area/DBIT121_SITE1/backupset/2017_02_23/o1_mf_nnsnf_TAG20170223T090856_dbx65kmx_.bkp
File Name: /u03/fast_recovery_area/DBIT121_SITE1/backupset/2017_02_23/o1_mf_ncnnf_TAG20170223T090856_dbx65lnt_.bkp
File Name: /u03/fast_recovery_area/DBIT121_SITE1/backupset/2017_02_23/o1_mf_annnn_TAG20170223T090923_dbx65mto_.bkp
File Name: /u03/fast_recovery_area/DBIT121_SITE1/backupset/2017_02_24/o1_mf_annnn_TAG20170224T080806_dbzpypdc_.bkp
File Name: /u03/fast_recovery_area/DBIT121_SITE1/backupset/2017_02_24/o1_mf_annnn_TAG20170224T080806_dbzpypfp_.bkp
File Name: /u03/fast_recovery_area/DBIT121_SITE1/backupset/2017_02_24/o1_mf_annnn_TAG20170224T080806_dbzpysqh_.bkp
File Name: /u03/fast_recovery_area/DBIT121_SITE1/backupset/2017_02_24/o1_mf_nnndf_TAG20170224T080812_dbzpyy2f_.bkp
File Name: /u03/fast_recovery_area/DBIT121_SITE1/backupset/2017_02_24/o1_mf_nnndf_TAG20170224T080812_dbzpyy56_.bkp
File Name: /u03/fast_recovery_area/DBIT121_SITE1/backupset/2017_02_24/o1_mf_nnsnf_TAG20170224T080812_dbzpzqnz_.bkp
File Name: /u03/fast_recovery_area/DBIT121_SITE1/backupset/2017_02_24/o1_mf_ncnnf_TAG20170224T080812_dbzpzqop_.bkp
File Name: /u03/fast_recovery_area/DBIT121_SITE1/backupset/2017_02_24/o1_mf_annnn_TAG20170224T080841_dbzpzskt_.bkp

Do you really want to catalog the above files (enter YES or NO)? YES
cataloging files...
cataloging done

List of Cataloged Files
=======================
File Name: /u03/fast_recovery_area/DBIT121_SITE1/archivelog/2017_02_24/o1_mf_1_140_dbzswh06_.arc
File Name: /u03/fast_recovery_area/DBIT121_SITE1/archivelog/2017_02_24/o1_mf_1_141_dbzsxpv5_.arc
File Name: /u03/fast_recovery_area/DBIT121_SITE1/backupset/2017_02_23/o1_mf_annnn_TAG20170223T090854_dbx64pz6_.bkp
File Name: /u03/fast_recovery_area/DBIT121_SITE1/backupset/2017_02_23/o1_mf_annnn_TAG20170223T090854_dbx64q0b_.bkp
File Name: /u03/fast_recovery_area/DBIT121_SITE1/backupset/2017_02_23/o1_mf_nnndf_TAG20170223T090856_dbx64s0z_.bkp
File Name: /u03/fast_recovery_area/DBIT121_SITE1/backupset/2017_02_23/o1_mf_nnndf_TAG20170223T090856_dbx64s3n_.bkp
File Name: /u03/fast_recovery_area/DBIT121_SITE1/backupset/2017_02_23/o1_mf_nnsnf_TAG20170223T090856_dbx65kmx_.bkp
File Name: /u03/fast_recovery_area/DBIT121_SITE1/backupset/2017_02_23/o1_mf_ncnnf_TAG20170223T090856_dbx65lnt_.bkp
File Name: /u03/fast_recovery_area/DBIT121_SITE1/backupset/2017_02_23/o1_mf_annnn_TAG20170223T090923_dbx65mto_.bkp
File Name: /u03/fast_recovery_area/DBIT121_SITE1/backupset/2017_02_24/o1_mf_annnn_TAG20170224T080806_dbzpypdc_.bkp
File Name: /u03/fast_recovery_area/DBIT121_SITE1/backupset/2017_02_24/o1_mf_annnn_TAG20170224T080806_dbzpypfp_.bkp
File Name: /u03/fast_recovery_area/DBIT121_SITE1/backupset/2017_02_24/o1_mf_annnn_TAG20170224T080806_dbzpysqh_.bkp
File Name: /u03/fast_recovery_area/DBIT121_SITE1/backupset/2017_02_24/o1_mf_nnndf_TAG20170224T080812_dbzpyy2f_.bkp
File Name: /u03/fast_recovery_area/DBIT121_SITE1/backupset/2017_02_24/o1_mf_nnndf_TAG20170224T080812_dbzpyy56_.bkp
File Name: /u03/fast_recovery_area/DBIT121_SITE1/backupset/2017_02_24/o1_mf_nnsnf_TAG20170224T080812_dbzpzqnz_.bkp
File Name: /u03/fast_recovery_area/DBIT121_SITE1/backupset/2017_02_24/o1_mf_ncnnf_TAG20170224T080812_dbzpzqop_.bkp
File Name: /u03/fast_recovery_area/DBIT121_SITE1/backupset/2017_02_24/o1_mf_annnn_TAG20170224T080841_dbzpzskt_.bkp

List of Files Which Were Not Cataloged
=======================================
File Name: /u03/fast_recovery_area/DBIT121_SITE1/flashback/o1_mf_dbx641px_.flb
  RMAN-07529: Reason: catalog is not supported for this file type
File Name: /u03/fast_recovery_area/DBIT121_SITE1/flashback/o1_mf_dbx642pf_.flb
  RMAN-07529: Reason: catalog is not supported for this file type
File Name: /u03/fast_recovery_area/DBIT121_SITE1/flashback/o1_mf_dby398lz_.flb
  RMAN-07529: Reason: catalog is not supported for this file type
File Name: /u03/fast_recovery_area/DBIT121_SITE1/flashback/o1_mf_dbymcg20_.flb
  RMAN-07529: Reason: catalog is not supported for this file type
File Name: /u03/fast_recovery_area/DBIT121_SITE1/flashback/o1_mf_dbyyg1r0_.flb
  RMAN-07529: Reason: catalog is not supported for this file type
File Name: /u03/fast_recovery_area/DBIT121_SITE1/controlfile/o1_mf_d4fjwsgr_.ctl.old
  RMAN-07519: Reason: Error while cataloging. See alert.log.

List of files in Recovery Area not managed by the database
==========================================================
File Name: /u03/fast_recovery_area/DBIT121_SITE1/onlinelog/o1_mf_4_dbzwt72f_.log
  RMAN-07527: Reason: File was not created using DB_RECOVERY_FILE_DEST initialization parameter
File Name: /u03/fast_recovery_area/DBIT121_SITE1/onlinelog/o1_mf_5_dbzwtgl3_.log
  RMAN-07527: Reason: File was not created using DB_RECOVERY_FILE_DEST initialization parameter
File Name: /u03/fast_recovery_area/DBIT121_SITE1/onlinelog/o1_mf_6_dbzwtn04_.log
  RMAN-07527: Reason: File was not created using DB_RECOVERY_FILE_DEST initialization parameter
File Name: /u03/fast_recovery_area/DBIT121_SITE1/onlinelog/o1_mf_7_dbzwtvc7_.log
  RMAN-07527: Reason: File was not created using DB_RECOVERY_FILE_DEST initialization parameter

number of files not managed by recovery area is 4, totaling 200.00MB

RMAN>

We are nearly done. We just need to enable the Data Guard broker again and enable fast start failover.

SQL> alter system set dg_broker_start=true;

System altered.

SQL> alter system archive log current;

System altered.

DGMGRL> enable fast_start failover;
Enabled.
DGMGRL> show configuration;

Configuration - DBIT121

 Protection Mode: MaxAvailability
 Members:
 DBIT121_SITE1 - Primary database
 DBIT121_SITE2 - (*) Physical standby database

Fast-Start Failover: ENABLED

Configuration Status:
SUCCESS (status updated 21 seconds ago)

DGMGRL> validate database 'DBIT121_SITE2';

 Database Role: Physical standby database
 Primary Database: DBIT121_SITE1

 Ready for Switchover: Yes
 Ready for Failover: Yes (Primary Running)

Re-register the database into the RMAN catalog.

oracle@dbidg01:/home/oracle/ [DBIT121] rman target sys/manager catalog rman/rman@rman

Recovery Manager: Release 12.1.0.2.0 - Production on Fri Feb 24 09:57:34 2017

Copyright (c) 1982, 2014, Oracle and/or its affiliates.  All rights reserved.

connected to target database: DBIT121 (DBID=172831209)
connected to recovery catalog database
recovery catalog schema release 12.02.00.01. is newer than RMAN release

RMAN> register database;

database registered in recovery catalog
starting full resync of recovery catalog
full resync complete

RMAN>

Ready. That’s it. In case your Standby Controlfile is not correct as well (was not in my case), you can now simply create a new standby controlfile on the primary and move it to the standby like documented in the following MOS note (Steps to recreate a Physical Standby Controlfile (Doc ID 459411.1).

Conclusion

As a last resort, sometimes it is needed to recreate the controlfile manually, but in case you have all the online redo logs and your datafiles in place, you can do it with noresetlogs. And take care that your RMAN retention is always higher than your control file record keep time.

 

Cet article Oracle 12c – Recreating a Controlfile in a Data Guard environment with noresetlogs est apparu en premier sur Blog dbi services.

12c Unified Auditing and AUDIT_TRAIL=DB in mixed mode

Thu, 2017-02-23 13:34

Oracle enables some auditing by default, and if you don’t do anything, the tables where it is stored will grow in SYSAUX. Don’t wait to get an alert when it is too late. Everything that fills something automatically must be managed to archive or purge automatically. If not, one day you will have a problem.

Imagine that you have 5 features doing something similar but in a different way because they were implemented one at a time. You want to stop this and have only 1 unified feature. That’s great. But you are also required to maintain compatibility with previous version, which means that you actually implemented a 5+1=6th feature :(

Unified Auditing

This exactly what happens with Unified Auditing. Because of this compatibility requirement, it is declined in two modes:

  • The ‘mixed mode’ that keeps all compatibility as the 5+1 case in my example
  • The ‘pure mode’ that do not take care of the past and is actually the one that unifies all. The real ‘Unified’ one.

You are in ‘mixed mode’ by default and you see it as if there is nothing new enabled:

SQL> select parameter,value from v$option where parameter='Unified Auditing';
 
PARAMETER VALUE
--------- -----
Unified Auditing FALSE

But there may be something enabled if the old auditing is enabled, because it is actually a mixed mode.

AUDIT_TRAIL=DB

Let me explain. I use the old auditing:

SQL> show parameter audit
NAME TYPE VALUE
---------------------------- ------- --------------------------------
audit_trail string DB

This means that I have the default audits (such as logon, logoff, ALTER/CREATE/DROP/GRANT ANY, and so on.
In addition to that, I enabled the audit of create table:

SQL> audit create table;
Audit succeeded.

I do some of these stuff and I can see info in the old audit trail:
SQL> select action_name,sql_text from dba_audit_trail;
 
ACTION_NAME SQL_TEXT
----------- --------
CREATE TABLE
LOGON
SELECT
LOGON
LOGOFF

If you are in that case, you probably manage this trail. Our recommandation is either to disable audit, or to manage it.

But once upgraded to 12c, did you think about managing the new unified audit trail?

SQL> select audit_type,unified_audit_policies,action_name,return_code,count(*) from unified_audit_trail group by audit_type,unified_audit_policies,action_name,return_code order by 1,2,3;
---- ------ ------------------------------------------------------------------ ---- ------------------- ----- -- --------------------------------------------------------- ----- -- ------
AUDIT_TYPE UNIFIED_AUDIT_POLICIES ACTION_NAME RETURN_CODE COUNT(*)
Standard ORA_LOGON_FAILURES LOGON 0 2
Standard ORA_LOGON_FAILURES LOGON 1017 1
Standard ORA_SECURECONFIG CREATE ROLE 0 1
Standard ORA_SECURECONFIG DROP ROLE 0 1
Standard EXECUTE 0 1

Even with Unified Auditing set to off, some operations are audited when AUDIT_TRAIL=DB. If you don’t want them you have to disable them:

noaudit policy ORA_SECURECONFIG;
noaudit policy ORA_LOGON_FAILURES;

As you see, in mixed mode the new unified auditing is enabled, and AUDIT_TRAIL is not ignored. This is the mode to use until you have migrated all your policies and audit trail queries to the new one. However you can see that in mixed mode, there is no double auditing but only new default policies. The old policies are only logged to to old audit trail.

But if you don’t use auditing, then you don’t want the mixed mode.

uniaud_on

This is done with an instance shutdown, relinking onLinux or renaming a ddl on Windows.


SQL> shutdown immediate;
ORACLE instance shut down.
SQL> host ( cd $ORACLE_HOME/rdbms/lib ; make -f ins_rdbms.mk uniaud_&2 ioracle ORACLE_HOME=$ORACLE_HOME )
/usr/bin/ar d /u01/app/oracle/product/12.2.0/dbhome_1/rdbms/lib/libknlopt.a kzanang.o
/usr/bin/ar cr /u01/app/oracle/product/12.2.0/dbhome_1/rdbms/lib/libknlopt.a /u01/app/oracle/product/12.2.0/dbhome_1/rdbms/lib/kzaiang.o
chmod 755 /u01/app/oracle/product/12.2.0/dbhome_1/bin
 
- Linking Oracle
...

And then you are in ‘pure mode':


SQL> select parameter,value from v$option where parameter='Unified Auditing';
 
PARAMETER VALUE
--------- -----
Unified Auditing TRUE

In that mode, AUDIT_TRAIL is ignored and you will never see new rows in the old AUD$:

SQL> select action_name,sql_text from dba_audit_trail;
 
no rows selected

However, as in the mixed mode you will have to manage the new audit trail. My best recommandation is to keep it and add a purge job. One day you may want to have a look at unsuccessful logins of the past few days. But you still have the choice to disable the default polices, and then the only things you will see are the operations done on the trail:

AUDIT_TYPE UNIFIED_AUDIT_POLICIES ACTION_NAME SQL_TEXT
---------- ---------------------- ----------- --------
Standard EXECUTE BEGIN dbms_audit_mgmt.flush_unified_audit_trail; END;^@
Standard EXECUTE BEGIN dbms_audit_mgmt.clean_audit_trail(audit_trail_type => dbms_audit_mgmt.audi
Standard EXECUTE BEGIN dbms_audit_mgmt.flush_unified_audit_trail; END;^@

The reason is that if a hacker getting super administrator rights has tried to whipe his traces, then at least this suspect operation remains.

Test it

To validate this blog post, I’ve tested all scenarios on 12.2.0.1 with the combination of:

  • audit_trail=db or audit_trail=none
  • uniaud_on or uniaud_off
  • audit or noaudit policy for ORA_SECURECONFIG and ORA_LOGON_FAILURES

For each combination, I’ve purged both audit trails (AUD$ and AUD$UNIFIED) and run a few statements that are logged by default or by explicit audit.

So what?

Basically, the recommandation is still the same as before: either disable the audit or schedule a purge. There is no purge by default because auditing is different than logging. When your security policy is to audit some operations, they must not be purged before being archived, or processed.

When you upgrade to 12c:

  1. If you want to manage only the old audit, then you should disable ORA_LOGON_FAILURES and ORA_SECURECONFIG.
  2. If you want to manage both, then add a job to purge the unified audit trail (audit_trail_type=>dbms_audit_mgmt.audit_trail_unified).
  3. If you don’t use the old auditing, then enable the ‘pure mode’. But then, AUDIT_TRAIL=NONE is ignored, so:
  4. If you don’t use the new unified auditing, then disable ORA_LOGON_FAILURES and ORA_SECURECONFIG.
  5. Or use the new unified auditing and set a job to purge it regularly.

And control the growth of SYSAUX:

SQL> select occupant_name,schema_name,occupant_desc,space_usage_kbytes from v$sysaux_occupants where occupant_name like 'AUD%';
 
OCCUPANT_NAME SCHEMA_NAME OCCUPANT_DESC SPACE_USAGE_KBYTES
------------- ----------- ------------- ------------------
AUDSYS AUDSYS AUDSYS schema objects 1280
AUDIT_TABLES SYS DB audit tables 0

SYS ‘DB audit tables’ is the old one, filled in ‘mixed mode’ only. AUDSYS ‘AUDSYS schema objects’ is the new unified one, filled in both modes.

But I have something to add. The default policies do not audit something that you are supposed to do so frequently, it should not fills hundreds of MB before several decades.
If you get this during the last hour:
SQL> select audit_type,unified_audit_policies,action_name,return_code,count(*)
2 from unified_audit_trail where event_timestamp>sysdate-1
3 group by audit_type,unified_audit_policies,action_name,return_code
4 order by count(*);
AUDIT_TYPE UNIFIED_AUDIT_POLICIES ACTION_NAME RETURN_CODE COUNT(*)
---------- ---------------------- ----------- ----------- --------
Standard AUDIT 0 2
Standard EXECUTE 0 4
Standard ORA_SECURECONFIG CREATE ROLE 0 9268
Standard ORA_LOGON_FAILURES LOGON 1017 348

then the problem is not auditing but an attack, either from a hacker of because of your application design connecting for each execution or running DDL all the time.

 

Cet article 12c Unified Auditing and AUDIT_TRAIL=DB in mixed mode est apparu en premier sur Blog dbi services.

BLOG Technoark 2017 Conference – Blockchain, beyond the bitcoin.

Thu, 2017-02-23 07:15

BLOG Technoark 2017 Conference – Blockchain, beyond the bitcoin.
and
How applying Blockchain for Enterprise security and data management.
BlockC1

I post a brief personal impression of the TechnoArk Conference 2017 in TechnoArk center of Sierre /Switzerland)
This TechnoArk Center is based on three essentials parts:

• The Ark Accelerator leverages new technologies and guides them to a commercial application. It is also responsible, if necessary, for patenting them.

• The Ark Incubator offers professional coaching to start-up companies. It strengthens the chances of success and promotes their insertion in the economic fabric.

• The Innovation department offers proactive support in the consolidation of your achievements, but especially in the development of new business potential.

Sum-up of this event :

Born with Bitcoins in 2008 and created by Natoshi Sakamoto, Blockchain is a technologic concept, and not a ”software”. Rather it’s a new techno and new decentralized organization.
Blockchain is in same time a peer2peer network, and a distributed database system, which allows to the historic listing of transactions between 2 parts, to become unfalsifiable.
The aim of this event was to show that we can use blockchain in several domains like aeronautic, health, lawyer, energy, eGovernment, gaming and so on.
One presentation per domain was carried out.
Keynotes were organized around case studies, among special interests:
Fintech : it is seen as a foundation techno but it will take time to be seen as a standard. There’s no real application, all in beta version. Smart contract can be used, they could allow real time transaction, immediate reconciliation between traders and sellers.
eHealth : blockchain can be used with Hyperledger software on follow up for oncology as it is a long term disease that implies multiple people, doctors, pharmacists, hospital executives…
eGov : Mike Gault from Guardtime explains how Estonian gov. already used blockchain for Enterprise security and data management.
Energy : Stephan Tual, (founder of the startup slock.it) has explained how blockchains could be used in the service of renewable energies. Notably with regard to the billing with energy consumed.
Aeronautic : Stephane Cheickh explained how blockchains could be useful in civil aviation supervision of the use of drones by businesses and individuals, but also in the management of baggage.
• Others keynotes belonged to a very particular field (like gaming, the legal one and so on)

What we heard :
Blockchain is more than a Technology; It’s a strategy …
Blockchain is the most disruptive technology I have ever seen…

Technical part:

As it is very difficult to find precise technical information about blockchains, let me give you some interesting elements to have in memory for the future markets and contracts:

Definition:
A blockchain is a ledger of records arranged in data batches called blocks (or blocs) that use cryptographic validation to link themselves together.
Put simply, each block references and identifies the previous block by a hashing function, forming an unbroken chain, hence the name.

BlockC2Source : Blockchain France ©

 

Blocks in details :

Block-d-C3

Source : Blockchain France ©

The operation of a ledger can be described like this:

BlockC4Source : Blockchain France ©

 

• We can immediately see that Blockchain’s ledger doesn’t store “data” itself but fingerprints only.
• The second point proves the existence of an object at precise moment, but not authenticity.

 

What is important to remember is that a blockchain is characterized by the 4 following features:

BlockC5Source : Blockchain France ©

• usually contains financial transactions
• is replicated across a number of systems in almost real-time
• usually exists over a peer-to-peer network
• uses cryptography and digital signatures to prove identity, authenticity and enforce read/write access rights
• can be written by certain participants
• can be read by certain participants, maybe a wider audience, and
• has mechanisms to make it hard to change historical records, or at least make it easy to detect when someone is trying to do so

 

Additionals informations :

A string of blocks is a kind of independent, transparent and permanent database coexisting in several places and shared by a community. That is why it is sometimes called a Mutual Distributed Book (MDL) too.
There is nothing new about MDL, the origins of which date back to the 1976 Diffie-Hellman research paper New Directions in Cryptography. But for a long time they were considered complicated and not quite safe.
It took the Blockchain implementation simpler in Bitcoin to change things. The permanence, security and distributed nature of Bitcoin ensured that it was a currency maintained by a growing community but controlled by absolutely no one and incapable of being manipulated.
Throughout this event, it has been objectively demonstrated that this concept has many advantages.
On the other hand, it is more complicated regarding the Swiss law (and law in general), which is not yet ready to absorb the paradigm shifts induced by this new technology. The impacts on society are also not to be overlooked.

Conclusion:
Today big companies are cautiously launching into the use of this concept because they believe that the first to use the blockchains will be ahead of the competition.
And if this concept was to become dangerous, the money invested would have served to better understand the danger.

Ludovic HAYE

dbi-services consultant

 

 

Cet article BLOG Technoark 2017 Conference – Blockchain, beyond the bitcoin. est apparu en premier sur Blog dbi services.

Pages