Yann Neuhaus

Subscribe to Yann Neuhaus feed
dbi services technical blog
Updated: 1 hour 3 min ago

Vertically scale your PostgreSQL infrastructure with pgpool – 1 – Basic setup and watchdog configuration

Thu, 2017-03-16 12:21

I have written some posts on how you can make your PostgreSQL deployment high available by using PostgreSQL’s streaming replication feature in the past ( 1, 2 ). The main issue you’ll have to resolve with such a setup is how the application can be made aware of a new master when a fail over happened. You could use EDB Failover Manager (1, 2, 3, 4) for that because it provides the functionality to move a VIP from one host to another so the application can always connect to the very same IP address no matter where the current master is running (EDB EFM requires a subscription). You could also use Pacemaker and Corosync for that. But, which is the scope of this post, you can also use pgpool which is widely known in the PostgreSQL community. When you configure it the right way you can even spread your read operations over all hot standby servers in your configuration and only write operations go to the master. This allows you to vertically scale your PostgreSQL deployment by adding more standby nodes when you need more resources. Lets go …

To start with a picture is always a good idea. This is what we want to setup:

pgpool-architecture

We will have two nodes dedicated to pgpool (centos7_pgpool_m1/m2). pgpool will be running in a watchdog configuration so that one node can take over in case the other goes down. pgpool will provide a virtual IP address for the clients to connect to (which fails over to the surviving node in case a node goes down for any reason). In the background there are two nodes which host the PostgreSQL 9.6.2 primary and hot standby instances (centos7_pgpool_1/2). At the very beginning the master is running on centos7_pgpool_1 although that does not really matter once the whole setup is completed.

I’ll not describe the setup of the PostgreSQL master->standby setup. When you need assistance there take a look here, here or search the web, there are many great howtos.

Lets start by installing pgpool onto the hosts dedicated for pgpool (centos7_pgpool_m1/m2):

You can download pgpool here. As pgpool requires libpq we’ll just install the PostgreSQL binaries on the hosts dedicated for pgpool as well before proceeding with the installation of pgpool. Of course these steps need to be done on both hosts (centos7_pgpool_m1/m2):

[root@centos7_pgpool_m1 ~]$ groupadd postgres
[root@centos7_pgpool_m1 ~]$ useradd -g postgres postgres
[root@centos7_pgpool_m1 ~]$ passwd postgres
[root@centos7_pgpool_m1 ~]$ mkdir -p /u01/app/postgres/software
[root@centos7_pgpool_m1 ~]$ chown -R postgres:postgres /u01/app/postgres
[root@centos7_pgpool_m1 ~]$ su - postgres
[postgres@centos7_pgpool_m1 ~]$ cd /u01/app/postgres/software/
[postgres@centos7_pgpool_m1 software]$ wget https://ftp.postgresql.org/pub/source/v9.6.2/postgresql-9.6.2.tar.bz2
[postgres@centos7_pgpool_m1 software]$ tar -axf postgresql-9.6.2.tar.bz2
[postgres@centos7_pgpool_m1 software]$ cd postgresql-9.6.2
[postgres@centos7_pgpool_m1 postgresql-9.6.2]$ PGHOME=/u01/app/postgres/product/96/db_2
[postgres@centos7_pgpool_m1 postgresql-9.6.2]$ SEGSIZE=2
[postgres@centos7_pgpool_m1 postgresql-9.6.2]$ BLOCKSIZE=8
[postgres@centos7_pgpool_m1 postgresql-9.6.2]$ WALSEGSIZE=16
[postgres@centos7_pgpool_m1 postgresql-9.6.2]$ ./configure --prefix=${PGHOME} \
>             --exec-prefix=${PGHOME} \
>             --bindir=${PGHOME}/bin \
>             --libdir=${PGHOME}/lib \
>             --sysconfdir=${PGHOME}/etc \
>             --includedir=${PGHOME}/include \
>             --datarootdir=${PGHOME}/share \
>             --datadir=${PGHOME}/share \
>             --with-pgport=5432 \
>             --with-perl \
>             --with-python \
>             --with-tcl \
>             --with-openssl \
>             --with-pam \
>             --with-ldap \
>             --with-libxml \
>             --with-libxslt \
>             --with-segsize=${SEGSIZE} \
>             --with-blocksize=${BLOCKSIZE} \
>             --with-wal-segsize=${WALSEGSIZE}  \
>             --with-extra-version=" dbi services build"
[postgres@centos7_pgpool_m1 postgresql-9.6.2]$ make world
[postgres@centos7_pgpool_m1 postgresql-9.6.2]$ make install-world
[postgres@centos7_pgpool_m1 postgresql-9.6.2]$ cd ..
[postgres@centos7_pgpool_m1 software]$ rm -rf postgresql-9.6.2*
### download pgpool
[postgres@centos7_pgpool_m1 software]$ ls
pgpool-II-3.6.1.tar.gz
[postgres@centos7_pgpool_m1 software]$ tar -axf pgpool-II-3.6.1.tar.gz 
[postgres@centos7_pgpool_m1 software]$ cd pgpool-II-3.6.1
[postgres@centos7_pgpool_m1 pgpool-II-3.6.1]$ export PATH=/u01/app/postgres/product/96/db_2/bin/:$PATH
[postgres@centos7_pgpool_m1 pgpool-II-3.6.1]$ ./configure --prefix=/u01/app/postgres/product/pgpool-II
[postgres@centos7_pgpool_m1 pgpool-II-3.6.1]$ make
[postgres@centos7_pgpool_m1 pgpool-II-3.6.1]$ make install
[postgres@centos7_pgpool_m1 pgpool-II-3.6.1]$ cd src/sql/pgpool-recovery/
[postgres@centos7_pgpool_m1 pgpool-II-3.6.1]$ make
[postgres@centos7_pgpool_m1 pgpool-recovery]$ make install
[postgres@centos7_pgpool_m1 pgpool-recovery]$ cd ../pgpool-regclass/
[postgres@centos7_pgpool_m1 pgpool-regclass]$ make
[postgres@centos7_pgpool_m1 pgpool-regclass]$ make install

Copy the generated extensions to the PostgreSQL master and standby servers:

[postgres@centos7_pgpool_m1 ~]$ cd /u01/app/postgres/software/pgpool-II-3.6.1
# master node
[postgres@centos7_pgpool_m1 pgpool-II-3.6.1]$ scp /u01/app/postgres/product/96/db_2/share/extension/pgpool_recovery.control 192.168.22.34:/u01/app/postgres/product/96/db_2/share/extension/pgpool_recovery.control
[postgres@centos7_pgpool_m1 pgpool-II-3.6.1]$ scp /u01/app/postgres/product/96/db_2/share/extension/pgpool_recovery--1.1.sql 192.168.22.34:/u01/app/postgres/product/96/db_2/share/extension/pgpool_recovery--1.1.sql
[postgres@centos7_pgpool_m1 pgpool-II-3.6.1]$ scp /u01/app/postgres/product/96/db_2/share/extension/pgpool-recovery.sql 192.168.22.34:/u01/app/postgres/product/96/db_2/share/extension/pgpool-recovery.sql
[postgres@centos7_pgpool_m1 pgpool-II-3.6.1]$ scp /u01/app/postgres/product/96/db_2/lib/pgpool-recovery.so 192.168.22.34:/u01/app/postgres/product/96/db_2/lib/pgpool-recovery.so
[postgres@centos7_pgpool_m1 pgpool-II-3.6.1]$ scp /u01/app/postgres/product/96/db_2/share/extension/pgpool_regclass--1.0.sql 192.168.22.34:/u01/app/postgres/product/96/db_2/share/extension/pgpool_regclass--1.0.sql
[postgres@centos7_pgpool_m1 pgpool-II-3.6.1]$ scp /u01/app/postgres/product/96/db_2/share/extension/pgpool_regclass.control 192.168.22.34:/u01/app/postgres/product/96/db_2/share/extension/pgpool_regclass.control
[postgres@centos7_pgpool_m1 pgpool-II-3.6.1]$ scp /u01/app/postgres/product/96/db_2/lib/pgpool-regclass.so 192.168.22.34:/u01/app/postgres/product/96/db_2/lib/pgpool-regclass.so
# standby node
[postgres@centos7_pgpool_m1 pgpool-II-3.6.1]$ scp /u01/app/postgres/product/96/db_2/share/extension/pgpool_recovery.control 192.168.22.35:/u01/app/postgres/product/96/db_2/share/extension/pgpool_recovery.control
[postgres@centos7_pgpool_m1 pgpool-II-3.6.1]$ scp /u01/app/postgres/product/96/db_2/share/extension/pgpool_recovery--1.1.sql 192.168.22.35:/u01/app/postgres/product/96/db_2/share/extension/pgpool_recovery--1.1.sql
[postgres@centos7_pgpool_m1 pgpool-II-3.6.1]$ scp /u01/app/postgres/product/96/db_2/share/extension/pgpool-recovery.sql 192.168.22.35:/u01/app/postgres/product/96/db_2/share/extension/pgpool-recovery.sql
[postgres@centos7_pgpool_m1 pgpool-II-3.6.1]$ scp /u01/app/postgres/product/96/db_2/lib/pgpool-recovery.so 192.168.22.35:/u01/app/postgres/product/96/db_2/lib/pgpool-recovery.so
[postgres@centos7_pgpool_m1 pgpool-II-3.6.1]$ scp /u01/app/postgres/product/96/db_2/share/extension/pgpool_regclass--1.0.sql 192.168.22.35:/u01/app/postgres/product/96/db_2/share/extension/pgpool_regclass--1.0.sql
[postgres@centos7_pgpool_m1 pgpool-II-3.6.1]$ scp /u01/app/postgres/product/96/db_2/share/extension/pgpool_regclass.control 192.168.22.35:/u01/app/postgres/product/96/db_2/share/extension/pgpool_regclass.control
[postgres@centos7_pgpool_m1 pgpool-II-3.6.1]$ scp /u01/app/postgres/product/96/db_2/lib/pgpool-regclass.so 192.168.22.35:/u01/app/postgres/product/96/db_2/lib/pgpool-regclass.so

Install the extensions on the master node only (this will be replicated to the standby node automatically as the PostgreSQL instances already operate in hot_standby mode):

postgres@pgpool1:/u01/app/postgres/product/96/db_2/ [PG1] psql template1
psql (9.6.2 dbi services build)
Type "help" for help.

(postgres@[local]:5432) [template1] > create extension pgpool_recovery;
CREATE EXTENSION
(postgres@[local]:5432) [template1] > create extension pgpool_regclass;
CREATE EXTENSION
(postgres@[local]:5432) [template1] > \dx
                                List of installed extensions
      Name       | Version |   Schema   |                    Description                     
-----------------+---------+------------+----------------------------------------------------
 pgpool_recovery | 1.1     | public     | recovery functions for pgpool-II for V3.4 or later
 pgpool_regclass | 1.0     | public     | replacement for regclass
 plpgsql         | 1.0     | pg_catalog | PL/pgSQL procedural language
(3 rows)

Create the pgpool.conf configuration file on both nodes. For node 1 (centos7_pgpool_m1):

echo "echo "listen_addresses = '*'
port = 5432
socket_dir = '/tmp'
pcp_port = 9898
pcp_socket_dir = '/tmp'
backend_hostname0 = '192.168.22.34'
backend_port0 = 5432
backend_weight0 = 1
backend_data_directory0 = '/u02/pgdata/PG1'
backend_flag0 = 'ALLOW_TO_FAILOVER'
backend_hostname1 = '192.168.22.35'
backend_port1 = 5432
backend_weight1 = 1
backend_data_directory1 = '/u02/pgdata/PG1'
backend_flag1 = 'ALLOW_TO_FAILOVER'
enable_pool_hba = off
pool_passwd = 'pool_passwd'
authentication_timeout = 60
ssl = off
num_init_children = 32
max_pool = 4
child_life_time = 300
child_max_connections = 0
connection_life_time = 0
client_idle_limit = 0
log_destination = 'stderr'
print_timestamp = on
log_connections = off
log_hostname = off
log_statement = off
log_per_node_statement = off
log_standby_delay = 'none'
syslog_facility = 'LOCAL0'
syslog_ident = 'pgpool'
debug_level = 0
pid_file_name = '/tmp/pgpool.pid'
logdir = '/tmp/pgpool'
connection_cache = on
reset_query_list = 'ABORT; DISCARD ALL'
replication_mode = off
replicate_select = off
insert_lock = on
lobj_lock_table = ''
replication_stop_on_mismatch = off
failover_if_affected_tuples_mismatch = off
load_balance_mode = off
ignore_leading_white_space = on
white_function_list = ''
black_function_list = 'nextval,setval'
master_slave_mode = on
master_slave_sub_mode = 'stream'
sr_check_period = 0
sr_check_user = 'postgres'
sr_check_password = ''
delay_threshold = 0
follow_master_command = ''
parallel_mode = off
pgpool2_hostname = 'centos7_pgpool_m2'
system_db_hostname  = 'localhost'
system_db_port = 5432
system_db_dbname = 'pgpool'
system_db_schema = 'pgpool_catalog'
system_db_user = 'pgpool'
system_db_password = ''
health_check_period = 20
health_check_timeout = 20
health_check_user = 'postgres'
health_check_password = ''
health_check_max_retries = 0
health_check_retry_delay = 1
failover_command = '/home/postgres/failover.sh %d "%h" %p %D %m %M "%H" %P'
failback_command = ''
fail_over_on_backend_error = on
search_primary_node_timeout = 10
recovery_user = 'postgres'
recovery_password = ''
recovery_1st_stage_command = 'resync_master.sh'
recovery_2nd_stage_command = ''
recovery_timeout = 90
client_idle_limit_in_recovery = 0
use_watchdog = on
trusted_servers = ''
ping_path = '/usr/bin'
wd_hostname = 'centos7_pgpool_m1'
wd_port = 9000
wd_authkey = ''
other_pgpool_hostname0 = 'centos7_pgpool_m2'
other_pgpool_port0 = 5432
other_wd_port0 = 9000
delegate_IP = '192.168.22.38'
ifconfig_path = '/usr/bin'
if_up_cmd = 'ifconfig enp0s8:0 inet \$_IP_\$ netmask 255.255.255.0'
if_down_cmd = 'ifconfig enp0s8:0 down'
arping_path = '/usr/sbin'
arping_cmd = 'arping -U \$_IP_\$ -w 1'
clear_memqcache_on_escalation = on
wd_escalation_command = ''
wd_lifecheck_method = 'heartbeat'
wd_interval = 10
wd_heartbeat_port = 9694
wd_heartbeat_keepalive = 2
wd_heartbeat_deadtime = 30
heartbeat_destination0 = 'host0_ip1'
heartbeat_destination_port0 = 9694
heartbeat_device0 = ''
wd_life_point = 3
wd_lifecheck_query = 'SELECT 1'
wd_lifecheck_dbname = 'template1'
wd_lifecheck_user = 'nobody'
wd_lifecheck_password = ''
relcache_expire = 0
relcache_size = 256
check_temp_table = on
memory_cache_enabled = off
memqcache_method = 'shmem'
memqcache_memcached_host = 'localhost'
memqcache_memcached_port = 11211
memqcache_total_size = 67108864
memqcache_max_num_cache = 1000000
memqcache_expire = 0
memqcache_auto_cache_invalidation = on
memqcache_maxcache = 409600
memqcache_cache_block_size = 1048576
memqcache_oiddir = '/var/log/pgpool/oiddir'
white_memqcache_table_list = ''
black_memqcache_table_list = ''
" > /u01/app/postgres/product/pgpool-II/etc/pgpool.conf

For node 2 (centos7_pgpool_m2):

echo "echo "listen_addresses = '*'
port = 5432
socket_dir = '/tmp'
pcp_port = 9898
pcp_socket_dir = '/tmp'
backend_hostname0 = '192.168.22.34'
backend_port0 = 5432
backend_weight0 = 1
backend_data_directory0 = '/u02/pgdata/PG1'
backend_flag0 = 'ALLOW_TO_FAILOVER'
backend_hostname1 = '192.168.22.35'
backend_port1 = 5432
backend_weight1 = 1
backend_data_directory1 = '/u02/pgdata/PG1'
backend_flag1 = 'ALLOW_TO_FAILOVER'
enable_pool_hba = off
pool_passwd = 'pool_passwd'
authentication_timeout = 60
ssl = off
num_init_children = 32
max_pool = 4
child_life_time = 300
child_max_connections = 0
connection_life_time = 0
client_idle_limit = 0
log_destination = 'stderr'
print_timestamp = on
log_connections = off
log_hostname = off
log_statement = off
log_per_node_statement = off
log_standby_delay = 'none'
syslog_facility = 'LOCAL0'
syslog_ident = 'pgpool'
debug_level = 0
pid_file_name = '/tmp/pgpool.pid'
logdir = '/tmp/pgpool'
connection_cache = on
reset_query_list = 'ABORT; DISCARD ALL'
replication_mode = off
replicate_select = off
insert_lock = on
lobj_lock_table = ''
replication_stop_on_mismatch = off
failover_if_affected_tuples_mismatch = off
load_balance_mode = off
ignore_leading_white_space = on
white_function_list = ''
black_function_list = 'nextval,setval'
master_slave_mode = on
master_slave_sub_mode = 'stream'
sr_check_period = 0
sr_check_user = 'postgres'
sr_check_password = ''
delay_threshold = 0
follow_master_command = ''
parallel_mode = off
pgpool2_hostname = 'centos7_pgpool_m2'
system_db_hostname  = 'localhost'
system_db_port = 5432
system_db_dbname = 'pgpool'
system_db_schema = 'pgpool_catalog'
system_db_user = 'pgpool'
system_db_password = ''
health_check_period = 20
health_check_timeout = 20
health_check_user = 'postgres'
health_check_password = ''
health_check_max_retries = 0
health_check_retry_delay = 1
failover_command = '/home/postgres/failover.sh %d "%h" %p %D %m %M "%H" %P'
failback_command = ''
fail_over_on_backend_error = on
search_primary_node_timeout = 10
recovery_user = 'postgres'
recovery_password = ''
recovery_1st_stage_command = 'resync_master.sh'
recovery_2nd_stage_command = ''
recovery_timeout = 90
client_idle_limit_in_recovery = 0
use_watchdog = on
trusted_servers = ''
ping_path = '/usr/bin'
wd_hostname = 'centos7_pgpool_m2'
wd_port = 9000
wd_authkey = ''
other_pgpool_hostname0 = 'centos7_pgpool_m1'
other_pgpool_port0 = 5432
other_wd_port0 = 9000
delegate_IP = '192.168.22.38'
ifconfig_path = '/usr/sbin'
if_up_cmd = 'ifconfig enp0s8:0 inet \$_IP_\$ netmask 255.255.255.0'
if_down_cmd = 'ifconfig enp0s8:0 down'
arping_path = '/usr/sbin'
arping_cmd = 'arping -U \$_IP_\$ -w 1'
clear_memqcache_on_escalation = on
wd_escalation_command = ''
wd_lifecheck_method = 'heartbeat'
wd_interval = 10
wd_heartbeat_port = 9694
wd_heartbeat_keepalive = 2
wd_heartbeat_deadtime = 30
heartbeat_destination0 = 'host0_ip1'
heartbeat_destination_port0 = 9694
heartbeat_device0 = ''
wd_life_point = 3
wd_lifecheck_query = 'SELECT 1'
wd_lifecheck_dbname = 'template1'
wd_lifecheck_user = 'nobody'
wd_lifecheck_password = ''
relcache_expire = 0
relcache_size = 256
check_temp_table = on
memory_cache_enabled = off
memqcache_method = 'shmem'
memqcache_memcached_host = 'localhost'
memqcache_memcached_port = 11211
memqcache_total_size = 67108864
memqcache_max_num_cache = 1000000
memqcache_expire = 0
memqcache_auto_cache_invalidation = on
memqcache_maxcache = 409600
memqcache_cache_block_size = 1048576
memqcache_oiddir = '/var/log/pgpool/oiddir'
white_memqcache_table_list = ''
black_memqcache_table_list = ''
" > /u01/app/postgres/product/pgpool-II/etc/pgpool.conf

For switching the VIP from one host to another pgpool must be able to bring up and shutdown the virtual interface. You could use sudo for that or change the suid bit on the ifconfig and arping binaries:

[postgres@centos7_pgpool_m1 pgpool-II]$ sudo chmod u+s /usr/sbin/arping
[postgres@centos7_pgpool_m1 pgpool-II]$ sudo chmod u+s /sbin/ifconfig

The other important configuration file for pgpool is the pcp.conf file. This file holds the authentication for pgpool itself and requires a user name and a md5 hashed password. To generate the password you can use the pg_md5 utility which comes with the installation of pgpool:

[postgres@centos7_pgpool_m1 ~]$ /u01/app/postgres/product/pgpool-II/bin/pg_md5 --prompt --username postgres
password: 
e8a48653851e28c69d0506508fb27fc5

Once you have the hashed password we can create the pcp.conf file (on both pgpool nodes of course):

[postgres@centos7_pgpool_m1 ~]$ echo "postgres:e8a48653851e28c69d0506508fb27fc5" > /u01/app/postgres/product/pgpool-II/etc/pcp.conf

Before doing anything else we need to allow connections from the pgpool nodes to the database nodes by adjusting the pg_hba.conf file for both PostgreSQL instances. On both nodes:

postgres@pgpool1:/home/postgres/ [PG1] echo "host    all             postgres        192.168.22.36/32         trust" >> /u02/pgdata/PG1/pg_hba.conf
postgres@pgpool1:/home/postgres/ [PG1] echo "host    all             postgres        192.168.22.37/32         trust" >> /u02/pgdata/PG1/pg_hba.conf
postgres@pgpool1:/home/postgres/ [PG1] pg_ctl -D /u02/pgdata/PG1/ reload

Before we start pgpool on both pgpool nodes lets take a look at the important watchdog parameters on node 1:

ping_path = '/usr/bin'
wd_hostname = 'centos7_pgpool_m2'
wd_port = 9000
wd_authkey = ''
other_pgpool_hostname0 = 'centos7_pgpool_m1'
other_pgpool_port0 = 5432
other_wd_port0 = 9000
delegate_IP = '192.168.22.38'
ifconfig_path = '/usr/sbin'
if_up_cmd = 'ifconfig enp0s8:0 inet \$_IP_\$ netmask 255.255.255.0'
if_down_cmd = 'ifconfig enp0s8:0 down'
arping_path = '/usr/sbin'
arping_cmd = 'arping -U \$_IP_\$ -w 1

The various *path* variables are obvious, they tell pgpool where to find the binaries for ping, arping and ifconfig (you can also use the ip command instead). The other0* variables specify which other host runs a pgpool instance on which pgpool and watchdog ports. This is essential for the communication between the two pgpool hosts. And then we have the commands to bring up the virtual interface and to bring it down (if_up_cmd,if_down_cmd). In addition we need an address for the virtual interface which is specified by the “delegate_IP” variable. Lets see if it works and start pgpool on both nodes:

# node 1
[postgres@centos7_pgpool_m1 ~]$ /u01/app/postgres/product/pgpool-II/bin/pgpool
[postgres@centos7_pgpool_m1 ~]$ 
# node 2
[postgres@centos7_pgpool_m2 ~]$ /u01/app/postgres/product/pgpool-II/bin/pgpool
[postgres@centos7_pgpool_m2 ~]$ 

Looks not so bad as no issues are printed to the screen. When everything went fine we should see the a new virtual IP (192.168.22.38) on one of the nodes (node2 in my case):

[postgres@centos7_pgpool_m2 ~]$ ip a
1: lo:  mtu 65536 qdisc noqueue state UNKNOWN qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: enp0s3:  mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 08:00:27:d6:95:ab brd ff:ff:ff:ff:ff:ff
    inet 10.0.2.15/24 brd 10.0.2.255 scope global dynamic enp0s3
       valid_lft 85216sec preferred_lft 85216sec
    inet6 fe80::a00:27ff:fed6:95ab/64 scope link 
       valid_lft forever preferred_lft forever
3: enp0s8:  mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 08:00:27:5c:b0:e5 brd ff:ff:ff:ff:ff:ff
    inet 192.168.22.37/24 brd 192.168.22.255 scope global enp0s8
       valid_lft forever preferred_lft forever
    inet 192.168.22.38/24 brd 192.168.22.255 scope global secondary enp0s8:0
       valid_lft forever preferred_lft forever
    inet6 fe80::a00:27ff:fe5c:b0e5/64 scope link tentative dadfailed 
       valid_lft forever preferred_lft forever

When we shutdown pgpool on the node where the VIP is currently running it should be switched to the other node automatically, so shutdown pgpool on the node where it is running currently:

[postgres@centos7_pgpool_m2 ~]$ /u01/app/postgres/product/pgpool-II/bin/pgpool -m fast stop
2017-03-16 17:54:02: pid 2371: LOG:  stop request sent to pgpool. waiting for termination...
.done.

Check the other host for the VIP:

[postgres@centos7_pgpool_m1 ~]$ ip a
1: lo:  mtu 65536 qdisc noqueue state UNKNOWN qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: enp0s3:  mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 08:00:27:d6:95:ab brd ff:ff:ff:ff:ff:ff
    inet 10.0.2.15/24 brd 10.0.2.255 scope global dynamic enp0s3
       valid_lft 85067sec preferred_lft 85067sec
    inet6 fe80::a00:27ff:fed6:95ab/64 scope link 
       valid_lft forever preferred_lft forever
3: enp0s8:  mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 08:00:27:5c:b0:e5 brd ff:ff:ff:ff:ff:ff
    inet 192.168.22.36/24 brd 192.168.22.255 scope global enp0s8
       valid_lft forever preferred_lft forever
    inet 192.168.22.38/24 brd 192.168.22.255 scope global secondary enp0s8:0
       valid_lft forever preferred_lft forever
    inet6 fe80::a00:27ff:fe5c:b0e5/64 scope link tentative dadfailed 
       valid_lft forever preferred_lft forever

Cool, now we have a VIP the application can connect to which switches between the pgpool hosts automatically in case the host where it currently runs on experiences an issue or is shutdown intentionally. There is a pcp command which shows you more details in regards to the watchdog:

[postgres@centos7_pgpool_m1 ~]$ /u01/app/postgres/product/pgpool-II/bin/pcp_watchdog_info 
Password: 
2 YES centos7_pgpool_m1:5432 Linux centos7_pgpool_m1 centos7_pgpool_m1

centos7_pgpool_m1:5432 Linux centos7_pgpool_m1 centos7_pgpool_m1 5432 9000 4 MASTER
centos7_pgpool_m2:5432 Linux centos7_pgpool_m2 centos7_pgpool_m2 5432 9000 7 STANDBY

As we now have a VIP we should be able to connect to the PostgreSQL backends by connecting to this VIP:

[postgres@centos7_pgpool_m1 ~]$ psql -h 192.168.22.38
psql (9.6.2 dbi services build)
Type "help" for help.

postgres=# \l
                                  List of databases
   Name    |  Owner   | Encoding |   Collate   |    Ctype    |   Access privileges   
-----------+----------+----------+-------------+-------------+-----------------------
 postgres  | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | 
 template0 | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres          +
           |          |          |             |             | postgres=CTc/postgres
 template1 | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres          +
           |          |          |             |             | postgres=CTc/postgres
(3 rows)

Ok, that works as well. What do we see on the PostgreSQL instances? On the master:

(postgres@[local]:5432) [postgres] > select datname,client_addr,client_hostname from pg_stat_activity where client_addr is not null;
 datname  |  client_addr  | client_hostname 
----------+---------------+-----------------
 postgres | 192.168.22.36 | NULL
(1 row)

We see one connection from the first pgpool node. What do we see on the standby?

(postgres@[local]:5432) [postgres] > select datname,client_addr,client_hostname from pg_stat_activity where client_addr is not null;
 datname  |  client_addr  | client_hostname 
----------+---------------+-----------------
 postgres | 192.168.22.36 | NULL
(1 row)

One connection as well. Looks good.

When you connect the PostgreSQL instances though pgpool there is a sql like syntax for displaying pgpool stuff as well:

postgres=# show pool_nodes;
 node_id |   hostname    | port | status | lb_weight |  role   | select_cnt | load_balance_node | replicati
on_delay 
---------+---------------+------+--------+-----------+---------+------------+-------------------+----------
---------
 0       | 192.168.22.34 | 5432 | up     | 0.500000  | primary | 1          | true              | 0
 1       | 192.168.22.35 | 5432 | up     | 0.500000  | standby | 0          | false             | 0
(2 rows)

To summarize: We now have a pgpool instance running on two nodes. Only one of these nodes hosts the VIP and the VIP switches to the other host in case there is an issue. Client connections from now on can go the VIP and pgpool will redirect the connection to one of the PostgreSQL nodes (depending if it is a write or a pure read operation).

In the next post we’ll dig deeper into the pgpool configuration, how you can tell on which instance you actually landed and how we can instruct pgpool to automatically promote a new master, dsiconnect the old master and the rebuild the old master as a new standby that follows the new master.

 

Cet article Vertically scale your PostgreSQL infrastructure with pgpool – 1 – Basic setup and watchdog configuration est apparu en premier sur Blog dbi services.

Apache JMeter and Cross-Site Request Forgery (CSRF) token management

Thu, 2017-03-16 08:45

Introduction

In Nowadays web technologies a common defensive mechanism against Cross-Site Request Forgery (CSRF) attacks is to use a synchronizer token. This token might be unique for each request and thus it blocks us from using the recorded JMeter test session off the shelf.

This blog will describe how this CSRF feature can be handled in JMeter.

How to implement this feature

The solution is to identify and extract the CSRF token from the response data or header depending how is it has been set.
The site I was doing the Load test using JMeter is using a cookie to set the CSRF Token and adds a X-CSRFToken header to the following HTTP requests.

The HTTP Response header contains something like:

Set-Cookie: csrftoken=sTrKh7qgnuKtuNTkbwlyCv45W2sqOaiY; expires=Sun, 21-Jan-2017 11:34:43 GMT; Max-Age=31449600; Path=/

To extract the CSRF token value from the HTTP Response header, add a Regular Expression Extractor Post Processor globally.
This way if the token value is reset to a new value somehow, it will be dynamically updated in the following response.

Now configure it as follows:

Apply to: Main sample only
Field to check: Response Headers
Reference Name: CSRF_TOKEN
Regular Expression: Set-Cookie: csrftoken=(.+?);
Template: $1$

Get the Response Cookie via the Regular Expression Extractor

DynCSRF_Regular_Expression

It is always better to have a user variable attached to the extracted value to be kept during the complete load test run.
select user defined variables and add a new variable with the same name as the reference name declared above in the regular expression Extractor.

DynCSRF_variable

The next step is to analyse each HTTP Request recorded in the scenario to replace the hard coded value for the X_CSRFToken header with the variable set by the Post Processor as shown below:

DynCSRF_HTTP_Header

To avoid having to check every request HTTP Header Manager as displayed above which can take some time and might introduce errors, a pre-processor can be used that checks the headers
and replace automatically the X_CSRFToekn hard coded value with the variable set by the post processor task. This kind of pre-processor can be time consuming and should be as simplest as possible. Thus I decided to not check if the X_CSRFToken exist in the request header and just call the remove header attribute and add the X_CSRFToken one to all requests. This worked fine for the site I was working on.

The pre-processor code used was the following:

import org.apache.jmeter.protocol.http.control.Header;

sampler.getHeaderManager().removeHeaderNamed("X-CSRFToken");
newValue=vars.get("csrfToken");
sampler.getHeaderManager().add(new Header("X-CSRFToken",newValue));

DynCSRF_BeasnShell

 

Cet article Apache JMeter and Cross-Site Request Forgery (CSRF) token management est apparu en premier sur Blog dbi services.

10th CH Exadata Community Meeting

Thu, 2017-03-16 04:55

Yesterday (15th of March), the 10th Switzerland Exadata Community Meeting took place. It was not a meeting only about Exadata, but also about the Oracle Database Appliances which became more and more popular. It is interesting, how many companies in Switzerland already use Exadata’s or ODA’s.

A big topic was the Exadata 12.2.1.1.0 software release which includes over 30 unique software features and enhancements, like better analytics, better transaction processing, better consolidation, more secure and faster and more robust upgrades, to mention just a few.

Attached are the slides from Gurmeet Goindi, the Master Product Manager for Exadata who presented the session about the Exadata 12.2 software.

http://www.oracle.com/technetwork/database/exadata/exadatasoftware-12211-3521134.pdf

Like in the Exadata Community Meetings before, patching is still one of the hot topics. Oracle puts a lot of effort to make it fast and more robust. Starting with 12.2, parallel firmware upgrades across components such as hard disks, flash, ILOM/BIOS, InfiniBand card are supported which might lead to 5x speed up in storage server software updates. Besides that, the number of reboots for software updates where reduced by using kexec where possible.
Making Exadata patching faster and more robust is in the great interest of Oracle, because Oracle themselves are the biggest Exadata customer in the world. ;-)

Another very interesting session from the Accenture Enkitec Group was about the attractiveness of the Oracle Cloud regarding performance and also pricing is some situations.

https://www.accenture.com/t20161013T060358__w__/us-en/_acnmedia/PDF-34/Accenture-Oracle-Cloud-Performance-Test-October2016.pdf

I am very looking forward to the next Exadata Community Meeting in Switzerland.

Cheers,
William

 

Cet article 10th CH Exadata Community Meeting est apparu en premier sur Blog dbi services.

Postgres Barman and DMK

Tue, 2017-03-14 10:21

As PostgreSQL is more and more present in our client’s infrastructure, I wanted to describe you the barman installation and configuration. Barman is the backup and recovery tool for PostgreSQL, I configured it using DMK out tool for infrastructure administrators on Oracle, MySQL, and PostgreSQL.

I used two virtual severs running under RedHat Enterprise Libux 7.1, one for PostgreSQL database server (pg1) ands the second for barman (pg2).

At first I install PostgreSQL 9.6 on both servers:

[root@pg1 ~]# wget https://download.postgresql.org/pub/repos/yum/9.6/redhat/
rhel-7-x86_64/pgdg-redhat96-9.6-3.noarch.rpm
--2017-02-06 15:08:05--  https://download.postgresql.org/pub/repos/yum/9.6/redhat
/rhel-7-x86_64/pgdg-redhat96-9.6-3.noarch.rpm
Resolving download.postgresql.org (download.postgresql.org)... 
217.196.149.55, 174.143.35.246, 87.238.57.227, ...
Connecting to download.postgresql.org (download.postgresql.org)|
217.196.149.55|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4816 (4.7K) [application/x-redhat-package-manager]
Saving to: âpgdg-redhat96-9.6-3.noarch.rpm
 
100%[======================================>] 4,816       
 
2017-02-06 15:08:05 (2.71 MB/s) - pgdg-redhat96-9.6-3.noarch.rpm saved 
 
[root@pg1 ~]# sudo yum localinstall -y pgdg-redhat96-9.6-3.noarch.rpm
Examining pgdg-redhat96-9.6-3.noarch.rpm: pgdg-redhat96-9.6-3.noarch
Marking pgdg-redhat96-9.6-3.noarch.rpm to be installed
Resolving Dependencies
--> Running transaction check
---> Package pgdg-redhat96.noarch 0:9.6-3 will be installed
--> Finished Dependency Resolution
 
Dependencies Resolved
 
================================================================================
 Package           Arch       Version     Repository                       Size
================================================================================
Installing:
 pgdg-redhat96     noarch     9.6-3       /pgdg-redhat96-9.6-3.noarch     2.7 k
 
Transaction Summary
================================================================================
Install  1 Package
 
Total size: 2.7 k
Installed size: 2.7 k
Downloading packages:
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
  Installing : pgdg-redhat96-9.6-3.noarch                                   1/1
  Verifying  : pgdg-redhat96-9.6-3.noarch                                   1/1
 
Installed:
  pgdg-redhat96.noarch 0:9.6-3
 
Complete!

I install barman on the barman server (pg2):

[root@pg2 ~]# sudo yum install barman
pgdg96                                                   | 4.1 kB     00:00
(1/2): pgdg96/7Server/x86_64/group_gz                      |  249 B   00:00
(2/2): pgdg96/7Server/x86_64/primary_db                    | 129 kB   00:02
Resolving Dependencies
--> Running transaction check
---> Package barman.noarch 0:2.1-1.rhel7 will be installed
--> Processing Dependency: python-psycopg2 >= 2.4.2 for package:
barman-2.1-1.rhel7.noarch
--> Processing Dependency: python-argh >= 0.21.2 for package: 
barman-2.1-1.rhel7.noarch
--> Processing Dependency: python-dateutil for package: 
barman-2.1-1.rhel7.noarch
--> Processing Dependency: python-argcomplete for package: 
barman-2.1-1.rhel7.noarch
--> Running transaction check
---> Package python-argcomplete.noarch 0:0.3.7-1.rhel7 will be installed
---> Package python-argh.noarch 0:0.23.0-1.rhel7 will be installed
---> Package python-dateutil.noarch 1:2.5.3-3.rhel7 will be installed
--> Processing Dependency: python-six for package: 1:
python-dateutil-2.5.3-3.rhel7.noarch
---> Package python-psycopg2.x86_64 0:2.6.2-3.rhel7 will be installed
--> Processing Dependency: postgresql96-libs for package: 
python-psycopg2-2.6.2-3.rhel7.x86_64
--> Running transaction check
---> Package postgresql96-libs.x86_64 0:9.6.1-1PGDG.rhel7 will be installed
---> Package python-six.noarch 0:1.9.0-2.el7 will be installed
--> Finished Dependency Resolution
 
Dependencies Resolved
 
================================================================================
 Package                Arch       Version                 Repository      Size
================================================================================
Installing:
 barman                 noarch     2.1-1.rhel7             pgdg96         248 k
Installing for dependencies:
 postgresql96-libs      x86_64     9.6.1-1PGDG.rhel7       pgdg96         308 k
 python-argcomplete     noarch     0.3.7-1.rhel7           pgdg96          23 k
 python-argh            noarch     0.23.0-1.rhel7          pgdg96          33 k
 python-dateutil        noarch     1:2.5.3-3.rhel7         pgdg96         241 k
 python-psycopg2        x86_64     2.6.2-3.rhel7           pgdg96         131 k
 python-six             noarch     1.9.0-2.el7             ol7_latest      28 k
 
Transaction Summary
================================================================================
Install  1 Package (+6 Dependent packages)
 
Total download size: 1.0 M
Installed size: 3.6 M
Is this ok [y/d/N]: y
Downloading packages:
(1/7): barman-2.1-1.rhel7.noarch.rpm                       | 248 kB   00:03
(2/7): python-argcomplete-0.3.7-1.rhel7.noarch.rpm         |  23 kB   00:00
(3/7): python-argh-0.23.0-1.rhel7.noarch.rpm               |  33 kB   00:00
(4/7): postgresql96-libs-9.6.1-1PGDG.rhel7.x86_64.rpm      | 308 kB   00:04
(5/7): python-six-1.9.0-2.el7.noarch.rpm                   |  28 kB   00:00
(6/7): python-dateutil-2.5.3-3.rhel7.noarch.rpm            | 241 kB   00:01
(7/7): python-psycopg2-2.6.2-3.rhel7.x86_64.rpm            | 131 kB   00:01
--------------------------------------------------------------------------------
Total                                              163 kB/s | 1.0 MB  00:06
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
  Installing : python-argh-0.23.0-1.rhel7.noarch                            1/7
  Installing : postgresql96-libs-9.6.1-1PGDG.rhel7.x86_64                   2/7
  Installing : python-psycopg2-2.6.2-3.rhel7.x86_64                         3/7
  Installing : python-argcomplete-0.3.7-1.rhel7.noarch                      4/7
  Installing : python-six-1.9.0-2.el7.noarch                                5/7
  Installing : 1:python-dateutil-2.5.3-3.rhel7.noarch                       6/7
  Installing : barman-2.1-1.rhel7.noarch                                    7/7
  Verifying  : python-psycopg2-2.6.2-3.rhel7.x86_64                         1/7
  Verifying  : python-six-1.9.0-2.el7.noarch                                2/7
  Verifying  : python-argcomplete-0.3.7-1.rhel7.noarch                      3/7
  Verifying  : postgresql96-libs-9.6.1-1PGDG.rhel7.x86_64                   4/7
  Verifying  : python-argh-0.23.0-1.rhel7.noarch                            5/7
  Verifying  : barman-2.1-1.rhel7.noarch                                    6/7
  Verifying  : 1:python-dateutil-2.5.3-3.rhel7.noarch                       7/7
 
Installed:
  barman.noarch 0:2.1-1.rhel7
 
Dependency Installed:
  postgresql96-libs.x86_64 0:9.6.1-1PGDG.rhel7
  python-argcomplete.noarch 0:0.3.7-1.rhel7
  python-argh.noarch 0:0.23.0-1.rhel7
  python-dateutil.noarch 1:2.5.3-3.rhel7
  python-psycopg2.x86_64 0:2.6.2-3.rhel7
  python-six.noarch 0:1.9.0-2.el7
Complete!

Everything is installed on both servers :

– PostgreSQL 9.6

– DMK last version

– barman

Now we configure as follows:

The barman server is pg2 : 192.168.1.101

The database server is pg1 : 192.168.1.100

 

On the database server, we create a barman user:

postgres@:5432) [postgres] > create user barman superuser login encrypted password 
'barman';
CREATE ROLE

And a barman_streaming user:

postgres@: [postgres] > create user barman_streaming replication encrypted password 
'barman';
CREATE ROLE

We modify the following parameters max_replication_slots (which specifies the maximum number of replication slots the server can support), and max_wal_senders (specifies the maximum number of simultaneously running wal sender processes):

postgres@:5432) [postgres] > alter system set max_replication_slots=10;
ALTER SYSTEM
postgres@:5432) [postgres] > alter system set max_wal_senders=10;
ALTER SYSTEM

As those previous parameters have been modified, we need to restart the database, we use pgrestart which is a DMK alias for pg_ctl -D ${PGDATA} restart -m fast

postgres@pg1:/home/postgres/ [PG1] pgrestart
waiting for server to shut down.... done
server stopped
server starting
postgres@pg1:/home/postgres/ [PG1] 2017-02-06 15:59:14.756 CET - 1 - 17008 -  
- @ LOG:  redirecting log output to logging collector process
2017-02-06 15:59:14.756 CET - 2 - 17008 -  - 
@ HINT:  Future log output will appear in directory 
"/u01/app/postgres/admin/PG1/pg_log".

We modify the pg_hba.conf on the barman server in order to allow connections from the barman server to the database server as follows:

host    all             barman          192.168.1.101/24       md5
host    replication     barman_streaming 192.168.1.101/24      md5

We modify the .pgpass file on the barman server in order not to be asked for passwords:

postgres@pg2:/home/postgres/ [pg96] cat .pgpass
*:*:*:postgres:postgres
192.168.1.100:*:*:barman:barman
192.168.1.100:*:*:barman_streaming:barman

Finally we test the connection from the barman server to the database server:

postgres@pg2:/home/postgres/ [pg96] psql -c 'select version()'
 -U barman -h 192.168.1.100 -p 5432 postgres
                                                 version
 
--------------------------------------------------------------------------------

 PostgreSQL 9.6.1 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (
Red Hat 4.8.5-11), 64-bit
(1 row)
postgres@pg2:/home/postgres/ [pg96] psql -U barman_streaming -h 192.168.1.100 
-p 5432 -c "IDENTIFY_SYSTEM" replication=1
      systemid       | timeline |  xlogpos  | dbname
---------------------+----------+-----------+--------
 6384063115439945376 |        1 | 0/F0006F0 |
(1 row)

Now it’s time to create a configuration file pg96.conf in $DMK_HOME/etc/barman.d in the barman server:

[pg96]
description =  "PostgreSQL 9.6 server"
conninfo = host=192.168.1.100 port=5432 user=barman dbname=postgres
backup_method = postgres
streaming_conninfo = host=192.168.1.100 port=5432 user=barman_streaming 
dbname=postgres
streaming_wals_directory = /u99/received_wal
streaming_archiver = on
slot_name = barman

We create a barman.conf file in $DMK_HOME/etc as follows, mainly defining the barman_user, the configuration file directory and the barman backup home, the barman lock directory and the log directory:

postgres@pg2:/u01/app/postgres/local/dmk/etc/ [pg96] cat barman.conf
; Barman, Backup and Recovery Manager for PostgreSQL
; http://www.pgbarman.org/ - http://www.2ndQuadrant.com/
;
; Main configuration file
 
[barman]
; System user
barman_user = postgres
 
; Directory of configuration files. Place your sections in separate files 
with .conf extension
; For example place the 'main' server section in /etc/barman.d/main.conf
configuration_files_directory = /u01/app/postgres/local/dmk/etc/barman.d
 
; Main directory
barman_home = /u99/backup
 
; Locks directory - default: %(barman_home)s
barman_lock_directory = /u01/app/postgres/local/dmk/etc/
 
; Log location
log_file = /u01/app/postgres/local/dmk/log/barman.log
 
; Log level (see https://docs.python.org/3/library/logging.html#levels)
log_level = DEBUG
 
; Default compression level: possible values are None (default), 
bzip2, gzip, pigz, pygzip or pybzip2
compression = gzip
 
; Pre/post backup hook scripts
;pre_backup_script = env | grep ^BARMAN
;pre_backup_retry_script = env | grep ^BARMAN
;post_backup_retry_script = env | grep ^BARMAN
;post_backup_script = env | grep ^BARMAN
 
; Pre/post archive hook scripts
;pre_archive_script = env | grep ^BARMAN
;pre_archive_retry_script = env | grep ^BARMAN
;post_archive_retry_script = env | grep ^BARMAN
;post_archive_script = env | grep ^BARMAN
 
; Global retention policy (REDUNDANCY or RECOVERY WINDOW) - default empty
retention_policy = RECOVERY WINDOW OF 4 WEEKS
 
; Global bandwidth limit in KBPS - default 0 (meaning no limit)
;bandwidth_limit = 4000
 
; Immediate checkpoint for backup command - default false
;immediate_checkpoint = false
 
; Enable network compression for data transfers - default false
;network_compression = false
 
; Number of retries of data copy during base backup after an error - default 0
;basebackup_retry_times = 0
 
; Number of seconds of wait after a failed copy, before retrying - default 30
;basebackup_retry_sleep = 30
 
; Maximum execution time, in seconds, per server
; for a barman check command - default 30
;check_timeout = 30
 
; Time frame that must contain the latest backup date.
; If the latest backup is older than the time frame, barman check
; command will report an error to the user.
; If empty, the latest backup is always considered valid.
; Syntax for this option is: "i (DAYS | WEEKS | MONTHS)" where i is an
; integer > 0 which identifies the number of days | weeks | months of
; validity of the latest backup for this check. Also known as 'smelly backup'.
;last_backup_maximum_age =
 
; Minimum number of required backups (redundancy)
;minimum_redundancy = 1

 

In order to enable streaming of transaction logs and to use replication slots, we run the following command on the barman server:

postgres@pg2:/u01/app/postgres/local/dmk/etc/ [pg96] barman receive-wal 
--create-slot pg96
Creating physical replication slot 'barman' on server 'pg96'
Replication slot 'barman' created

Then we can test:

We can force a log switch on the database server:

postgres@pg2:/u01/app/postgres/local/dmk/etc/ [pg96] barman switch-xlog 
--force pg96
The xlog file 00000001000000000000000F has been closed on server 'pg96'

 

We start receive wal:

postgres@pg2:/u99/received_wal/ [pg96] barman -c 
/u01/app/postgres/local/dmk/etc/barman.conf receive-wal pg96
Starting receive-wal for server pg96
pg96: pg_receivexlog: starting log streaming at 0/68000000 (timeline 3)
pg96: pg_receivexlog: finished segment at 0/69000000 (timeline 3)
pg96: pg_receivexlog: finished segment at 0/6A000000 (timeline 3)
pg96: pg_receivexlog: finished segment at 0/6B000000 (timeline 3)
pg96: pg_receivexlog: finished segment at 0/6C000000 (timeline 3)

 

We can check the barman configuration:

postgres@pg2:/u99/restore_test/ [pg96] barman check pg96
Server pg96:
                    PostgreSQL: OK
                    superuser: OK
                    PostgreSQL streaming: OK
                    wal_level: OK
                    replication slot: OK
                    directories: OK
                    retention policy settings: OK
                    backup maximum age: OK (no last_backup_maximum_age provided)
                    compression settings: OK
                    failed backups: FAILED (there are 1 failed backups)
                    minimum redundancy requirements: OK (have 3 backups, 
                    expected at least 0)
                    pg_basebackup: OK
                    pg_basebackup compatible: OK
                    pg_basebackup supports tablespaces mapping: OK
                    pg_receivexlog: OK
                    pg_receivexlog compatible: OK
                    receive-wal running: OK
                    archiver errors: OK

We can run a barman archive-wal command:

postgres@pg2:/home/postgres/ [pg96] barman archive-wal pg96
Processing xlog segments from streaming for pg96
                    00000003.history
                    000000030000000000000067
                    000000030000000000000068

And finally you can run a backup with the command:

postgres@pg2:/home/postgres/ [pg96] barman backup pg96
Starting backup using postgres method for server pg96 in 
/u99/backup/pg96/base/20170214T103226
Backup start at xlog location: 0/69000060 (000000030000000000000069, 00000060)
Copying files.
Copy done.
Finalising the backup.
Backup size: 60.1 MiB
Backup end at xlog location: 0/6B000000 (00000003000000000000006A, 00000000)
Backup completed
Processing xlog segments from streaming for pg96
                    000000030000000000000069

We can list the backups :

postgres@pg2:/u02/pgdata/ [pg96] barman list-backup pg96
pg96 20170214T103226 - Tue Feb 14 09:32:27 2017 - Size: 60.2 MiB - WAL Size: 0 B 
(tablespaces: tab1:/u02/pgdata/PG1/mytab)
pg96 20170207T061338 - Tue Feb  7 06:19:38 2017 - Size: 29.0 MiB - WAL Size: 0 B
pg96 20170207T060633 - Tue Feb  7 06:12:33 2017 - Size: 29.0 MiB - WAL Size: 0 B

 

We have the possibility to test a restore for example on the barman server :

postgres@pg2:/u02/pgdata/ [pg96] barman recover pg96 20170214T103226 
/u99/restore_test/
Starting local restore for server pg96 using backup 20170214T103226
Destination directory: /u99/restore_test/
                    24648, tab1, /u02/pgdata/PG1/mytab
Copying the base backup.
Copying required WAL segments.
Generating archive status files
Identify dangerous settings in destination directory.
 
IMPORTANT
These settings have been modified to prevent data losses
 
postgresql.conf line 71: archive_command = false
postgresql.auto.conf line 4: archive_command = false

Your PostgreSQL server has been successfully prepared for recovery, the /u99/test_restore directory contains:

postgres@pg2:/u99/restore_test/ [pg96] ll

total 64
-rw-------  1 postgres postgres  208 Feb 14 10:32 backup_label
-rw-------  1 postgres postgres  207 Feb 14 10:32 backup_label.old
drwx------ 10 postgres postgres   98 Feb 14 10:32 base
drwx------  2 postgres postgres 4096 Feb 14 10:32 global
drwx------  2 postgres postgres    6 Feb 14 10:32 mytab
drwx------  2 postgres postgres   17 Feb 14 10:32 pg_clog
drwx------  2 postgres postgres    6 Feb 14 10:32 pg_commit_ts
drwx------  2 postgres postgres    6 Feb 14 10:32 pg_dynshmem
-rw-------  1 postgres postgres 4416 Feb 14 10:32 pg_hba.conf
-rw-------  1 postgres postgres 4211 Feb 14 10:32 pg_hba.conf_conf
-rw-------  1 postgres postgres 1636 Feb 14 10:32 pg_ident.conf
drwx------  4 postgres postgres   65 Feb 14 10:32 pg_logical
drwx------  4 postgres postgres   34 Feb 14 10:32 pg_multixact
drwx------  2 postgres postgres   17 Feb 14 10:32 pg_notify
drwx------  2 postgres postgres    6 Feb 14 10:32 pg_replslot
drwx------  2 postgres postgres    6 Feb 14 10:32 pg_serial
drwx------  2 postgres postgres    6 Feb 14 10:32 pg_snapshots
drwx------  2 postgres postgres    6 Feb 14 10:32 pg_stat
drwx------  2 postgres postgres    6 Feb 14 10:32 pg_stat_tmp
drwx------  2 postgres postgres   17 Feb 14 10:32 pg_subtrans
drwx------  2 postgres postgres   18 Feb 14 10:32 pg_tblspc
drwx------  2 postgres postgres    6 Feb 14 10:32 pg_twophase
-rw-------  1 postgres postgres    4 Feb 14 10:32 PG_VERSION
drwx------  3 postgres postgres   81 Feb 14 10:39 pg_xlog
-rw-------  1 postgres postgres  391 Feb 14 10:39 postgresql.auto.conf
-rw-------  1 postgres postgres  358 Feb 14 10:32 postgresql.auto.conf.origin
-rw-------  1 postgres postgres 7144 Feb 14 10:39 postgresql.conf
-rw-------  1 postgres postgres 7111 Feb 14 10:32 postgresql.conf.origin
-rw-------  1 postgres postgres   56 Feb 14 10:32 recovery.done

If you need to  restore your backup on the pg1 original database server, you have to use the –remote-ssh-command as follows (you specify the hostname where you want restore, and the PGDATA directory)

postgres@pg2:/home/postgres/.ssh/ [pg96] barman recover --remote-ssh-command "ssh postgres@pg1" pg96 20170214T103226 /u02/pgdata/PG1
Starting remote restore for server pg96 using backup 20170214T103226
Destination directory: /u02/pgdata/PG1
       24648, tab1, /u02/pgdata/PG1/mytab
Copying the base backup.
Copying required WAL segments.
Generating archive status files
Identify dangerous settings in destination directory.
 
IMPORTANT
These settings have been modified to prevent data losses
 
postgresql.conf line 71: archive_command = false
postgresql.auto.conf line 4: archive_command = false
 
Your PostgreSQL server has been successfully prepared for recovery!

You also have the possibility to realise a point in time recovery.

In my PG1 database I create a table employes and insert some data :

postgres@[local]:5432) [blubb] > create table employes (name varchar(10));
CREATE TABLE
(postgres@[local]:5432) [blubb] > insert into employes values ('fiona');
INSERT 0 1
(postgres@[local]:5432) [blubb] > insert into employes values ('cathy');
INSERT 0 1
(postgres@[local]:5432) [blubb] > insert into employes values ('helene');
INSERT 0 1
(postgres@[local]:5432) [blubb] > select * from employes;
  name  
--------
 fiona
 cathy
 helene

A few minutes later I insert some more records in the employes table:

postgres@[local]:5432) [blubb] > insert into employes values ('larry');
INSERT 0 1
(postgres@[local]:5432) [blubb] > insert into employes values ('bill');
INSERT 0 1
(postgres@[local]:5432) [blubb] > insert into employes values ('steve');
INSERT 0 1
(postgres@[local]:5432) [blubb] > select * from employes;
  name  
--------
 fiona
 cathy
 helene
 larry
 bill
 steve

The first data were create at 15:15, let’s see if the pitr barman restore works correctly:

I stop the PG1 database :

postgres@pg1:/u02/pgdata/ [PG1] pgstop
waiting for server to shut down....... done
server stopped

I delete the PGDATA directory:

postgres@pg1:/u02/pgdata/ [PG1] rm -rf PG1

And from the barman server I run the pitr recovery command using the –target-time argument:

postgres@pg2:/home/postgres/ [pg96] barman recover --remote-ssh-command "ssh postgres@pg1" pg96 
--target-time "2017-02-14 15:15:48"  20170214T141055 /u02/pgdata/PG1 
Starting remote restore for server pg96 using backup 20170214T141055
Destination directory: /u02/pgdata/PG1
Doing PITR. Recovery target time: '2017-02-14 15:15:48'
       24648, tab1, /u02/pgdata/PG1/mytab
Copying the base backup.
Copying required WAL segments.
Generating recovery.conf
Identify dangerous settings in destination directory.
 
IMPORTANT
These settings have been modified to prevent data losses
 
postgresql.conf line 72: archive_command = false
postgresql.auto.conf line 4: archive_command = false
 
Your PostgreSQL server has been successfully prepared for recovery!

I restart my PG1 database the data are correctly restored, just before the Larry, Bill and Steve insertion into the employes tables

postgres@[local]:5432) [blubb] > select * from employes;
  name  
--------
 fiona
 cathy
 helene
(3 rows)

 

 

Cet article Postgres Barman and DMK est apparu en premier sur Blog dbi services.

Oracle 12.2 and Transparent Data Encryption

Tue, 2017-03-14 10:20

Since the new Oracle 12.2.0 version is released, I decided to test the Transparent Data Encryption as new features are available. The following tests have been made in a multitenant environment, DB1 and two pluggable databases DB1PDB1 and DB1PDB2.

The first step consists in creating a software keystore. A software keystore is a container that stores the Transparent Data Encryption key. We define its location in the sqlnet.ora file if we need to use it for a software keystore location:

In the sqlnet.ora file, we have to define the ENCRYPTION_WALLET_LOCATION parameter:

ENCRYPTION_WALLET_LOCATION=
 (SOURCE=
  (METHOD=FILE)
   (METHOD_DATA=
    (DIRECTORY=/u00/app/oracle/local/wallet)))

We can verify in the view:

SQL> select * from v$encryption_wallet;

WRL_TYPE  WRL_PARAMETER                  STATUS	WALLET_TYPE	WALLET_OR   FULLY_BAC      CON_ID

FILE    /u00/app/oracle/local/wallet/     NOT_AVAILABLE		UNKNOWN      SINGLE       UNDEFINED

Then we create the software keystore using sqlplus. We must be connected with a user with the ADMINISTER KEY MANAGEMENT or SYSKM privilege:

SQL> connect c##sec_admin as syskm
Enter password: 
Connected.

SQL> administer key management create keystore '/u00/app/oracle/local/wallet' identified by manager; 

keystore altered.

Once the keystore is created the ewallet.p12 is generated in the keystore file location:

oracle@localhost:/u00/app/oracle/local/wallet/ [db1] ls
afiedt.buf  ewallet.p12

Therefore, depending of the type of the keystore we have created, we must manually open the keystore. We can check in the v$encryption_wallet view to see if the keystore is opened.

If not you have to run the following command:

oracle@localhost:/u00/app/oracle/local/wallet/ [db1] sqlplus c##sec_admin as syskm

SQL*Plus: Release 12.2.0.1.0 Production on Mon Mar 13 11:59:47 2017

Copyright (c) 1982, 2016, Oracle.  All rights reserved.

Enter password: 

Connected to:
Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production

SQL> administer key management set keystore open identified by manager container = ALL;

keystore altered.

If we ask the view:

SQL> select * from v$encryption_wallet;

WRL_TYPE.    WRL_PARAMETER                STATUS             WALLET_TYPE  WALLET_OR   FULLY_BAC   CON_ID

FILE     /u00/app/oracle/local/wallet/  OPEN_NO_MASTER_KEY    PASSWORD 	    SINGLE    UNDEFINED

Now we must set the Software TDE master encryption key, once the keystore is open, as we are in a multitenant environment, we have to specify CONTAINER=ALL in order to set the keystone in all the PDBs:

SQL> administer key management set keystore close identified by manager;

keystore altered.

SQL> administer key management set keystore open identified by manager  container =all;

keystore altered.

SQL> administer key management set key identified by manager with backup using 'kex_backup' container =ALL;

keystore altered.

Now the v$encryption_wallet view is up to date:

SQL> select * from v$encryption_wallet;

WRL_TYPE   WRL_PARAMETER.               STATUS  WALLET_TYPE	    WALLET_OR FULLY_BAC   CON_ID

FILE.   /u00/app/oracle/local/wallet/.   OPEN	 PASSWORD 	    SINGLE      NO          1

When you startup your CDB and your PDBs, you must do things in a good way:

You shutdown and startup the database

oracle@localhost:/u00/app/oracle/admin/db1/ [db1] sq

SQL*Plus: Release 12.2.0.1.0 Production on Tue Mar 14 13:53:09 2017

Copyright (c) 1982, 2016, Oracle.  All rights reserved.


Connected to:
Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production

SQL> shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> startup
ORACLE instance started.

Total System Global Area 3405774848 bytes
Fixed Size		    8798456 bytes
Variable Size		  805310216 bytes
Database Buffers	 2583691264 bytes
Redo Buffers		    7974912 bytes
Database mounted.
Database opened.

You open the wallet:

SQL> administer key management set keystore open identified by manager container = all;

keystore altered.

The pluggable databases are not yet opened:

SQL> connect sys/manager@db1pdb1
ERROR:
ORA-01033: ORACLE initialization or shutdown in progress
Process ID: 0
Session ID: 0 Serial number: 0


Warning: You are no longer connected to ORACLE.

You start the pluggable databases:

SQL> connect / as sysdba
Connected.
SQL> alter pluggable database all open;

Pluggable database altered.

The wallet is closed on the pluggable databases:

SQL> connect sys/manager@db1pdb1 as sysdba
Connected.
SQL> select status from v$encryption_wallet;

STATUS
------------------------------
CLOSED

You first have to close the wallet then to open it again:

SQL> connect / as sysdba
Connected.
SQL> administer key management set keystore open identified by manager container = all;
administer key management set keystore open identified by manager container = all
*
ERROR at line 1:
ORA-28354: Encryption wallet, auto login wallet, or HSM is already open


SQL> administer key management set keystore close identified by manager;

keystore altered.

SQL> administer key management set keystore open identified by manager container = all;

keystore altered.

The wallet is opened on every pluggable database:

SQL> connect sys/manager@db1pdb1 as sysdba
Connected.
SQL> select status from v$encryption_wallet;

STATUS
------------------------------
OPEN

SQL> connect sys/manager@db1pdb2 as sysdba
Connected.
SQL> select status from v$encryption_wallet;

STATUS
------------------------------
OPEN

Once the software keytore is set, you have the possibility now to encrypt your data.You have the possibility to encrypt columns in tables, or realise encryption in tablespaces or databases.

Concerning the columns in a table, you can encrypt many data types, Oracle recommend not to use TDE in case of transportable tablespace, or columns used in foreign keys constraints. The TDE default algorithm used is AES192.

Let’s create the classical empire table and insert some values:

SQL> create table emp1 (name varchar2(30), salary number(7) encrypt);

Table created.


SQL> insert into emp1 values ('Larry', 1000000);

1 row created.

SQL> select * from emp1;

NAME				   SALARY
------------------------------ ----------
Larry				  1000000

If now we close the keystore, the data are not viewable anymore:

SQL> administer key management set keystore close identified by manager container = all;

keystore altered.

SQL> connect psi/psi@db1pdb1
Connected.
SQL> select * from emp1;
select * from emp1
*
ERROR at line 1:
ORA-28365: wallet is not open


SQL> select name from emp1;

NAME
------------------------------
Larry

SQL> select name, salary from emp1;
select name, salary from emp1
                         *
ERROR at line 1:
ORA-28365: wallet is not open

We can also use non default algorithms as 3DES168, AES128, AES256, for example:

SQL> create table emp2 (
  2  name varchar2(30),
  3  salary number(7) encrypt using 'AES256');

Table created.

If your table has a high number of rows and encrypted columns, you have the possibility to use the NOMAC parameter to bypass the TDE checks and to save some disk space:

SQL> create table emp3 (
  2  name varchar2(30),
  3  salary number (7) encrypt 'NOMAC');

Table created.

For existing tables, you can add encrypted columns with the ALTER table XXX add SQL statement, or you can encrypt an existing column with the alter table modify statement:

SQL> create table emp4 (name varchar2(30));

Table created.

SQL> alter table emp4 add (salary number (7) encrypt);

Table altered.

SQL> create table emp5 (
  2  name varchar2(30),
  3  salary number(7));

Table created.

SQL> alter table emp5 modify (salary encrypt);

Table altered.

Eventually, you can turn off the encryption for a table:

SQL> alter table emp5 modify (salary decrypt);

Table altered.

One of the main 12.2 new feature is the tablespace encryption. You have now the possibility to encrypt new and existing tablespace, you can also encrypt the database including the SYS SYSAUX TEMP and UNDO tablespaces in online mode.

For example, in the previous Oracle versions, you had the possibility to encrypt tablespace when they were in offline mode or the database in mount state, in 12.2 we can encrypt in online mode.

The encryption for the TEMP tablespace is the same as the Oracle previous releases, you cannot convert the TEMP tablespace, but you can create a new temporary encrypted tablespace and make it default temporary tablespace.

You can encrypt the UNDO tablespace, but Oracle recommends not to decrypt the tablespace once it has been encrypted.

At first the compatible parameter must be set to 11.2.0 when encrypting tablespaces, and at the 12.2.0.0 when encrypting SYS SYSAUX or UNDO tablespaces.

SQL> create tablespace PSI_ENCRYPT
  2  datafile '/u01/oradata/db1/db1pdb1/psi_encrypt.dbf' size 10M
  3  encryption using 'AES128' encrypt;

Tablespace created.

We have the possibility to realise Online conversion for existing tablespaces:

SQL> select file_name from dba_data_files where tablespace_name = 'PSI';

FILE_NAME
--------------------------------------------------------------------------------
/u01/oradata/db1/db1pdb1/psi.dbf

The compatible parameter is set to 12.2.0:

SQL> show parameter compatible

NAME				     TYPE	 VALUE
------------------------------------ ----------- ------------------------------
compatible			     string	 12.2.0

Now you have the possibility to encrypt the data file by using the following command, be sure that you have available free space:

SQL> alter tablespace PSI ENCRYPTION online using 'AES256' ENCRYPT FILE_NAME_CONVERT = ('psi.dbf', 'psi_encrypt.dbf');

Tablespace altered.
SQL> select file_name from dba_data_files where tablespace_name = 'PSI';

FILE_NAME
--------------------------------------------------------------------------------
/u01/oradata/db1/db1pdb1/psi_encrypt.dbf

You can also decrypt online a tablespace:

QL> alter tablespace PSI ENCRYPTION ONLINE DECRYPT FILE_NAME_CONVERT = ('psi_encrypt.dbf', 'psi.dbf');

Tablespace altered.

SQL> select file_name from dba_data_files where tablespace_name = 'PSI';

FILE_NAME
--------------------------------------------------------------------------------
/u01/oradata/db1/db1pdb1/psi.dbf

Therefore our PSI tablespace is not encrypted anymore, let’s create a non-encrypted table, insert some values in it, and perform an encryption on the tablespace, then close the wallet and see what happens:

SQL> select file_name from dba_data_files where tablespace_name = 'PSI';

FILE_NAME
--------------------------------------------------------------------------------
/u01/oradata/db1/db1pdb1/psi.dbf

SQL> connect psi/psi@db1pdb1
Connected.
SQL> create table emp (name varchar2(30), salary number(7));

Table created.

SQL> insert into emp values ('Larry', 1000000);

1 row created.

SQL> commit;

Commit complete.

SQL> select * from emp;

NAME				   SALARY
------------------------------ ----------
Larry				  1000000

SQL> select tablespace_name from user_tables where table_name = 'EMP';

TABLESPACE_NAME
------------------------------
PSI

SQL> alter tablespace PSI ENCRYPTION online using 'AES256' ENCRYPT FILE_NAME_CONVERT = ('psi.dbf', 'psi_encrypt.dbf');

Tablespace altered.

SQL> select * from emp;

NAME				   SALARY
------------------------------ ----------
Larry				  1000000

oracle@localhost:/u01/oradata/db1/ [db1] sqlplus / as sysdba

SQL*Plus: Release 12.2.0.1.0 Production on Mon Mar 13 16:11:18 2017

Copyright (c) 1982, 2016, Oracle.  All rights reserved.


Connected to:
Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production

SQL> administer key management set keystore close identified by manager container =all;

keystore altered.

SQL> connect psi/psi@db1pdb1
Connected.
SQL> select * from emp;
select * from emp
              *
ERROR at line 1:
ORA-28365: wallet is not open

It works fine, non encrypted tables in a tablespace are encrypted when the tablespace is encrypted.

When the tablespace is encrypted, the strings command gives no result:

oracle@localhost:/u01/oradata/db1/db1pdb1/ [db1] strings psi_encrypt.dbf | grep -i Larry
oracle@localhost:/u01/oradata/db1/db1pdb1/ [db1]

When we open the wallet and decrypt the tablespace, we can find information in the datafile:

oracle@localhost:/u01/oradata/db1/db1pdb1/ [db1] strings psi.dbf | grep Larry
Larry

Now in 12.2 Oracle version, you can convert online the entire database, i.e the SYSTEM SYSAUX TEMP and UNDO tablespace. The commands are the same as for a data tablespace as seen previously: always the same precautions have enough free space and the compatible parameter set to 12.2.0, just a little difference you cannot specify an encryption key:

For example let’s encrypt the SYSTEM tablespace:

SQL> alter tablespace SYSTEM ENCRYPTION ONLINE ENCRYPT FILE_NAME_CONVERT = ('system01.dbf','system01_encrypt.dbf');

Tablespace altered.

SQL> select file_name from dba_data_files where tablespace_name = 'SYSTEM';

FILE_NAME
--------------------------------------------------------------------------------
/u01/oradata/db1/db1pdb1/system01_encrypt.dbf

For the temporary tablespace, we have to drop the existing temporary tablespace , and create a new one encrypted as follows:

SQL> create temporary tablespace TEMP_ENCRYPT

2  tempfile ‘/u01/oradata/db1/db1pdb1/temp_encrypt.dbf’ size 100M

3  ENCRYPTION ENCRYPT;

Tablespace created.

SQL> alter database default temporary tablespace TEMP_ENCRYPT;

Database altered.

SQL> drop tablespace TEMP;

Tablespace dropped.

For the undo tablespace:

SQL> alter tablespace UNDOTBS1 ENCRYPTION ONLINE ENCRYPT FILE_NAME_CONVERT = (‘undotbs01.dbf’,’undotbs01_encrypt.dbf’);

Tablespace altered.

SQL> connect sys/manager@db1pdb1 as sysdba
Connected.
SQL> administer key management set keystore close identified by manager;
administer key management set keystore close identified by manager
*
ERROR at line 1:
ORA-28439: cannot close wallet when SYSTEM, SYSAUX, UNDO, or TEMP tablespaces
are encrypted

On the pluggable db1pdb2, as the tablespaces are not encrypted, the wallet can be closed:

SQL> connect sys/manager@db1pdb2 as sysdba
Connected.
SQL> administer key management set keystore close identified by manager;

keystore altered.

I also wanted to test the expel and impdp behaviour between pluggable databases, as we are in a multitenant environment, we have to ensure the wallet is opened in the PDBs

In order to export a table, you have to add the ENCRYPTION parameter and the ENCRYPTION_PWD_PROMPT parameter  for security reasons:

oracle@localhost:/home/oracle/ [DB1PDB1] expdp system@db1pdb1 tables=psi.emp directory=DATA_PUMP_DIR dumpfile=emp.dmp ENCRYPTION=ENCRYPTED_COLUMNS_ONLY ENCRYPTION_PWD_PROMPT=YES

Export: Release 12.2.0.1.0 - Production on Tue Mar 14 11:53:52 2017

Copyright (c) 1982, 2017, Oracle and/or its affiliates.  All rights reserved.
Password: 

Connected to: Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production

Encryption Password: 
Starting "SYSTEM"."SYS_EXPORT_TABLE_01":  system/********@db1pdb1 tables=psi.emp directory=DATA_PUMP_DIR dumpfile=emp.dmp ENCRYPTION=ENCRYPTED_COLUMNS_ONLY ENCRYPTION_PWD_PROMPT=YES 
Processing object type TABLE_EXPORT/TABLE/TABLE_DATA
Processing object type TABLE_EXPORT/TABLE/STATISTICS/TABLE_STATISTICS
Processing object type TABLE_EXPORT/TABLE/STATISTICS/MARKER
Processing object type TABLE_EXPORT/TABLE/TABLE
. . exported "PSI"."EMP"                                 5.523 KB       1 rows
Master table "SYSTEM"."SYS_EXPORT_TABLE_01" successfully loaded/unloaded
******************************************************************************
Dump file set for SYSTEM.SYS_EXPORT_TABLE_01 is:
  /u00/app/oracle/admin/db1/dpdump/4A3D428970DA5D68E055000000000001/emp.dmp
Job "SYSTEM"."SYS_EXPORT_TABLE_01" successfully completed at Tue Mar 14 11:54:16 2017 elapsed 0 00:00:21

In the same way if we want to import the emp table in the second pluggable database, the wallet must be opened , otherwise it will not work:

racle@localhost:/home/oracle/ [DB1PDB1] impdp system@db1pdb2 tables=psi.emp directory=DATA_PUMP_DIR dumpfile=emp.dmp ENCRYPTION_PWD_PROMPT=YES

Import: Release 12.2.0.1.0 - Production on Tue Mar 14 12:15:24 2017

Copyright (c) 1982, 2017, Oracle and/or its affiliates.  All rights reserved.
Password: 

Connected to: Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production

Encryption Password: 
ORA-39002: invalid operation
ORA-39180: unable to encrypt ENCRYPTION_PASSWORD
ORA-28365: wallet is not open

you open the wallet:
SQL> administer key management set keystore open identified by manager;

keystore altered.

The impdp command runs fine:

oracle@localhost:/home/oracle/ [DB1PDB1] impdp system@db1pdb2 tables=psi.emp directory=DATA_PUMP_DIR dumpfile=emp.dmp ENCRYPTION_PWD_PROMPT=YES

Import: Release 12.2.0.1.0 - Production on Tue Mar 14 12:21:47 2017

Copyright (c) 1982, 2017, Oracle and/or its affiliates.  All rights reserved.
Password: 

Connected to: Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production

Encryption Password: 
ORA-39175: Encryption password is not needed.
Master table "SYSTEM"."SYS_IMPORT_TABLE_01" successfully loaded/unloaded
Starting "SYSTEM"."SYS_IMPORT_TABLE_01":  system/********@db1pdb2 tables=psi.emp directory=DATA_PUMP_DIR dumpfile=emp.dmp ENCRYPTION_PWD_PROMPT=YES 
Processing object type TABLE_EXPORT/TABLE/TABLE
Processing object type TABLE_EXPORT/TABLE/TABLE_DATA
Processing object type TABLE_EXPORT/TABLE/STATISTICS/TABLE_STATISTICS
Processing object type TABLE_EXPORT/TABLE/STATISTICS/MARKER
Job "SYSTEM"."SYS_IMPORT_TABLE_01" completed with 1 error(s) at Tue Mar 14 12:21:55 2017 elapsed 0 00:00:05

But the generated dumpfile is not encrypted and you can find sensitive data in this file:

oracle@localhost:/u00/app/oracle/admin/db1/dpdump/ [db1] strings emp.dmp | grep -i Larry
Larry

Oracle offers a solution to encrypt the dump file, you can use the ENCRYPTION_MODE parameter set to TRANSPARENT or DUAL to realise your expdp command. By using TRANSPARENT, you do not need a password, the dump file is encrypted transparently, the keystone must be present and open on the target database. By specifying DUAL, you need a password and the dump file is encrypted using the TDE master key encryption.

oracle@localhost:/home/oracle/ [db1] expdp system@db1pdb1 tables=psi.emp directory=DATA_PUMP_DIR ENCRYPTION=ALL ENCRYPTION_PWD_PROMPT=YES ENCRYPTION_ALGORITHM=AES256 ENCRYPTION_MODE=DUAL dumpfile=emp_encrypt.dmp

Export: Release 12.2.0.1.0 - Production on Tue Mar 14 12:44:18 2017

Copyright (c) 1982, 2017, Oracle and/or its affiliates.  All rights reserved.
Password: 

Connected to: Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production

Encryption Password: 
Starting "SYSTEM"."SYS_EXPORT_TABLE_01":  system/********@db1pdb1 tables=psi.emp directory=DATA_PUMP_DIR ENCRYPTION=ALL ENCRYPTION_PWD_PROMPT=YES ENCRYPTION_ALGORITHM=AES256 ENCRYPTION_MODE=DUAL dumpfile=emp_encrypt.dmp 
Processing object type TABLE_EXPORT/TABLE/TABLE_DATA
Processing object type TABLE_EXPORT/TABLE/STATISTICS/TABLE_STATISTICS
Processing object type TABLE_EXPORT/TABLE/STATISTICS/MARKER
Processing object type TABLE_EXPORT/TABLE/TABLE
. . exported "PSI"."EMP"                                 5.531 KB       1 rows
Master table "SYSTEM"."SYS_EXPORT_TABLE_01" successfully loaded/unloaded
******************************************************************************
Dump file set for SYSTEM.SYS_EXPORT_TABLE_01 is:
  /u00/app/oracle/admin/db1/dpdump/4A3D428970DA5D68E055000000000001/emp_encrypt.dmp
Job "SYSTEM"."SYS_EXPORT_TABLE_01" successfully

And now we cannot retrieve sensitive data from the dump file:

oracle@localhost:/u00/app/oracle/admin/db1/dpdump/ [db1] strings emp_encrypt.dmp | grep -i Larry
oracle@localhost:/u00/app/oracle/admin/db1/dpdump/ [db1]

 

Conclusion:

Concerning the Transparent Data Encryption in the last 12.2.0.1 Oracle version, I will mainly retain the SYSTEM, SYSAUX, UNDO or TEMP encryption giving more security for sensitive data, but be careful even if this functionality is documented in the Oracle documentation, Oracle also writes:

“Do not attempt to encrypt database internal objects such as SYSTEM, SYSAUX, UNDO or TEMP tablespaces using TDE tablespace encryption. You should focus TDE tablespace encryption on tablespaces that hold application data, not on these core components of the Oracle database.”


 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 


 

Cet article Oracle 12.2 and Transparent Data Encryption est apparu en premier sur Blog dbi services.

SQL Server 2016: Does Dynamic Data Masking work with Temporal Table?

Tue, 2017-03-14 06:14

In the last IT Tagen 2016, I presented the Dynamic Data Masking (DDM) and how it worked.
To add a little fun, I applied the DDM to a temporal table to see if the history table inherit also from DDM’s rules.
In this blog, I explain all the different steps to reproduce my last demo.

Step 1: Create the table and the temporal table in the database DDM_TEST
USE [DDM_TEST]
GO

CREATE TABLE [dbo].[Confidential](
[ID] [int] IDENTITY(1,1) NOT NULL PRIMARY KEY CLUSTERED,
[Name] [nvarchar](70)NULL,
[CreditCard] [nvarchar](16)NULL,
[Salary] [int] NULL,
[Email] [nvarchar](60)NULL,
  [StartDate] datetime2 GENERATED ALWAYS AS ROW START NOT NULL,
  [EndDate] datetime2 GENERATED ALWAYS AS ROW END NOT NULL  
   , PERIOD FOR SYSTEM_TIME (StartDate,EndDate)
)  WITH (SYSTEM_VERSIONING=ON(HISTORY_TABLE = [dbo].[ConfidentialHistory]))

The table has sensitive data like the Salary and the Credit Card number.
As you can see, I add an history table [dbo].[ConfidentialHistory].
I insert 6 rows in my table and select both tables.

insert into [dbo].[Confidential]([Name],[CreditCard],[Salary],[Email]) values (N'Stephane',N'3546748598467584',113459,N'sts@dbi-services.com')
insert into [dbo].[Confidential]([Name],[CreditCard],[Salary],[Email]) values (N'David',N'3546746598450989',143576,'dab@dbi-services.com')
insert into [dbo].[Confidential]([Name],[CreditCard],[Salary],[Email])  values (N'Nathan',N'3890098321457893',118900,'nac@dbi-services.com')
insert into [dbo].[Confidential]([Name],[CreditCard],[Salary],[Email])  values (N'Olivier',N'3564890234785612',98000,'olt@dbi-services.com')
insert into [dbo].[Confidential]([Name],[CreditCard],[Salary],[Email])  values (N'Alain',N'9897436900989342',85900,'ala@dbi-services.com')
insert into [dbo].[Confidential]([Name],[CreditCard],[Salary],[Email])  values (N'Fabrice',N'908323468902134',102345,'fad@dbi-services.com')

select * from [dbo].[Confidential]
select * from [dbo].[ConfidentialHistory]

DDM_TemporalTable01
With just inserts, you have no entries in the history table.
After an update for the Salary of Stephane, you can see now the old value in the history table.
To see both tables I use the new option in the SELECT “FOR SYSTEM_TIME ALL”.
DDM_TemporalTable02
The context is in place. Now I will apply the DDM

Step 2: create the DDM rules

I apply masks on all columns from my table with different function like default, partial or email.

Use DDM_TEST
ALTER Table Confidential
ALTER COLUMN NAME ADD MASKED WITH (FUNCTION='default()')
ALTER Table Confidential
ALTER COLUMN SALARY ADD MASKED WITH (FUNCTION='default()')
ALTER Table Confidential
ALTER COLUMN creditcard ADD MASKED WITH (FUNCTION='partial(1,"XXXX",2)')
ALTER Table Confidential
ALTER COLUMN email ADD MASKED WITH (FUNCTION='email()')

DDM_TemporalTable03
As you can see if I read the table, nothing appends because I’m sysadmin of course!
Now, I begin tests with a user who can just read the table.

Step 3: Test the case

The user that I create need to have SELECT permissions on both tables (System-Versioned and History)

USE DDM_TEST;
CREATE USER TestDemo WITHOUT LOGIN
GRANT SELECT ON Confidential TO TestDemo
GRANT SELECT ON ConfidentialHistory TO TestDemo
I execute all SELECT queries as this user:
EXECUTE AS USER='TestDemo'
SELECT * FROM [dbo].[Confidential] 
REVERT
EXECUTE AS USER='TestDemo'
SELECT * FROM [dbo].[ConfidentialHistory]
REVERT
EXECUTE AS USER='TestDemo'
select * from [dbo].[Confidential]  FOR SYSTEM_TIME ALL
REVERT

DDM_TemporalTable04
As you can see, the 3 selects mask data for this user. Nice, isn’t it?
Finally, the Dynamic Data Masking work with Temporal Table very well and can be used to mask all data including historic data from users.

 

Cet article SQL Server 2016: Does Dynamic Data Masking work with Temporal Table? est apparu en premier sur Blog dbi services.

EDB BART 2.0 – How to upgrade and block level incremental backups

Mon, 2017-03-13 06:37

We already have some posts on how you can use EDB BART to backup and restore your PostgreSQL instances from one central server (EnterpriseDB Backup and Recovery Tool (BART), getting started with postgres plus advanced server (2) – setting up a backup and recovery server). The current version you can download from the EnterpriseDB website is 1.1 but version 2.0 is in beta and can be tested already. The main new feature is that BART 2.0 allows you to perform block level incremental backups of your PostgreSQL instances starting with PostgreSQL 9.5. In this post we’ll be looking at that feature and we’ll upgrade from BART 1.1. Lets go …

In my test environment I have BART 1.1 configured and working against an EDB Postgres Plus 9.6 instance:

postgres@edbbart:/home/postgres/ [pg950] bart -v
bart (EnterpriseDB) 1.1.1
postgres@edbbart:/home/postgres/ [pg950] bart SHOW-BACKUPS -s pg3
 SERVER NAME   BACKUP ID       BACKUP TIME               BACKUP SIZE   WAL(s) SIZE   WAL FILES   STATUS  
                                                                                                         
 pg3           1489078376562   2017-03-09 17:53:01 CET   61.93 MB      16.00 MB      1           active  

What do I need to do to upgrade to BART 2.0? Quite easy: Either you have access to the EDB yum repositories (this requires a username and a password) or just install the rpm you which can download here. I will install BART 2.0 from the EDB repositories:

postgres@edbbart:/home/postgres/ [pg950] sudo yum install edb-bart20

That’s it. The first thing to highlight is that the location where BART gets installed changed. Starting with version 2.0 BART will be installed under “/usr/edb/” instead of “/usr/edb-bart-***”.

postgres@edbbart:/home/postgres/ [pg950] ls -la /usr/edb/bart2.0/
total 16
drwxr-xr-x. 5 root root    59 Mar  9 17:56 .
drwxr-xr-x. 4 root root    32 Mar  9 17:56 ..
-rw-r--r--. 1 root root 15272 Feb 21 10:00 bart_license.txt
drwxr-xr-x. 2 root root    36 Mar  9 17:56 bin
drwxr-xr-x. 2 root root    21 Mar  9 17:56 etc
drwxr-xr-x. 2 root root    56 Mar  9 17:56 lib

We always recommend to work with links to point to you current BART installation to make life more easy. If you followed that rule all you have to do is:

postgres@edbbart:/home/postgres/ [pg950] sudo rm /usr/edb-bart
postgres@edbbart:/home/postgres/ [pg950] sudo ln -s /usr/edb/bart2.0/ /usr/edb-bart
postgres@edbbart:/home/postgres/ [pg950] bart -v
bart (EnterpriseDB) 2.0.0

… and you point to the new binaries. For sure you want to copy over your BART 1.1 configuration to the new location:

postgres@edbbart:/home/postgres/ [pg950] sudo cp /usr/edb-bart-1.1/etc/bart.cfg /usr/edb/bart2.0/etc/

From now on you are working with BART 2.0:

postgres@edbbart:/home/postgres/ [pg950] bart show-servers
SERVER NAME         : pg3
BACKUP FRIENDLY NAME: PG3_%year-%month-%dayT%hour:%minute
HOST NAME           : 192.168.22.37
USER NAME           : backupuser
PORT                : 4445
REMOTE HOST         : postgres@192.168.22.37
RETENTION POLICY    : 2016-12-09 18:02:23 CET
DISK UTILIZATION    : 189.93 MB
NUMBER OF ARCHIVES  : 8
ARCHIVE PATH        : /u90/pgdata/backup/pg3/archived_wals
ARCHIVE COMMAND     : scp %p postgres@edbbart:/u90/pgdata/backup/pg3/archived_wals/%f
XLOG METHOD         : fetch
WAL COMPRESSION     : enabled
TABLESPACE PATH(s)  : 
INCREMENTAL BACKUP  : DISABLED      <============================ NEW
DESCRIPTION         : "postgres PG3"

You’ll immediately notice that there is a new line in the output: “INCREMENTAL BACKUP”. Lets keep that for later. What I am most interested in right now is: Can I still backup my PostgreSQL instances with the new version of BART:

postgres@edbbart:/home/postgres/ [pg950] bart backup -s pg3
INFO:  creating backup for server 'pg3'
INFO:  backup identifier: '1489078978132'
63413/63413 kB (100%), 1/1 tablespace

INFO:  backup completed successfully
WARNING: log_timezone is not set in the server, using the local timezone information
INFO:  backup checksum: d318744e42819f76b137edf197a0b59b of base.tar
INFO:  
BACKUP DETAILS:
BACKUP STATUS: active
BACKUP IDENTIFIER: 1489078978132
BACKUP NAME: PG3_2017-03-09T18:02
BACKUP PARENT: none
BACKUP LOCATION: /u90/pgdata/backup/pg3/1489078978132
BACKUP SIZE: 61.93 MB
BACKUP FORMAT: tar
XLOG METHOD: fetch
BACKUP CHECKSUM(s): 1
 ChkSum                             File      
 d318744e42819f76b137edf197a0b59b   base.tar  
TABLESPACE(s): 0
START WAL LOCATION: 00000001000000000000000D
BACKUP METHOD: streamed
BACKUP FROM: master
START TIME: 2017-03-09 18:03:00 CET
STOP TIME: 2017-03-09 18:02:59 CET
TOTAL DURATION: 0 sec(s)

Looks fine. I did not need to change anything in the configuration file and can just start using BART 2.0. So, now I have two backups:

postgres@edbbart:/home/postgres/ [pg950] bart show-backups -s PG3
 SERVER NAME   BACKUP ID       BACKUP NAME            PARENT ID   BACKUP TIME               BACKUP SIZE   WAL(s) SIZE   WAL FILES   STATUS  
                                                                                                                                            
 pg3           1489078978132   PG3_2017-03-09T18:02   none        2017-03-09 18:02:59 CET   61.93 MB      16.00 MB      1           active  
 pg3           1489078376562   PG3_2017-03-09T17:52   none        2017-03-09 17:53:01 CET   61.93 MB      32.00 MB      2           active  

Time to do my first incremental backup. What do I need to do? First of all we need to disable wal compression as this is not supported with incremental backups:

postgres@edbbart:/home/postgres/ [pg950] cat /usr/edb-bart/etc/bart.cfg | grep -i COMPRE
wal_compression = disabled

The next thing to do is obviously to enable incremental backups:

postgres@edbbart:/home/postgres/ [pg950] cat /usr/edb-bart/etc/bart.cfg | grep increme
allow_incremental_backups = enabled

My complete BART configuration then looks like this:

[BART]
bart-host = postgres@edbbart
backup_path = /u90/pgdata/backup
pg_basebackup_path = /usr/edb/as9.6/bin/pg_basebackup
xlog-method = fetch
retention_policy = 3 MONTHS
logfile = /var/log/bart.logs

[PG3]
host = 192.168.22.37
port = 4445
user = backupuser
backup-name = PG3_%year-%month-%dayT%hour:%minute
remote-host = postgres@192.168.22.37
description = "postgres PG3"
wal_compression = disabled
allow_incremental_backups = enabled

Make sure that the show-servers command displays what you expect:

postgres@edbbart:/home/postgres/ [pg950] bart show-servers
SERVER NAME         : pg3
BACKUP FRIENDLY NAME: PG3_%year-%month-%dayT%hour:%minute
HOST NAME           : 192.168.22.37
USER NAME           : backupuser
PORT                : 4445
REMOTE HOST         : postgres@192.168.22.37
RETENTION POLICY    : 2016-12-10 16:53:05 CET
DISK UTILIZATION    : 283.86 MB
NUMBER OF ARCHIVES  : 10
ARCHIVE PATH        : /u90/pgdata/backup/pg3/archived_wals
ARCHIVE COMMAND     : scp %p postgres@edbbart:/u90/pgdata/backup/pg3/archived_wals/%f
XLOG METHOD         : fetch
WAL COMPRESSION     : disabled
TABLESPACE PATH(s)  : 
INCREMENTAL BACKUP  : ENABLED
DESCRIPTION         : "postgres PG3"

With BART 2.0 there is a new component called the “WAL scanner”. The binary is at the same location as BART itself:

postgres@edbbart:/home/postgres/ [pg950] ls -la /usr/edb/bart2.0/bin/bart-scanner 
-rwxr-xr-x. 1 root root 603832 Feb 21 10:01 /usr/edb/bart2.0/bin/bart-scanner

What does it do? The WAL scanner (as the name implies) scans the WALs that are archived from your PostgreSQL instances to the BART host for changes and then writes a so called “modified block map (MBM)” (this is the reason why wal compression is not supported). As this should happen as soon as the WALs arrive on the BART host the WAL scanner should run all the time. To start it in daemon mode:

echo $PATH / include bart bin
postgres@edbbart:/home/postgres/ [pg950] /usr/edb-bart/bin/bart-scanner --daemon
postgres@edbbart:/home/postgres/ [pg950] ps -ef | grep scanner
postgres  2473     1  0 16:58 ?        00:00:00 /usr/edb-bart/bin/bart-scanner --daemon
postgres  2474  2473  0 16:58 ?        00:00:00 /usr/edb-bart/bin/bart-scanner --daemon
postgres  2476  2185  0 16:58 pts/0    00:00:00 grep --color=auto scanner

This will fork one wal scanner process for each PostgreSQL instance configured for incremental backups.

Now we need a new full backup (the scanner did not run when we did the previous backups and the previous WAL files were compressed, so they can not be used):

postgres@edbbart:/home/postgres/ [pg950] bart backup -s PG3 --backup-name full0
INFO:  creating backup for server 'pg3'
INFO:  backup identifier: '1489161554590'
63416/63416 kB (100%), 1/1 tablespace

INFO:  backup completed successfully
WARNING: log_timezone is not set in the server, using the local timezone information
INFO:  backup checksum: f1c917edd0734c155ddace77bfbc3a17 of base.tar
INFO:  
BACKUP DETAILS:
BACKUP STATUS: active
BACKUP IDENTIFIER: 1489161554590
BACKUP NAME: full0
BACKUP PARENT: none
BACKUP LOCATION: /u90/pgdata/backup/pg3/1489161554590
BACKUP SIZE: 61.93 MB
BACKUP FORMAT: tar
XLOG METHOD: fetch
BACKUP CHECKSUM(s): 1
 ChkSum                             File      
 f1c917edd0734c155ddace77bfbc3a17   base.tar  

TABLESPACE(s): 0
START WAL LOCATION: 00000001000000000000000F
BACKUP METHOD: streamed
BACKUP FROM: master
START TIME: 2017-03-10 16:59:14 CET
STOP TIME: 2017-03-10 16:59:18 CET
TOTAL DURATION: 4 sec(s)

postgres@edbbart:/home/postgres/ [pg950] bart show-backups -s pg3
 SERVER NAME   BACKUP ID       BACKUP NAME            PARENT ID   BACKUP TIME               BACKUP SIZE   WAL(s) SIZE   WAL FILES   STATUS  
                                                                                                                                            
 pg3           1489161554590   full0                  none        2017-03-10 16:59:18 CET   61.93 MB      16.00 MB      1           active  
 pg3           1489078978132   PG3_2017-03-09T18:02   none        2017-03-09 18:02:59 CET   61.93 MB      32.00 MB      2           active  
 pg3           1489078376562   PG3_2017-03-09T17:52   none        2017-03-09 17:53:01 CET   61.93 MB      32.00 MB      2           active  

Lets do some work on the PostgreSQL which we are backing up to generate WAL:

(postgres@[local]:4445) [postgres] > create table t1 ( a int );
CREATE TABLE
Time: 2.972 ms
(postgres@[local]:4445) [postgres] > insert into t1 (a) values (generate_series(1,1000000));
INSERT 0 1000000
Time: 512.806 ms
(postgres@[local]:4445) [postgres] > select * from pg_switch_xlog();
 pg_switch_xlog 
----------------
 0/10C19D98
(1 row)

Do the first incremental backup based on the full backup from above:

postgres@edbbart:/home/postgres/ [pg950] bart backup -s pg3 -F p --parent 1489161554590 --backup-name incr1
INFO:  creating incremental backup for server 'pg3'
INFO:  checking mbm files /u90/pgdata/backup/pg3/archived_wals
INFO:  new backup id generated 1489161760522
INFO:  reading directory /u90/pgdata/backup/pg3/archived_wals
INFO:  all files processed
WARNING: log_timezone is not set in the server, using the local timezone information
INFO:  server has been successfully backed up

Now you can do another incremental backup but from the incremental backup taken above instead of the full backup:

postgres@edbbart:/home/postgres/ [pg950] bart BACKUP -s pg3 -F p --parent 1489161760522 --backup-name incr2
INFO:  creating incremental backup for server 'pg3'
INFO:  checking mbm files /u90/pgdata/backup/pg3/archived_wals
INFO:  new backup id generated 1489162048588
INFO:  reading directory /u90/pgdata/backup/pg3/archived_wals
INFO:  all files processed
WARNING: log_timezone is not set in the server, using the local timezone information
INFO:  server has been successfully backed up

So, what do we have now:

postgres@edbbart:/home/postgres/ [pg950] bart show-backups
 SERVER NAME   BACKUP ID       BACKUP NAME            PARENT ID       BACKUP TIME               BACKUP SIZE   WAL(s) SIZE   WAL FILES   STATUS  
                                                                                                                                                
 pg3           1489162048588   incr2                  1489161760522   2017-03-10 17:09:51 CET   45.98 MB                                active  
 pg3           1489161760522   incr1                  1489161554590   2017-03-10 17:06:10 CET   67.35 MB                                active  
 pg3           1489161554590   full0                  none            2017-03-10 16:59:18 CET   61.93 MB      112.00 MB     7           active  
 pg3           1489078978132   PG3_2017-03-09T18:02   none            2017-03-09 18:02:59 CET   61.93 MB      32.00 MB      2           active  
 pg3           1489078376562   PG3_2017-03-09T17:52   none            2017-03-09 17:53:01 CET   61.93 MB      32.00 MB      2           active  

Hm this does not really look an improvement. The first incremental backup is even larger than the full backup it is based on. The second one is a bit smaller but as I did not change anything on the source database between the two incremental backups my expectation was that at least the second incremental backup should use far less space. Lets check it on disk:

postgres@edbbart:/u90/pgdata/backup/pg3/ [pg950] pwd
/u90/pgdata/backup/pg3
postgres@edbbart:/u90/pgdata/backup/pg3/ [pg950] du -sh 1489161554590
62M	1489161554590
postgres@edbbart:/u90/pgdata/backup/pg3/ [pg950] du -sh 1489161760522
68M	1489161760522
postgres@edbbart:/u90/pgdata/backup/pg3/ [pg950] du -sh 1489162048588
47M	1489162048588

At least this seems to be consistent. Lets do another one:

postgres@edbbart:/u90/pgdata/backup/pg3/ [pg950] bart backup -s pg3 -F p --parent 1489162048588 --backup-name incr3
INFO:  creating incremental backup for server 'pg3'
INFO:  checking mbm files /u90/pgdata/backup/pg3/archived_wals
INFO:  new backup id generated 1489224698357
INFO:  reading directory /u90/pgdata/backup/pg3/archived_wals
INFO:  all files processed
WARNING: log_timezone is not set in the server, using the local timezone information

postgres@edbbart:/u90/pgdata/backup/pg3/ [pg950] bart show-backups
 SERVER NAME   BACKUP ID       BACKUP NAME            PARENT ID       BACKUP TIME               BACKUP SIZE   WAL(s) SIZE   WAL FILES   STATUS  
                                                                                                                                                
 pg3           1489224698357   incr3                  1489162048588   2017-03-11 10:31:41 CET   16.58 MB                                active  
 pg3           1489162048588   incr2                  1489161760522   2017-03-10 17:09:51 CET   45.98 MB                                active  
 pg3           1489161760522   incr1                  1489161554590   2017-03-10 17:06:10 CET   67.35 MB                                active  
 pg3           1489161554590   full0                  none            2017-03-10 16:59:18 CET   61.93 MB      160.00 MB     10          active  
 pg3           1489078978132   PG3_2017-03-09T18:02   none            2017-03-09 18:02:59 CET   61.93 MB      32.00 MB      2           active  
 pg3           1489078376562   PG3_2017-03-09T17:52   none            2017-03-09 17:53:01 CET   61.93 MB      32.00 MB      2           active  

Ok, now we can see a real improvement (not sure why there is no improvement for the first ones, need to do more testing). Restores should work as well (I’ll restore the last incremental backup):

postgres@edbbart:/home/postgres/ [pg950] /usr/edb/bart2.0/bin/bart restore -s pg3 -i incr3 -p /var/tmp/restore_test/  -r postgres@localhost
INFO:  restoring incremental backup 'incr3' of server 'pg3'
INFO:  base backup restored
ERROR: failed to overlay modified blocks
command failed with exit code 127
bash: bart: command not found

Hm, what’s that? Why does bart not find bart? Can I restore the full backup?

postgres@edbbart:/home/postgres/ [pg950] /usr/edb/bart2.0/bin/bart restore -s pg3 -i full0 -p /var/tmp/restore_test/  -r postgres@localhost
INFO:  restoring backup 'full0' of server 'pg3'
INFO:  base backup restored
INFO:  archiving is disabled
postgres@edbbart:/home/postgres/ [pg950] ls /var/tmp/restore_test/
backup_label  global        pg_dynshmem    pg_log        pg_notify    pg_snapshots  pg_subtrans  PG_VERSION            postgresql.conf
base          pg_clog       pg_hba.conf    pg_logical    pg_replslot  pg_stat       pg_tblspc    pg_xlog               tablespace_map
dbms_pipe     pg_commit_ts  pg_ident.conf  pg_multixact  pg_serial    pg_stat_tmp   pg_twophase  postgresql.auto.conf

Looks fine, the first incremental:

postgres@edbbart:/home/postgres/ [pg950] /usr/edb/bart2.0/bin/bart restore -s pg3 -i incr1 -p /var/tmp/restore_test/  -r postgres@localhost
INFO:  restoring incremental backup 'incr1' of server 'pg3'
INFO:  base backup restored
ERROR: failed to overlay modified blocks
command failed with exit code 127
bash: bart: command not found

No. Time for the “–debug” mode:

postgres@edbbart:/home/postgres/ [pg950] rm -rf /var/tmp/restore_test/*
postgres@edbbart:/home/postgres/ [pg950] /usr/edb/bart2.0/bin/bart --debug restore -s pg3 -i incr1 -p /var/tmp/restore_test/  -r postgres@localhost
DEBUG: Server: Global, Now: 2017-03-13 12:12:24 CET, RetentionWindow: 7776000 (secs) ==> 2160 hour(s)
DEBUG: Exec Command: ssh -o BatchMode=yes -o PasswordAuthentication=no postgres@localhost exit
DEBUG: Exec Command: ssh -o BatchMode=yes -o PasswordAuthentication=no postgres@localhost test -d /var/tmp/restore_test && echo "exists"
DEBUG: Exec Command: ssh -o BatchMode=yes -o PasswordAuthentication=no postgres@localhost touch /var/tmp/restore_test/tmp-incr1 && echo "exists"
DEBUG: Exec Command: ssh -o BatchMode=yes -o PasswordAuthentication=no postgres@localhost rm -f /var/tmp/restore_test/tmp-incr1
DEBUG: Exec Command: ssh -o BatchMode=yes -o PasswordAuthentication=no postgres@localhost ls -A /var/tmp/restore_test
INFO:  restoring incremental backup 'incr1' of server 'pg3'
DEBUG: restoring backup: 1489161554590
DEBUG: restoring backup to /var/tmp/restore_test
DEBUG: restore command: cat /u90/pgdata/backup/pg3/1489161554590/base.tar | ssh -o BatchMode=yes -o PasswordAuthentication=no postgres@localhost " tar -C /var/tmp/restore_test -xf - "
DEBUG: Exec Command: cat /u90/pgdata/backup/pg3/1489161554590/base.tar | ssh -o BatchMode=yes -o PasswordAuthentication=no postgres@localhost " tar -C /var/tmp/restore_test -xf - "
INFO:  base backup restored
DEBUG: backup '1489161554590' restored to '/var/tmp/restore_test'
DEBUG: restoring backup: 1489161760522
DEBUG: Exec Command: cd /u90/pgdata/backup/pg3/1489161760522/base && tar -cf - * | ssh -o BatchMode=yes -o PasswordAuthentication=no postgres@localhost " tar -C /var/tmp/restore_test -xf - "
DEBUG: Exec Command: cd . && scp -o BatchMode=yes -o PasswordAuthentication=no  -r /u90/pgdata/backup/pg3/1489161760522/base/../1489161760522.cbm postgres@localhost:/var/tmp/restore_test
DEBUG: Exec Command: ssh -o BatchMode=yes -o PasswordAuthentication=no postgres@localhost "bart --debug apply-incremental --source /var/tmp/restore_test --tsmap '' --backup 1489161760522"
ERROR: failed to overlay modified blocks
command failed with exit code 127
bash: bart: command not found

The command that does not seem to work is this one:

DEBUG: Exec Command: ssh -o BatchMode=yes -o PasswordAuthentication=no postgres@localhost "bart --debug apply-incremental --source /var/tmp/restore_test --tsmap '' --backup 1489161760522"

Lets do that manually:

ostgres@edbbart:/home/postgres/ [pg950] bart --debug apply-incremental --source /var/tmp/restore_test --tsmap '' --backup 1489161760522
DEBUG: loading '/var/tmp/restore_test/1489161760522.cbm' MBM/CBM file 
DEBUG: mbm chksum: old f60a435d4d3709302e5b7acc3f8d8ecb, new f60a435d4d3709302e5b7acc3f8d8ecb
DEBUG: applying incremental 1489161760522 (pid 3686)
DEBUG: copying blocks for relation from: /var/tmp/restore_test/base/15184/1247.blk to /var/tmp/restore_test/base/15184/1247
src size: 16384, dst size: 139264
DEBUG: block copy complete: from /var/tmp/restore_test/base/15184/1247.blk, to /var/tmp/restore_test/base/15184/1247
src size: 16384, dst size: 139264
DEBUG: copying blocks for relation from: /var/tmp/restore_test/base/15184/1249.blk to /var/tmp/restore_test/base/15184/1249
src size: 8192, dst size: 753664
DEBUG: block copy complete: from /var/tmp/restore_test/base/15184/1249.blk, to /var/tmp/restore_test/base/15184/1249
src size: 8192, dst size: 753664
DEBUG: copying blocks for relation from: /var/tmp/restore_test/base/15184/1259.blk to /var/tmp/restore_test/base/15184/1259
src size: 8192, dst size: 139264
DEBUG: block copy complete: from /var/tmp/restore_test/base/15184/1259.blk, to /var/tmp/restore_test/base/15184/1259
src size: 8192, dst size: 139264
DEBUG: copying relation node files from (src): /var/tmp/restore_test/base/15184/16386.all, to (dst): /var/tmp/restore_test/base/15184/16386
src size: 36249600, dst size: 0
DEBUG: Exec Command: cp --preserve /var/tmp/restore_test/base/15184/16386.all /var/tmp/restore_test/base/15184/16386
DEBUG: copying blocks for relation from: /var/tmp/restore_test/base/15184/2608.blk to /var/tmp/restore_test/base/15184/2608
src size: 16384, dst size: 663552
DEBUG: block copy complete: from /var/tmp/restore_test/base/15184/2608.blk, to /var/tmp/restore_test/base/15184/2608
src size: 16384, dst size: 663552
DEBUG: copying blocks for relation from: /var/tmp/restore_test/base/15184/2658.blk to /var/tmp/restore_test/base/15184/2658
src size: 8192, dst size: 204800
DEBUG: block copy complete: from /var/tmp/restore_test/base/15184/2658.blk, to /var/tmp/restore_test/base/15184/2658
src size: 8192, dst size: 204800
DEBUG: copying blocks for relation from: /var/tmp/restore_test/base/15184/2659.blk to /var/tmp/restore_test/base/15184/2659
src size: 8192, dst size: 139264
DEBUG: block copy complete: from /var/tmp/restore_test/base/15184/2659.blk, to /var/tmp/restore_test/base/15184/2659
src size: 8192, dst size: 139264
DEBUG: copying blocks for relation from: /var/tmp/restore_test/base/15184/2662.blk to /var/tmp/restore_test/base/15184/2662
src size: 8192, dst size: 32768
DEBUG: block copy complete: from /var/tmp/restore_test/base/15184/2662.blk, to /var/tmp/restore_test/base/15184/2662
src size: 8192, dst size: 32768
DEBUG: copying blocks for relation from: /var/tmp/restore_test/base/15184/2663.blk to /var/tmp/restore_test/base/15184/2663
src size: 8192, dst size: 57344
DEBUG: block copy complete: from /var/tmp/restore_test/base/15184/2663.blk, to /var/tmp/restore_test/base/15184/2663
src size: 8192, dst size: 57344
DEBUG: copying blocks for relation from: /var/tmp/restore_test/base/15184/2673.blk to /var/tmp/restore_test/base/15184/2673
src size: 16384, dst size: 540672
DEBUG: block copy complete: from /var/tmp/restore_test/base/15184/2673.blk, to /var/tmp/restore_test/base/15184/2673
src size: 16384, dst size: 540672
DEBUG: copying blocks for relation from: /var/tmp/restore_test/base/15184/2674.blk to /var/tmp/restore_test/base/15184/2674
src size: 24576, dst size: 557056
DEBUG: block copy complete: from /var/tmp/restore_test/base/15184/2674.blk, to /var/tmp/restore_test/base/15184/2674
src size: 24576, dst size: 557056
DEBUG: copying blocks for relation from: /var/tmp/restore_test/base/15184/2703.blk to /var/tmp/restore_test/base/15184/2703
src size: 8192, dst size: 40960
DEBUG: block copy complete: from /var/tmp/restore_test/base/15184/2703.blk, to /var/tmp/restore_test/base/15184/2703
src size: 8192, dst size: 40960
DEBUG: copying blocks for relation from: /var/tmp/restore_test/base/15184/2704.blk to /var/tmp/restore_test/base/15184/2704
src size: 16384, dst size: 57344
DEBUG: block copy complete: from /var/tmp/restore_test/base/15184/2704.blk, to /var/tmp/restore_test/base/15184/2704
src size: 16384, dst size: 57344
DEBUG: copying blocks for relation from: /var/tmp/restore_test/base/15184/3455.blk to /var/tmp/restore_test/base/15184/3455
src size: 8192, dst size: 49152
DEBUG: block copy complete: from /var/tmp/restore_test/base/15184/3455.blk, to /var/tmp/restore_test/base/15184/3455
src size: 8192, dst size: 49152

.. and that works. So, next test:

postgres@edbbart:/home/postgres/ [pg950] ssh postgres@localhost "bart -version"
bash: bart: command not found
postgres@edbbart:/home/postgres/ [pg950] ssh postgres@localhost "which bart"
which: no bart in (/usr/local/bin:/usr/bin)

Here we have the issue. As we do not get a login shell for these types of ssh commands:

postgres@edbbart:/home/postgres/ [pg950] echo "PATH=\$PATH:/usr/edb-bart/bin/" >> ~/.bashrc
postgres@edbbart:/home/postgres/ [pg950] echo "export PATH" >> ~/.bashrc
postgres@edbbart:/home/postgres/ [pg950] ssh postgres@localhost "which bart"
/usr/edb-bart/bin/bart

Try again:

postgres@edbbart:/home/postgres/ [pg950] rm -rf /var/tmp/restore_test/*
postgres@edbbart:/home/postgres/ [pg950] /usr/edb/bart2.0/bin/bart restore -s pg3 -i incr3 -p /var/tmp/restore_test/  -r postgres@localhostINFO:  restoring incremental backup 'incr3' of server 'pg3'
INFO:  base backup restored
INFO:  archiving is disabled

… and it works. But: This does mean that you have to install BART on all the hosts where you have a PostgreSQL instance if you want to restore to the same host where the instance is running. Not sure if I really like that (or I completely missed something) …

 

Cet article EDB BART 2.0 – How to upgrade and block level incremental backups est apparu en premier sur Blog dbi services.

Google Cloud Platform instances and Oracle database performance

Sun, 2017-03-12 13:00

When it comes to choose a cloud instance to run Oracle Database, you want to be able to run your workload on the minimum CPU cores. This is why in a previous post I measured how many logical reads per seconds can be achieved with a SLOB workload, on AWS which is often the first considered, and will probably do it on Azure in the future. I did the same on the Oracle Cloud which is the only one where Oracle make it easy to run an license the Oracle Database.

CaptureGCPlogoPNGToday, I’ve seen that the Google Cloud offers a 1 year trial (1 year instead of 3 months) with 300$ free credits where you are sure that your credit card will not be debited before you accept it. Of course I wanted to play with it and tested the IaaS instance types with same method: SLOB.

Licencing No-Go

I said ‘play with it’ because you cannot consider Google Cloud as a platform for Oracle Database. Not because of the Google Cloud: it is on the top 3 cloud vendors with AWS and Azure. But just because Oracle doesn’t want to:

  • It is not an ‘authorized cloud’ where Oracle accepts the virtual cores as a metric
  • It is not running with an hypervisor where Oracle accepts the virtual cores as a metric
  • It is not the Oracle Cloud where Oracle accepts to count the virtual cores, and even apply the core factor

So, the sad truth is that if you want to run an Oracle Database on the Google Cloud, you may have to pay Oracle Licences to cover the whole physical infrastructure of Google data center… This is clearly a No-Go for processor metrics licenses. You may think about NUP+ licensing where the metric is not the processors, but the number of users. This was possible for Standard Edition for 11g (and for 12.1.0.1 but this is out of support now). But with 12.1.0.2 they changed the Standard Edition rules and, even when the metric is the number of users, you have to count the number of servers. This is again a No-Go for a public cloud environment.

So let’s play in the hope that one day the rules will change. For the moment, they think that this strategy will push the current Oracle Database users to the Oracle Cloud. They may realize one day that it increases the counterflow of users going to Open Source databases to run away from those arbitrary licensing rules.

Machine types

There are 3 types of processors available for Europe (datacenter in Brussels) Sandy Bridge, Ivy Bridge and Haswell. The regions are detailed in regions-zones documentation and there is more detail in machine_types.

Ivy Brige

Here I choose europe-west1-c and created an VM running RHEL7 on 4 vCPU Ivy Brige:

CaptureGCP00Ivy

Note that the price is the same for all kind of processors within the same region.

SLOB

Here is cached SLOB (UPDATE_PCT=0 RUN_TIME=300 SCALE=80M WORK_UNIT=64)

Load Profile Per Second Per Transaction Per Exec Per Call
~~~~~~~~~~~~~~~ --------------- --------------- --------- ---------
DB Time(s): 1.0 30.3 0.00 4.74
DB CPU(s): 1.0 29.4 0.00 4.59
Redo size (bytes): 14,254.9 433,831.2
Logical read (blocks): 674,052.6 20,513,983.3

This is 674 kLIOPS per vCPU.

Load Profile Per Second Per Transaction Per Exec Per Call
~~~~~~~~~~~~~~~ --------------- --------------- --------- ---------
DB Time(s): 2.0 25.0 0.00 7.24
DB CPU(s): 2.0 24.6 0.00 7.11
Logical read (blocks): 1,377,553.4 17,267,173.0

This is 689 kLIOPS per vCPU. We are running on two cores

Now, let’s see if those 4 VCPUs are threads or cores.

Load Profile Per Second Per Transaction Per Exec Per Call
~~~~~~~~~~~~~~~ --------------- --------------- --------- ---------
DB Time(s): 3.0 1.6 0.00 12.10
DB CPU(s): 2.9 1.6 0.00 11.93
Logical read (blocks): 1,493,775.4 815,084.0

Load Profile Per Second Per Transaction Per Exec Per Call
~~~~~~~~~~~~~~~ --------------- --------------- --------- ---------
DB Time(s): 4.0 54.6 0.00 13.80
DB CPU(s): 4.0 54.0 0.00 13.64
Logical read (blocks): 1,780,041.3 24,329,604.1

This is 498 kLIOPS per vCPU with 3 threads and 445 kLIOPS per vCPU with 4 threads. We are obviously running on 2 hyper-threaded cores, where hyper-threading here gives only 30% additional LIOPS.

When going further, the performance will decrease:

Load Profile Per Second Per Transaction Per Exec Per Call
~~~~~~~~~~~~~~~ --------------- --------------- --------- ---------
DB Time(s): 5.0 48.5 0.00 15.81
DB CPU(s): 3.0 29.0 0.00 9.46
Logical read (blocks): 1,460,029.2 14,233,024.3

It is interesting to see that when running 5 sessions on 4 vCPU then actually 3 threads only are used on average.

Load Profile Per Second Per Transaction Per Exec Per Call
~~~~~~~~~~~~~~~ --------------- --------------- --------- ---------
DB Time(s): 6.0 72.1 0.00 16.24
DB CPU(s): 3.0 36.5 0.00 8.22
Logical read (blocks): 1,458,749.6 17,651,628.4

Load Profile Per Second Per Transaction Per Exec Per Call
~~~~~~~~~~~~~~~ --------------- --------------- --------- ---------
DB Time(s): 7.0 72.5 0.00 20.42
DB CPU(s): 3.0 31.5 0.00 8.86
Logical read (blocks): 1,445,423.3 15,073,622.1

Load Profile Per Second Per Transaction Per Exec Per Call
~~~~~~~~~~~~~~~ --------------- --------------- --------- ---------
DB Time(s): 8.0 82.9 0.00 18.93
DB CPU(s): 3.0 31.4 0.00 7.17
Logical read (blocks): 1,436,355.2 14,986,038.9

Those 8 runs are visible on the CPU usage graph from the VM dashboard where maximum CPU usage is when running 4 sessions on those 4 threads.

CaptureGCP00Ivy001

Here is the CPU description that we can see from lscpu and /proc/cpuinfo where we can see that we are virtualized with KVM:

InkedCaptureGCP00Ivy002_LI

For tty output I usually prefer to paste the text rather than a screenshot, but I wanted to show the ssh screen that you get on your browser with a simple click. This is very convenient. We are really in easy and fast provisioning here.

Haswell

Just to compare, here is the same running on the Haswell machine type.

CaptureGCP00Haswel

Load Profile Per Second Per Transaction Per Exec Per Call
~~~~~~~~~~~~~~~ --------------- --------------- --------- ---------
DB Time(s): 1.0 25.4 0.00 5.53
DB CPU(s): 1.0 24.4 0.00 5.32
Logical read (blocks): 598,274.2 15,275,586.5
 
Load Profile Per Second Per Transaction Per Exec Per Call
~~~~~~~~~~~~~~~ --------------- --------------- --------- ---------
DB Time(s): 2.0 27.3 0.00 8.46
DB CPU(s): 2.0 26.8 0.00 8.29
Logical read (blocks): 1,155,681.9 15,814,666.6
 
Load Profile Per Second Per Transaction Per Exec Per Call
~~~~~~~~~~~~~~~ --------------- --------------- --------- ---------
DB Time(s): 3.0 34.7 0.00 12.34
DB CPU(s): 3.0 34.2 0.00 12.18
Logical read (blocks): 1,300,659.1 15,052,978.1
 
Load Profile Per Second Per Transaction Per Exec Per Call
~~~~~~~~~~~~~~~ --------------- --------------- --------- ---------
DB Time(s): 4.0 54.6 0.00 13.80
DB CPU(s): 3.9 53.8 0.00 13.61
Logical read (blocks): 1,541,843.0 21,098,158.6

Sandy Brige

here is the same running on the Sandy Brige machine type.

CaptureGCP00Sandy


Load Profile Per Second Per Transaction Per Exec Per Call
~~~~~~~~~~~~~~~ --------------- --------------- --------- ---------
DB Time(s): 1.0 25.2 0.00 2.16
DB CPU(s): 1.0 25.0 0.00 2.14
Logical read (blocks): 668,393.8 16,935,651.8
 
Load Profile Per Second Per Transaction Per Exec Per Call
~~~~~~~~~~~~~~~ --------------- --------------- --------- ---------
DB Time(s): 2.0 23.1 0.00 11.13
DB CPU(s): 2.0 22.6 0.00 10.86
Logical read (blocks): 1,204,487.9 13,938,751.4
 
Load Profile Per Second Per Transaction Per Exec Per Call
~~~~~~~~~~~~~~~ --------------- --------------- --------- ---------
DB Time(s): 3.0 37.5 0.00 13.65
DB CPU(s): 3.0 36.9 0.00 13.43
Logical read (blocks): 1,383,602.0 17,334,975.8
 
Load Profile Per Second Per Transaction Per Exec Per Call
~~~~~~~~~~~~~~~ --------------- --------------- --------- ---------
DB Time(s): 4.0 50.0 0.00 16.23
DB CPU(s): 3.9 49.3 0.00 15.98
Logical read (blocks): 1,656,216.2 20,782,477.0

So what?

Whether you like it or not, you cannot run Oracle Database on the Google Cloud because Oracle licensing rules are set to make life difficult for running Oracle Database outside of the Oracle Cloud. For performance, it is still the Oracle IaaS that shows the best performance for this test. But of course, there are lot of other points to consider. You don’t run only the database, but application should be located in the same data center.

The Google Cloud Platform looks good and I’ll probably use my Google Cloud trial to test Spanner, and maybe the new PostgreSQL service. I love Oracle Database technology. But the customers going to a public cloud will tend migrate to database systems which can run on any virtualized environments without certification, support and licensing risks.

 

Cet article Google Cloud Platform instances and Oracle database performance est apparu en premier sur Blog dbi services.

Is it the right time to move to MariaDB now?

Sat, 2017-03-11 03:06

Do you think about adding MariaDB to your database landscape or do you even think about replacing other database systems with MariaDB? Then you should register here. We will be doing a webinar with MariaDB on Thursday, the 23rd of March. The title is: “Why it is a good time to move to MariaDB now” and after a short introduction of dbi services and what we do in the open source area Bruno Šimić (Sales Engineer, MariaDB Corporation) will highlight why the time to do so is now.

mariadb_webinar

Hope to see you registered.

 

Cet article Is it the right time to move to MariaDB now? est apparu en premier sur Blog dbi services.

12cR2: TNS_ADMIN in env.ora

Fri, 2017-03-10 04:37

The network files (sqlnet.ora, tnsnames.ora, lsnrctl.ora) are read by default from ORACLE_HOME/network/admin but you may have several Oracle installations, and want only one place for those files. Then you can use TNS_ADMIN environment variable. But are you sure that it is always set when starting the database? the listener? the client? They must be consistent (see https://blog.dbi-services.com/oracle-12cr2-plsql-new-feature-tnsping-from-the-database/). Then what we do for the cases where TNS_ADMIN is not set: define symbolic links from the ORACLE_HOME to the common location. It would be better to just change the default location and this is what can be done in 12.2 with env.ora

By default, the $ORACLE_HOME/env.ora is empty. There are only comments.

If you run any oracle 12.2 client the $ORACLE_HOME/ora.env will be read. If nothing is set here, then the default $ORACLE_HOME/network/admin location is read.

$ORACLE_HOME/network/admin

[oracle@VM104 tmp]$ strace -e trace=file tnsping localhost 2>&1 | grep -E "[.]ora"
lstat("/u01/app/oracle/product/12.2.0/dbhome_1/env.ora", {st_mode=S_IFREG|0644, st_size=852, ...}) = 0
open("/u01/app/oracle/product/12.2.0/dbhome_1/env.ora", O_RDONLY) = 3
access("/u01/app/oracle/product/12.2.0/dbhome_1/network/admin/sqlnet.ora", F_OK) = 0
open("/u01/app/oracle/product/12.2.0/dbhome_1/network/admin/sqlnet.ora", O_RDONLY) = 3
access("/u01/app/oracle/product/12.2.0/dbhome_1/network/admin/sqlnet.ora", F_OK) = 0
open("/u01/app/oracle/product/12.2.0/dbhome_1/network/admin/sqlnet.ora", O_RDONLY) = 3
access("/home/oracle/.tnsnames.ora", F_OK) = -1 ENOENT (No such file or directory)
access("/etc/tnsnames.ora", F_OK) = -1 ENOENT (No such file or directory)
access("/u01/app/oracle/product/12.2.0/dbhome_1/network/admin/tnsnames.ora", F_OK) = -1 ENOENT (No such file or directory)

Here I have a sqlnet.ora but no tnsnames.ora so the next locations that are searched are ~/.tnsnames.ora and /etc/tnsnames.ora

TNS_ADMIN environment variable

If I set the environment variable TNS_ADMIN to /tmp then

[oracle@VM104 tmp]$ TNS_ADMIN=/tmp strace -e trace=file tnsping localhost 2>&1 | grep -E "[.]ora"
lstat("/u01/app/oracle/product/12.2.0/dbhome_1/env.ora", {st_mode=S_IFREG|0644, st_size=852, ...}) = 0
open("/u01/app/oracle/product/12.2.0/dbhome_1/env.ora", O_RDONLY) = 3
access("/tmp/sqlnet.ora", F_OK) = 0
open("/tmp/sqlnet.ora", O_RDONLY) = 3
access("/tmp/sqlnet.ora", F_OK) = 0
open("/tmp/sqlnet.ora", O_RDONLY) = 3
access("/home/oracle/.tnsnames.ora", F_OK) = -1 ENOENT (No such file or directory)
access("/tmp/tnsnames.ora", F_OK) = -1 ENOENT (No such file or directory)
access("/etc/tnsnames.ora", F_OK) = -1 ENOENT (No such file or directory)
access("/u01/app/oracle/product/12.2.0/dbhome_1/network/admin/tnsnames.ora", F_OK) = -1 ENOENT (No such file or directory)

The directory defined in TNS_ADMIN is searched first

TNS_ADMIN in env.ora

I have added the TNS_ADMIN=/tmp in the env.ora:

[oracle@VM104 tmp]$ tail -3 $ORACLE_HOME/env.ora
# Default: $ORACLE_HOME/network/admin
#
TNS_ADMIN=/tmp

When I run tnsping without setting any environment variable, I have exactly the same as before:


[oracle@VM104 tmp]$ strace -e trace=file tnsping localhost 2>&1 | grep -E "[.]ora"
lstat("/u01/app/oracle/product/12.2.0/dbhome_1/env.ora", {st_mode=S_IFREG|0644, st_size=867, ...}) = 0
open("/u01/app/oracle/product/12.2.0/dbhome_1/env.ora", O_RDONLY) = 3
access("/tmp/sqlnet.ora", F_OK) = 0
open("/tmp/sqlnet.ora", O_RDONLY) = 3
access("/tmp/sqlnet.ora", F_OK) = 0
open("/tmp/sqlnet.ora", O_RDONLY) = 3
access("/home/oracle/.tnsnames.ora", F_OK) = -1 ENOENT (No such file or directory)
access("/tmp/tnsnames.ora", F_OK) = -1 ENOENT (No such file or directory)
access("/etc/tnsnames.ora", F_OK) = -1 ENOENT (No such file or directory)
access("/u01/app/oracle/product/12.2.0/dbhome_1/network/admin/tnsnames.ora", F_OK) = -1 ENOENT (No such file or directory)

The good thing about it is that the setting is centralized for all binaries running from this ORACLE_HOME set.

Both

However the setting in environment has priority over the env.ora one:

[oracle@VM104 tmp]$ TNS_ADMIN=/var/tmp strace -e trace=file tnsping localhost 2>&1 | grep -E "[.]ora"
lstat("/u01/app/oracle/product/12.2.0/dbhome_1/env.ora", {st_mode=S_IFREG|0644, st_size=867, ...}) = 0
open("/u01/app/oracle/product/12.2.0/dbhome_1/env.ora", O_RDONLY) = 3
access("/var/tmp/sqlnet.ora", F_OK) = 0
open("/var/tmp/sqlnet.ora", O_RDONLY) = 3
access("/var/tmp/sqlnet.ora", F_OK) = 0
open("/var/tmp/sqlnet.ora", O_RDONLY) = 3
access("/home/oracle/.tnsnames.ora", F_OK) = -1 ENOENT (No such file or directory)
access("/var/tmp/tnsnames.ora", F_OK) = -1 ENOENT (No such file or directory)
access("/etc/tnsnames.ora", F_OK) = -1 ENOENT (No such file or directory)
access("/u01/app/oracle/product/12.2.0/dbhome_1/network/admin/tnsnames.ora", F_OK) = -1 ENOENT (No such file or directory)

So the recommandation if you want to use the env.ora is not to set TNS_ADMIN, especially when starting the listener or the database, to be sure that the same environment is always used. Final note: I’ve not seen it in the documentation so if you rely on it for critical environment, better to validate with support.

 

Cet article 12cR2: TNS_ADMIN in env.ora est apparu en premier sur Blog dbi services.

Oracle 12cR2, RAC, Cloud, ACFS, PDB thin clones and asmadmin

Thu, 2017-03-09 15:21

In the Oracle Public Cloud, fast provisioning gets all its meaning when creating a RAC database service: in one hour you can get an operational highly available multitenant database. You can even create it in Data Guard for Disaster Recovery. Now, Oracle is pushing ACFS to store the datafiles rather than direct ASM. Especially in multitenant because a great feature is thin cloning: CREATE PLUGGABLE DATABASE AS SNAPSHOT COPY. However, I encountered an error when I tried it for the first time.

TDE keystore

SQL> create pluggable database pdb2 from pdb1 snapshot copy;
create pluggable database pdb2 from pdb1 snapshot copy
*
ERROR at line 1:
ORA-28357: password required to open the wallet

Oh yes, in the cloud all tablespaces are encrypted. In 12.2 we can put the keystore password in the command:

ORA-17517

SQL> create pluggable database pdb2 from pdb1 snapshot copy keystore identified by "Ach1z0#d" ;
 
create pluggable database pdb2 from pdb1 snapshot copy keystore identified by "Ach1z0#d"
*
ERROR at line 1:
ORA-65169: error encountered while attempting to copy file
/u02/app/oracle/oradata/CDB1/49FC0C40FCF62C05E053BAF6C40A9DD6/datafile/o1_mf_users_dcr220sd_.dbf
ORA-17517: Database cloning using storage snapshot failed on file
8:/u02/app/oracle/oradata/CDB1/49FC0C40FCF62C05E053BAF6C40A9DD6/datafile/o1_mf_users_dcr220sd_.dbf

Here we are. The call to the storage snapshot feature has failed. Usually the errors coming from OS calls are accompanied with additional information but not here.

alert.log and trace

In alert.log, the error is displayed with reference to some other trace files:

2017-03-05 16:24:38.935000 +00:00
create pluggable database pdb2 from pdb1 snapshot copy keystore identified by *
AUDSYS.AUD$UNIFIED (SQL_TEXT) - CLOB populated
WARNING: Detected that PDB needs to import keys from source. PDB can only open in restricted mode until import.
2017-03-05 16:24:40.447000 +00:00
Errors in file /u01/app/oracle/diag/rdbms/cdb1/cdb11/trace/cdb11_p000_8910.trc:
Errors in file /u01/app/oracle/diag/rdbms/cdb1/cdb11/trace/cdb11_p002_8918.trc:
Errors in file /u01/app/oracle/diag/rdbms/cdb1/cdb11/trace/cdb11_p001_8914.trc:
Errors in file /u01/app/oracle/diag/rdbms/cdb1/cdb11/trace/cdb11_p003_8922.trc:
**************************************************************
Undo Create of Pluggable Database PDB2 with pdb id - 4.
**************************************************************
ORA-65169 signalled during: create pluggable database pdb2 from pdb1 snapshot copy keystore identified by * ...

And those trace files have the following information:
ksfdsscre_clone: create snapshot failed error(-1) errmsg(OS dependent failure) voltag(49FF372094256196E053BAF6C40AEB9D) parent_voltag() mntbuf(/u02)

This is not very helpful by itself. We see the snapshot name (voltag) and the parent name (parent_voltag). You may know that error (-1) is EPERM which is ‘operation not permitted’. What I did to be sure was to try to create the snapshot myself:

[oracle@rac1 cdb11]$ acfsutil snap create -w 49FFA9651D1D58D5E053BAF6C40AF81C /u02
acfsutil snap create: CLSU-00100: operating system function: ioctl failed with error data: 1
acfsutil snap create: CLSU-00101: operating system error message: Operation not permitted
acfsutil snap create: CLSU-00103: error location: OI_0
acfsutil snap create: ACFS-03046: unable to perform snapshot operation on /u02

EPERM

This is more clear and I also strace’d it to see where the error comes from:

open("/u02", O_RDONLY) = 41
ioctl(41, RTC_UIE_ON, 0x7fff17ae17a0) = 0
ioctl(41, 0xffffffffc1287021, 0x7fff17ae0e90) = -1 EPERM (Operation not permitted)

I’m running that with the oracle user, as the instance does when creating a PDB:
uid=1001(oracle) gid=1001(oinstall) groups=1001(oinstall),1002(dba),1003(racoper),1004(asmdba)

grid

When connecting as grid, I am able to create the snapshot

[grid@rac1 ~]$ acfsutil snap create -w 49FFA9651D1D58D5E053BAF6C40AF81C /u02
acfsutil snap create: Snapshot operation is complete.
[grid@rac1 ~]$ acfsutil snap delete 49FFA9651D1D58D5E053BAF6C40AF81C /u02
acfsutil snap delete: Snapshot operation is complete.

Grid has the following permissions:
uid=1000(grid) gid=1001(oinstall) groups=1001(oinstall),1003(racoper),1004(asmdba),1005(asmoper),1006(asmadmin)

asmadmin

This is what /etc/group looks like:

opc:x:54323:
oinstall:x:1001:
dba:x:1002:oracle
racoper:x:1003:oracle,grid
asmdba:x:1004:oracle,grid
asmoper:x:1005:grid
asmadmin:x:1006:grid

This is what the Oracle Public Cloud defines at RAC DBaaS service creation, and asmadmin is not mentioned in documentation.

So, to solve (or workaround) the issue, I’ve added oracle to the asmadmin group:

asmadmin:x:1006:grid,oracle

and now, I’m able to create a snapshot when logging as oracle:

[oracle@rac1 ~]$ acfsutil snap create -w 49FFA9651D1D58D5E053BAF6C40AF81C /u02
acfsutil snap create: Snapshot operation is complete.
[oracle@rac1 ~]$ acfsutil snap delete 49FFA9651D1D58D5E053BAF6C40AF81C /u02
acfsutil snap delete: Snapshot operation is complete.

restart

I thought that restarting the instance would be sufficient, but it is not. I had to restart the cluster. And this is also something easy in the Oracle Public Cloud:

CaptureRestartOPCRAC

A simple click restarts the first node, and then, once it is up again, restarts the second node.Rolling reboot ensures that the service is always up.

Thin clone

Here it is. The instance is now able to create a snapshot.

Connected to:
Oracle Database 12c EE Extreme Perf Release 12.2.0.1.0 - 64bit Production
 
SQL> set echo on
SQL> create pluggable database pdb2 from pdb1 snapshot copy keystore identified by "Ach1z0#d" ;
Pluggable database created.
 
Elapsed: 00:00:30.36

So what?

In my opinion, the configurations that stores a CDB datafiles on ACFS should give the rights to create snapshots to the user running the database. The cloud interface is very simple, but the technology behind is complex. The consequence of this gap is that using the cloud is easy when everything goes as expected, but any exception can bring us into troubleshooting.

 

Cet article Oracle 12cR2, RAC, Cloud, ACFS, PDB thin clones and asmadmin est apparu en premier sur Blog dbi services.

Oracle 12cR2: changes for login.sql

Tue, 2017-03-07 15:58

If you use a login.sql script to set the SQL*Plus environment from your current working directory, you will see that it will not run anymore in 12.2. This is a security feature, and a good occasion to explain how sqlplus finds the scritps to run, on Linux.

For my test I have login.sql, LOGIN.SQL and script.sql in the following directories

$ tree /tmp/mytest/
/tmp/mytest/
├── a
│   ├── login.sql
│   ├── LOGIN.SQL
│   └── script.sqlL
├── b
│   ├── login.sql
│   ├── LOGIN.SQL
│   └── script.sql
├── login.sql
├── LOGIN.SQL
└── script.sql

I’m going to the parent directory
cd /tmp/mytest

The scripts display their name:

+ head login.sql LOGIN.SQL script.sql
==> login.sql LOGIN.SQL script.sql <==
prompt Hello from /tmp/mytest/script.sql

I’ll run commands from bash -x so that they are displayed, and environment variables are set only for the command to be run.

login.sql

+ sqlplus -s /nolog

Nothing displayed here. This is what has changed in 12.2 for security reasons the login.sql in the current working directory is not run anymore.

+ sqlplus -s /nolog @ login.sql

This is probably a side effect of the implementation of this new security feature: even when I explicitly want to run the login.sql script it is ignored

+ sqlplus -s /nolog @ login
Hello from /tmp/mytest/login.sql

Here, I rely on the implicit ‘.sql’ added and the the script is run. Probably the implementation of the security feature is done before this implicit extension.

+ sqlplus -s /nolog @ /tmp/mytest/login.sql
Hello from /tmp/mytest/login.sql

With full path, the script is run.

Actually, the only way to get the current directory login.sql run implicitely when starting sqlplus or connecting is to set the current directory in ORACLE_PATH:

+ ORACLE_PATH=.
+ sqlplus -s /nolog
Hello from /tmp/mytest/login.sql

Note that this defeats the security feature, in the same way it is not recommended to add ‘.’ to your shell PATH. It is better to put only absolute paths in the PATH, with directories whey you know nobody can add a trojan script.

LOGIN.SQL

+ sqlplus -s /nolog @ LOGIN.SQL

The implementation of this new feature is case insensitive. LOGIN.SQL is ignored even when specified in the command line.

+ sqlplus -s /nolog @ ./LOGIN.SQL
Hello from /tmp/mytest/LOGIN.SQL

Only when using less or more characters to specify it it is used.

Note that the implicit login.sql is case sensitive on Linux:
+ rm login.sql
+ ORACLE_PATH=.
+ sqlplus -s /nolog

Even with ORACLE_PATH it is not found.

SQLPATH

+ SQLPATH=/tmp/mytest
+ sqlplus -s /nolog

SQLPATH is not a solution to find login.sql unlike ORACLE_PATH

Note that the documentation tells very different things in http://docs.oracle.com/database/122/SQPUG/configuring-SQL-Plus.htm#SQPUG012

script.sql

Now, because SQLPATH and ORACLE_PATH was already a mess, I’ll try with a script that is not login.sql

+ sqlplus -s /nolog @ script.sql
Hello from /tmp/mytest/script.sql

Current directory is still searched for non-login scripts

+ sqlplus -s /nolog @ /tmp/mytest/script.sql
Hello from /tmp/mytest/script.sql

Absolute path can be used, or we can sete a PATH to search:

+ SQLPATH=/tmp/mytest
+ sqlplus -s /nolog @ script
Hello from /tmp/mytest/script.sql

Unlike login.sql, SQLPATH can be used to find a script in another directory

+ cd /tmp/mytest/a
+ SQLPATH=/tmp/mytest
+ sqlplus -s /nolog @ script
Hello from /tmp/mytest/a/script.sql

But current directory is still the first one that is searched

+ rm script.sql
+ SQLPATH=/tmp/mytest
+ sqlplus -s /nolog @ script
Hello from /tmp/mytest/script.sql

Only when the script is not in the current directory it is searched in SQLPATH

+ rm /tmp/mytest/script.sql
+ SQLPATH=/tmp/mytest
+ sqlplus -s /nolog @ script
SP2-0310: unable to open file "script.sql"

Again, the documentation is wrong. Only specified directories are looked-up, not sub-directories. But if I specify the subdirectory relative to SQLPATH (I am still in /tmp/mytest/a where shere is no script.sql)

+ SQLPATH=/tmp/mytest
+ sqlplus -s /nolog @ b/script
Hello from /tmp/mytest/b/script.sql

The b/script was resolved from the SQLPATH=/tmp/mytest

In SQLPATH, we can add multiple paths

+ SQLPATH=/tmp/mytest:/tmp/mytest/a:/tmp/mytest/b
+ sqlplus -s /nolog @ script
Hello from /tmp/mytest/b/script.sql

Here I have a script.sql only in the ‘b’ sub-directory and sqlplus finds it when this directory is listed in SQLPATH

ORACLE_PATH

Running the same with ORACLE_PATH instead of SQLPATH

+ ORACLE_PATH=/tmp/mytest:/tmp/mytest/a:/tmp/mytest/b
+ sqlplus -s /nolog @ script
Hello from /tmp/mytest/a/login.sql
Hello from /tmp/mytest/b/script.sql

We can have also multiple paths for ORACLE_PATH (this is not in the documentation) and it acts as with SQLPATH but there are a few differences.

First, you have seen that the login.sql script is run.

Then, if I have the script in my current directory, but not in ORACLE_PATH

+ cd /tmp
+ echo 'prompt Hello from /tmp' > script.sql
+ ORACLE_PATH=/tmp/mytest:/tmp/mytest/a:/tmp/mytest/b
+ sqlplus -s /nolog @ script
Hello from /tmp/mytest/a/login.sql
Hello from /tmp/mytest/b/script.sql

The ORACLE_PATH one is used first

+ ORACLE_PATH=/tmp/mytest:/tmp/mytest/a
+ sqlplus -s /nolog @ script
Hello from /tmp/mytest/a/login.sql
Hello from /tmp

The current directory is considered only when not found in ORACLE_PATH.

+ ORACLE_PATH=/tmp/mytest:/tmp/mytest/a
+ sqlplus -s /nolog @ b/script
Hello from /tmp/mytest/a/login.sql
Hello from /tmp/mytest/b/script.sql

As with SQLPATH, subdirectory is accessible if mentioned.

Both, in order

If you don’t want to keep it simple, you can specify both ORACLE_PATH and SQLPATH

+ cd mytest
+ ORACLE_PATH=/tmp
+ SQLPATH=/tmp/mytest/b
+ sqlplus -s /nolog @ script
Hello from /tmp

and ORACLE_PATH is read first. So the order is:

  1. ORACLE_PATH, in order of paths specified
  2. Current directory (except for login.sq)
  3. SQLPATH (except for login.sql) in order of paths specified
strace

Better than documentation or testing all combinations, in Linux we can trace the system calls when sqlplus is looking for the script.

I’ve set non-existing paths /ORACLE_PATH1 and /ORACLE_PATH2 for ORACLE_PATH, and /SQLPATH1 and /SQLPATH2 for SQLPATH and run ‘script’ without the extention

ORACLE_PATH=/ORACLE_PATH1:/ORACLE_PATH2 SQLPATH=/SQLPATH1:/SQLPATH2 strace -e trace=file sqlplus -s /nolog @ script

This traces all system calls with a file name:


access("/ORACLE_PATH1/script", F_OK) = -1 ENOENT (No such file or directory)
access("/ORACLE_PATH2/script", F_OK) = -1 ENOENT (No such file or directory)
access("/ORACLE_PATH1/script.sql", F_OK) = -1 ENOENT (No such file or directory)
access("/ORACLE_PATH2/script.sql", F_OK) = -1 ENOENT (No such file or directory)
access("/ORACLE_PATH1/script", F_OK) = -1 ENOENT (No such file or directory)
access("/ORACLE_PATH2/script", F_OK) = -1 ENOENT (No such file or directory)
access("/ORACLE_PATH1/script.sql", F_OK) = -1 ENOENT (No such file or directory)
access("/ORACLE_PATH2/script.sql", F_OK) = -1 ENOENT (No such file or directory)
stat("script.sql", 0x7fff01921400) = -1 ENOENT (No such file or directory)
access("/ORACLE_PATH1/script", F_OK) = -1 ENOENT (No such file or directory)
access("/ORACLE_PATH2/script", F_OK) = -1 ENOENT (No such file or directory)
access("/ORACLE_PATH1/script.sql", F_OK) = -1 ENOENT (No such file or directory)
access("/ORACLE_PATH2/script.sql", F_OK) = -1 ENOENT (No such file or directory)
stat("/SQLPATH1/script.sql", 0x7fff0191b430) = -1 ENOENT (No such file or directory)
stat("/SQLPATH2/script.sql", 0x7fff0191b430) = -1 ENOENT (No such file or directory)
SP2-0310: unable to open file "script.sql"
access("/ORACLE_PATH1/login.sql", F_OK) = -1 ENOENT (No such file or directory)
access("/ORACLE_PATH2/login.sql", F_OK) = -1 ENOENT (No such file or directory)
access("/ORACLE_PATH1/login.sql", F_OK) = -1 ENOENT (No such file or directory)
access("/ORACLE_PATH2/login.sql", F_OK) = -1 ENOENT (No such file or directory)
stat("/u01/app/oracle/product/12.2.0/dbhome_1/sqlplus/admin/glogin.sql", {st_mode=S_IFREG|0644, st_size=342, ...}) = 0
access("/u01/app/oracle/product/12.2.0/dbhome_1/sqlplus/admin/glogin.sql", F_OK) = 0
statfs("/u01/app/oracle/product/12.2.0/dbhome_1/sqlplus/admin/glogin.sql", {f_type="EXT2_SUPER_MAGIC", f_bsize=4096, f_blocks=6676009, f_bfree=2866104, f_bavail=2521221, f_files=1703936, f_ffree=1663469, f_fsid={-1731931108, 1057261682}, f_namelen=255, f_frsize=4096}) = 0
open("/u01/app/oracle/product/12.2.0/dbhome_1/sqlplus/admin/glogin.sql", O_RDONLY) = 9

This is very interesting. First we see that the paths are searched multiple time, and I don’t know why. Second, when passing a name without extension (i.e without dot in the name) the exact name is used forst for ORACLE_PATH, but lookup in current directory and in SQLPATH automatically adds ‘.sql’. The system calls are also different: ORACLE_PATH has no stat() call before access(), which is different with current directory and SQLPATH. Finally, login.sql is read from ORACLE_PATH only and glogin.sql from ORACLE_HOME/sqlplus/admin.

Change occurred between 12.2.0.1 and 12.2.0.1

As a comparison, sqlplus 12.1.0.2 and even 12.2.0.1 DBaaS version (built in October) has the following additional calls to look for login.sql in current path and in SQLPATH:
stat("login.sql", 0x7fffc14d5490) = -1 ENOENT (No such file or directory)
stat("/SQLPATH1/login.sql", 0x7fffc14cf4c0) = -1 ENOENT (No such file or directory)
stat("/SQLPATH2/login.sql", 0x7fffc14cf4c0) = -1 ENOENT (No such file or directory)

This has disappear in 12.2.0.1 on-premises version (build in January).

So what?

WTF login.sql is not run, even when explicitly started
Except when mentioning it as 'login' or as './login.sql' pic.twitter.com/ClfvAUizO6

— Franck Pachot (@FranckPachot) March 7, 2017

Big thanks to the SQL Developer team who gave me the solution approximately 3 seconds after my tweet.

This behavior changed and, as far as I know, is not documented and the MOS note about it is not published. It makes sense, for security reason, to prevent running scripts in the current directory without explicitely allowing it. However, login.sql is often used for formatting only. It seems that SQLcl will implement this in a finer way, running only the formatting commands when it comes implicitly.

Be careful when moving to/from the Oracle Cloud and your premises because you don’t run exactly the same version…

 

Cet article Oracle 12cR2: changes for login.sql est apparu en premier sur Blog dbi services.

Misleading wait event names clarified in V$EVENT_NAME

Mon, 2017-03-06 14:15

The oracle wait event names were originally implemented for the oracle rdbms developers and are now use by the database users to troubleshoot performance issues. The consequence is that the name may be misleading because they have a meaning from the internal point of view. Here is some clarification about them.

In 12c the clarification is easy because we have a new DISPLAY_NAME column in the V$EVENT_NAME view:


SQL> select wait_class,name, display_name from v$event_name where display_name != name order by 1,2;
WAIT_CLASS NAME DISPLAY_NAME
-------------- ----------------------------------- ----------------------------------------------
Administrative concurrent I/O completion online move datafile IO completion
Administrative datafile copy range completion online move datafile copy range completion
Administrative wait for possible quiesce finish quiesce database completion
Commit log file sync commit: log file sync
Configuration log buffer space log buffer full - LGWR bottleneck
Idle LGWR real time apply sync standby apply advance notification
Other DFS db file lock quiesce for datafile offline
Other Image redo gen delay redo resource management
Other datafile move cleanup during resize online move datafile resize cleanup
System I/O control file sequential read control file read
System I/O control file single write control file write
System I/O db file parallel write db list of blocks write
System I/O log file parallel write log file redo write
System I/O log file sequential read log file multiblock read
System I/O log file single write log file header write
User I/O db file parallel read db list of blocks read
User I/O db file scattered read db multiblock read
User I/O db file sequential read db single block read
User I/O db file single write db single block write

For long we know the misleading ‘db file sequential read’ which is what we call ‘random reads’ from storage point of view and ‘db file scattered read’ that is what we call ‘sequential reads’ from storage point of view. The DISPLAY_NAME clarifies everything: single block reads vs. multiblock reads.

‘db file parallel read’ is a batch of random reads, used by prefetching for example, which reads multiple blocks but non contiguous.
‘db file parallel write’ is similar, for DBWR to write a batch of blocks. The DISPLAY_NAME clarifies everything: ‘db list of blocks’.

‘log file parallel write’ is ‘parallel’ only because you can have multiplexed files. DISPLAY_NAME is less misleading with ‘log file redo write’.
The ‘log buffer space’ has a DISPLAY_NAME that is more focused on the cause: ‘log buffer full – LGWR bottleneck’

You can look at the others where DISPLAY_NAME is very clear about the operation: ‘online move’ for some operations on files, ‘commit’ for the well know log file sync…

Of course they are also described in the Database Reference documentation.

 

Cet article Misleading wait event names clarified in V$EVENT_NAME est apparu en premier sur Blog dbi services.

ODA – 32GB template but got a database with 16GB SGA???

Mon, 2017-03-06 11:40

I got an interesting question today from a customer which created a database on ODA. He selected the template odb-04 which shows 32GB Memory but got a database with a 16GB SGA… Is it due to the PGA size, a limitation in the system, the huge pages usage which is reached or even a bug?

Indeed, the answer is easier and funnier. If you look to the menu shown by OAKCLI while creating a database, you get something like that:

 Please select one of the following for Database Class  [1 .. 6]:

1    => odb-01s  (   1 cores ,     4 GB memory)

2    =>  odb-01  (   1 cores ,     8 GB memory)

3    =>  odb-02  (   2 cores ,    16 GB memory)

4    =>  odb-04  (   4 cores ,    32 GB memory)

5    =>  odb-06  (   6 cores ,    48 GB memory)

6    =>  odb-12  (  12 cores ,    96 GB memory)

4

Selected value is : odb-04  (   4 cores ,    32 GB memory)

 

So using the template odb-04 seems to use 32GB memory for the newly created database. However looking to what OAKCLI really does shows that the reality is a bit different. Following all files/scripts which are called by OAKCLI at execution, we come to following file

/opt/oracle/oak/lib/oakutilslib/DbSizingValues.pm

 

This script contains the definition of the DBCA template used including the memory definition

my $sga_size = $memory * $dbTypes{$dbtypeid}{sga_factor};
my $pga_size = $memory * $dbTypes{$dbtypeid}{pga_factor};

So the memory value is multiplied by a factor depending on the database type. Looking in the same script we find both information:

my  %dbTemplates  =

    (  1  => { name => 'odb-01s', cpus => 1,  sfactor  => 0.5},

       2  => { name => 'odb-01',  cpus => 1  },

       3  => { name => 'odb-02',  cpus => 2  },

       4  => { name => 'odb-04',  cpus => 4  },

       5  => { name => 'odb-06',  cpus => 6  },

       6  => { name => 'odb-12',  cpus => 12 },

       7  => { name => 'odb-16',  cpus => 16 },

       8  => { name => 'odb-24',  cpus => 24 },

       9  => { name => 'odb-32',  cpus => 32 },

       10 => { name => 'odb-36',  cpus => 36 }

    );

my  %dbTypes  =

    ( 1 => { name => 'OLTP', template_name => 'OAK_oltp.dbt',     sga_factor => 0.5,  pga_factor => 0.25 },

      2 => { name => 'DSS',  template_name => 'OAK_dss.dbt',      sga_factor => 0.25, pga_factor => 0.50 },

      3 => { name => 'In-Memory', template_name => 'OAK_oltp.dbt',sga_factor => 0.25, pga_factor => 0.25, in_mem_factor=>0.25, only12c=> 1}

    );

 

This means that If you create an OLTP database with the odb-04 template it takes 32GB as basis and multiplied them by 0,5.
Here we go we have our 16GB!!

In conclusion the memory information shown by OAKCLI CREATE DATABASE is the base memory used for the calculation and not the one assigned to the SGA. I must admit that this is quite confusing for the end users as the base memory as no signification and is useless…

To be fully fair, I have to mention that the correct information about SGA size per template is available in the documentation the appendix B:

http://docs.oracle.com/cd/E83239_01/doc.121/e83201/database-templates-oracle-database-appliance.htm#CMTAR269

Enjoy!

David

 

Cet article ODA – 32GB template but got a database with 16GB SGA??? est apparu en premier sur Blog dbi services.

Purging Unified Audit Trail in 12cR1

Sat, 2017-03-04 11:24

When you want to empty a table you have two methods: delete and truncate. If, for any reason (see previous post) the Unified Audit Trail has become too big, you cannot directly delete or truncate the table. You must call the dbms_audit_mgmt.clean_audit_trail. But then you want to know if it will do slow deletes or quick truncates. Let’s trace it.

I have filled my Unified Audit Trail with hundred of thousands failed logins:
SQL> select unified_audit_policies,action_name,count(*) from unified_audit_trail group by unified_audit_policies,action_name;
 
UNIFIED_AUDIT_POLICIES ACTION_NAME COUNT(*)
---------------------------------------- -------------------- ----------
EXECUTE 2
ORA_LOGON_FAILURES LOGON 255799

We have two methods to purge: purge records older than a timestamp or purge all.

Purge old

Auditing is different than logging. It’s a security feature. The goal is not to keep only recent information by specifying a retention. The goal is to read, process and archive the records, and then set a timestamp to the high water mark that has been processed. Then a background job will delete what is before this timestamp.

I set the timestamp to 6 hours before now

SQL> exec dbms_audit_mgmt.set_last_archive_timestamp(audit_trail_type=>dbms_audit_mgmt.audit_trail_unified
,last_archive_time=>sysdate-6/24);
PL/SQL procedure successfully completed.

And call the clean procedure:

SQL> exec dbms_audit_mgmt.clean_audit_trail(audit_trail_type=>dbms_audit_mgmt.audit_trail_unified
,use_last_arch_timestamp=TRUE);
PL/SQL procedure successfully completed.

This was fast but let’s look at the tkprof. Besides some select, I see a delete on the CLI_SWP$ table that stores the Unified Audit Trail in Secure File LOBs

delete from "CLI_SWP$2f516430$1$1" partition("HIGH_PART")
where
max_time < :1
 
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.47 1.82 20 650 47548 6279
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 2 0.47 1.82 20 650 47548 6279
 
Misses in library cache during parse: 1
Misses in library cache during execute: 1
Optimizer mode: ALL_ROWS
Parsing user id: 7 (recursive depth: 1)
Number of plan statistics captured: 1
 
Rows (1st) Rows (avg) Rows (max) Row Source Operation
---------- ---------- ---------- ---------------------------------------------------
0 0 0 DELETE CLI_SWP$2f516430$1$1 (cr=650 pr=20 pw=0 time=1827790 us)
6279 6279 6279 PARTITION RANGE SINGLE PARTITION: 1 1 (cr=248 pr=0 pw=0 time=15469 us cost=5 size=18020 card=530)
6279 6279 6279 TABLE ACCESS FULL CLI_SWP$2f516430$1$1 PARTITION: 1 1 (cr=248 pr=0 pw=0 time=10068 us cost=5 size=18020 card=530)

I will not go into the detail there. This delete may be optimized (120000 audit trail records were actually deleted here behind those 6000 rows. This table is partitioned, and we can expect that old partitions are truncated but there are many bugs with that. On lot of environments we see all rows in HIGH_PART.
This is improved in 12cR2 and will be the subject of a future post. I you have a huge audit trail to purge, then conventional delete is not optimal.

Purge all

I have still lot of rows remaining.

SQL> select unified_audit_policies,action_name,count(*) from unified_audit_trail group by unified_audit_policies,action_name;
 
UNIFIED_AUDIT_POLICIES ACTION_NAME COUNT(*)
---------------------------------------- -------------------- ----------
EXECUTE 4
ORA_LOGON_FAILURES LOGON 136149

When purging all without setting a timestamp, I expect a truncate which is faster than deletes. Let’s try it and trace it.

SQL> exec dbms_audit_mgmt.clean_audit_trail(audit_trail_type=>dbms_audit_mgmt.audit_trail_unified
,use_last_arch_timestamp=FALSE);
PL/SQL procedure successfully completed.

First, there seem to be an internal log acquired:
SELECT LOCKID FROM DBMS_LOCK_ALLOCATED WHERE NAME = :B1 FOR UPDATE
UPDATE DBMS_LOCK_ALLOCATED SET EXPIRATION = SYSDATE + (:B1 /86400) WHERE ROWID = :B2

Then a partition split:
alter table "CLI_SWP$2f516430$1$1" split partition high_part at (3013381) into (partition "PART_6", partition high_part lob(log_piece) store as securefile (cache logging tablespace SYSAUX) tablespace "SYSAUX")

The split point is the current timestamp SCN:

SQL> select scn_to_timestamp(3013381) from dual;
 
SCN_TO_TIMESTAMP(3013381)
---------------------------------------------------------------------------
02-MAR-17 05.59.06.000000000 PM

This is the time when I’ve run the purge and this is probably used to ‘truncate’ all previous partition but keep the on-going one.

Then , there is no TRUNCATE in the trace, but something similar: some segments are dropped:

delete from seg$
where
ts#=:1 and file#=:2 and block#=:3
 
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 6 0.00 0.00 0 18 12 6
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 7 0.00 0.00 0 18 12 6

There is finally a delete, but with no rows to delete as the rows were in the dropped segments:

delete from "CLI_SWP$2f516430$1$1" partition("HIGH_PART")
where
max_time < :1
 
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 3 0 0
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 2 0.00 0.00 0 3 0 0
 
Misses in library cache during parse: 1
Misses in library cache during execute: 1
Optimizer mode: ALL_ROWS
Parsing user id: 7 (recursive depth: 1)
Number of plan statistics captured: 1
 
Rows (1st) Rows (avg) Rows (max) Row Source Operation
---------- ---------- ---------- ---------------------------------------------------
0 0 0 DELETE CLI_SWP$2f516430$1$1 (cr=3 pr=0 pw=0 time=61 us)
0 0 0 PARTITION RANGE SINGLE PARTITION: 1 1 (cr=3 pr=0 pw=0 time=57 us cost=5 size=2310 card=33)
0 0 0 TABLE ACCESS FULL CLI_SWP$2f516430$1$1 PARTITION: 1 1 (cr=3 pr=0 pw=0 time=48 us cost=5 size=2310 card=33)

So what?

Cleaning the Unified Audit Trail is done with internal statements but looks like a delete when use_last_arch_timestamp=TRUE or a truncate when use_last_arch_timestamp=FALSE. This means that we can use this procedure when AUDSYS has grown too much. However, there are a few bug with this internal table, partitioned even when partitioning is not allowed. The implementation has changed in 12.2 so the next blog post will show the same test on 12cR2.

 

Cet article Purging Unified Audit Trail in 12cR1 est apparu en premier sur Blog dbi services.

Is your DBA_FEATURE_USAGE_STATISTICS up-to-date?

Sat, 2017-03-04 05:35

Last day we were doing a licensing review for a client. As many dbas may already know, this require to execute some oracle scripts at OS level and database level.
Among these scripts we have the script options_packs_usage_statistics.sql (docId 1317265.1) which is an official oracle script to check the usage of separately licensed Oracle Database Options/Management Packs
This script is using the DBA_FEATURE_USAGE_STATISTICS table to retrieve info. And sometimes it may happen that data of this table are not recent.
One important thing is that the DBA_FEATURE_USAGE_STATISTICS are based on the most recent sample in the column LAST_SAMPLE_DATE. In our case we got following results (outputs are truncated).

SYSDATE |
-------------------|
2017.02.17_13.36.44|


PRODUCT |LAST_SAMPLE_DATE |
-------------------------------|-------------------|
Active Data Guard |2014.01.02_13.37.53|
Advanced Analytics |2014.01.02_13.37.53|
Advanced Compression |2014.01.02_13.37.53|
Advanced Security |2014.01.02_13.37.53|
Database Vault |2014.01.02_13.37.53|
Diagnostics Pack |2014.01.02_13.37.53|
Label Security |2014.01.02_13.37.53|
OLAP |2014.01.02_13.37.53|
Partitioning |2014.01.02_13.37.53|
Real Application Clusters |2014.01.02_13.37.53|
Real Application Testing |2014.01.02_13.37.53|
Tuning Pack |2014.01.02_13.37.53|
.Exadata |2014.01.02_13.37.53|

If we compare sysdate and the date of the last_sample_date, we can see that we have to manually refresh our DBA_FEATURE_USAGE_STATISTICS data.
One way to do this is to run the procedure

exec dbms_feature_usage_internal.exec_db_usage_sampling(SYSDATE);

In our case the procedure did not refresh our data despite the fact that there was any error and we received message that procedure was successfully executed.

SQL> exec dbms_feature_usage_internal.exec_db_usage_sampling(SYSDATE);
PL/SQL procedure successfully completed.


SQL> select max(last_sample_date) from dba_feature_usage_statistics order by 1;
MAX(LAST_
---------
02-JAN-14

Following oracle document 1629485.1 we were able to refresh the last_sample_date using this ALTER SESSION
code>
SQL> alter session set “_SWRF_TEST_ACTION”=53;
Session altered.


SQL> alter session set NLS_DATE_FORMAT='DD/MM/YYYY HH24:MI:SS';
Session altered.


SQL> select MAX(LAST_SAMPLE_DATE) from dba_feature_usage_statistics;
MAX(LAST_SAMPLE_DAT
-------------------
16/02/2017 13:44:46

Hope this article may help

 

Cet article Is your DBA_FEATURE_USAGE_STATISTICS up-to-date? est apparu en premier sur Blog dbi services.

Sharding with Oracle 12c R2 Part I

Sat, 2017-03-04 05:33

Oracle 12.2 comes with many new features. In this article we are going to talk about sharding. It is a database scaling technique based on horizontal partitioning of data across multiple oracle databases called sharded databases (SDB). Each shard contains the table with the same columns but a different subset of rows. Sharding can be represented like this
shar1
For DBA: SDB is in fact multiples databases that can be managed collectively or individually
There are 3 methods of sharding System-managed sharding,User-defined sharding and Composite sharding
In this article we are using System-managed sharding where data are automatically distributed across shards using partitioning by consistent hash. It is the most used. We will just demonstrate how it is possible to create shards using oracle. In next articles we will show how we can connect to these shards and how it is possible to add new shards.
What do we need?
Oracle Database 12c Release 2 : linuxx64_12201_database.zip
Oracle Database 12c Release 2 Global Service Manager : linuxx64_12201_gsm.zip

In this demo we use following configuration to create sharded databases on sharddemo2 and sharddemo3.
VM sharddemo1:  catalog
VM sharddemo2: shard
VM sharddemo3: shard

Oracle 12.2 should be installed on all servers. We will not show oracle software installation.
GSM software should be installed on the catalog sharddemo1.

After unzipping the file just launch the installer, just launch the runInstaller
[oracle@sharddemo1 gsm122]$ ./runInstaller

gsm1

gsm2

gsm3

gsm4

gsm5


[root@sharddemo1 oracle122]# /u01/app/oracle/product/12.2.0.1/gsmhome_1/root.sh
Performing root user operation.
The following environment variables are set as:
ORACLE_OWNER= oracle
ORACLE_HOME= /u01/app/oracle/product/12.2.0.1/gsmhome_1
Enter the full pathname of the local bin directory: [/usr/local/bin]:
The contents of "dbhome" have not changed. No need to overwrite.
The contents of "oraenv" have not changed. No need to overwrite.
The contents of "coraenv" have not changed. No need to overwrite.
Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
[root@sharddemo1 oracle122]#

gsm6

The second step is to create the catalog database on sharddemo1. We will name it ORCLCAT (NON-CDB). Some database parameters need to be configured for ORCLCAT. Database creation is not shown here.

SQL> alter system set db_create_file_dest='/u01/app/oracle/oradata/’;
System altered.
SQL> alter system set open_links=16 scope=spfile;
System altered.
SQL> alter system set open_links_per_instance=16 scope=spfile;
System altered.

Oracle 12.2 database comes with a schema  gsmcatuser schema. This schema is used by the shard director while connecting to the shard catalog database. This schema is locked by default, so we have to unlock it.

SQL> alter user gsmcatuser account unlock;
User altered.
SQL> alter user gsmcatuser identified by root;
User altered.

We also have to create the gsm  administrator schema (mygdsadmin in our case) and give him  the required  privileges

SQL> create user mygdsadmin identified by root;
User created.
SQL> grant connect, create session, gsmadmin_role to mygdsadmin;
Grant succeeded.
SQL> grant inherit privileges on user SYS to GSMADMIN_INTERNAL;
Grant succeeded.
The next step is to configure the scheduler on the shard catalog by setting  the remote scheduler’s http port and the agent registration password on the shard catalog database ORCLCAT

SQL> execute dbms_xdb.sethttpport(8080);
PL/SQL procedure successfully completed.
SQL> commit;
Commit complete.
SQL> @?/rdbms/admin/prvtrsch.plb


SQL> exec DBMS_SCHEDULER.SET_AGENT_REGISTRATION_PASS('welcome');
PL/SQL procedure successfully completed.
SQL>

We have now to register sharddemo2 and sharddemo3 agents in the scheduler. The executable schagent in $ORACLE_HOME/bin is used. After registration, agents should be started
Below registration for sharddemo2

[oracle@sharddemo2 ~]$ schagent -start
Scheduler agent started using port 21440
[oracle@sharddemo2 ~]$ echo welcome | schagent -registerdatabase sharddemo1 8080
Agent Registration Password ?
Oracle Scheduler Agent Registration for 12.2.0.1.2 Agent
Agent Registration Successful!
[oracle@sharddemo2 ~]$

After agent registration, corresponding directories for database must be created on sharddemo2 and shardddemo3

[oracle@sharddemo2 ~]$ mkdir /u01/app/oracle/oradata
[oracle@sharddemo2 ~]$ mkdir /u01/app/oracle/fast_recovery_area

Now it’s time to launch the Global Data Services Control Utility (GDSCTL) on sharddemo1. GDSCTL is in $GSM_HOME/bin in our case /u01/app/oracle/product/12.2.0.1/gsmhome_1/bin

[oracle@sharddemo1 ~]$ gdsctl
GDSCTL: Version 12.2.0.1.0 - Production on Thu Mar 02 13:53:50 CET 2017
Copyright (c) 2011, 2016, Oracle. All rights reserved.
Welcome to GDSCTL, type "help" for information.
Warning: current GSM name is not set automatically because gsm.ora contains zero or several GSM entries. Use "set gsm" command to set GSM for the session.
Current GSM is set to GSMORA
GDSCTL>

And to create the shardcatalog

GDSCTL>create shardcatalog -database sharddemo1:1521:ORCLCAT -chunks 12 -user mygdsadmin/root -sdb cust_sdb -region region1
Catalog is created
GDSCTL>

Now let’s create and start the shard director. The listener of the gsm should use a free port. In our case the port is 1571

GDSCTL>add gsm -gsm region1_director -listener 1571 -pwd root -catalog sharddemo1:1521:ORCLCAT -region region1
GSM successfully added


GDSCTL>start gsm -gsm region1_director
GSM is started successfully
GDSCTL>


GDSCTL>status gsm
Alias REGION1_DIRECTOR
Version 12.2.0.1.0
Start Date 02-MAR-2017 14:03:36
Trace Level off
Listener Log File /u01/app/oracle/diag/gsm/sharddemo1/region1_director/alert/log.xml
Listener Trace File /u01/app/oracle/diag/gsm/sharddemo1/region1_director/trace/ora_21814_140615026692480.trc
Endpoint summary (ADDRESS=(HOST=sharddemo1.localdomain)(PORT=1571)(PROTOCOL=tcp))
GSMOCI Version 2.2.1
Mastership Y
Connected to GDS catalog Y
Process Id 21818
Number of reconnections 0
Pending tasks. Total 0
Tasks in process. Total 0
Regional Mastership TRUE
Total messages published 0
Time Zone +01:00
Orphaned Buddy Regions:
None
GDS region region1
GDSCTL>

We also have to set the scheduler agent password to “welcome” in gdsctl

GDSCTL>modify catalog -agent_password welcome
The operation completed successfully
GDSCTL>

The OS credential for the user “oracle” must be defined. We are using the same OS credential for all the shards

GDSCTL>add credential -credential oracle_cred -osaccount oracle -ospassword root
The operation completed successfully
GDSCTL>

Before deploying the shards we have to define metadata for them.

GDSCTL>set gsm -gsm region1_director
GDSCTL>connect mygdsadmin/root
Catalog connection is established
GDSCTL>


GDSCTL>add shardgroup -shardgroup shgrp1 -deploy_as primary -region region1
The operation completed successfully
GDSCTL>


GDSCTL>add invitednode sharddemo2


GDSCTL>create shard -shardgroup shgrp1 -destination sharddemo2 -credential oracle_cred
The operation completed successfully
DB Unique Name: sh1
GDSCTL>


GDSCTL>add invitednode sharddemo3


GDSCTL>create shard -shardgroup shgrp1 -destination sharddemo3 -credential oracle_cred
The operation completed successfully
DB Unique Name: sh21
GDSCTL>

We can then verify the status of our configuration

GDSCTL>config shard
Name Shard Group Status State Region Availability
---- ----------- ------ ----- ------ ------------
sh1 shgrp1 U none region1 -
sh21 shgrp1 U none region1 -

If there is no error it’s time to deploy our shards. Deployment is the last step before creating the schema we will use for System Managed Sharding

GDSCTL>deploy
deploy: examining configuration...
deploy: deploying primary shard 'sh1' ...
deploy: network listener configuration successful at destination 'sharddemo2'
deploy: starting DBCA at destination 'sharddemo2' to create primary shard 'sh1' ...
deploy: deploying primary shard 'sh21' ...
deploy: network listener configuration successful at destination 'sharddemo3'
deploy: starting DBCA at destination 'sharddemo3' to create primary shard 'sh21' ...
deploy: waiting for 2 DBCA primary creation job(s) to complete...
deploy: waiting for 2 DBCA primary creation job(s) to complete...
deploy: waiting for 2 DBCA primary creation job(s) to complete...
deploy: waiting for 2 DBCA primary creation job(s) to complete...
deploy: waiting for 2 DBCA primary creation job(s) to complete...
deploy: waiting for 2 DBCA primary creation job(s) to complete...
deploy: waiting for 2 DBCA primary creation job(s) to complete...
deploy: waiting for 2 DBCA primary creation job(s) to complete...
deploy: waiting for 2 DBCA primary creation job(s) to complete...
deploy: waiting for 2 DBCA primary creation job(s) to complete...
deploy: waiting for 2 DBCA primary creation job(s) to complete...
deploy: waiting for 2 DBCA primary creation job(s) to complete...
deploy: waiting for 2 DBCA primary creation job(s) to complete...
deploy: waiting for 2 DBCA primary creation job(s) to complete...
deploy: waiting for 2 DBCA primary creation job(s) to complete...
deploy: waiting for 2 DBCA primary creation job(s) to complete...
deploy: waiting for 2 DBCA primary creation job(s) to complete...
deploy: waiting for 2 DBCA primary creation job(s) to complete...
deploy: DBCA primary creation job succeeded at destination 'sharddemo3' for shard 'sh21'
deploy: waiting for 1 DBCA primary creation job(s) to complete...
deploy: DBCA primary creation job succeeded at destination 'sharddemo2' for shard 'sh1'
deploy: requesting Data Guard configuration on shards via GSM
deploy: shards configured successfully
The operation completed successfully
GDSCTL>

The command may take some times. Once done running again the config command may return.

GDSCTL>config shard
Name Shard Group Status State Region Availability
---- ----------- ------ ----- ------ ------------
sh1 shgrp1 Ok Deployed region1 ONLINE
sh21 shgrp1 Ok Deployed region1 ONLINE

We should have two instances running on sharddemo2 and sharddemo3: sh1 and sh21.
Now that shards are deployed let’s create in the shardcatalog database ORCLCAT the schema we will use for sharding. Here the user is called shard_user.

SQL> show parameter instance_name
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
instance_name string ORCLCAT
SQL>


SQL>alter session enable shard ddl;
SQL>create user user_shard identified by user_shard;
SQL>grant connect, resource, alter session to user_shard;
SQL>grant execute on dbms_crypto to user_shard;
SQL>grant create table, create procedure, create tablespace, create materialized view to user_shard;
SQL>grant unlimited tablespace to user_shard;
SQL>grant select_catalog_role to user_shard;
SQL>grant all privileges to user_shard;
SQL>grant gsmadmin_role to user_shard;
SQL>grant dba to user_shard;

In a sharding environment, we have two types of tables
Sharded tables : data are distributed in the different shards
Duplicated tables: data are duplicated in the different shards
Let’s create tablespaces for each type of tables

SQL> CREATE TABLESPACE SET TAB_PRIMA_SET using template (datafile size 100m autoextend on next 10M maxsize unlimited extent management local segment space management auto );
Tablespace created.


SQL> CREATE TABLESPACE TAB_PRODUCT datafile size 100m autoextend on next 10M maxsize unlimited extent management local uniform size 1m;
Tablespace created

Now under user_shard schema let’s create Sharded and Duplicated tables in ORCLCAT

CREATE SHARDED TABLE Customers
(
CustId VARCHAR2(60) NOT NULL,
FirstName VARCHAR2(60),
LastName VARCHAR2(60),
Class VARCHAR2(10),
Geo VARCHAR2(8),
CustProfile VARCHAR2(4000),
CONSTRAINT pk_customers PRIMARY KEY (CustId),
CONSTRAINT json_customers CHECK (CustProfile IS JSON)
) TABLESPACE SET TAB_PRIMA_SET PARTITION BY CONSISTENT HASH (CustId) PARTITIONS AUTO;
Table created.


CREATE SHARDED TABLE Orders
(
OrderId INTEGER NOT NULL,
CustId VARCHAR2(60) NOT NULL,
OrderDate TIMESTAMP NOT NULL,
SumTotal NUMBER(19,4),
Status CHAR(4),
constraint pk_orders primary key (CustId, OrderId),
constraint fk_orders_parent foreign key (CustId)
references Customers on delete cascade
) partition by reference (fk_orders_parent);


CREATE DUPLICATED TABLE Products
(
ProductId INTEGER GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
Name VARCHAR2(128),
DescrUri VARCHAR2(128),
LastPrice NUMBER(19,4)
) TABLESPACE TAB_PRODUCT;
Table created.

Some checks can be done on both instances (ORCLCAT, sh1, sh21) to verify that tablespaces, sharded tables, duplcated tables are created for example.

SQL> select name from v$database;
NAME
---------
ORCLCAT

SQL> select TABLESPACE_NAME, BYTES/1024/1024 MB from sys.dba_data_files order by tablespace_name;
TABLESPACE_NAME MB
------------------------------ ----------
SYSAUX 660
SYSTEM 890
TAB_PRIMA_SET 100
TAB_PRODUCT 100
UNDOTBS1 110
USERS 5

on sh1 and sh21

select TABLESPACE_NAME, BYTES/1024/1024 MB from sys.dba_data_files order by tablespace_name;
C001TAB_PRIMA_SET 100
C002TAB_PRIMA_SET 100
C003TAB_PRIMA_SET 100
C004TAB_PRIMA_SET 100
C005TAB_PRIMA_SET 100
C006TAB_PRIMA_SET 100
SYSAUX 660
SYSTEM 890
SYS_SHARD_TS 100
TAB_PRIMA_SET 100
TAB_PRODUCT 100
UNDOTBS1 115
USERS 5

On sharddemo2 (sh1) for example verify that the chunks and chunk tablespaces are created

SQL> select table_name, partition_name, tablespace_name from dba_tab_partitions where tablespace_name like 'C%TAB_PRIMA_SET' order by tablespace_name;
CUSTOMERS CUSTOMERS_P1 C001TAB_PRIMA_SET
ORDERS CUSTOMERS_P1 C001TAB_PRIMA_SET
CUSTOMERS CUSTOMERS_P2 C002TAB_PRIMA_SET
ORDERS CUSTOMERS_P2 C002TAB_PRIMA_SET
CUSTOMERS CUSTOMERS_P3 C003TAB_PRIMA_SET
ORDERS CUSTOMERS_P3 C003TAB_PRIMA_SET
CUSTOMERS CUSTOMERS_P4 C004TAB_PRIMA_SET
ORDERS CUSTOMERS_P4 C004TAB_PRIMA_SET
CUSTOMERS CUSTOMERS_P5 C005TAB_PRIMA_SET
ORDERS CUSTOMERS_P5 C005TAB_PRIMA_SET
CUSTOMERS CUSTOMERS_P6 C006TAB_PRIMA_SET
ORDERS CUSTOMERS_P6 C006TAB_PRIMA_SET

On ORCLCAT

SQL> select table_name from user_tables;
TABLE_NAME
---------------------------------------------------
MLOG$_PRODUCTS
PRODUCTS
CUSTOMERS
ORDERS
RUPD$_PRODUCTS

On sh1

SQL> select table_name from user_tables;
TABLE_NAME
------------------------------------------------------------
PRODUCTS
CUSTOMERS
ORDERS

on sh21

SQL> select table_name from user_tables;
TABLE_NAME
--------------------------------------------------------------
PRODUCTS
CUSTOMERS
ORDERS

Using gdsctl on sharddemo1 we can also see ddl executed by using the show ddl command on the gsm interface

GDSCTL>show ddl
id DDL Text Failed shards
-- -------- -------------
6 grant select_catalog_role to user_shard;
7 grant all privileges to user_shard;
8 grant gsmadmin_role to user_shard;
9 grant dba to user_shard;
10 CREATE TABLESPACE SET TAB_PRIMA_SET u...
11 CREATE TABLESPACE TAB_PRODUCT datafil...
12 CREATE SHARDED TABLE Customers ( Cust...
13 CREATE SHARDED TABLE Orders ( OrderId...
14 create database link "PRODUCTSDBLINK@...
15 CREATE MATERIALIZED VIEW "PRODUCTS" ...

And now our sharding should work. After inserting some data we can see that for duplicated tables whole data are replicated into the different shards
On ORCLCAT for example the number of rows for table products is 9

SQL> select count(*) from products;
COUNT(*)
----------
9

On sh1 as product is a duplucated table, the number of rows should be 9

SQL> select count(*) from products;
COUNT(*)
----------
9

Same for table product in on sh21

SQL> select count(*) from products;
COUNT(*)
----------
9

For sharded tables, we can see that rows are distributed

On ORCLCAT

SQL> select count(*) from customers;
COUNT(*)
----------
14

On sh1

SQL> select count(*) from customers;
COUNT(*)
----------
6

On sh21 number of rows of customers should be 8

SQL> select count(*) from customers;
COUNT(*)
----------
8

Conclusion
In this first part we talked about sharding configuration. We have seen how using Oracle Global Data Services we can create shards. In a second part we will see how to connect to shards and how scalibilty is possible in a shard environment

 

Cet article Sharding with Oracle 12c R2 Part I est apparu en premier sur Blog dbi services.

Speaking at the next SQL Nexus at Copenhagen 2017

Fri, 2017-03-03 06:40

On May 2nd, I will have the chance to speak during the next SQL Nexus event in Copenhagen (1 -3 may) about SQL Server 2016 and availability groups and if I have enough time, you will see what is coming with the SQL Server vNext.

SQL_Nexus2017_1200x627px_linkedIn_post_template_test

This is also a good opportunity to attend to other sessions held by well-known people in the industry like David Klee, Edwin M Sarmiento, Wolfgang Strasser and Uwe Ricken  to name a few ones ..

I’m looking forward to share and learn with the SQL Server community.

Hope to see you there!

 

Cet article Speaking at the next SQL Nexus at Copenhagen 2017 est apparu en premier sur Blog dbi services.

Oracle 12c – Is VKTM always your top process?

Fri, 2017-03-03 05:16

If VKTM is always your top cpu consuming process, then this blog might be something for you. Especially in virtual environments, I have seen often the VKTM process as the top process, even if the VM was idle. So, I am burning CPU without any obvious benefit. So what is the reason for the high CPU consumption? Well … it can a combination of many things like not correctly working NTP, missing VMware Tools, but for and foremost Oracle Bugs. I really don’t know why, but quite a lot of issues have been raised regarding the VKTM process, like the following.

  • Bug 20693049 – 12C VKTM CONSUMING MORE CPU THAN IN 11GR2
  • Bug 20552573 – VKTM PROCESSES FROM ASM AND A CDB DATABASE CONSUME 12% CPU PERMANENTLY.
  • BUG 12883034 – CHANGE THE INTERVAL WHEN VKTM PROCESS WAKES U
  • Bug 20542107 – WARNING: VKTM DETECTED A TIME DRIFT
  • Bug 20138957 – VKTM PROCESS CONSUMING HIGH CPU EVEN AFTER PATCH 18499306
  • Bug 11837095 – “TIME DRIFT DETECTED” APPEARS INTERMITTENTLY IN ALERT LOG, THO’ EVENT 10795 SET.

If you search around in MOS, you probably find even more. Usually VKTM and VKRM issues come together, at least when you are using the resource manager. The VTKM is the Virtual Keeper of Time Process. The VKTM acts as a time publisher for an Oracle instance.  VKTM publishes two sets of time: a wall clock time using a seconds interval and a higher resolution time (which is not wall clock time) for interval measurements.  The VKTM process is a process that is available in ASM instances and RDBMS instances. So if you see issues with VKTM process, it usually popps up on both. VKTM usage is affected mainly by two hidden parameters _timer_precision and _disable_highres_ticks. So, tuning these parameters can bring down VKTM CPU consumption.

The VKRM process is the Virtual Scheduler for Resource Manager Process, and it serves as centralized scheduler for Resource Manager activity. As there is no resource manager on a ASM instance, you will see this process only on RDBMS instances.

Ok .. my test environment is OEL 6.8 with 12.1.0.2 ASM and a 12.1.0.2 database on a virtual guest. To be more precise, it is a Database with PSU 12.1.0.2.161018 and Grid Infrastructure PSU 12.1.0.2.161018. The patch level plays a quite important role. For 11gR2 database, you might need patch 20531619.

So let’s start fixing the ASM VKTM issue first. It warns me all the time, that is has detected a time drift.

$ cat alert_+ASM.log | grep -i "Warning: VKTM detected a time drift."
Warning: VKTM detected a time drift.
Warning: VKTM detected a time drift.
Warning: VKTM detected a time drift.
Warning: VKTM detected a time drift.
Warning: VKTM detected a time drift.
...
...
Warning: VKTM detected a time drift.

Before continuing, it is important that you really don’t have a time drift. On VMware, you might want to consult the following knowledge base article and the time keeping pdf. Both are very good resources. KB Article: 1006427 Timekeeping best practices for Linux guests and http://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/techpaper/Timekeeping-In-VirtualMachines.pdf

Next, check that your ntp is in sync.

$ ntpq -p
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
+aquila.init7.ne 162.23.41.10     2 u   16   64  377   16.375   25.656   3.873
*ntp0.as34288.ne 85.158.25.74     2 u   17   64  377   12.987   27.874   4.045

If your ntp is not in sync, you should stop already here, and correct it. Because in that case, the warning message from Oracle is correct, that VKTM has detected a time drift. Ok. Let’s continue with checking the ASM instance. The 12.1.0.2 ASM defaults regarding VKTM are the following:

Parameter                           Session_Value  Instance_Value Description
----------------------------------- -------------- -------------- --------------------------------------------
_disable_highres_ticks              FALSE          FALSE          disable high-res tick counter
_high_priority_processes            LMS*           LMS*           High Priority Process Name Mask
_highest_priority_processes         VKTM           VKTM           Highest Priority Process Name Mask
_timer_precision                    10             10             VKTM timer precision in milli-sec
_vkrm_schedule_interval             10             10             VKRM scheduling interval
_vktm_assert_thresh                 30             30             soft assert threshold VKTM timer drift

Because I don’t need high resolution ticks on my ASM instance, I am going to disable it, and besides that I am going to disable the excessiv trace by the VKTM process which is done with the 10795 event.

SQL> alter system set "_disable_highres_ticks"=true scope=spfile;

System altered.

SQL> alter system set event="10795 trace name context forever, level 2" scope=spfile;

System altered.

Unfortunately, these changes can’t be done online, and so I have to bounce my ASM instance.

$ srvctl stop asm -f
$ srvctl start asm

My alert.log does not report time drift issues anymore and the VKTM process from my ASM instance  disappeared  from my top process list. As soon as the ASM VKTM process went away, the one from the database popped up. :-)

oracle@oel001:/home/oracle/ [OCM121] top -c
top - 11:17:18 up  1:21,  2 users,  load average: 0.69, 0.77, 0.98
Tasks: 229 total,   2 running, 227 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.7%us,  0.4%sy,  0.0%ni, 98.9%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  10021556k total,  3816024k used,  6205532k free,   209916k buffers
Swap:  6160380k total,        0k used,  6160380k free,   672856k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 5696 oracle    -2   0 1880m  46m  44m S 22.7  0.5   0:23.49 ora_vktm_OCM121

Ok. Let’s fix that one as well. The RDBMS defaults regarding VKTM with 12.1.0.2 are the same as with ASM.

Parameter                           Session_Value  Instance_Value Description
----------------------------------- -------------- -------------- --------------------------------------------
_disable_highres_ticks              FALSE          FALSE          disable high-res tick counter
_high_priority_processes            LMS*           LMS*           High Priority Process Name Mask
_highest_priority_processes         VKTM           VKTM           Highest Priority Process Name Mask
_timer_precision                    10             10             VKTM timer precision in milli-sec
_vkrm_schedule_interval             10             10             VKRM scheduling interval
_vktm_assert_thresh                 30             30             soft assert threshold VKTM timer drift

Without any changes, the tracing for the VKTM and VKRM background processes are enabled, and quite a lot  of information go into these trace files.

Tracing for the VKRM process can be disabled via the following event:

alter system set events '10720 trace name context forever, level 0x10000000';

Tracing for the VKRM process can be disabled via the following event:

alter system set events '10795 trace name context forever, level 2';

Because I don’t need any of those, I am going to disable both in one shot.

SQL> alter system set event='10720 trace name context forever, level 0x10000000','10795 trace name context forever, level 2' comment='Turn off VKRM tracing and turn off VKTM tracing' scope=spfile;

System altered.

And like on the ASM instance, I don’t need the high-resolution ticks here as well.

SQL> alter system set "_disable_highres_ticks"=true scope=spfile;

System altered.

After the restart of the database, the extensive traceing and cpu usage went away.

oracle@oel001:/home/oracle/ [OCM121] srvctl stop database -d OCM121
oracle@oel001:/home/oracle/ [OCM121] srvctl start database -d OCM121

I am not seeing the VKTM process in my top processes anymore. Beforehand, even on an idle system, the VKTM from the ASM and the one from the RDBMS instance have always been at the top.

oracle@oel001:/home/oracle/ [OCM121] top -c
top - 11:29:06 up  1:33,  2 users,  load average: 0.19, 0.69, 0.69
Tasks: 233 total,   2 running, 231 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.8%us,  0.5%sy,  0.0%ni, 98.7%id,  0.1%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  10021556k total,  3839340k used,  6182216k free,   211884k buffers
Swap:  6160380k total,        0k used,  6160380k free,   686380k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 3643 grid      20   0 1559m  84m  39m S  1.3  0.9   0:55.64 /u00/app/grid/12.1.0.2/bin/oraagent.bin
 3660 grid      20   0  329m  28m  22m S  1.0  0.3   0:28.15 /u00/app/grid/12.1.0.2/bin/evmd.bin
 3517 grid      20   0 1507m  69m  48m S  0.7  0.7   0:34.74 /u00/app/grid/12.1.0.2/bin/ohasd.bin reboot
 3738 grid      20   0  262m  26m  21m S  0.7  0.3   0:28.03 /u00/app/grid/12.1.0.2/bin/evmlogger.bin -o /u00/app/grid/12.1
 3757 grid      20   0  791m  32m  23m S  0.7  0.3   0:31.30 /u00/app/grid/12.1.0.2/bin/cssdagent
Conclusion

Especially in virtual environment I have seen often a quite high cpu usage by the VKTM process, so take care that your time keeping via NTP is setup correctly. After NTP is running smoothly, you might want to disable the high-resolution ticks and disable the extensive tracing by the VKTM and VKRM processes. Obviously, this is not a general recommendation. You should test it yourself.

 

Cet article Oracle 12c – Is VKTM always your top process? est apparu en premier sur Blog dbi services.

Oracle 12c – How to Recover lost DataGuard Broker Configuration Files

Tue, 2017-02-28 07:53

If you are using RMAN to back up your database, you are already doing the right thing. However, RMAN does not take care of everything. e.g. it is not backing up the following things, to mention just a few.

  • Oracle Home
  • Grid Infrastructure Home
  • Data Guard broker files
  • Password File
  • SQL*Net file like ldap.ora, sqlnet.ora, tnsnames.ora and listener.ora
  • /etc/oratab
  • OS audit files
  • Wallets
  • /etc/sysctl.conf and limits.conf
  • OLR and OCR
  • Voting Disks
  • ASM Metadata
  • passwd, shadow, group
  • RMAN scripts itself ksh/rcv (some exceptions when using the RMAN catalog)
  • crontab

There are for sure many more, but in this article I would like to take a closer look at the Data Guard broker configuraiton files. The DataGuard Broker Configuration Files are quite important files. The contain entries that describe the state and the properties of the DataGuard configuration like the sites and databases that are part of the configuration, the roles and properties of each of the databases, and the state of each of the elements of the configuration. The broker configuration files are very small, in my case they have only 12K. Don’t expect them to grow very big. They usually stay at this size.

oracle@dbidg01:/u01/app/oracle/admin/DBIT121/pfile/ [DBIT121] ls -lh dr*
-rw-r----- 1 oracle oinstall 12K Feb 28 09:27 dr1DBIT121_SITE1.dat
-rw-r----- 1 oracle oinstall 12K Feb 28 09:34 dr2DBIT121_SITE1.dat

You can have a maximum of two different copies which can be configured via the DG_BROKER_CONFIG_FILEn (where n = 1, 2) parameter. Maybe, maximum is not the correct word in this context, because if you don’t specify the broker file location, Oracle still creates one in the default directory which is OS dependent. On Linux they end up in $ORACLE_HOME/dbs/.

However, not like you might expect, the broker configuration files are not a 1:1 mirror like e.g. redo log members or controlfiles. They are individual copies and they maintain the last known good state of the configuration. You can check it quite easily yourself by editing e.g. the FastStartFailoverThreshold and check the time stamps of the broker files afterwards.

DGMGRL> EDIT CONFIGURATION SET PROPERTY FastStartFailoverThreshold='40';
Property "faststartfailoverthreshold" updated

-- Primary
oracle@dbidg01:/u01/app/oracle/admin/DBIT121/pfile/ [DBIT121] ls -l dr*
-rw-r----- 1 oracle oinstall 12288 Feb 28 09:27 dr1DBIT121_SITE1.dat
-rw-r----- 1 oracle oinstall 12288 Feb 28 09:34 dr2DBIT121_SITE1.dat

-- Standby
oracle@dbidg02:/u01/app/oracle/admin/DBIT121/pfile/ [DBIT121] ls -l dr*
-rw-r----- 1 oracle oinstall 12288 Feb 28 09:34 dr1DBIT121_SITE2.dat
-rw-r----- 1 oracle oinstall 12288 Feb 28 09:27 dr2DBIT121_SITE2.dat

As you can see here, the broker configuration files have two different time stamps. One with the last know good state at 09:27 and one with new state and 09:34. When the broker is started for the first time, it is expected that you see only one configuration file. But don’t panic, the other will be created by the next updates done by the broker.

The broker configuration file is a binary file, however the readable contents of that file can be extracted via the strings command. It gives an idea of some contents of the broker file. e.g. it shows you that the broker config file is brought to you by the fine folks from NEDC, whoever they are. :-) With 12cR1 is looks like the following.

oracle@dbidg01:/u01/app/oracle/admin/DBIT121/pfile/ [DBIT121] file dr2DBIT121_SITE1.dat
dr2DBIT121_SITE1.dat: data

oracle@dbidg01:/u01/app/oracle/admin/DBIT121/pfile/ [DBIT121] strings dr2DBIT121_SITE1.dat
}|{z
DBIT121
DBIT121_SITE1
DBIT121_SITE2
DBIT121_SITE2
Brought to you by the fine folks at NEDC.
TRUETRUEdbidg03ALLDBIT121FALSECONTINUE
DBIT121_SITE1DBIT121_SITE1FALSEFALSEFALSESYNCoptionalDISABLEON1,1948637,931101276,0*AUTO0,0,00,0,0AUTODBIT121_SITE2
dbidg01DBIT121DBIT121(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=dbidg01)(PORT=1521))(CONNECT_DATA=(SERVICE_NA
ME=DBIT121_SITE1_DGMGRL)(INSTANCE_NAME=DBIT121)(SERVER=DEDICATED)))USE_DB_RECOVERY_FILE_DEST%t_%s_%r.dbf4
DBIT121_SITE2DBIT121_SITE2FALSEFALSEFALSESYNCoptionalDISABLEON1,1948637,931101276,0*DBIT121AUTO0,0,00,0,0AUTODBIT121_SITE1
dbidg02DBIT121DBIT121(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=dbidg02)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=DBIT121_SITE2_DGMGRL)(INSTANCE_NAME=DBIT121)(SERVER=DEDICATED)))USE_DB_RECOVERY_FILE_DEST%t_%s_%r.dbf

Things are changing with Oracle 12cR2. The broker config files are still binary files, however the readable content via the strings command is very good now. You can get a lot of very useful information out of it. And even with Oracle 12cR2, it looks like that it is still brought to you by the same fine folks from NEDC. :-) Looks like that the Oracle developers have a sort of humor. :-)

oracle@dbidg01:/u01/app/oracle/admin/DBIT122/pfile/ [DBIT122] file dr2DBIT122_SITE1.dat
dr2DBIT122_SITE1.dat: data
oracle@dbidg01:/u01/app/oracle/admin/DBIT122/pfile/ [DBIT122] strings dr2DBIT122_SITE1.dat
}|{z
DBIT122
DBIT122_SITE1
Brought to you by the fine folks at NEDC.
<?xml version="1.0" encoding="UTF-8"?>
<DRC Version="12.2.0.1.0" Name="DBIT122" CurrentPath="True">
  <PlannedState>OFFLINE</PlannedState>
  <Status>
    <Severity>Success</Severity>
    <Error>0</Error>
  </Status>
  <DefaultState>ONLINE</DefaultState>
  <IntendedState>ONLINE</IntendedState>
  <MIV PropertyID="1">0</MIV>
  <PRIMARY_SITE_ID PropertyID="26">513</PRIMARY_SITE_ID>
  <DRC_UNIQUE_ID PropertyID="4">152596437</DRC_UNIQUE_ID>
  <DRC_UNIQUE_ID_SEQUENCE PropertyID="5">18</DRC_UNIQUE_ID_SEQUENCE>
  <EXT_COND PropertyID="29">7</EXT_COND>
  <OVERALL_PROTECTION_MODE PropertyID="30">2</OVERALL_PROTECTION_MODE>
  <FastStartFailoverMode PropertyID="32">0</FastStartFailoverMode>
  <FSFO_MIV PropertyID="33">11</FSFO_MIV>
  <FastStartFailoverOBID1 PropertyID="119">470173189</FastStartFailoverOBID1>
  <FastStartFailoverOBID2 PropertyID="120">470173183</FastStartFailoverOBID2>
  <FastStartFailoverOBID3 PropertyID="121">470173184</FastStartFailoverOBID3>
  <ObserverVersion1 PropertyID="133">0</ObserverVersion1>
  <Configuration_Name PropertyID="31">DBIT122</Configuration_Name>
  <ObserverName1 PropertyID="129">dbidg03</ObserverName1>
  <ConfigurationWideServiceName PropertyID="132">DBIT122_CFG</ConfigurationWideServiceName>
  <RoleChangeHistory PropertyID="106">
    <RoleChangeRecord>
      <Event>PhysicalFailover</Event>
      <OldPrimary>DBIT122_SITE1</OldPrimary>
      <NewPrimary>DBIT122_SITE2</NewPrimary>
      <Status>0</Status>
      <Timestamp>931098450</Timestamp>
    </RoleChangeRecord>
    <RoleChangeRecord>
      <Event>PhysicalFailover</Event>
      <OldPrimary>DBIT122_SITE2</OldPrimary>
      <NewPrimary>DBIT122_SITE1</NewPrimary>
      <Status>0</Status>
      <Timestamp>931098812</Timestamp>
    </RoleChangeRecord>
    <RoleChangeRecord>
      <Event>PhysicalFailover</Event>
      <OldPrimary>DBIT122_SITE1</OldPrimary>
      <NewPrimary>DBIT122_SITE2</NewPrimary>
      <Status>0</Status>
      <Timestamp>932306689</Timestamp>
    </RoleChangeRecord>
    <RoleChangeRecord>
      <Event>PhysicalSwitchover</Event>
      <OldPrimary>DBIT122_SITE2</OldPrimary>
      <NewPrimary>DBIT122_SITE1</NewPrimary>
      <Status>0</Status>
      <Timestamp>932307856</Timestamp>
    </RoleChangeRecord>
    <RoleChangeRecord>
      <Event>PhysicalSwitchover</Event>
      <OldPrimary>DBIT122_SITE1</OldPrimary>
      <NewPrimary>DBIT122_SITE2</NewPrimary>
      <Status>0</Status>
      <Timestamp>932377455</Timestamp>
    </RoleChangeRecord>
    <RoleChangeRecord>
      <Event>PhysicalSwitchover</Event>
      <OldPrimary>DBIT122_SITE2</OldPrimary>
      <NewPrimary>DBIT122_SITE1</NewPrimary>
      <Status>0</Status>
      <Timestamp>932381717</Timestamp>
    </RoleChangeRecord>
    <RoleChangeRecord>
      <Event>PhysicalSwitchover</Event>
      <OldPrimary>DBIT122_SITE1</OldPrimary>
      <NewPrimary>DBIT122_SITE2</NewPrimary>
      <Status>0</Status>
      <Timestamp>932382294</Timestamp>
    </RoleChangeRecord>
    <RoleChangeRecord>
      <Event>PhysicalSwitchover</Event>
      <OldPrimary>DBIT122_SITE2</OldPrimary>
      <NewPrimary>DBIT122_SITE1</NewPrimary>
      <Status>0</Status>
      <Timestamp>932383387</Timestamp>
    </RoleChangeRecord>
    <RoleChangeRecord>
      <Event>PhysicalSwitchover</Event>
      <OldPrimary>DBIT122_SITE1</OldPrimary>
      <NewPrimary>DBIT122_SITE2</NewPrimary>
      <Status>0</Status>
      <Timestamp>934017954</Timestamp>
    </RoleChangeRecord>
    <RoleChangeRecord>
      <Event>PhysicalSwitchover</Event>
      <OldPrimary>DBIT122_SITE2</OldPrimary>
      <NewPrimary>DBIT122_SITE1</New
Primary>
      <Status>0</Status>
      <Timestamp>934018548</Timestamp>
    </RoleChangeRecord>
  </RoleChangeHistory>
  <Member MemberID="1" Name="DBIT122_SITE1" CurrentPath="True" Enabled="True" MultiInstanced="True">
    <PlannedState>STANDBY</PlannedState>
    <StandbyRole>PhysicalStandby</StandbyRole>
    <Status>
      <Severity>Success</Severity>
      <Error>0</Error>
    </Status>
    <DefaultState>PRIMARY</DefaultState>
    <IntendedState>PRIMARY</IntendedState>
    <ResourceType>Database</ResourceType>
    <CurrentState>PRIMARY</CurrentState>
    <Role>
      <ConditionState>PRIMARY</ConditionState>
      <DefaultState>READ-WRITE-XPTON</DefaultState>
      <IntendedState>READ-WRITE-XPTON</IntendedState>
    </Role>
    <Role>
      <ConditionState>STANDBY</ConditionState>
      <DefaultState>PHYSICAL-APPLY-ON</DefaultState>
      <IntendedState>OFFLINE</IntendedState>
    </Role>
    <DB_Unique_Name PropertyID="23">DBIT122_SITE1</DB_Unique_Name>
    <DGConnectIdentifier PropertyID="6">DBIT122_SITE1</DGConnectIdentifier>
    <DbDomain PropertyID="37"/>
    <ClusterDatabase PropertyID="42">FALSE</ClusterDatabase>
    <DbChangeCritical PropertyID="8">FALSE</DbChangeCritical>
    <DbIsCritical PropertyID="9">FALSE</DbIsCritical>
    <LogXptMode PropertyID="40">SYNC</LogXptMode>
    <IncarnationTable PropertyID="57">6,2568637,932306696,5*5,2514031,931098817,4#4,2513489,931098453,2#2,1396169,929894741,1#1,1,924281211,0#</IncarnationTable>
    <SRLStatus PropertyID="58">0</SRLStatus>
    <ActualApplyInstance PropertyID="7"/>
    <StandbyFileManagement PropertyID="72">AUTO</StandbyFileManagement>
    <ArchiveLagTarget PropertyID="73">0</ArchiveLagTarget>
    <LogArchiveMaxProcesses PropertyID="74">4</LogArchiveMaxProcesses>
    <LogArchiveMinSucceedDest PropertyID="75">1</LogArchiveMinSucceedDest>
    <DataGuardSyncLatency PropertyID="138">0</DataGuardSyncLatency>
    <DbFileNameConvert PropertyID="76"/>
    <LogFileNameConvert PropertyID="77"/>
    <FastStartFailoverTarget PropertyID="38">DBIT122_SITE2</FastStartFailoverTarget>
    <ReinstateContextArray PropertyID="39"/>
    <Instance InstanceID="1" Name="DBIT122" CurrentPath="True" Enabled="True" MultiInstanced="True" DefaultWriteOnce="True">
      <PlannedState>OFFLINE</PlannedState>
      <HostName PropertyID="2" Default="True">dbidg01</HostName>
      <SidName PropertyID="3">DBIT122</SidName>
      <InstanceName PropertyID="36">DBIT122</InstanceName>
      <StaticConnectIdentifier PropertyID="25">(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=dbidg01)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=DBIT122_SITE1_DGMGRL)(UR=A)(INSTANCE_NAME=DBIT122)(SERVER=DEDICATED)))</StaticConnectIdentifier>
      <StandbyArchiveLocation PropertyID="96" Default="True">USE_DB_RECOVERY_FILE_DEST</StandbyArchiveLocation>
      <LogArchiveTrace PropertyID="98">0</LogArchiveTrace>
      <LogArchiveFormat PropertyID="99">%t_%s_%r.dbf</LogArchiveFormat>
    </Instance>
  </Member>
  <Member MemberID="2" Name="DBIT122_SITE2" CurrentPath="True" Enabled="True" MultiInstanced="True">
    <PlannedState>STANDBY</PlannedState>
    <StandbyRole>PhysicalStandby</StandbyRole>
    <Status>
      <Severity>Success</Severity>
      <Error>0</Error>
    </Status>
    <DefaultState>STANDBY</DefaultState>
    <IntendedState>STANDBY</IntendedState>
    <ResourceType>Database</ResourceType>
    <CurrentState>STANDBY</CurrentState>
    <Role>
      <ConditionState>PRIMARY</ConditionState>
      <DefaultState>READ-WRITE-XPTON</DefaultState>
      <IntendedState>OFFLINE</IntendedState>
    </Role>
    <Role>
      <ConditionState>STANDBY</ConditionState>
      <DefaultState>PHYSICAL-APPLY-ON</DefaultState>
      <IntendedState>PHYSICAL-APPLY-ON</IntendedState>
    </Role>
    <DB_Unique_Name PropertyID="23">DBIT122_SITE2</DB_Unique_Name>
    <DGConnectIdentifier PropertyID="6">DBIT122_SITE2</DGConnectIdentifier>
    <DbDomain PropertyID="37"/>
    <ClusterDatabase PropertyID="42">FALSE</ClusterDatabase>
    <DbChangeCritical PropertyID="8">FALSE</DbChangeCritical>
    <DbIsCritical PropertyID="9">FALSE</DbIsC
ritical>
    <LogXptMode PropertyID="40">SYNC</LogXptMode>
    <IncarnationTable PropertyID="57">8,2568637,932306696,4*4,2514031,931098817,3#3,2513489,931098453,2#2,1396169,929894741,1#1,1,924281211,0#</IncarnationTable>
    <SRLStatus PropertyID="58">0</SRLStatus>
    <ActualApplyInstance PropertyID="7">DBIT122</ActualApplyInstance>
    <StandbyFileManagement PropertyID="72">AUTO</StandbyFileManagement>
    <ArchiveLagTarget PropertyID="73">0</ArchiveLagTarget>
    <LogArchiveMaxProcesses PropertyID="74">4</LogArchiveMaxProcesses>
    <LogArchiveMinSucceedDest PropertyID="75">1</LogArchiveMinSucceedDest>
    <DataGuardSyncLatency PropertyID="138">0</DataGuardSyncLatency>
    <DbFileNameConvert PropertyID="76"/>
    <LogFileNameConvert PropertyID="77"/>
    <FastStartFailoverTarget PropertyID="38">DBIT122_SITE1</FastStartFailoverTarget>
    <ReinstateContextArray PropertyID="39"/>
    <Instance InstanceID="1" Name="DBIT122" CurrentPath="True" Enabled="True" MultiInstanced="True" DefaultWriteOnce="True">
      <PlannedState>OFFLINE</PlannedState>
      <HostName PropertyID="2" Default="True">dbidg02</HostName>
      <SidName PropertyID="3">DBIT122</SidName>
      <InstanceName PropertyID="36">DBIT122</InstanceName>
      <StaticConnectIdentifier PropertyID="25">(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=dbidg02)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=DBIT122_SITE2_DGMGRL)(UR=A)(INSTANCE_NAME=DBIT122)(SERVER=DEDICATED)))</StaticConnectIdentifier>
      <StandbyArchiveLocation PropertyID="96" Default="True">USE_DB_RECOVERY_FILE_DEST</StandbyArchiveLocation>
      <LogArchiveTrace PropertyID="98">0</LogArchiveTrace>
      <LogArchiveFormat PropertyID="99">%t_%s_%r.dbf</LogArchiveFormat>
    </Instance>
  </Member>
</DRC>

In 12cR2, Oracle put more information into the broker files and that’s why they are getting bigger. With a standard config they have now 16k instead of 12k like they had before with 12cR1. Ok .. it looks like I am drifting away. Let’s get back to the original question, how do I recover the broker configuration files, in case they get lost? Like always … it depends. You could lose the old copy on the standby, you could lose the new copy on the standby, or you could lose both copies on the standby, and you could lose them while the standby is up and running or while it is shutdown. The same applies to the primary. It might get even more complicated in case you have a far sync database in between, or more standby’s.

Not making it too complex, in the end, we might end up with 6 different recovery scenarios.

1.) We lose the old copy of the broker config file on either the primary or the standby
2.) We lose both copies of the broker config file on the standby while it is up and running
3.) We lose both copies of the broker config file on the standby while it is shut down
4.) We lose both copies of the broker config file on the primary while it is up and running
5.) We lose both copies of the broker config file on the primary while it is shut down
6.) We lose both copies of the broker contfig files on the primary and the standby (quite unlikely to happen)

My demos are built on a simple primary/standby setup with 12cR1. The primary is on host dbidg01 and the standby is currently on host dbidg02.

DGMGRL> show configuration;

Configuration - DBIT121

  Protection Mode: MaxAvailability
  Members:
  DBIT121_SITE1 - Primary database
    DBIT121_SITE2 - Physical standby database

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS   (status updated 2 seconds ago)

oracle@dbidg01:/u01/app/oracle/admin/DBIT121/pfile/ [DBIT121] ls -lh dr*
-rw-r----- 1 oracle oinstall 12K Feb 28 11:03 dr1DBIT121_SITE1.dat
-rw-r----- 1 oracle oinstall 12K Feb 28 11:04 dr2DBIT121_SITE1.dat

oracle@dbidg02:/u01/app/oracle/admin/DBIT121/pfile/ [DBIT121] ls -lh dr*
-rw-r----- 1 oracle oinstall 12K Feb 28 11:04 dr1DBIT121_SITE2.dat
-rw-r----- 1 oracle oinstall 12K Feb 28 11:03 dr2DBIT121_SITE2.dat
Scenario 1.) We lose the old copy of the broker config file on either the primary or the standby

There is absolutely not need to panic in this case. I am deleting the old copies on the primary and standby at the same time.

-- primary
oracle@dbidg01:/u01/app/oracle/admin/DBIT121/pfile/ [DBIT121] rm dr1DBIT121_SITE1.dat
oracle@dbidg01:/u01/app/oracle/admin/DBIT121/pfile/ [DBIT121]

-- standby
oracle@dbidg02:/u01/app/oracle/admin/DBIT121/pfile/ [DBIT121] rm dr2DBIT121_SITE2.dat
oracle@dbidg02:/u01/app/oracle/admin/DBIT121/pfile/ [DBIT121]

The broker does not care at all.

DGMGRL> show configuration;

Configuration - DBIT121

  Protection Mode: MaxAvailability
  Members:
  DBIT121_SITE1 - Primary database
    DBIT121_SITE2 - Physical standby database

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS   (status updated 57 seconds ago)

The second copy will be created automatically as soon as I am changing a property. e.g. I am setting the FastStartFailoverThreshold to the same value as it had before.

DGMGRL> EDIT CONFIGURATION SET PROPERTY FastStartFailoverThreshold='40';
Property "faststartfailoverthreshold" updated

The broker log has no information about the new created file and also the alert.log does not say anything, but automagically we end up with two broker config files after the update was done. So .. nothing to do here. Oracle handles this itself.

oracle@dbidg01:/u01/app/oracle/diag/rdbms/dbit121_site1/DBIT121/trace/ [DBIT121] tail -20f drcDBIT121.log
...
02/28/2017 11:10:25
EDIT CONFIGURATION SET PROPERTY faststartfailoverthreshold = 40
FSFO threshold value set to 40 seconds

oracle@dbidg01:/u01/app/oracle/admin/DBIT121/pfile/ [DBIT121] ls -l dr*
-rw-r----- 1 oracle oinstall 12288 Feb 28 11:10 dr1DBIT121_SITE1.dat
-rw-r----- 1 oracle oinstall 12288 Feb 28 11:04 dr2DBIT121_SITE1.dat

oracle@dbidg02:/u01/app/oracle/admin/DBIT121/pfile/ [DBIT121] ls -lh dr*
-rw-r----- 1 oracle oinstall 12K Feb 28 11:04 dr1DBIT121_SITE2.dat
-rw-r----- 1 oracle oinstall 12K Feb 28 11:10 dr2DBIT121_SITE2.dat

 

2.) We lose both copies of the broker config file on the standby while it is up and running

But what happens if I lose both copies on the standby while it is up and running.

oracle@dbidg02:/u01/app/oracle/admin/DBIT121/pfile/ [DBIT121] rm dr*.dat
oracle@dbidg02:/u01/app/oracle/admin/DBIT121/pfile/ [DBIT121] ls -lh dr*
ls: cannot access dr*: No such file or directory

I am logging in into the standby database and check the configuration. The broker does not care at all if the files are there or not. It seems like nothing has happend.

oracle@dbidg02:/home/oracle/ [DBIT121] dgmgrl -debug -xml
DGMGRL for Linux: Version 12.1.0.2.0 - 64bit Production

Copyright (c) 2000, 2013, Oracle. All rights reserved.

Welcome to DGMGRL, type "help" for information.
DGMGRL> connect sys/manager@DBIT121_SITE2
[W000 02/28 11:17:00.64] Connecting to database using DBIT121_SITE2.
[W000 02/28 11:17:01.72] Checking broker version [BEGIN :version := dbms_drs.dg_broker_info('VERSION'); END;].
[W000 02/28 11:17:01.72] Oracle database version is '12.1.0.2.0'
Connected as SYSDBA.
DGMGRL> show configuration;

Configuration - DBIT121

  Protection Mode: MaxAvailability
  Members:
  DBIT121_SITE1 - Primary database
    DBIT121_SITE2 - Physical standby database

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS   (status updated 8 seconds ago)

The broker is reading the information from memory, maybe because of performance reasons. The current state is also reflected in the x$drc.

SQL> desc x$drc
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 ADDR                                               RAW(8)
 INDX                                               NUMBER
 INST_ID                                            NUMBER
 CON_ID                                             NUMBER
 OBJECT_ID                                          NUMBER
 ATTRIBUTE                                          VARCHAR2(30)
 VALUE                                              VARCHAR2(512)
 PARENT_ID                                          VARCHAR2(15)
 STATUS                                             VARCHAR2(30)
 MESSAGE                                            VARCHAR2(256)
 ERRNUM                                             NUMBER
 VALUE_RAW                                          RAW(512)
 ERRTIME                                            NUMBER

So … how do we get the broker files back?  By simply editing any property. You can choose any property you want.

oracle@dbidg02:/home/oracle/ [DBIT121] dgmgrl -debug -xml
DGMGRL for Linux: Version 12.1.0.2.0 - 64bit Production

Copyright (c) 2000, 2013, Oracle. All rights reserved.

Welcome to DGMGRL, type "help" for information.
DGMGRL> connect sys/manager@DBIT121_SITE2
[W000 02/28 11:27:58.23] Connecting to database using DBIT121_SITE2.
[W000 02/28 11:27:59.33] Checking broker version [BEGIN :version := dbms_drs.dg_broker_info('VERSION'); END;].
[W000 02/28 11:27:59.33] Oracle database version is '12.1.0.2.0'
Connected as SYSDBA.
DGMGRL> EDIT CONFIGURATION SET PROPERTY FastStartFailoverThreshold='40';
<DO_CONFIGURE_DRC version="12.1"><EDIT_DRC><PROPERTY name="faststartfailoverthreshold" value="40" context="0"/></EDIT_DRC></DO_CONFIGURE_DRC>
<RESULT ><MESSAGE ><SUCCESS ></SUCCESS></MESSAGE><MESSAGE ><SUCCESS ></SUCCESS></MESSAGE></RESULT>
<RESULT ><MESSAGE ><SUCCESS ></SUCCESS></MESSAGE><MESSAGE ><SUCCESS ></SUCCESS></MESSAGE></RESULT>
Property "faststartfailoverthreshold" updated

As soon as you have edited your configuration, a new broker file appears on the standby.

oracle@dbidg02:/u01/app/oracle/admin/DBIT121/pfile/ [DBIT121] ls -lh dr*
-rw-r----- 1 oracle oinstall 12K Feb 28 11:28 dr1DBIT121_SITE2.dat

If you run the same command again, the second one appears as well.

oracle@dbidg02:/u01/app/oracle/admin/DBIT121/pfile/ [DBIT121] ls -lh dr*
-rw-r----- 1 oracle oinstall 12K Feb 28 11:28 dr1DBIT121_SITE2.dat
-rw-r----- 1 oracle oinstall 12K Feb 28 11:30 dr2DBIT121_SITE2.dat

 

3.) We lose both copies of the broker config file on the standby while it is shut down

What happens if we lose both copies while the standby is shut down. Does the standby come up correctly? Is there something in the drc.log or alert.log?

SQL> shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.

oracle@dbidg02:/u01/app/oracle/admin/DBIT121/pfile/ [DBIT121] rm dr*.dat
oracle@dbidg02:/u01/app/oracle/admin/DBIT121/pfile/ [DBIT121]

SQL> startup mount
ORACLE instance started.

Total System Global Area 1325400064 bytes
Fixed Size                  2924112 bytes
Variable Size             352321968 bytes
Database Buffers          956301312 bytes
Redo Buffers               13852672 bytes
Database mounted.

The standby comes up and no entries in the alert.log, however the drc.log shows that the broker files are missing.

02/28/2017 11:35:26
>> Starting Data Guard Broker bootstrap <<
Broker Configuration File Locations:
      dg_broker_config_file1 = "/u01/app/oracle/admin/DBIT121/pfile/dr1DBIT121_SITE2.dat"
      dg_broker_config_file2 = "/u01/app/oracle/admin/DBIT121/pfile/dr2DBIT121_SITE2.dat"
2017-02-28 11:35:26.313                      DMON: Attach state object
2017-02-28 11:35:26.314                      DMON: cannot open configuration file "/u01/app/oracle/admin/DBIT121/pfile/dr1DBIT121_SITE2.dat", retrying
2017-02-28 11:35:27.334                      DMON: cannot open configuration file "/u01/app/oracle/admin/DBIT121/pfile/dr1DBIT121_SITE2.dat"
2017-02-28 11:35:27.334                        ORA-27037: unable to obtain file status
2017-02-28 11:35:27.335                        Linux-x86_64 Error: 2: No such file or directory
2017-02-28 11:35:27.335                        Additional information: 3
2017-02-28 11:35:27.335                      DMON: Error opening "/u01/app/oracle/admin/DBIT121/pfile/dr1DBIT121_SITE2.dat", error = ORA-16572
2017-02-28 11:35:27.335                      DMON: Establishing "/u01/app/oracle/admin/DBIT121/pfile/dr2DBIT121_SITE2.dat" as the more current file
2017-02-28 11:35:27.335                      DMON: cannot open configuration file "/u01/app/oracle/admin/DBIT121/pfile/dr2DBIT121_SITE2.dat", retrying
2017-02-28 11:35:28.355                      DMON: cannot open configuration file "/u01/app/oracle/admin/DBIT121/pfile/dr2DBIT121_SITE2.dat"
...
2017-02-28 11:35:42.893                      Configuration does not exist, Data Guard broker ready
2017-02-28 11:35:42.894 7fffffff           0 DMON: Entered rfm_release_chief_lock() for CTL_BOOTSTRAP


DGMGRL> connect sys/manager@DBIT121_SITE2
[W000 02/28 11:38:50.58] Connecting to database using DBIT121_SITE2.
[W000 02/28 11:38:51.67] Checking broker version [BEGIN :version := dbms_drs.dg_broker_info('VERSION'); END;].
[W000 02/28 11:38:51.68] Oracle database version is '12.1.0.2.0'
Connected as SYSDBA.
DGMGRL> show configuration;
ORA-16532: Oracle Data Guard broker configuration does not exist

Configuration details cannot be determined by DGMGRL


SQL> select attribute, value from x$drc;
select attribute, value from x$drc
                             *
ERROR at line 1:
ORA-16532: Oracle Data Guard broker configuration does not exist

If we logon to the primary, we can see that the standby DBIT121_SITE2 was disabled.

oracle@dbidg01:/u01/app/oracle/admin/DBIT121/pfile/ [DBIT121] dgmgrl -debug -xml
DGMGRL for Linux: Version 12.1.0.2.0 - 64bit Production

Copyright (c) 2000, 2013, Oracle. All rights reserved.

Welcome to DGMGRL, type "help" for information.
DGMGRL> connect sys/manager@DBIT121_SITE1
[W000 02/28 11:45:40.56] Connecting to database using DBIT121_SITE1.
[W000 02/28 11:45:41.68] Checking broker version [BEGIN :version := dbms_drs.dg_broker_info('VERSION'); END;].
[W000 02/28 11:45:41.68] Oracle database version is '12.1.0.2.0'
Connected as SYSDBA.
DGMGRL> show configuration;

Configuration - DBIT121

  Protection Mode: MaxAvailability
  Members:
  DBIT121_SITE1 - Primary database
    Warning: ORA-16629: database reports a different protection level from the protection mode

    DBIT121_SITE2 - Physical standby database (disabled)

Fast-Start Failover: DISABLED

Configuration Status:
WARNING   (status updated 7 seconds ago)

However, after we enable the standby, we can see in the drc.log that the standby receives the metadata from the primary and creates the broker config file.

DGMGRL> enable database 'DBIT121_SITE2';
<DO_CONTROL version="12.1"><DO_COMMAND type="Enable" object_id="33554432"/></DO_CONTROL>
<RESULT ><MESSAGE ><SUCCESS ></SUCCESS></MESSAGE><MESSAGE ><SUCCESS ></SUCCESS></MESSAGE></RESULT>
Enabled.
DGMGRL>


drc.log

2017-02-28 11:46:03.352                      DRCX: Start receiving metadata file: "/u01/app/oracle/admin/DBIT121/pfile/dr1DBIT121_SITE2.dat"
2017-02-28 11:46:03.356                      DRCX: Receiving block #1 (containing Seq.MIV = 2.20), 2 blocks
2017-02-28 11:46:03.358                      DRCX: End receiving metadata file: opcode CTL_ENABLE (1.1.631286030)
2017-02-28 11:46:03.360                      DMON: Entered rfm_get_chief_lock() for CTL_ENABLE, reason 1
2017-02-28 11:46:03.360 02001000   631286030 DMON: start task execution: for metadata resynchronization
2017-02-28 11:46:03.360 02001000   631286030 DMON: status from posting standby instances for RESYNCH = ORA-00000
2017-02-28 11:46:03.360                      INSV: Received message for inter-instance publication
2017-02-28 11:46:03.361                            req ID 1.1.631286030, opcode CTL_ENABLE, phase RESYNCH, flags 8005
2017-02-28 11:46:03.361 02001000   631286030 DMON: Metadata available (1.1.631286030), loading from "/u01/app/oracle/admin/DBIT121/pfile/dr1DBIT121_SITE2.dat"
2017-02-28 11:46:03.361 02001000   631286030       Opcode = CTL_ENABLE, Chief Instance I_ID = 1
2017-02-28 11:46:03.364                      DMON Registering service DBIT121_SITE2_DGB with listener(s)
2017-02-28 11:46:03.364                      DMON: Executing SQL [ALTER SYSTEM REGISTER]
2017-02-28 11:46:03.365                      SQL [ALTER SYSTEM REGISTER] Executed successfully
02/28/2017 11:46:06
Creating process RSM0


oracle@dbidg02:/u01/app/oracle/admin/DBIT121/pfile/ [DBIT121] ls -lh dr*
-rw-r----- 1 oracle oinstall 12K Feb 28 11:46 dr1DBIT121_SITE2.dat

Again, if you do any property change again, the second broker file will be created.

4.) We lose both copies of the broker config file on the primary while it is up and running

There is not much difference with the scenario we have seen with the standby. The broker just reads from memory and as soon as any update to the config is done, the broker file is created again. There will be no entries in the drc.log or the alert.log. The broker file is just silently recreated.

oracle@dbidg01:/u01/app/oracle/admin/DBIT121/pfile/ [DBIT121] rm dr*
oracle@dbidg01:/u01/app/oracle/admin/DBIT121/pfile/ [DBIT121] ls -lh dr*
ls: cannot access dr*: No such file or directory

oracle@dbidg01:/u01/app/oracle/admin/DBIT121/pfile/ [DBIT121] dgmgrl -debug -xml
DGMGRL for Linux: Version 12.1.0.2.0 - 64bit Production

Copyright (c) 2000, 2013, Oracle. All rights reserved.

Welcome to DGMGRL, type "help" for information.
DGMGRL> connect sys/manager
[W000 02/28 12:28:45.07] Connecting to database using .
[W000 02/28 12:28:45.10] Checking broker version [BEGIN :version := dbms_drs.dg_broker_info('VERSION'); END;].
[W000 02/28 12:28:45.10] Oracle database version is '12.1.0.2.0'
Connected as SYSDG.
DGMGRL> show configuration;

Configuration - DBIT121

  Protection Mode: MaxAvailability
  Members:
  DBIT121_SITE1 - Primary database
    DBIT121_SITE2 - Physical standby database

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS   (status updated 9 seconds ago)


DGMGRL> EDIT CONFIGURATION SET PROPERTY FastStartFailoverThreshold='40';
<DO_CONFIGURE_DRC version="12.1"><EDIT_DRC><PROPERTY name="faststartfailoverthreshold" value="40" context="0"/></EDIT_DRC></DO_CONFIGURE_DRC>
<RESULT ><MESSAGE ><SUCCESS ></SUCCESS></MESSAGE><MESSAGE ><SUCCESS ></SUCCESS></MESSAGE></RESULT>
<RESULT ><MESSAGE ><SUCCESS ></SUCCESS></MESSAGE><MESSAGE ><SUCCESS ></SUCCESS></MESSAGE></RESULT>
Property "faststartfailoverthreshold" updated


oracle@dbidg01:/u01/app/oracle/admin/DBIT121/pfile/ [DBIT121] ls -lh dr*
-rw-r----- 1 oracle oinstall 12K Feb 28 12:30 dr2DBIT121_SITE1.dat

 

5.) We lose both copies of the broker config file on the primary while it is shut down

Let’s do the whole thing again while the primary DB is shutdown.

SQL> shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL>

oracle@dbidg01:/u01/app/oracle/admin/DBIT121/pfile/ [DBIT121] rm dr*
oracle@dbidg01:/u01/app/oracle/admin/DBIT121/pfile/ [DBIT121] ls -lh dr*
ls: cannot access dr*: No such file or directory

In the drc.log log file you can immediately see that there is something wrong.

>> Starting Data Guard Broker bootstrap <<
Broker Configuration File Locations:
      dg_broker_config_file1 = "/u01/app/oracle/admin/DBIT121/pfile/dr1DBIT121_SITE1.dat"
      dg_broker_config_file2 = "/u01/app/oracle/admin/DBIT121/pfile/dr2DBIT121_SITE1.dat"
2017-02-28 12:34:45.866                      DMON: Attach state object
2017-02-28 12:34:45.866                      DMON: cannot open configuration file "/u01/app/oracle/admin/DBIT121/pfile/dr1DBIT121_SITE1.dat", retrying
2017-02-28 12:34:46.876                      DMON: cannot open configuration file "/u01/app/oracle/admin/DBIT121/pfile/dr1DBIT121_SITE1.dat"
2017-02-28 12:34:46.880                        ORA-27037: unable to obtain file status
2017-02-28 12:34:46.881                        Linux-x86_64 Error: 2: No such file or directory
2017-02-28 12:34:46.881                        Additional information: 3
2017-02-28 12:34:46.881                      DMON: Error opening "/u01/app/oracle/admin/DBIT121/pfile/dr1DBIT121_SITE1.dat", error = ORA-16572
2017-02-28 12:34:46.881                      DMON: Establishing "/u01/app/oracle/admin/DBIT121/pfile/dr2DBIT121_SITE1.dat" as the more current file
2017-02-28 12:34:46.882                      DMON: cannot open configuration file "/u01/app/oracle/admin/DBIT121/pfile/dr2DBIT121_SITE1.dat", retrying
2017-02-28 12:34:47.899                      DMON: cannot open configuration file "/u01/app/oracle/admin/DBIT121/pfile/dr2DBIT121_SITE1.dat"
...
2017-02-28 12:35:02.058 7fffffff           0 DMON: Entered rfm_release_chief_lock() for CTL_BOOTSTRAP
2017-02-28 12:35:02.424                      Fore: Continuing with primary evaluation, rfmsp.drc_status_rfmp = ORA-16532
2017-02-28 12:35:03.507                      Fore: Initiating post-open tasks
2017-02-28 12:35:09.192                      DMON: Initiating post-open tasks
2017-02-28 12:35:22.242 00000000  1934847279 DMON: GET_FSFO will be retired
2017-02-28 12:35:22.242 00000000  1934847279       severity = ORA-16501, status = ORA-16532
2017-02-28 12:35:22.242 00000000  1934847279 DMON: GET_FSFO operation completed
2017-02-28 12:35:52.250 00000000  1934847280 DMON: GET_FSFO will be retired
2017-02-28 12:35:52.250 00000000  1934847280       severity = ORA-16501, status = ORA-16532
2017-02-28 12:35:52.251 00000000  1934847280 DMON: GET_FSFO operation completed

If you take a look at the alert.log, everything is fine.

oracle@dbidg01:/u01/app/oracle/admin/DBIT121/pfile/ [DBIT121] dgmgrl -debug -xml
DGMGRL for Linux: Version 12.1.0.2.0 - 64bit Production

Copyright (c) 2000, 2013, Oracle. All rights reserved.

Welcome to DGMGRL, type "help" for information.
DGMGRL> connect sys/manager@DBIT121_SITE1
[W000 02/28 12:39:01.18] Connecting to database using DBIT121_SITE1.
[W000 02/28 12:39:02.28] Checking broker version [BEGIN :version := dbms_drs.dg_broker_info('VERSION'); END;].
[W000 02/28 12:39:02.28] Oracle database version is '12.1.0.2.0'
Connected as SYSDBA.
DGMGRL> show configuration;
ORA-16532: Oracle Data Guard broker configuration does not exist

Configuration details cannot be determined by DGMGRL

But if connected to the standby, you will see a clear error message: ORA-16532: Oracle Data Guard broker configuration does not exist.

DGMGRL> connect sys/manager@DBIT121_SITE2
[W000 02/28 12:39:18.81] Connecting to database using DBIT121_SITE2.
[W000 02/28 12:39:19.90] Checking broker version [BEGIN :version := dbms_drs.dg_broker_info('VERSION'); END;].
[W000 02/28 12:39:19.90] Oracle database version is '12.1.0.2.0'
Connected as SYSDBA.
DGMGRL> show configuration;

Configuration - DBIT121

  Protection Mode: MaxAvailability
  Members:
  DBIT121_SITE1 - Primary database
    Error: ORA-16532: Oracle Data Guard broker configuration does not exist

    DBIT121_SITE2 - Physical standby database

Fast-Start Failover: DISABLED

Configuration Status:
ERROR   (status updated 0 seconds ago)

Taking a close look at the error message, it does not give you a hint how to correct the issue.

oracle@dbidg01:/u01/app/oracle/admin/DBIT121/pfile/ [DBIT121] oerr ora 16532
16532, 00000, "Oracle Data Guard broker configuration does not exist"
// *Cause:  A broker operation was requested that required a broker
//          configuration to already be created.
// *Action: Create a Data Guard broker configuration prior to performing
//          other broker operations. If only one instance of a RAC
//          database is reporting this error, ensure that the
//          DG_BROKER_CONFIG_FILE[1|2] initialization parameters are
//          set to file locations that are shared by all instances of
//          the RAC database.

Let’s try the same trick, as we have done on the standby by simply enabling the database again.

DGMGRL> enable database 'DBIT121_SITE1';
<DO_CONTROL version="12.1"><DO_COMMAND type="Enable" object_id="16777216"/></DO_CONTROL>
<RESULT ><MESSAGE ><FAILURE  error_num="16532" error_prefix="ORA"><ERROR_TEXT >ORA-16532: Oracle Data Guard broker configuration does not exist
</ERROR_TEXT></FAILURE></MESSAGE><MESSAGE ><FAILURE  error_num="16625" error_prefix="ORA" error_tag1="DBIT121_SITE1"><ERROR_TEXT >ORA-16625: cannot reach database &quot;DBIT121_SITE1&quot;
</ERROR_TEXT></FAILURE></MESSAGE></RESULT>
<RESULT ><MESSAGE ><FAILURE  error_num="16532" error_prefix="ORA"><ERROR_TEXT >ORA-16532: Oracle Data Guard broker configuration does not exist
</ERROR_TEXT></FAILURE></MESSAGE><MESSAGE ><FAILURE  error_num="16625" error_prefix="ORA" error_tag1="DBIT121_SITE1"><ERROR_TEXT >ORA-16625: cannot reach database &quot;DBIT121_SITE1&quot;
</ERROR_TEXT></FAILURE></MESSAGE></RESULT>
Failed.

Hmmmm … does not look good. To recovery from that situation, we need to figure out the latest broker file version on the standby, which is dr2DBIT121_SITE2.dat is my case, and copy it over to the primary. Before doing that, we need to stop the broker on the primary.

SQL> alter system set dg_broker_start=false;

System altered.

Now we can copy the latest version over.

-- standby
oracle@dbidg02:/u01/app/oracle/admin/DBIT121/pfile/ [DBIT121] ls -rlth dr*
-rw-r----- 1 oracle oinstall 12K Feb 28 12:30 dr1DBIT121_SITE2.dat
-rw-r----- 1 oracle oinstall 12K Feb 28 12:33 dr2DBIT121_SITE2.dat

oracle@dbidg02:/u01/app/oracle/admin/DBIT121/pfile/ [DBIT121] scp -p dr2DBIT121_SITE2.dat oracle@dbidg01:/u01/app/oracle/admin/DBIT121/pfile/dr1DBIT121_SITE1.dat
dr2DBIT121_SITE2.dat                                                                          100%   12KB  12.0KB/s   00:00

And optionally create the second broker file on the primary.

-- primary
oracle@dbidg01:/u01/app/oracle/admin/DBIT121/pfile/ [DBIT121] cp -p dr1DBIT121_SITE1.dat dr2DBIT121_SITE1.dat
oracle@dbidg01:/u01/app/oracle/admin/DBIT121/pfile/ [DBIT121]

Last but not least, we need to enable the broker again.

SQL> alter system set dg_broker_start=true;

System altered.

DGMGRL> connect sys/manager@DBIT121_SITE1
Connected as SYSDBA.
DGMGRL> show configuration;

Configuration - DBIT121

  Protection Mode: MaxAvailability
  Members:
  DBIT121_SITE1 - Primary database
    DBIT121_SITE2 - Physical standby database

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS   (status updated 7 seconds ago)
6.) We lose both copies of the broker config files on the primary and the standby (quite unlikely to happen)

This scenario is quite unlikely to happen, but if it happens it is good to have your Data Guard configuration as a script in place.

-- shutdown primary
SQL> shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.

-- shutdown standby
SQL> shutdown immediate
ORA-01109: database not open
Database dismounted.
ORACLE instance shut down.

Remove the broker files on the primary and the standby

-- primary
oracle@dbidg01:/u01/app/oracle/admin/DBIT121/pfile/ [DBIT121] rm dr*
oracle@dbidg01:/u01/app/oracle/admin/DBIT121/pfile/ [DBIT121]

-- standby
oracle@dbidg02:/u01/app/oracle/admin/DBIT121/pfile/ [DBIT121] rm dr*
oracle@dbidg02:/u01/app/oracle/admin/DBIT121/pfile/ [DBIT121]

Now, after we have lost everything, meaning all broker config files, the only chance that we have is to create it from scratch. It is quite cool, if you have the configuration is place, because in a blink of an eye you recreate the whole configuration.

oracle@dbidg01:/u01/app/oracle/admin/DBIT121/create/ [DBIT121] cat broker.cfg
CONNECT SYS/manager@DBIT121_SITE1
REMOVE CONFIGURATION;
sql "alter system set log_archive_dest_2=''";
CREATE CONFIGURATION 'DBIT121' AS
PRIMARY DATABASE IS 'DBIT121_SITE1'
CONNECT IDENTIFIER IS 'DBIT121_SITE1';
ADD DATABASE 'DBIT121_SITE2' AS
CONNECT IDENTIFIER IS 'DBIT121_SITE2';
EDIT DATABASE 'DBIT121_SITE1' SET PROPERTY StandbyFileManagement='AUTO';
EDIT DATABASE 'DBIT121_SITE2' SET PROPERTY StandbyFileManagement='AUTO';
EDIT DATABASE 'DBIT121_SITE1' SET PROPERTY LogXptMode='SYNC';
EDIT DATABASE 'DBIT121_SITE2' SET PROPERTY LogXptMode='SYNC';
ENABLE CONFIGURATION;
EDIT CONFIGURATION SET PROTECTION MODE AS MaxAvailability;
EDIT CONFIGURATION SET PROPERTY FastStartFailoverThreshold='40';
EDIT DATABASE 'DBIT121_SITE1' SET PROPERTY FastStartFailoverTarget='DBIT121_SITE2';
EDIT DATABASE 'DBIT121_SITE2' SET PROPERTY FastStartFailoverTarget='DBIT121_SITE1';
sql "alter system archive log current";
sql "alter system register";
SHOW CONFIGURATION;

oracle@dbidg01:/u01/app/oracle/admin/DBIT121/create/ [DBIT121] dgmgrl -debug -xml < broker.cfg > broker.log 2>&1
oracle@dbidg01:/u01/app/oracle/admin/DBIT121/create/ [DBIT121]

DGMGRL> show configuration;

Configuration - DBIT121

  Protection Mode: MaxAvailability
  Members:
  DBIT121_SITE1 - Primary database
    DBIT121_SITE2 - Physical standby database

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS   (status updated 2 seconds ago)
Conclusion

Loosing broker configuration files is not the end of the world. As you have seen in the blog, it makes quite a huge difference if you loose the broker file while the DB is up and running or it is shut down. In case you haven’t lost all of them you can recover them in most of the situations. However, the alert.log is not a good candidate to check if the broker files are missing. So better scan the alert.log and the drc.log with your preferred monitoring tool to get the whole picture. Besides that, it is good practice to adjust your broker.cfg file after you have done some changes because it makes it very easy to rebuild the whole broker config from scratch.

 

Cet article Oracle 12c – How to Recover lost DataGuard Broker Configuration Files est apparu en premier sur Blog dbi services.

Pages