Yann Neuhaus

Subscribe to Yann Neuhaus feed
dbi services technical blog
Updated: 15 hours 40 min ago

SQL Server Management Studio 17.4: Vulnerability Assessment

Thu, 2018-02-15 10:01

SQL Server Management Studio is a well know integrated environment used to manage SQL Server infrastructure.
This new version 17.4 can support SQL Server from 2008 up to 2017. It enhances existing features like Showplan, XE Profiler (complete list here) but also add an interesting one which is the Vulnerability Assessment.

Vulnerability Assessment will scan a database in order to help you to track security holes and deviations. Rules to define those deviations are based on Microsoft SQL Server best practices.
Let’s have a look to this new feature.

Once SSMS 17.4 installed, just choose the database you want to scan, right click on the database and select Task, Vulnerability Assessment and Scan For Vulnerabilities…:

VAss1

Select where you want to save the scan report and click OK:

VAss2

Once you clicked OK the scan is starting:

VAss3

At the end of the scan a Vulnerability Assessment Results is displayed:

VAss4

The report is displayed in a Management studio pane with the number of checks that have been run, how many issues have been found with different level of risk from Low to High and provide also some links about SQL Server security best practices.
Review all failed checks to validate that there are really security issues for your environment and go through results.
For each failed issue you will have a description of the issue, the impact, also the rule query applied and a possible remediation script:

VAss5

There is also a possibility to accept results even if there are considered as Potential Risk as a baseline. This will validate results that match the baseline.

VAss6

Once issues are solved or Baseline settled, the Vulnerability Assessment can be run again to see the result of the performed actions:

VAss7

This new feature integrated in Management Studio gives the ability to check that all your databases have a good level of security but also to keep this level.
Great new feature ;-)

 

Cet article SQL Server Management Studio 17.4: Vulnerability Assessment est apparu en premier sur Blog dbi services.

Backup and Restore PostgreSQL with PgBackRest II

Thu, 2018-02-15 09:41

In a precedent blog I shown a basic utilization of PgBackRest which is a tool to backup and restore PostgreSQL databases. In this blog I am going to talk some useful features of this tool. In practical examples we will see some tasks we can do with this tool. Of course the official documentation remains the best source of knowledges.

Encryption
Nowadays encryption of backups is very critical and is mandatory for many companies. PgBackRest allows us to encrypt the repository where backups are stored. A passphrase is used to encrypt/decrypt files of the repository. As you may already know, it is recommended to use a strong passphrase. In the following demonstration we use the openssl to generate a passphrase.

[postgres@pgserver ~]$ openssl rand -base64 48
FhXg7oW2pZb9UICZ4iYZPn3X4I6fF0ni7IL6QjaB1IL8qz4LIrP+GW+XqCZqIi3w
[postgres@pgserver ~]$

Once the passphrase generated, we can update the PgBackRest configuration file with
2 options: repo-cipher-pass and repo-cipher-type

[postgres@pgserver clustpgserver]$ cat /etc/pgbackrest.conf
[global] repo-path=/var/lib/pgbackrest
repo-cipher-pass=FhXg7oW2pZb9UICZ4iYZPn3X4I6fF0ni7IL6QjaB1IL8qz4LIrP+GW+XqCZqIi3w
repo-cipher-type=aes-256-cbc
[clustpgserver] db-path=/var/lib/pgsql/10/data
retention-full=2

The next step is to create the stanza

[postgres@pgserver ~]$ pgbackrest --stanza=clustpgserver --log-level-console=info --db-port=5435 stanza-create
2018-02-13 13:54:50.447 P00 INFO: stanza-create command begin 1.28: --db1-path=/var/lib/pgsql/10/data --db1-port=5435 --log-level-console=info --repo-cipher-pass= --repo-cipher-type=aes-256-cbc --repo-path=/var/lib/pgbackrest --stanza=clustpgserver
2018-02-13 13:55:04.520 P00 INFO: stanza-create command end: completed successfully
[postgres@pgserver ~]$

As we can see the system automatically detect that the repository is encrypted and then will rewrite the command including the –repo-cipher-pass and the –repo-cipher-type options. After the creation of the stanza we can check the status of our stanza

[postgres@pgserver ~]$ pgbackrest --stanza=clustpgserver --log-level-console=info --db-port=5435 check
2018-02-13 13:56:08.999 P00 INFO: check command begin 1.28: --db1-path=/var/lib/pgsql/10/data --db1-port=5435 --log-level-console=info --repo-cipher-pass= --repo-cipher-type=aes-256-cbc --repo-path=/var/lib/pgbackrest --stanza=clustpgserver
2018-02-13 13:57:08.026 P00 INFO: WAL segment 00000002000000000000004C successfully stored in the archive at '/var/lib/pgbackrest/archive/clustpgserver/10-1/0000000200000000/00000002000000000000004C-f5ced60cd351d74a91c9ce2e913b761144165e28.gz'
2018-02-13 13:57:08.030 P00 INFO: check command end: completed successfully

Everything seems fine, so let’s run a backup. Note that outputs are truncated

[postgres@pgserver ~]$ pgbackrest --stanza=clustpgserver --log-level-console=info --db-port=5435 backup
2018-02-13 14:01:40.012 P00 INFO: backup command begin 1.28: --db1-path=/var/lib/pgsql/10/data --db1-port=5435 --log-level-console=info --repo-cipher-pass= --repo-cipher-type=aes-256-cbc --repo-path=/var/lib/pgbackrest --retention-full=2 --stanza=clustpgserver
WARN: no prior backup exists, incr backup has been changed to full
2018-02-13 14:01:54.118 P00 INFO: execute non-exclusive pg_start_backup() with label "pgBackRest backup started at 2018-02-13 14:01:52": backup begins after the next regular checkpoint completes
...
type=aes-256-cbc --repo-path=/var/lib/pgbackrest --retention-archive=2 --retention-full=2 --stanza=clustpgserver
2018-02-13 14:35:08.281 P00 INFO: full backup total < 2 - using oldest full backup for 10-1 archive retention
2018-02-13 14:35:08.801 P00 INFO: expire command end: completed successfully
[postgres@pgserver ~]$

In a non-encrypted repository, file backup.info can be read. Now with encryption the if we try to read the file backup.info in the repository, we cannot.

[postgres@pgserver clustpgserver]$ less /var/lib/pgbackrest/backup/clustpgserver/backup.info
"/var/lib/pgbackrest/backup/clustpgserver/backup.info" may be a binary file. See it anyway?

And using the command strings, we can see that the file is encrypted.

[postgres@pgserver clustpgserver]$ strings /var/lib/pgbackrest/backup/clustpgserver/backup.info
Salted__Fx
.;Ru
cz4@
do:t
\pi3"E
VUSO
}a.R*
Wx5M
,?,W
3CXWB
[postgres@pgserver clustpgserver]$

From now, backups cannot be used unless the password is provided.

Restore in another location
PgBackRest allows to restore to another location. This can be useful if we want to duplicate our cluster on the same server or to another server. In the following demonstration, let’s duplicate on the same server.
The data directory of the source cluster is /var/lib/pgsql/10/data

postgres=# show data_directory;
data_directory
------------------------
/var/lib/pgsql/10/data
(1 row)
postgres=#

To duplicate to a new data directory /u01/devdata for example, the option –db-path is used

[postgres@pgserver log]$ pgbackrest --stanza=clustpgserver --log-level-console=info --db-path=/u01/devdata restore


2018-02-14 09:40:05.755 P01 INFO: restore file /u01/devdata/base/1/13657 (0B, 100%)
2018-02-14 09:40:05.773 P01 INFO: restore file /u01/devdata/base/1/13652 (0B, 100%)
2018-02-14 09:40:05.811 P01 INFO: restore file /u01/devdata/base/1/13647 (0B, 100%)
2018-02-14 09:40:05.983 P01 INFO: restore file /u01/devdata/base/1/13642 (0B, 100%)
2018-02-14 09:40:06.067 P00 INFO: write /u01/devdata/recovery.conf
2018-02-14 09:40:14.403 P00 INFO: restore global/pg_control (performed last to ensure aborted restores cannot be started)
2018-02-14 09:40:30.187 P00 INFO: restore command end: completed successfully

After the duplicate don’t forget to change the port (as we are in the same server) and then start your new cluster

postgres=# show data_directory ;
data_directory
----------------
/u01/devdata
(1 row)
postgres=#

Restore specific databases
With PgBackRest, we can restore specific user databases. Note that built-in databases (template0, template1 and postgres) are always restored.
Let’s show an example. In our source cluster we actually have two databases test and sandbox.

sandbox=# \l
List of databases
Name | Owner | Encoding | Collate | Ctype | Access privileges
-----------+----------+----------+-------------+-------------+-----------------------
postgres | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
sandbox | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
template0 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =c/postgres +
| | | | | postgres=CTc/postgres
template1 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =c/postgres +
| | | | | postgres=CTc/postgres
test | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
(5 rows)

In sandbox we have a table mytab with 2 rows

sandbox=# \c sandbox
You are now connected to database "sandbox" as user "postgres".
sandbox=# table mytab;
id
----
1
2
(2 rows)

Now let’s restore the cluster but only with test database, the option –db-include will be used.

[postgres@pgserver log]$ pgbackrest --stanza=clustpgserver --log-level-console=info --db-path=/u01/devdata --db-include=test restore
2018-02-14 10:11:00.948 P00 INFO: restore command begin 1.28: --db-include=test=1 --db1-path=/u01/devdata --log-level-console=info --repo-cipher-pass= --repo-cipher-type=aes-256-cbc --repo-path=/var/lib/pgbackrest --stanza=clustpgserver
2018-02-14 10:11:05.137 P00 INFO: restore backup set 20180214-095439F_20180214-100446I
2018-02-14 10:11:25.110 P00 INFO: remap $PGDATA directory to /u01/devdata
...

After the restore completed, let’s start the new cluster and let’s verify present databases.

[postgres@pgserver devdata]$ psql -p 5436
psql (10.1)
Type "help" for help.
postgres=# \l
List of databases
Name | Owner | Encoding | Collate | Ctype | Access privileges
-----------+----------+----------+-------------+-------------+-----------------------
postgres | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
sandbox | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
template0 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =c/postgres +
| | | | | postgres=CTc/postgres
template1 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =c/postgres +
| | | | | postgres=CTc/postgres
test | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
(5 rows)

What!! the sandbox is still present despite the use of option –include-db=test. But if we try to connect to sandbox database. We get an error.

postgres=# \c sandbox
FATAL: relation mapping file "base/24581/pg_filenode.map" contains invalid data
Previous connection kept
postgres=#

And if we compare at OS level the size of files of the database at the source cluster and at the target

[postgres@pgserver log]$ du -sh /var/lib/pgsql/10/data/base/24581
7.8M /var/lib/pgsql/10/data/base/24581
[postgres@pgserver log]$ du -sh /u01/devdata/base/24581
16K /u01/devdata/base/24581
[postgres@pgserver log]$

We can see that at the target cluster, sandbox uses less disk space during the selective restore than it would have if the entire database had been restored. To finish the selective restore, we have to manually drop the sandbox database. Indeed PgBackRest cannot automatically drop the database because the cluster is not accessible until the recovery process finishes.

postgres=# drop database sandbox;
DROP DATABASE
postgres=#

Automatic cleanup of expired backups
Another nice feature of PgBackRest is that expired backups are automatically removed.
If we check our pgbackrest.conf file, we see that the retention-full is set to 2. This means that 2 full backups will be maintained. So if we do a third full backup, the first full backup and all corresponding incremental and differential backups will be expired and removed

[postgres@pgserver log]$ cat /etc/pgbackrest.conf
[global] repo-path=/var/lib/pgbackrest
repo-cipher-pass=FhXg7oW2pZb9UICZ4iYZPn3X4I6fF0ni7IL6QjaB1IL8qz4LIrP+GW+XqCZqIi3w
repo-cipher-type=aes-256-cbc
[clustpgserver] db-path=/var/lib/pgsql/10/data
retention-full=2
[postgres@pgserver log]$

Let’s do a quick demonstration. Actually we have 2 full backups

[postgres@pgserver log]$ pgbackrest --stanza=clustpgserver info
stanza: clustpgserver
status: ok
db (current)
wal archive min/max (10-1): 00000002000000000000004E / 000000020000000000000056
full backup: 20180213-140152F
timestamp start/stop: 2018-02-13 14:01:52 / 2018-02-13 14:32:00
wal start/stop: 00000002000000000000004E / 00000002000000000000004E
database size: 577MB, backup size: 577MB
repository size: 28.8MB, repository backup size: 28.8MB
incr backup: 20180213-140152F_20180213-152509I
timestamp start/stop: 2018-02-14 09:31:03 / 2018-02-14 09:33:17
wal start/stop: 000000020000000000000052 / 000000020000000000000052
database size: 30.7MB, backup size: 285.3KB
repository size: 3.6MB, repository backup size: 24.3KB
backup reference list: 20180213-140152F
full backup: 20180214-095439F
timestamp start/stop: 2018-02-14 09:54:39 / 2018-02-14 09:58:53
wal start/stop: 000000020000000000000054 / 000000020000000000000054
database size: 30.7MB, backup size: 30.7MB
repository size: 3.6MB, repository backup size: 3.6MB
incr backup: 20180214-095439F_20180214-100446I
timestamp start/stop: 2018-02-14 10:04:46 / 2018-02-14 10:07:43
wal start/stop: 000000020000000000000056 / 000000020000000000000056
database size: 38.3MB, backup size: 7.6MB
repository size: 4.5MB, repository backup size: 928.5KB
backup reference list: 20180214-095439F
[postgres@pgserver log]$

And we can confirm by executing a simple ls in the repository where backups are stored

[postgres@pgserver clustpgserver]$ ls -ld *
drwxr-x---. 3 postgres postgres 69 Feb 13 14:32 20180213-140152F
drwxr-x---. 3 postgres postgres 69 Feb 14 09:33 20180213-140152F_20180213-152509I
drwxr-x---. 3 postgres postgres 69 Feb 14 09:59 20180214-095439F
drwxr-x---. 3 postgres postgres 69 Feb 14 10:07 20180214-095439F_20180214-100446I
drwxr-x---. 3 postgres postgres 17 Feb 13 14:33 backup.history
-rw-r-----. 1 postgres postgres 2992 Feb 14 10:08 backup.info
-rw-r-----. 1 postgres postgres 2992 Feb 14 10:08 backup.info.copy
lrwxrwxrwx. 1 postgres postgres 33 Feb 14 10:08 latest -> 20180214-095439F_20180214-100446I
[postgres@pgserver clustpgserver]$ ls -ld
drwxr-x---. 7 postgres postgres 4096 Feb 14 10:08 .
[postgres@pgserver clustpgserver]$ ls -ld *

Now let’s do a third full backup

[postgres@pgserver clustpgserver]$ pgbackrest --stanza=clustpgserver --log-level-console=info --db-port=5435 --type=full backup
2018-02-14 10:55:52.250 P00 INFO: backup command begin 1.28: --db1-path=/var/lib/pgsql/10/data --db1-port=5435 --log-level-console=info --repo-cipher-pass= --repo-cipher-type=aes-256-cbc --repo-path=/var/lib/pgbackrest --retention-full=2 --stanza=clustpgserver --type=full

2018-02-14 11:19:02.001 P00 INFO: backup command end: completed successfully
2018-02-14 11:19:02.107 P00 INFO: expire command begin 1.28: --log-level-console=info --repo-cipher-pass= --repo-cipher-type=aes-256-cbc --repo-path=/var/lib/pgbackrest --retention-archive=2 --retention-full=2 --stanza=clustpgserver
2018-02-14 11:19:02.928 P00 INFO: expire full backup set: 20180213-140152F, 20180213-140152F_20180213-152509I
2018-02-14 11:22:08.759 P00 INFO: remove expired backup 20180213-140152F_20180213-152509I
2018-02-14 11:22:09.000 P00 INFO: remove expired backup 20180213-140152F

2018-02-14 11:22:49.387 P00 INFO: expire command end: completed successfully
[postgres@pgserver clustpgserver]$

We can see that at the end of backups, some old backups are expired and removed. We can also confirm this by listing files in the repository

[postgres@pgserver clustpgserver]$ ls -ld *
drwxr-x---. 3 postgres postgres 69 Feb 14 09:59 20180214-095439F
drwxr-x---. 3 postgres postgres 69 Feb 14 10:07 20180214-095439F_20180214-100446I
drwxr-x---. 3 postgres postgres 69 Feb 14 11:13 20180214-105603F
drwxr-x---. 3 postgres postgres 17 Feb 13 14:33 backup.history
-rw-r-----. 1 postgres postgres 2320 Feb 14 11:19 backup.info
-rw-r-----. 1 postgres postgres 2320 Feb 14 11:20 backup.info.copy
lrwxrwxrwx. 1 postgres postgres 16 Feb 14 11:14 latest -> 20180214-105603F
[postgres@pgserver clustpgserver]$

Point-in-Time Recovery
PgBackRest can also do a point-in-time recovery. Let’s drop table article in the database test

test=# table article;
nom
---------
printer
(1 row)
.
test=# select now();
now
-------------------------------
2018-02-14 11:39:28.024378+01
(1 row)
.
test=# drop table article;
DROP TABLE
.
test=# table article;
ERROR: relation "article" does not exist
LINE 1: table article;
^
test=#

And now let’s restore until just before we drop the table let’s say 2018-02-14 11:39:28.
But as we have many backup sets we have to restore from a backup done before the table was dropped.
If we check our backups, we have to restore from the full backup: 20180214-105603F
which was taken before table article was dropped.

[postgres@pgserver devdata]$ pgbackrest --stanza=clustpgserver --log-level-console=info info
stanza: clustpgserver
status: ok
db (current)
wal archive min/max (10-1): 000000020000000000000054 / 00000002000000000000005A
full backup: 20180214-095439F
timestamp start/stop: 2018-02-14 09:54:39 / 2018-02-14 09:58:53
wal start/stop: 000000020000000000000054 / 000000020000000000000054
database size: 30.7MB, backup size: 30.7MB
repository size: 3.6MB, repository backup size: 3.6MB
incr backup: 20180214-095439F_20180214-100446I
timestamp start/stop: 2018-02-14 10:04:46 / 2018-02-14 10:07:43
wal start/stop: 000000020000000000000056 / 000000020000000000000056
database size: 38.3MB, backup size: 7.6MB
repository size: 4.5MB, repository backup size: 928.5KB
backup reference list: 20180214-095439F
full backup: 20180214-105603F
timestamp start/stop: 2018-02-14 10:56:03 / 2018-02-14 11:12:26

wal start/stop: 000000020000000000000058 / 000000020000000000000058
database size: 38.3MB, backup size: 38.3MB
repository size: 4.5MB, repository backup size: 4.5MB
incr backup: 20180214-105603F_20180214-121044I
timestamp start/stop: 2018-02-14 12:10:44 / 2018-02-14 12:15:14
wal start/stop: 00000002000000000000005A / 00000002000000000000005A
database size: 38.3MB, backup size: 1.1MB
repository size: 4.5MB, repository backup size: 140.8KB
backup reference list: 20180214-105603F
[postgres@pgserver devdata]$

For the restore we use the option –set which allows us to specify the backup set we want to use for the restore. Note also the use of –type=time and –target

[postgres@pgserver log]$ pgbackrest --stanza=clustpgserver --log-level-console=info --type=time "--target=2018-02-14 11:39:28.024378+01" --db-path=/u01/devdata --set=20180214-105603F restore
2018-02-14 13:36:50.848 P00 INFO: restore command begin 1.28: --db1-path=/u01/devdata --log-level-console=info --repo-cipher-pass= --repo-cipher-type=aes-256-cbc --repo-path=/var/lib/pgbackrest --set=20180214-105603F --stanza=clustpgserver "--target=2018-02-14 11:39:28.024378+01" --type=time
2018-02-14 13:37:03.406 P00 INFO: restore backup set 20180214-105603F
...

At the end of the restore let’s see the contents of the recovery.done file

[postgres@pgserver devdata]$ cat recovery.conf
restore_command = '/usr/bin/pgbackrest --db1-path=/u01/devdata --log-level-console=info --stanza=clustpgserver archive-get %f "%p"'
recovery_target_time = '2018-02-14 11:39

If we start our new cluster, we can see in log files that PITR is starting
2018-02-14 13:54:23.824 CET [10049] LOG: starting point-in-time recovery to 2018-02-14 11:39:28.024378+01
And once the recovery finished, we can verify that the table article is present

postgres=# \c test
You are now connected to database "test" as user "postgres".
test=# \d article
Table "public.article"
Column | Type | Collation | Nullable | Default
--------+-----------------------+-----------+----------+---------
nom | character varying(50) | | |
.
test=# table article;
nom
---------
printer
(1 row)
test=#

Conclusion:
In this blog I talked about some features about PgBackRest. But as already specified, there are many, many other options with this wonderful tool. The official documentation can give more information. In future blogs we will explore more with advanced configuration.

 

Cet article Backup and Restore PostgreSQL with PgBackRest II est apparu en premier sur Blog dbi services.

(DESCRIPTION_LIST=(DESCRIPTION=(ADDRESS_LIST=(FAILOVER=YES)(LOAD_BALANCE=NO)

Thu, 2018-02-15 00:02

Do you have complex connection strings with DESCRIPTION_LIST, DESCRIPTION, ADDRESS_LIST, ADDRESS and a nice combination of FAILOVER and LOAD_BALANCE? You probably checked the documentation, telling you that FAILOVER=YES is the default at all levels, but LOAD_BALANCE=YES is the default only for DESCRIPTION_LIST. But when disaster recovery and availability is concerned, the documentation is not sufficient. I want to test it. And here is how I do it.

I don’t want to test it with the real configuration and stop the different instances. And I don’t need to. My way to test an address list is to define a tnsnames.ora with the connection string, such as the following:

NET_SERVICE_NAME=
NET_SERVICE_NAME=
(DESCRIPTION_LIST=
(DESCRIPTION=
(CONNECT_DATA=(SERVICE_NAME=pdb1))
(ADDRESS=(PROTOCOL=TCP)(HOST=localhost)(PORT=101))
(ADDRESS=(PROTOCOL=TCP)(HOST=localhost)(PORT=102))
)
(DESCRIPTION=
(CONNECT_DATA=(SERVICE_NAME=pdb1))
(ADDRESS_LIST=
(ADDRESS=(PROTOCOL=TCP)(HOST=localhost)(PORT=201))
(ADDRESS=(PROTOCOL=TCP)(HOST=localhost)(PORT=202))
)
)
)

I used localhost because I know it’s there and I don’t want to wait for the TCP timeout. But I use fake ports, which do not exist. So finally, a connection will never be established but I will be able to see all that are tried. I check them with strace on the connect() system call, with the following script:


for i in {1..10}
do
TNS_ADMIN=/tmp strace -T -e trace=connect sqlplus -s -L sys/oracle@NET_SERVICE_NAME as sysdba <<< "" 2>&1 | awk '
/sa_family=AF_INET, sin_port=htons/{
gsub(/[()]/," ") ; printf "%s ",$5
}
END{
print ""
}
'
done | sort | uniq

So, I used meaningful numbers for my fake ports: 101 and 102 for the addresses in the first description of the description list, and 201 and 202 for the address list in the second description. The awk script shows the sequence that was tried. And, because of the random round robin, I run them in a loop several times to see all patterns, aggregated by sort|uniq

So here is the result from the connection string above using the defaults for load balancing and failover:

101 102 201 202
201 202 101 102

The sequence within the address list is always in order (101,102 and 201,202) because LOAD_BALANCE=NO is the default there. But I have two combinations for the descriptions because LOAD_BALANCE=YES is the default in DESCRIPTION_LIST. Finally, all adresses are tried because FAILOVER=YES is the default at all levels.

LOAD_BALANCE

If I define LOAD_BALANCE at all levels, such as:

NET_SERVICE_NAME=
(DESCRIPTION_LIST=(FAILOVER=YES)(LOAD_BALANCE=YES)
(DESCRIPTION=(FAILOVER=YES)(LOAD_BALANCE=YES)
(CONNECT_DATA=(SERVICE_NAME=pdb1))
(ADDRESS=(PROTOCOL=TCP)(HOST=localhost)(PORT=101))
(ADDRESS=(PROTOCOL=TCP)(HOST=localhost)(PORT=102))
)
(DESCRIPTION=(FAILOVER=YES)
(CONNECT_DATA=(SERVICE_NAME=pdb1))
(ADDRESS_LIST=(FAILOVER=YES)(LOAD_BALANCE=YES)
(ADDRESS=(PROTOCOL=TCP)(HOST=localhost)(PORT=201))
(ADDRESS=(PROTOCOL=TCP)(HOST=localhost)(PORT=202))
)
)
)

The result shows that all combinations can be tried in any order:

101 102 201 202
101 102 202 201
102 101 201 202
102 101 202 201
201 202 101 102
201 202 102 101
202 201 101 102
202 201 102 101

By running it in a large loop you will confirm that any address will be tried at most once.

FAILOVER

Now, If I set FAILOVER=NO within the first description:

NET_SERVICE_NAME=
(DESCRIPTION_LIST=(FAILOVER=YES)(LOAD_BALANCE= NO)
(DESCRIPTION=(FAILOVER= NO)(LOAD_BALANCE=YES)
(CONNECT_DATA=(SERVICE_NAME=pdb1))
(ADDRESS=(PROTOCOL=TCP)(HOST=localhost)(PORT=101))
(ADDRESS=(PROTOCOL=TCP)(HOST=localhost)(PORT=102))
)
(DESCRIPTION=(LOAD_BALANCE=NO )
(CONNECT_DATA=(SERVICE_NAME=pdb1))
(ADDRESS_LIST=(FAILOVER=YES)(LOAD_BALANCE=YES)
(ADDRESS=(PROTOCOL=TCP)(HOST=localhost)(PORT=201))
(ADDRESS=(PROTOCOL=TCP)(HOST=localhost)(PORT=202))
)
)
)

the first attempt can be 101 or 102 (because of LOAD_BALANCING) but only one will be tried in this address list, because of no failover. Then, the second description is attempted (because FAILOVER=YES at description list level) and with all addresses there (because of LOAD_BALANCING=YES). The result of all possible combinations is:


101 201 202
102 201 202
102 202 201

So here it is. You can test any complex connection description to check what will be the possible connections and in which order they will be tried. From this, you can infer what will happen with a real configuration: the wait for TCP timeout for addresses tested on hosts that are not up, and the load balancing given be the different possible combinations.

 

Cet article (DESCRIPTION_LIST=(DESCRIPTION=(ADDRESS_LIST=(FAILOVER=YES)(LOAD_BALANCE=NO) est apparu en premier sur Blog dbi services.

Full page logging in Postgres and Oracle

Wed, 2018-02-14 14:13

In my opinion, the volume of logging (aka redo log, aka xlog, aka WAL) is the most important factor for OLTP performance, availability and scalability, for several reasons:

  • This is the only structure where disk latency is a mandatory component of response time
  • This is a big part of the total volume of backups
  • This is sequential by nature, and very difficult to scale by parallelizing

In this post, I look at the volume of logging generated by some DML in Postgres and Oracle. I know Oracle quite well and just start to look at Postgres. The comparison here is not a contest but a way to better understand. For example, the default behavior of Postgres, with full_page_writes=on, is very similar to Oracle ‘begin backup’ mode. The comparison makes no sense for most of Postgres DBAs, but probably helps Oracle DBAs to understand it.

Measure WAL segment writes

Here is how I measured the volume of transaction log written: start the Postgres server with ‘strace -f’ and parse with ‘awk’ the open(), write() and close() calls:

sudo su postgres <<'END'
export PGDATA=/u01/pgdata
/usr/pgsql-10/bin/pg_ctl stop
strace -e trace=open,close,write,recvfrom -f /usr/pgsql-10/bin/pg_ctl start 2>&1 | awk '
/^[^[]/{
$0="[pid MAIN] "$0
}
/strace: Process [0-9][0-9]* attached/{
sub(/^.*strace: /,"strace: ") ; "ps -o cmd -hp " $3 |& getline proc[$3"]"] ; print "" ; print $0,proc[$3"]"] }
/open[(].*pg_wal[/].* = [0-9]*$/{
z=$0 ; gsub(qq," ") ; fd_wal[$2 $NF]=$4
}
/checkpoint;/{
total_written_wal=0
}
/write[(]/{
#pid=$2 ; sub("]","",$2) ; "ps -o cmd -hp " p |& getline proc[p"]"] z=$0 ; gsub("[(,]"," ") ; if ( fd_wal[$2 $4]>0 ) { written_wal[$2 $4]=written_wal[$2 $4]+$NF ; total_written_wal=total_written_wal+$NF } next
}
/close[(]/{
pid=$2 ; sub("[^0-9]","",pid) ;
z=$0 ; gsub("[()]"," ") ; if ( ( fd_wal[$2 $4]!="" ) && ( written_wal[$2 $4] > 0 ) ) {
printf " ( written %d bytes to %s -> total WAL segments: %.2f MB ) cmd=%s\n",written_wal[$2 $4],fd_wal[$2 $4],total_written_wal/1024/1024 , proc[$2] ; fd_wal[$2 $4]=""
} next
}
' qq='"'
END

Do not do that in production. This is experimentation in a lab. Do not attach strace to a critical process in production.

There’s probably an easier way to get the same information, maybe with postgres activity statistics, or through a size counting archive_command, so please don’t hesitate to comment. Anyway, from the ‘write()’ calls I am sure that I’m counting exactly what I want: the volume of logging written to disk. As an Oracle DBA used to LogWriter and its slave threads managing all writes, I started to trace only the WAL writer process but quickly realized that part ot the logging is directly written by my server process.

Postgres: insert

I create a table with some numbers and a 100 bytes character string.

create table demo as select generate_series a,generate_series b,generate_series c,generate_series d,generate_series e,generate_series f,lpad('x',100,'x') g from generate_series(0,0);

The first operation I test is the insert of 1 million rows.

insert into demo select generate_series a,generate_series b,generate_series c,generate_series d,generate_series e,generate_series f,lpad('x',100,'x') g from generate_series(1,1000000);
( written 4349952 bytes to pg_wal/000000010000000A0000005F -> total WAL segments: 4.16 MB ) cmd=postgres: demo demo 192.168.56.122(38013)
( written 8192 bytes to pg_wal/000000010000000A0000005F -> total WAL segments: 9.00 MB ) cmd=postgres: wal writer process
( written 17735680 bytes to pg_wal/000000010000000A00000060 -> total WAL segments: 20.07 MB ) cmd=postgres: demo demo 192.168.56.122(38013)
( written 3309568 bytes to pg_wal/000000010000000A00000060 -> total WAL segments: 31.40 MB ) cmd=postgres: wal writer process
( written 33783808 bytes to pg_wal/000000010000000A00000061 -> total WAL segments: 36.03 MB ) cmd=postgres: demo demo 192.168.56.122(38013)
( written 3997696 bytes to pg_wal/000000010000000A00000061 -> total WAL segments: 39.80 MB ) cmd=postgres: wal writer process
( written 49676288 bytes to pg_wal/000000010000000A00000062 -> total WAL segments: 51.19 MB ) cmd=postgres: demo demo 192.168.56.122(38013)
( written 65273856 bytes to pg_wal/000000010000000A00000063 -> total WAL segments: 66.06 MB ) cmd=postgres: demo demo 192.168.56.122(38013)
( written 79364096 bytes to pg_wal/000000010000000A00000064 -> total WAL segments: 82.04 MB ) cmd=postgres: demo demo 192.168.56.122(38013)
( written 6660096 bytes to pg_wal/000000010000000A00000064 -> total WAL segments: 82.39 MB ) cmd=postgres: wal writer process
( written 88285184 bytes to pg_wal/000000010000000A00000065 -> total WAL segments: 98.02 MB ) cmd=postgres: demo demo 192.168.56.122(38013)
( written 14491648 bytes to pg_wal/000000010000000A00000065 -> total WAL segments: 106.82 MB ) cmd=postgres: wal writer process
( written 101703680 bytes to pg_wal/000000010000000A00000066 -> total WAL segments: 113.99 MB ) cmd=postgres: demo demo 192.168.56.122(38013)
( written 17825792 bytes to pg_wal/000000010000000A00000066 -> total WAL segments: 117.19 MB ) cmd=postgres: wal writer process
( written 115769344 bytes to pg_wal/000000010000000A00000067 -> total WAL segments: 128.20 MB ) cmd=postgres: demo demo 192.168.56.122(38013)
( written 18661376 bytes to pg_wal/000000010000000A00000067 -> total WAL segments: 135.09 MB ) cmd=postgres: wal writer process
( written 19824640 bytes to pg_wal/000000010000000A00000068 -> total WAL segments: 144.17 MB ) cmd=postgres: wal writer process
( written 131350528 bytes to pg_wal/000000010000000A00000068 -> total WAL segments: 148.16 MB ) cmd=postgres: demo demo 192.168.56.122(38013)
( written 27435008 bytes to pg_wal/000000010000000A00000069 -> total WAL segments: 159.80 MB ) cmd=postgres: wal writer process
( written 140132352 bytes to pg_wal/000000010000000A00000069 -> total WAL segments: 159.80 MB ) cmd=postgres: demo demo 192.168.56.122(38013)
INSERT 0 1000000

You can see that my ‘strace|awk’ script is running in the background and has counted about 160 MB of logging, partially from the ‘postgres: wal writer process’ and partly from ‘postgres: demo demo 192.168.56.122(38013)’ serving my connection.

The relation size as stored on disk is about 150 MB;

analyze demo;
ANALYZE
select relname, relnamespace, reltype, reloftype, relowner, relam, relfilenode, reltablespace, relpages, reltuples, relallvisible, relpages*8/1024 MB from pg_class where relname = 'demo';
relname | relnamespace | reltype | reloftype | relowner | relam | relfilenode | reltablespace | relpages | reltuples | relallvisible | mb
---------+--------------+---------+-----------+----------+-------+-------------+---------------+----------+-----------+---------------+-----
demo | 2200 | 25157 | 0 | 16385 | 0 | 25155 | 0 | 19231 | 1e+06 | 0 | 150
(1 row)

This makes sense. An insert has to write all new data into the log in order to be able to recover the pages until they are checkpointed.

Note that I have no index on this table for this test.

Postgres: update

I’m now updating one column for all rows.

update demo set b=b+1;
( written 150528000 bytes to pg_wal/000000010000000A0000006A -> total WAL segments: 4.01 MB ) cmd=postgres: demo demo 192.168.56.122(38013)
( written 162693120 bytes to pg_wal/000000010000000A0000006B -> total WAL segments: 17.84 MB ) cmd=postgres: demo demo 192.168.56.122(38013)
( written 29769728 bytes to pg_wal/000000010000000A0000006B -> total WAL segments: 28.44 MB ) cmd=postgres: wal writer process
...
( written 84287488 bytes to pg_wal/000000010000000A00000081 -> total WAL segments: 343.65 MB ) cmd=postgres: wal writer process
( written 453705728 bytes to pg_wal/000000010000000A00000082 -> total WAL segments: 347.36 MB ) cmd=postgres: demo demo 192.168.56.122(38013)
UPDATE 1000001

I touched only a small part of the volume in bytes, but I touched all rows and all pages. An, even if only a few bytes are modified, Postgres logs the whole page to protect from fractured blocks in case of crash (pages partially written). So that’s about 150 MB. But postgres do not update rows in-place. The whole row is inserted in its new version, which means the whole volume again, which is another 150 MB. If we look at the size of the table, we can see 300MB of pages:

analyze demo;
ANALYZE
select relname, relnamespace, reltype, reloftype, relowner, relam, relfilenode, reltablespace, relpages, reltuples, relallvisible, relpages*8/1024 MB from pg_class where relname = 'demo';
relname | relnamespace | reltype | reloftype | relowner | relam | relfilenode | reltablespace | relpages | reltuples | relallvisible | mb
---------+--------------+---------+-----------+----------+-------+-------------+---------------+----------+-------------+---------------+-----
demo | 2200 | 25157 | 0 | 16385 | 0 | 25155 | 0 | 38462 | 1.21882e+06 | 0 | 300
(1 row)

So this update has generated even more logging: 347 MB.

Postgres: sparse update

Now updating only 1 row out of ten, still one column only:

update demo set b=b+1 where mod(a,10)=1;
( written 89923584 bytes to pg_wal/000000010000000A00000083 -> total WAL segments: 13.88 MB ) cmd=postgres: wal writer process
( written 469123072 bytes to pg_wal/000000010000000A00000084 -> total WAL segments: 22.98 MB ) cmd=postgres: demo demo 192.168.56.122(38013)
...
( written 563576832 bytes to pg_wal/000000010000000A0000008D -> total WAL segments: 151.07 MB ) cmd=postgres: demo demo 192.168.56.122(38013)
( written 130940928 bytes to pg_wal/000000010000000A0000008D -> total WAL segments: 151.27 MB ) cmd=postgres: wal writer process
UPDATE 100000
analyze demo;

So, 10% of the rows had to be copied to their new version, which brings the table size to additional 15 MB.

analyze demo;
ANALYZE
select relname, relnamespace, reltype, reloftype, relowner, relam, relfilenode, reltablespace, relpages, reltuples, relallvisible, relpages*8/1024 MB from pg_class where relname = 'demo';
relname | relnamespace | reltype | reloftype | relowner | relam | relfilenode | reltablespace | relpages | reltuples | relallvisible | mb
---------+--------------+---------+-----------+----------+-------+-------------+---------------+----------+-------------+---------------+-----
demo | 2200 | 25157 | 0 | 16385 | 0 | 25155 | 0 | 40385 | 1.07267e+06 | 0 | 315
(1 row)

For these additional 15 MB, half of the table pages had to be modified (the current version having to point to the new version), and the logging generated was 150 MB. Because of MVCC at tuple level, doing something similar to ‘chained rows’ and ‘row migration’ for all updates, and because of full page logging, even sparse updates generate a lot log writes.

Postgres: delete

Here is a delete of those million rows:

delete from demo;
( written 576364544 bytes to pg_wal/000000010000000A0000008E -> total WAL segments: 6.44 MB ) cmd=postgres: demo demo 192.168.56.122(38013)
( written 134930432 bytes to pg_wal/000000010000000A0000008E -> total WAL segments: 6.73 MB ) cmd=postgres: wal writer process
( written 589225984 bytes to pg_wal/000000010000000A0000008F -> total WAL segments: 18.70 MB ) cmd=postgres: demo demo 192.168.56.122(38013)
...
( written 162054144 bytes to pg_wal/000000010000000A00000099 -> total WAL segments: 184.70 MB ) cmd=postgres: wal writer process
( written 740352000 bytes to pg_wal/000000010000000A0000009A -> total WAL segments: 189.80 MB ) cmd=postgres: demo demo 192.168.56.122(38013)
DELETE 1000001
( written 163217408 bytes to pg_wal/000000010000000A0000009A -> total WAL segments: 196.22 MB ) cmd=postgres: wal writer process

Marking tuples as deleted does not increase the table:

analyze demo;
ANALYZE
select relname, relnamespace, reltype, reloftype, relowner, relam, relfilenode, reltablespace, relpages, reltuples, relallvisible, relpages*8/1024 MB from pg_class where relname = 'demo';
relname | relnamespace | reltype | reloftype | relowner | relam | relfilenode | reltablespace | relpages | reltuples | relallvisible | mb
---------+--------------+---------+-----------+----------+-------+-------------+---------------+----------+-----------+---------------+-----
demo | 2200 | 25157 | 0 | 16385 | 0 | 25155 | 0 | 40385 | 275837 | 0 | 315
(1 row)

But all current tuples have to be marked as deleted and not visible once the transaction is committed. This touches all pages for the current version, which is more than 150 MB of logging here.

Postgres: vacuum

After two updates and a delete, I have old tuples in this table. It seems that VACUUM does not generate any logging:

vacuum demo;
( written 762445824 bytes to pg_wal/000000010000000A0000009B -> total WAL segments: 14.67 MB ) cmd=postgres: demo demo 192.168.56.122(38013)
VACUUM

My guess (but remember that I am a newbie in Postgres) is that in case of a crash occurring before the next checkpoint we will just have to vacuum again. But this is not what was answered in the postgres-general list a few years ago.

Note that full page logging is not necessary for all changes, but only for the first change after the page was read from disk after a checkpoint. This is sufficient to cover future writes failures because recovery will start from there. Once we have full page logged, change vector is sufficient for further recovery. However, I had the same amount of WAL, 15 MB, when vacuuming after a checkpoint.

Oracle: insert

Let’s do some similar things in Oracle, which MVCC implementation is completely different: at block level, with undo logging.

SQL> create table demo as select rownum a,rownum b,rownum c,rownum d,rownum e,rownum f,lpad('x',100,'x') g from xmltable('0 to 0');
Table created.

I have exposed in a previous post how I get the delta values from V$MYSTAT join V$STATNAME using (STATISTIC#) for ‘redo size’, so no need to strace here. But we can see the same result by measuring the writes to redo log groups (do not double count the multiplexed members).

SQL> insert into demo select rownum a,rownum b,rownum c,rownum d,rownum e,rownum f,lpad('x',100,'x') g from xmltable('1 to 1000000');
1000001 rows inserted.
 
SQL> @ _mystat_diff.sql
 
db block changes redo size undo change vector size redo entries
---------------- ---------------- ----------------------- ----------------
141,342 155,218,876 4,380,448 104,411

This is about 150MB, which is the volume of the table:

SQL> exec dbms_stats.gather_table_stats(user,'DEMO');
PL/SQL procedure successfully completed.
SQL> select table_name,num_rows,blocks,(blocks*block_size/1024/1024) MB, avg_row_len from user_tables join dba_tablespaces using(tablespace_name) where table_name='DEMO';
 
TABLE_NAME NUM_ROWS BLOCKS MB AVG_ROW_LEN
------------------------------ ---------- ---------- ---------- -----------
DEMO 1000001 19280 150.625 131

Conclusion for inserts: all databases have to log the whole data inserted in order to be protected from instance crash. Note that Oracle has a way to insert directly into the file, bypassing the buffer cache, and then reduce the logging required for crash recovery. But I’m not doing bulk inserts here.

Oracle: update

The update in Oracle is done in-place. There is no need to copy the whole row (except in the rare cases where the row increases and do not fit into the block). However, the old value of the column must be copied for MVCC, into the UNDO segment. This is why we see 46 MB of ‘undo change vector size’ here.

SQL> update demo set b=b+1;
1000001 rows updated.
SQL> commit;
Commit complete.
SQL> @ _mystat_diff.sql
 
db block changes redo size undo change vector size redo entries
---------------- ---------------- ----------------------- ----------------
170,777 105,301,308 48,641,772 82,221

The UNDO is only the change vector, not the full block. If you read about copies of full blocks to rollback segments, it is a confusion either from veterans of Oracle 5, or a misunderstanding of flashback features. The UNDO being stored in segments, written first into buffer cache, it is protected by redo logging, so about 46 MB of redo is actually the redo vector of undo vectors. The other 54 MB of redo is the new value of the update.

Oracle: sparse update

The logging of change vectors rather than full pages is even cheaper with sparse updates:

SQL> update demo set b=b+1 where mod(a,10)=1;
100001 rows updated.
SQL> commit;
Commit complete.
SQL> @ _mystat_diff.sql
 
db block changes redo size undo change vector size redo entries
---------------- ---------------- ----------------------- ----------------
56,583 15,414,328 6,111,608 36,921

The volume of undo and redo generated is only 15 MB here, including 6 MB of undo vectors. This is really optimized and this is one reason why you should update only the columns changed (and not use the default non-dynamic update of Hibernate for example).

Oracle: delete

The delete has to mark all rows as deleted and because the space can immediately be reused then whole row must be logged into the UNDO, and this has to be logged into the REDO, so the delete generates lot of logging:

SQL> delete from demo;
1000001 rows deleted.
SQL> commit;
Commit complete.
SQL> @ _mystat_diff.sql
 
db block changes redo size undo change vector size redo entries
---------------- ---------------- ----------------------- ----------------
2,124,823 403,755,892 240,302,088 1,093,821

I have no indexes here. With indexes, all index entries have to be marked as deleted, and this generates undo and redo vector because MVCC in Oracle is at block level: each block modification – for table or index – have to be logged.

Deleting a lot of rows is an expensive operation in Oracle. For bulk purges, it is often better to truncate and insert /*+ append */ when possible (as in non-atomic materialized view refresh). Partitioning helps for that for example to purge old data when partitioned on date.

Postgres without full page logging

Given the huge overhead, is full page logging really required? There are plans to avoid it, mentioned in the Postgres ToDo wiki, or at least to keep it only short term for crash recovery and not media recovery. Another possibility is to implement a checksum on the blocks so that fractured blocks can be detected. Then, when detected, the fractured blocks may not need full page logging to recover them if we can restore a previous backup. This takes longer to recover, but can be acceptable given the low probability of this kind of failure. In addition to that, when you have a physical standby synchronized with log-shipping, you have a easy way to recover without having to restore files. But you need a checksum to detect the problem.

Without a checksum, the problem is the detection of partial writes. But if you trust your storage and if you failover to the standby in case of a crash, you may accept to set full_page_writes=off and this is what I did here.


insert into demo select generate_series a,generate_series b,generate_series c,generate_series d,generate_series e,generate_series f,lpad('x',100,'x') g from generate_series(1,1000000);
...
( written 125255680 bytes to pg_wal/000000010000000A000000E3 -> total WAL segments: 140.65 MB ) cmd=postgres: demo demo 192.168.56.122(38109)
INSERT 0 1000000

The insert still have to log all new data: 140 MB.


update demo set b=b+1;
...
( written 72613888 bytes to pg_wal/000000010000000A000000F2 -> total WAL segments: 213.02 MB ) cmd=postgres: wal writer process
UPDATE 1000001

The update has to log only what is modified, but because of Postgres MVCC implementation, the whole row has to be written in its new version, and the old ones have their pointer updated: 210 MB here.


update demo set b=b+1 where mod(a,10)=1;
( written 305709056 bytes to pg_wal/000000010000000A000000F3 -> total WAL segments: 1.96 MB ) cmd=postgres: demo demo 192.168.56.122(38109)
( written 72613888 bytes to pg_wal/000000010000000A000000F3 -> total WAL segments: 5.62 MB ) cmd=postgres: wal writer process
( written 75718656 bytes to pg_wal/000000010000000A000000F4 -> total WAL segments: 9.65 MB ) cmd=postgres: wal writer process
( written 310665216 bytes to pg_wal/000000010000000A000000F4 -> total WAL segments: 9.65 MB ) cmd=postgres: demo demo 192.168.56.122(38109)
UPDATE 100000

The sparse update benefits from logging only the changed rows: 10 MB here. This one is even smaller than with Oracle because there’s no UNDO to write here: the old values stay in-place.


delete from demo;
( written 323256320 bytes to pg_wal/000000010000000A000000F5 -> total WAL segments: 11.27 MB ) cmd=postgres: demo demo 192.168.56.122(38109)
( written 338829312 bytes to pg_wal/000000010000000A000000F6 -> total WAL segments: 26.92 MB ) cmd=postgres: demo demo 192.168.56.122(38109)
( written 76562432 bytes to pg_wal/000000010000000A000000F6 -> total WAL segments: 31.41 MB ) cmd=postgres: wal writer process
( written 345415680 bytes to pg_wal/000000010000000A000000F7 -> total WAL segments: 39.73 MB ) cmd=postgres: demo demo 192.168.56.122(38109)
( written 83410944 bytes to pg_wal/000000010000000A000000F7 -> total WAL segments: 40.41 MB ) cmd=postgres: wal writer process
DELETE 1000001

The delete is cheap when full_page_writes=off because there’s only the visibility is changed but data remains (until committed and vacuumed). If you have a lot of rows to delete, then consider to set full_page_writes=off and be sure to have a backup to restore in case of crash.

Oracle full page logging in backup mode

So, Oracle by default does not need to protect from fractured blocks, because they can be detected. If the storage crashes while a block is partially written, the block is corrupt. Thanks to the checksum, this corruption will be detected during recovery (or even earlier depending on DB_BLOCK_CHECKSUM and DB_LOST_WRITE_PROTECT). The redo is not sufficient, as it contains only change vectors, but you can recover from the last backup and Oracle can do a simple block recover. This recovery can also be done from the standby database.

However, full page logging exists in Oracle. When running backup from a non-Oracle tool, not aware of block checksum, you need to enclose the copy or snapshot between ‘begin backup’ and ‘end backup’. You do this because online backup may read partially updated blocks, and without the checksum, cannot detect it. A corrupt backup is not very useful and this is why this backup mode will generate more redo to be able to recover them. This is very similar to full page logging: the redo generated for the first modification of the buffer will store the whole block. Next modifications, until buffer is checkpointed, will need only the change vectors.

I think the first article I’ve ever written was a description of the Oracle backup mode. And it is still visible thanks to archive.org only because it was published on… Google Knol!

So, here is the same run with Oracle in backup mode.

Insert does not change a lot as it fills full blocks:

SQL> insert into demo select rownum a,rownum b,rownum c,rownum d,rownum e,rownum f,lpad('x',100,'x') g from xmltable('1 to 1000000');
1000000 rows created.
 
SQL> @ _mystat_diff.sql
 
db block changes redo size undo change vector size redo entries
---------------- ---------------- ----------------------- ----------------
141,376 156,527,072 4,380,448 124,195

Full update of one column generates same undo, but more than 2x redo because of full page logging:

SQL> update demo set b=b+1;
1000001 rows updated.
 
SQL> commit;
Commit complete.
 
SQL> @ _mystat_diff.sql
 
db block changes redo size undo change vector size redo entries
---------------- ---------------- ----------------------- ----------------
170,778 238,317,632 48,641,772 104,640

Sparse update is exactly the same as full update because this 10% touches all pages:

SQL> update demo set b=b+1 where mod(a,10)=1;
100001 rows updated.
 
SQL> commit;
Commit complete.
 
SQL> @ _mystat_diff.sql
 
db block changes redo size undo change vector size redo entries
---------------- ---------------- ----------------------- ----------------
319,622 240,502,284 17,832,196 192,815

Delete generates even more because there’s all the the UNDO in addition to all data pages:

SQL> delete from demo;
1000001 rows deleted.
 
SQL> commit;
Commit complete.
 
SQL> @ _mystat_diff.sql
 
db block changes redo size undo change vector size redo entries
---------------- ---------------- ----------------------- ----------------
2,125,285 558,510,928 240,303,768 1,143,131

So what?

Beyond the very different implementation of Postgres and Oracle, we can see that we have flexibility: the large logging generated by Postgres by default may be reduced in some cases, and the minimal logging which is the default for Oracle may be larger in some situations. The most important, as for all technologies, is to understand how it works. Only then you can do the right choice to balance between performance, availability, and cost. Understand how it works means: read the docs (how it is supposed to work) and test (how it actually works). With Oracle there’s additional information from a huge community testing and using it for decades. With Postgres, as with all Open Source projects, the source code with comments is an amazing documentation.

 

Cet article Full page logging in Postgres and Oracle est apparu en premier sur Blog dbi services.

Backup and Restore PostgreSQL with PgBackRest I

Wed, 2018-02-14 09:58

Many tools can be used to backup PostgreSQL databases. In this blog I will talk about PgBackRest which is a simple tool that can be used to backup and restore a PostgreSQL database. Full, differential, and incremental backups are supported.
In this first blog I will present a basic configuration of pgbackprest. Our configuration is composed of only one cluster and pgbackrest is installed on the server hosting the database. The goal is to explain a first use of PgBackRest.
Below our configuration
Server with Oracle Linux 7
PostgreSQL 10.1
PgBackRest 1.28
We supposed that the linux box and PostgreSQL 10.1 are already installed. So let’s install PgBackRest.

root@pgserver ~]# yum search pgbackrest
Loaded plugins: langpacks, ulninfo
=========================== N/S matched: pgbackrest ============================
pgbackrest.noarch : Reliable PostgreSQL Backup & Restore
pgbackrest.x86_64 : Reliable PostgreSQL Backup & Restore
Name and summary matches only, use "search all" for everything

And then we can install PgBackRest
[root@pgserver ~]# yum install pgbackrest.x86_64
After we can check the installation using pgbackrest command

[postgres@pgserver ~]$ /usr/bin/pgbackrest
pgBackRest 1.28 - General help
Usage:
pgbackrest [options] [command] Commands:
archive-get Get a WAL segment from the archive.
archive-push Push a WAL segment to the archive.
backup Backup a database cluster.
check Check the configuration.
expire Expire backups that exceed retention.
help Get help.
info Retrieve information about backups.
restore Restore a database cluster.
stanza-create Create the required stanza data.
stanza-delete Delete a stanza.
stanza-upgrade Upgrade a stanza.
start Allow pgBackRest processes to run.
stop Stop pgBackRest processes from running.
version Get version.
Use 'pgbackrest help [command]' for more information.

The configuration of PgBackRest is very easy, it consists of a configuration pgbackrest.conf file that must be edited. In my case the file is located in /etc. As specified, we will use a very basic configuration file.
Below the contents of my configuration file

[root@pgserver etc]# cat pgbackrest.conf
[global] repo-path=/var/lib/pgbackrest
[clustpgserver] db-path=/var/lib/pgsql/10/data
retention-full=2
[root@pgserver etc]#

In the file above,
• repo-path is where backup will be stored,
• clusterpgserver is the name of my cluster stanza (free to take what you want as name). A stanza is the configuration for a PostgreSQL database cluster that defines where it is located, how it will be backed up, archiving options, etc.
• db-path is the path of my database files
• retention-full : configure retention to 2 full backups
A complete list can be found here
Once the configuration file done, we can now create the stanza with the command create-stanza. Note that my PostgreSQL cluster is using the port 5435.

[postgres@pgserver ~]$ pgbackrest --stanza=clustpgserver --log-level-console=info --db-port=5435 stanza-create
2018-02-08 14:01:49.293 P00 INFO: stanza-create command begin 1.28: --db1-path=/var/lib/pgsql/10/data --db1-port=5435 --log-level-console=info --repo-path=/var/lib/pgbackrest --stanza=clustpgserver
2018-02-08 14:01:50.707 P00 INFO: stanza-create command end: completed successfully
[postgres@pgserver ~]$

After we create the stanza, we can verify that the configuration is fine using the check command

[postgres@pgserver ~]$ pgbackrest --stanza=clustpgserver --log-level-console=info --db-port=5435 check
2018-02-08 14:03:42.095 P00 INFO: check command begin 1.28: --db1-path=/var/lib/pgsql/10/data --db1-port=5435 --log-level-console=info --repo-path=/var/lib/pgbackrest --stanza=clustpgserver
2018-02-08 14:03:48.805 P00 INFO: WAL segment 00000001000000000000000C successfully stored in the archive at '/var/lib/pgbackrest/archive/clustpgserver/10-1/0000000100000000/00000001000000000000000C-c387b901a257bac304f27865478fd9f768de83d6.gz'
2018-02-08 14:03:48.808 P00 INFO: check command end: completed successfully
[postgres@pgserver ~]$

Since we did not take yet any backup with PgBackRest, the command info for the backups returns error

[postgres@pgserver ~]$ pgbackrest --stanza=clustpgserver --log-level-console=info info
stanza: clustpgserver
status: error (no valid backups)
db (current)
wal archive min/max (10-1): 00000001000000000000000C / 00000001000000000000000C
[postgres@pgserver ~]$

Now let’s take a backup

[postgres@pgserver ~]$ pgbackrest --stanza=clustpgserver --log-level-console=info --db-port=5435 backup
2018-02-08 14:06:52.706 P00 INFO: backup command begin 1.28: --db1-path=/var/lib/pgsql/10/data --db1-port=5435 --log-level-console=info --repo-path=/var/lib/pgbackrest --retention-full=2 --stanza=clustpgserver
WARN: no prior backup exists, incr backup has been changed to full
2018-02-08 14:06:54.734 P00 INFO: execute non-exclusive pg_start_backup() with label "pgBackRest backup started at 2018-02-08 14:06:53": backup begins after the next regular checkpoint completes
2018-02-08 14:06:55.159 P00 INFO: backup start archive = 00000001000000000000000E, lsn = 0/E000060
2018-02-08 14:07:09.867 P01 INFO: backup file /var/lib/pgsql/10/data/base/13805/1255 (592KB, 2%) checksum 61f284092cabf44a30d1442ef6dd075b2e346b7f


2018-02-08 14:08:34.709 P00 INFO: expire command begin 1.28: --log-level-console=info --repo-path=/var/lib/pgbackrest --retention-archive=2 --retention-full=2 --stanza=clustpgserver
2018-02-08 14:08:34.895 P00 INFO: full backup total < 2 - using oldest full backup for 10-1 archive retention
2018-02-08 14:08:34.932 P00 INFO: expire command end: completed successfully
[postgres@pgserver ~]$

We can see that by default PgBackRest will try to do an incremental backup. But as there is no full backup yet, a full backup will be done. Once full backup done, all future backups will be incremental unless we specify the type of backup.

[postgres@pgserver ~]$ pgbackrest --stanza=clustpgserver --log-level-console=info --db-port=5435 backup
2018-02-08 14:26:25.590 P00 INFO: backup command begin 1.28: --db1-path=/var/lib/pgsql/10/data --db1-port=5435 --log-level-console=info --repo-path=/var/lib/pgbackrest --retention-full=2 --stanza=clustpgserver
2018-02-08 14:26:29.314 P00 INFO: last backup label = 20180208-140653F, version = 1.28
2018-02-08 14:26:30.135 P00 INFO: execute non-exclusive pg_start_backup() with label "pgBackRest backup started at 2018-02-08 14:26:26": backup begins after the next regular checkpoint completes
...
2018-02-08 14:27:01.408 P00 INFO: expire command begin 1.28: --log-level-console=info --repo-path=/var/lib/pgbackrest --retention-archive=2 --retention-full=2 --stanza=clustpgserver
2018-02-08 14:27:01.558 P00 INFO: full backup total < 2 - using oldest full backup for 10-1 archive retention
2018-02-08 14:27:01.589 P00 INFO: expire command end: completed successfully
[postgres@pgserver ~]$

If we want to perform another full backup we can specify the option –type=full

[postgres@pgserver ~]$ pgbackrest --stanza=clustpgserver --log-level-console=info --db-port=5435 --type=full backup
2018-02-08 14:30:05.961 P00 INFO: backup command begin 1.28: --db1-path=/var/lib/pgsql/10/data --db1-port=5435 --log-level-console=info --repo-path=/var/lib/pgbackrest --retention-full=2 --stanza=clustpgserver --type=full
2018-02-08 14:30:08.472 P00 INFO: execute non-exclusive pg_start_backup() with label "pgBackRest backup started at 2018-02-08 14:30:06": backup begins after the next regular checkpoint completes
2018-02-08 14:30:08.993 P00 INFO: backup start archive = 000000010000000000000012, lsn = 0/12000028
….
….

To have info about our backups
[postgres@pgserver ~]$ pgbackrest --stanza=clustpgserver info
stanza: clustpgserver
status: ok
db (current)
wal archive min/max (10-1): 00000001000000000000000E / 000000010000000000000012
full backup: 20180208-140653F
timestamp start/stop: 2018-02-08 14:06:53 / 2018-02-08 14:08:19
wal start/stop: 00000001000000000000000E / 00000001000000000000000E
database size: 23.2MB, backup size: 23.2MB
repository size: 2.7MB, repository backup size: 2.7MB
incr backup: 20180208-140653F_20180208-142626I
timestamp start/stop: 2018-02-08 14:26:26 / 2018-02-08 14:26:52
wal start/stop: 000000010000000000000010 / 000000010000000000000010
database size: 23.2MB, backup size: 8.2KB
repository size: 2.7MB, repository backup size: 472B
backup reference list: 20180208-140653F
full backup: 20180208-143006F
timestamp start/stop: 2018-02-08 14:30:06 / 2018-02-08 14:31:30
wal start/stop: 000000010000000000000012 / 000000010000000000000012
database size: 23.2MB, backup size: 23.2MB
repository size: 2.7MB, repository backup size: 2.7MB
[postgres@pgserver ~]$

Now that we see how to perform backup with pgbackrest, let’s see how to restore.
First let identify the directory of our database files

[postgres@pgserver ~]$ psql
psql (10.1)
Type "help" for help.
postgres=# show data_directory ;
data_directory
------------------------
/var/lib/pgsql/10/data
(1 row)
postgres=#

And let’s remove all files in the directory

[postgres@pgserver data]$ pwd
/var/lib/pgsql/10/data
[postgres@pgserver data]$ ls
base pg_dynshmem pg_notify pg_stat_tmp pg_wal postmaster.pid
current_logfiles pg_hba.conf pg_replslot pg_subtrans pg_xact
global pg_ident.conf pg_serial pg_tblspc postgresql.auto.conf
log pg_logical pg_snapshots pg_twophase postgresql.conf
pg_commit_ts pg_multixact pg_stat PG_VERSION postmaster.opts
[postgres@pgserver data]$ rm -rf *
[postgres@pgserver data]$

Now if we try to connect, of course we will get errors

[postgres@pgserver data]$ psql
psql: could not connect to server: No such file or directory
Is the server running locally and accepting
connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5435"?
[postgres@pgserver data]$

So let’s restore with PgBackRest with the restore command

[postgres@pgserver ~]$ pgbackrest --stanza=clustpgserver --log-level-console=info restore
2018-02-08 14:52:01.845 P00 INFO: restore command begin 1.28: --db1-path=/var/lib/pgsql/10/data --log-level-console=info --repo-path=/var/lib/pgbackrest --stanza=clustpgserver
2018-02-08 14:52:03.490 P00 INFO: restore backup set 20180208-143006F
2018-02-08 14:52:21.904 P01 INFO: restore file /var/lib/pgsql/10/data/base/13805/1255 (592KB, 2%) checksum 61f284092cabf44a30d1442ef6dd075b2e346b7f
….
….
2018-02-08 14:53:21.186 P00 INFO: write /var/lib/pgsql/10/data/recovery.conf
2018-02-08 14:53:23.948 P00 INFO: restore global/pg_control (performed last to ensure aborted restores cannot be started)
2018-02-08 14:53:28.258 P00 INFO: restore command end: completed successfully
[postgres@pgserver ~]$

At the end of the backup, a recovery.conf file is created in the data directory

[postgres@pgserver data]$ cat recovery.conf
restore_command = '/usr/bin/pgbackrest --log-level-console=info --stanza=clustpgserver archive-get %f "%p"'

Now we can restart the PostgreSQL cluster

[postgres@pgserver data]$ pg_ctl start
waiting for server to start....2018-02-08 14:57:06.519 CET [4742] LOG: listening on IPv4 address "0.0.0.0", port 5435
2018-02-08 14:57:06.522 CET [4742] LOG: listening on IPv6 address "::", port 5435
2018-02-08 14:57:06.533 CET [4742] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5435"
2018-02-08 14:57:06.551 CET [4742] LOG: listening on Unix socket "/tmp/.s.PGSQL.5435"
2018-02-08 14:57:06.645 CET [4742] LOG: redirecting log output to logging collector process
2018-02-08 14:57:06.645 CET [4742] HINT: Future log output will appear in directory "log".
...... done
server started

And then connect

[postgres@pgserver data]$ psql
psql (10.1)
Type "help" for help.
postgres=#

Conclusion
In this blog we shown in a simple configuration how to perform backup using PgBackRest. This basic configuration can help for first use of PgBackRest. In future articles we will go further in an advanced use of this tool.

 

Cet article Backup and Restore PostgreSQL with PgBackRest I est apparu en premier sur Blog dbi services.

SQL Server on Docker and network bridge considerations

Wed, 2018-02-14 06:46

Let’s continue with this blog post series about SQL Server and Docker. A couple of days ago, I was in a customer shop that already implemented SQL Server 2017 on Linux as Docker containers. It was definitely a very interesting day with a lot of customer experience and feedbacks. We discussed with him about lot of architecture scenarios.

The interesting point here is I was able to compare with a previous customer who used docker containers for a while in a completely different way. Indeed, my new customer implemented a Docker infrastructure exclusively based on SQL Server containers whereas the older one already containerized its applications that were connected to an external and non-containerized SQL Server environment.

Use case 1 – Containerized apps and virtualized SQL Server environments Use case 2 – SQL Server containers and virtualized applications  blog 128 - 1- docker archi 2  blog 128 - 1 - docker archi 1

 

In this blog post I want to focus on the first use case in terms of networks.

Connecting to an outside SQL Server (from a docker perspective) is probably an intermediate solution for many customers who already deal with mission-critical environments implying very restrictive high-availability scenarios and when very high performance is required as well. Don’t get me wrong. I’m not saying docker is not designed for mission critical scenarios but let’s say that fear of unknown things, as virtualization before, is still predominant, at least for this kind of scenario. I always keep in mind the repetitive customer question: is Docker ready for production and for databases? Connecting to a non-containerized SQL Server environment may make sense here at least to speed containers adoption. That’s my guess but feel free to comment with your thoughts!

So, in this context we may use different Docker network topologies. I spent some times to study and to discuss with customers about implemented network topologies in their context. For simple Docker infrastructures (without orchestrators like Swarm or Kubernetes) Docker bridges seem to be predominant with either Docker0 bridges or user-defined bridges.

 

  • Docker default bridge (Docker0)

For very limited Docker topologies, default network settings will be probably sufficient with Docker0 bridge. It is probably the case of my latest customer with only 5 SQL Server containers on the top of one Docker engine. By default, each container created without any network specification (and any Docker engine setting customization) will have one network interface sitting on the docker0 bridge with an IP from 172.17.0.0/16 CIDR or whichever CIDR you have configured docker to use. But did you wonder what is exactly a bridge on Docker world?

Let’s have a deeper look on it with a very simple example concerning one docker engine that includes two containers based on microsoft/mssql-tools each and one outside SQL Server that runs on the top of Hyper-V virtual machine. The below picture shows some network details that I will explain later in this blog post.

blog 128 - 3 - docker network bridge

My 2 containers can communicate together because they are sitting on the same network bridge and they are also able to communicate with my database server through the NAT mechanism. IP masquerading and IP forwarding is enabled on my Docker host.

$ sudo docker run -tid --name docker1 microsoft/mssql-tools
77b501fe29af322dd2d1da2824d339a60ba3080c1e61a2332b3cf563755dd3e3

$ sudo docker run -tid --name docker2 microsoft/mssql-tools
3f2ba669591a1889068240041332f02faf970e3adc85619adbf952d5c135d3f4

$ sudo docker ps
CONTAINER ID        IMAGE                   COMMAND                  CREATED             STATUS              PORTS               NAMES
3f2ba669591a        microsoft/mssql-tools   "/bin/sh -c /bin/bash"   7 seconds ago       Up 6 seconds                            docker2
77b501fe29af        microsoft/mssql-tools   "/bin/sh -c /bin/bash"   11 seconds ago      Up 10 seconds                           docker1

 

Let’s take a look at the network configuration of each container. As a reminder, each network object represents a layer 2 broadcast domain with a layer 3 subnet as shown below. Each container is attached to a network through a specific endpoint.

$ sudo docker inspect docker1
[
"Gateway": "172.17.0.1",
            "GlobalIPv6Address": "",
            "GlobalIPv6PrefixLen": 0,
            "IPAddress": "172.17.0.2",
            "IPPrefixLen": 16,
            "IPv6Gateway": "",
            "MacAddress": "02:42:ac:11:00:02",
            "Networks": {
                "bridge": {
                    "IPAMConfig": null,
                    "Links": null,
                    "Aliases": null,
                    "NetworkID": "985f25500e3d0c55d419790f1ac446f92c8d1090dddfd69987a52aab0717e630",
                    "EndpointID": "bd82669031ad87ddcb61eaa2dad823d89ca86cae92c4034d4925009aae634c14",
                    "Gateway": "172.17.0.1",
                    "IPAddress": "172.17.0.2",
                    "IPPrefixLen": 16,
                    "IPv6Gateway": "",
                    "GlobalIPv6Address": "",
                    "GlobalIPv6PrefixLen": 0,
                    "MacAddress": "02:42:ac:11:00:02",
                    "DriverOpts": null
                }
            }
]

$sudo docker inspect docker2
[
"Gateway": "172.17.0.1",
            "GlobalIPv6Address": "",
            "GlobalIPv6PrefixLen": 0,
            "IPAddress": "172.17.0.3",
            "IPPrefixLen": 16,
            "IPv6Gateway": "",
            "MacAddress": "02:42:ac:11:00:03",
            "Networks": {
                "bridge": {
                    "IPAMConfig": null,
                    "Links": null,
                    "Aliases": null,
                    "NetworkID": "985f25500e3d0c55d419790f1ac446f92c8d1090dddfd69987a52aab0717e630",
                    "EndpointID": "140cd8764506344958e9a9725d1c2513f67e56b2c4a1fc67f317c3e555764c1e",
                    "Gateway": "172.17.0.1",
                    "IPAddress": "172.17.0.3",
                    "IPPrefixLen": 16,
                    "IPv6Gateway": "",
                    "GlobalIPv6Address": "",
                    "GlobalIPv6PrefixLen": 0,
                    "MacAddress": "02:42:ac:11:00:03",
                    "DriverOpts": null
                }
            }
]

 

To summarize, two IP addresses have been assigned for Docker1 container (172.17.0.2) and Docker2 container (172.17.0.3) in the IP address interval defined by the Docker0 bridge from the Docker internal IPAM module. Each network interface is created with their own MAC address and the gateway IP address (172.17.0.1) for both containers corresponds to the Docker0 bridge interface.

$ sudo ip a show docker0
4: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN
    link/ether 02:42:2a:d0:7e:76 brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
       valid_lft forever preferred_lft forever
    inet6 fe80::42:2aff:fed0:7e76/64 scope link
       valid_lft forever preferred_lft forever

 

Let’s try to connect from the both containers to my SQL Server database:

$ sudo docker exec -it docker1
…
$ sudo docker exec -it docker2
...

 

Then on each container let’s run the following sqlcmd command:

sqlcmd -S 192.168.40.30,1450 -Usa -Ptoto

 

Finally let’s switch on the SQL Server instance and let’s get a picture of existing connections (IP Address 192.168.40.30 and port 1450).

SELECT 
	c.client_net_address,
	c.client_tcp_port,
	c.local_net_address,
	c.protocol_type,
	c.auth_scheme,
	s.program_name,
	s.is_user_process
FROM sys.dm_exec_connections AS c
JOIN sys.dm_exec_sessions AS s ON c.session_id = s.session_id
WHERE client_net_address <> '<local machine>'

 

blog 128 - 4 - docker network bridge sqlcmd

We may notice that the IP address is basically the same (192.168.40.50) indicating we are using NAT to connect from each container.

Let’s go back to the Docker engine network configuration. After creating my 2 containers, we may notice the creation of 2 additional network interfaces.

$ ip a show | grep veth*
12: veth45297ff@if11: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker0 state UP
14: veth46a8316@if13: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker0 state UP

 

What are they? At this point, we are entering to Linux network namespace world. You can read further technical details on the internet but to keep simple network namespace concepts, I would say they allow to run different and separate network instances (including routing tables) that operate independent of each other. In other words, there is a way to isolate different networks from each other based on the same physical network device. Assuming we are using docker bridge type networks, when creating a container, in background we are creating a dedicated network namespace that includes a virtual ethernet interface which comes in interconnected pairs. In fact, a virtual ethernet interface acts as a tube to connect a Docker container namespace (in this context) to the outside world via the default / global namespace where the physical interface exists.

Before digging further into details about virtual interfaces let’s say by default Docker doesn’t expose network namespace information because it uses it own libcontainer and the microsoft/mssql-tools docker image is based on a simplified Linux image that doesn’t include network tools to easily show virtual interface information. So, a workaround is to expose a Docker container namespace into the host.

First we have to find out the process id of the container and then link its corresponding proc namespace to /var/run/netns host directory as shown below:

$ sudo docker inspect --format '{{.State.Pid}}' docker1
2094
$ sudo ln -s /proc/2094/ns/net /var/run/netns/ns-2094

 

Then we may use ip netns command to extract the network information

$ sudo ip netns
ns-2094 (id: 0)
$ sudo ip netns exec ns-2094 ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
11: eth0@if12: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT
    link/ether 02:42:ac:11:00:02 brd ff:ff:ff:ff:ff:ff link-netnsid 0

 

Here we go. The interesting information is the container network interface 11: eth0@if12

So, the first pair is the eth0 interface on the Docker container and the “outside” pair corresponds to the interface number 12. On the host the interface 12 corresponds to the virtual ethernet adapter veth45297ff. Note we may also find out the pair corresponding to the container interface (@if11).

$ ip a | grep "^12"
12: veth45297ff@if11: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker0 state UP

 

Finally, let’s take a look at the bridge used by the virtual ethernet adapter veth45297ff

$ sudo brctl show
bridge name     bridge id               STP enabled     interfaces
docker0         8000.02422ad07e76       no              veth45297ff
                                                        veth46a8316

 

The other veth (46a8316) corresponds to my second docker2 container.

 

  • User-defined network bridges

But as said previously using the Docker0 bridge is only suitable for very limited scenarios. User-defined bridges are more prevalent with more complex scenarios like microservice applications because they offer a better isolation between containers and the outside world as well as a better manageability and customization. At this stage we may also introduce macvlan networks but probably in the next blog post …

For example, let’s say you want to create 2 isolated network bridges for a 3-tiers application. The users will access the web server (from the exposed port) throughout the first network (frontend-server). But in the same time, you also want to prevent containers that sit on this network to make connections to the outside world. The second network (backend-server) will host containers that must have access to both the outside SQL Server database and the web server.

blog 128 - 5 - docker network bridge segregation

User-defined networks is a good solution to address these requirements. Let’s create two user-defined networks. Note by default containers may make connections to the outside world but the outside is not able to make connections to the containers without exposing listen ports. This is why I disabled ip masquerading (com.docker.network.bridge.enable_ip_masquerade=false) for the frontend-server network to meet the above requirements.

$sudo docker network create \
    --driver bridge \
    --subnet 172.20.0.0/16 \
  --gateway 172.20.0.1 \
  backend-server  
$sudo docker network create \
    --driver bridge \
    --subnet 172.19.0.0/16 \
    --gateway 172.19.0.1 \
    --opt com.docker.network.bridge.enable_ip_masquerade=false \
  frontend-server
$ sudo docker network ls 
NETWORK ID          NAME                DRIVER              SCOPE
5c6f48269d2b        backend-server      bridge              local
985f25500e3d        bridge              bridge              local
b1fbde4f4674        frontend-server     bridge              local
ad52b859e3f9        host                host                local
1beda56f93d3        none                null                local

 

Let’s now take a look at the corresponding iptables masquerading rules on my host machine:

$ sudo iptables -t nat -L -n | grep -i "masquerade"
MASQUERADE  all  --  172.20.0.0/16        0.0.0.0/0
MASQUERADE  all  --  172.17.0.0/16        0.0.0.0/0

 

You may notice only the Docker0 (172.17.0.0/16) and backend-server (172.20.0.0/16) bridges are allowed for ip masquerading.

Then let’s create 2 containers with the two first ones (docker1 and docker2) that will sit on the frontend-server network and the second one (docker2) on the backend-server network. For convenient purposes, I setup fixed hostnames for each container. I also used a different ubuntu image that provides this time all necessary network tools including ping command.

$ sudo docker run -d --rm --name=docker1 --hostname=docker1 --net=frontend-server -it smakam/myubuntu:v6 bash

$ sudo docker run -d --rm --name=docker2 --hostname=docker2 --net=frontend-server -it smakam/myubuntu:v6 bash

$sudo docker run -d --rm --name=docker3 --hostname=docker3 --net=backend-server -it smakam/myubuntu:v6 bash

$ sudo docker ps
CONTAINER ID        IMAGE                COMMAND             CREATED             STATUS              PORTS               NAMES
225ee13c38f7        smakam/myubuntu:v6   "bash"              2 minutes ago       Up 2 minutes                            docker3
d95014602fe2        smakam/myubuntu:v6   "bash"              4 minutes ago       Up 4 minutes                            docker2
1d9645f61245        smakam/myubuntu:v6   "bash"              4 minutes ago       Up 4 minutes                            docker1

 

First, probably one of the biggest advantages of using user-defined networks (unlike Docker0 bridge) is the ability to use automatic DNS resolution between containers on the same user-defined subnet on the same host (this is default behavior but you can override DNS settings by specifying –dns parameter at the container creation time). In fact, Docker applies update on the /etc/hosts file of each container when adding / deleting containers.

As expected, I may ping docker2 container from docker1 container and vice-versa but the same doesn’t apply between neither docker1 and docker3 nor docker2 and docker3 because they are not sitting on the same network bridge.

$ sudo docker exec -ti docker1 ping -c2 docker2
PING docker2 (172.19.0.3) 56(84) bytes of data.
64 bytes from docker2.frontend-server (172.19.0.3): icmp_seq=1 ttl=64 time=0.088 ms
64 bytes from docker2.frontend-server (172.19.0.3): icmp_seq=2 ttl=64 time=0.058 ms
…
$ sudo docker exec -ti docker2 ping -c2 docker1
PING docker1 (172.19.0.2) 56(84) bytes of data.
64 bytes from docker1.frontend-server (172.19.0.2): icmp_seq=1 ttl=64 time=0.084 ms
64 bytes from docker1.frontend-server (172.19.0.2): icmp_seq=2 ttl=64 time=0.054 ms
...
$ sudo docker exec -ti docker1 ping -c2 docker3
ping: unknown host docker3
...

 

From a network perspective, on the host we may notice the creation of two additional bridge interfaces and 3 virtual Ethernet adapters after the creation of the containers.

$ brctl show
bridge name     bridge id               STP enabled     interfaces
br-5c6f48269d2b         8000.0242ddad1660       no              veth79ae355
br-b1fbde4f4674         8000.02424bebccdd       no              vethb66deb8
                                                        vethbf4ab2d
docker0         8000.02422ad07e76       no
$ ip a | egrep "^[1-9][1-9]"
25: br-5c6f48269d2b: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP
28: br-b1fbde4f4674: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP
58: vethb66deb8@if57: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br-b1fbde4f4674 state UP
64: veth79ae355@if63: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br-5c6f48269d2b state UP

 

If I want to make the docker3 container reachable from docker2 container I may simply connect the latter to the corresponding network as shown below:

$ sudo docker network connect backend-server docker2

$ sudo docker inspect docker2
[
"Networks": {
                "backend-server": {
                    "IPAMConfig": {},
                    "Links": null,
                    "Aliases": [
                        "d95014602fe2"
                    ],
                    "NetworkID": "5c6f48269d2b752bf1f43efb94437957359c6a72675380c16e11b2f8c4ecaaa1",
                    "EndpointID": "4daef42782b22832fc98485c27a0f117db5720e11d806ab8d8cf83e844ca6b81",
                    "Gateway": "172.20.0.1",
                    "IPAddress": "172.20.0.3",
                    "IPPrefixLen": 16,
                    "IPv6Gateway": "",
                    "GlobalIPv6Address": "",
                    "GlobalIPv6PrefixLen": 0,
                    "MacAddress": "02:42:ac:14:00:03",
                    "DriverOpts": null
                },
                "frontend-server": {
                    "IPAMConfig": null,
                    "Links": null,
                    "Aliases": [
                        "d95014602fe2"
                    ],
                    "NetworkID": "b1fbde4f4674386a0e01b7ccdee64ed8b08bd8505cd7f0021487d32951035570",
                    "EndpointID": "651ad7eaad994a06658941cda7e51068a459722c6d10850a4b546382c44fff86",
                    "Gateway": "172.19.0.1",
                    "IPAddress": "172.19.0.3",
                    "IPPrefixLen": 16,
                    "IPv6Gateway": "",
                    "GlobalIPv6Address": "",
                    "GlobalIPv6PrefixLen": 0,
                    "MacAddress": "02:42:ac:13:00:03",
                    "DriverOpts": null
                }
            }
]

 

You may notice the container is connected to the frontend-server and backend-server as well thanks to an additional network interface created at same time.

$ sudo docker exec -it docker2 ip a show | grep eth
59: eth0@if60: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
    link/ether 02:42:ac:13:00:03 brd ff:ff:ff:ff:ff:ff
    inet 172.19.0.3/16 brd 172.19.255.255 scope global eth0
68: eth2@if69: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
    link/ether 02:42:ac:14:00:03 brd ff:ff:ff:ff:ff:ff
    inet 172.20.0.3/16 brd 172.20.255.255 scope global eth2

 

Pinging both docker1 container and docker3 container from docker2 container is successful now.

$ sudo docker exec -it docker2 ping -c2 docker1
PING docker1 (172.19.0.2) 56(84) bytes of data.
64 bytes from docker1.frontend-server (172.19.0.2): icmp_seq=1 ttl=64 time=0.053 ms
64 bytes from docker1.frontend-server (172.19.0.2): icmp_seq=2 ttl=64 time=0.052 ms
…
$ sudo docker exec -it docker2 ping -c2 docker3
PING docker3 (172.20.0.2) 56(84) bytes of data.
64 bytes from docker3.backend-server (172.20.0.2): icmp_seq=1 ttl=64 time=0.082 ms
64 bytes from docker3.backend-server (172.20.0.2): icmp_seq=2 ttl=64 time=0.054 ms
…

 

In this blog post, we surfaced Docker network bridges and use cases we may have to deal with SQL Server instances regarding the context. As a reminder, user-defined networks may allow to define fine-grained policy rules to interconnect containers on different subnets. This is basically what we may want to achieve with microservices applications. Indeed, such applications include some components that need to span multiple networks (backend and frontend networks) whereas other ones should by isolated (even from outside) regarding their role.

Happy containerization!

 

 

 

 

Cet article SQL Server on Docker and network bridge considerations est apparu en premier sur Blog dbi services.

Could you trust option_packs_usage_statistics.sql ?

Wed, 2018-02-14 04:17
Introduction

As a former Oracle LMS qualified auditor my opinion is sometimes requested before/during/after an Oracle LMS audit or simply to ensure a customer that his Oracle database is 100% in conformity with Oracle Licensing Policy. Even if

“The Oracle License Management Services (LMS) Group is the only Oracle group authorized to review and provide opinions on compliance status and will provide guidance, education and impartial opinions on a customer or partner’s compliance state. For more information please visit the following website: http://www.oracle.com/corporate/lms.”

I very do hope that you will find interesting tips in this blog.

Most of the time when a customer would like to check which Oracle options are used by his database infrastructure he is using the well known script “option_packs_usage_statistics.sql”. dbi services did checks of the options detected by the script provided by Oracle (My Oracle Support DOC ID 1317265.1). Depending on your database usage this script will detect the usage of different options, but could you really trust the output of this script and how to interpret the output ?

Could we trust the output of option_packs_usage_statistics.sql ?

The answer is quite easy and short: NO you can’t !

Why? Because as for any software there are some bugs and these bugs lead to false positive detection. The good news is that some of these false positive are documented on My Oracle Support. Indeed the script options_packs_usage_statistics.sql used and provided by Oracle has 14 documented bugs (My Oracle Support Doc ID 1309070.1) and some other non-documented bugs (eg. My Oracle Support BUG 17164904). These bugs are related to:

1.    Bug 11902001 – Exclude default users for Feature usage tracking for Securefiles option
2.    Bug 11902142 – Exclude default users for Feature usage tracking for Advanced Compression option
3.    Bug 19618850 – SOLUTION TO PREVENT UNINTENTED ORACLE OPTION USAGE
4.    Query against DBA_FEATURE_USAGE_STATISTICS is not a true test for use of SDO
5.    Bug 16088534 : RMAN default Backup BZIP2 Compression feature is reported wrongly as as an Advanced Compression feature
6.    Bug 22122625 – GETTING FALSE POSITIVES ON USAGE OF ADVANCED INDEX COMPRESSION
7.    Bug 24844549 – ADVANCED INDEX COMPRESSION SHOWS USAGE IN DBA_FEATURE_USAGE_STATISTICS WITH HCC
8.    Bug 16859747 – DBA_FEATURE_USAGE_STATISTICS SHOWS INCORRECT USAGE FOR HEAPCOMPRESSION
9.    Bug 16563444 – HEAT MAP FEATURE USAGE TRACKING IS NOT CORRECT
10.    Bug 19317899 – IMC: IN-MEMORY OPTION IS REPORTED AS BEING USED EVEN INMEMORY_SIZE IS 0
11.    Bug 19308780 – DO NOT FEATURE TRACK OBJECTS FOR IM WHEN INMEMORY_SIZE = 0
12.    Bug 21248059 – DBA_FEATURE_USAGE_STATISTICS BUG IN TRACKING “HYBRID COLUMNAR COMPRESSION” FEAT
13.    Bug 25661076 – DBA_FEATURE_USAGE_STATISTICS INCORRECTLY SHOWS SPATIAL USAGE IN 12C
14.    Bug 23734270 – DBA_FEATURE_USAGE_STATISTICS SHOWS PERMANENT USAGE OF REAL-TIME SQL MONITORING

These bugs may lead to the detection of features such as : Automatic Maintenance – SQL Tuning advisor & Automatic SQL Tuning Advisor, Real-Time SQL monitoring, Advanced security – Oracle Utility Datapump (Export) and Oracle Utility Datapump (Import), Advanced Compression – Heat Map, Advanced Compression – Oracle Utility Datapump (Export) and Oracle Utility Datapump (Import), aso….

Of course these bugs make the real options usage analysis especially difficult even for an experimented Database Administrator. Additionally the Oracle database in version 12 could make usage of options in maintenance windows without manual activation. That the case for instance of options such as : Automatic Maintenance – SQL Tuning Advisor, Automatic SQL Tuning Advisor and Automatic SQL Tuning Advisor.

14. Bug 23734270 – DBA_FEATURE_USAGE_STATISTICS SHOWS PERMANENT USAGE OF REAL-TIME SQL MONITORING
On a freshly created 12c database, DBA_FEATURE_USAGE_STATISTICS shows usage of Real-Time SQL Monitoring even if no reports have been run from OEM pages or with DBMS_SQL_MONITOR.
Reason :SQL Monitor reports are automatically generated and saved in AWR but should be considered as system usage.
This behavior is the same for all 12 releases and is not present in 11.2. – Extract of My Oracle Support Bug 23734270

Even if LMS team is not using option_packs_usage_statistics.sql script, the output of LMS_Collection_Tool (ReviewLite.sql) is quite the same. The direct consequence in case of an Oracle LMS audit is that the auditor could detect options that you simply never used and you will have to make the proof of non usage… if not you will have to pay the invoice following the final LMS report as stated in your LMS preliminary/final report.

“Based upon the information provided to License Management Services, the following licensing issues need to be resolved within 30 days from the date of the Final Report.”

“In accordance to Oracle compliance policies, backdated support charges are due for the period of unlicensed usage of Oracle Programs.
Please provide your feedback on this preliminary report within 10 days from the presentation of this report.”- extract of an Oracle LMS audit

Even if I do not have hundreds of cases where the LMS department made wrong detection, I’ve concrete stories where LMS team detected some false positives. Last case was related to the detection of more than 700 usage of Advanced compression due to unpublished BUG 17164904. Thanks to the metalink Doc ID 1993134.1, the bug is explained:

In 12.1.0.1,  the compression counter is incorrectly incremented (COMPRESSCNT=1) for compression=metadata_only (either explicitly or by default) due to unpublished BUG 17164904 – INCORRECT FEATURE USAGE STATISTICS FOR DATA PUMP COMPRESSION, fixed with 12.1.0.2.

How to interpret the output of option_packs_usage_statistics.sql ?

Sometimes this script could provide you some non sense option usage. That the case for instance for features provided only since database version 12c but detected on your old database version 11g. In such a case simply edit the option_packs_usage_statistics.sql script and have a look on the comments. A perfect example of that is illustrated by the detection of Heat Map usage in database version 11g whereas this option is available since version 12c. You can see below another example of wrong options detection related to “Automatic Maintenance – SQL Tuning Advisor” and “Automatic SQL Tuning Advisor”:


SELECT ‘Tuning Pack’                                         , ‘Automatic Maintenance – SQL Tuning Advisor’              , ‘^12\.‘                      , ‘INVALID‘ from dual union all  – system usage in the maintenance window
SELECT ‘Tuning Pack’                                         , ‘Automatic SQL Tuning Advisor’                            , ‘^11\.2|^12\.’               , ‘INVALID‘ from dual union all  — system usage in the maintenance window
SELECT ‘Tuning Pack’                                         , ‘Real-Time SQL Monitoring’                                , ‘^11\.2′                     , ‘ ‘       from dual union all

This INVALID clause explain that the detection of this option is due to system usage in the maintenance window in version 12 (Automatic Maintenance – SQL Tuning Advisor) and in version 11.2 and 12 for Automatic SQL Tuning Advisor. This is also explained few lines after in the option_packs_usage_statistics.sql script :


where nvl(CONDITION, ‘-‘) != ‘INVALID‘                   — ignore features for which licensing is not required without further conditions

    and not (CONDITION = ‘C003′ and CON_ID not in (0, 1))  — multiple PDBs are visible only in CDB$ROOT; PDB level view is not relevant
)

In such a case the option does not have to be considered since the normal behavior of an oracle database in version 12 is to use this option in the maintenance window. This is just an example to illustrate that some detected option does not have to be licensed as explained in the script.

Conclusion

I very do hope that this blog helps you to have a better understanding of how to detect what your database infrastructure really uses in terms of Oracle options. Anyway if you are convinced that you do not use an Oracle database option despite the output of scripts such as option-packs_usage_statistics or ReviewLite which proofs the opposite, have a look on My Oracle Support. Look for bug related to wrong detection of this feature and with a little bit of luck you will find something interesting. Oracle is definitively engineered for heroes…

Oracle Engineered For Heroes

Oracle Engineered For Heroes

 

Cet article Could you trust option_packs_usage_statistics.sql ? est apparu en premier sur Blog dbi services.

How we build our customized PostgreSQL Docker image

Tue, 2018-02-13 13:21

Docker becomes more and more popular these days and a lot of companies start to really use it. At one project we decided to build our own customized Docker image instead of using the official PostgreSQL one. The main reason for that is that we wanted to compile from source so that we only get want is really required. Why having PostgreSQL compiled with tcl support when nobody will ever use that? Here is how we did it …

To dig in right away, this is the simplified Dockerfile:

FROM debian

# make the "en_US.UTF-8" locale so postgres will be utf-8 enabled by default
ENV LANG en_US.utf8
ENV PG_MAJOR 10
ENV PG_VERSION 10.1
ENV PG_SHA256 3ccb4e25fe7a7ea6308dea103cac202963e6b746697366d72ec2900449a5e713
ENV PGDATA /u02/pgdata
ENV PGDATABASE "" \
    PGUSERNAME "" \
    PGPASSWORD ""

COPY docker-entrypoint.sh /

RUN set -ex \
        \
        && apt-get update && apt-get install -y \
           ca-certificates \
           curl \
           procps \
           sysstat \
           libldap2-dev \
           libpython-dev \
           libreadline-dev \
           libssl-dev \
           bison \
           flex \
           libghc-zlib-dev \
           libcrypto++-dev \
           libxml2-dev \
           libxslt1-dev \
           bzip2 \
           make \
           gcc \
           unzip \
           python \
           locales \
        \
        && rm -rf /var/lib/apt/lists/* \
        && localedef -i en_US -c -f UTF-8 en_US.UTF-8 \
        && mkdir /u01/ \
        \
        && groupadd -r postgres --gid=999 \
        && useradd -m -r -g postgres --uid=999 postgres \
        && chown postgres:postgres /u01/ \
        && mkdir -p "$PGDATA" \
        && chown -R postgres:postgres "$PGDATA" \
        && chmod 700 "$PGDATA" \
        \
        && curl -o /home/postgre/postgresql.tar.bz2 "https://ftp.postgresql.org/pub/source/v$PG_VERSION/postgresql-$PG_VERSION.tar.bz2" \
        && echo "$PG_SHA256 /home/postgres/postgresql.tar.bz2" | sha256sum -c - \
        && mkdir -p /home/postgres/src \
        && chown -R postgres:postgres /home/postgres \
        && su postgres -c "tar \
                --extract \
                --file /home/postgres/postgresql.tar.bz2 \
                --directory /home/postgres/src \
                --strip-components 1" \
        && rm /home/postgres/postgresql.tar.bz2 \
        \
        && cd /home/postgres/src \
        && su postgres -c "./configure \
                --enable-integer-datetimes \
                --enable-thread-safety \
                --with-pgport=5432 \
                --prefix=/u01/app/postgres/product/$PG_VERSION \\
                --with-ldap \
                --with-python \
                --with-openssl \
                --with-libxml \
                --with-libxslt" \
        && su postgres -c "make -j 4 all" \
        && su postgres -c "make install" \
        && su postgres -c "make -C contrib install" \
        && rm -rf /home/postgres/src \
        \
        && apt-get update && apt-get purge --auto-remove -y \
           libldap2-dev \
           libpython-dev \
           libreadline-dev \
           libssl-dev \
           libghc-zlib-dev \
           libcrypto++-dev \
           libxml2-dev \
           libxslt1-dev \
           bzip2 \
           gcc \
           make \
           unzip \
        && apt-get install -y libxml2 \
        && rm -rf /var/lib/apt/lists/*

ENV LANG en_US.utf8
USER postgres
EXPOSE 5432
ENTRYPOINT ["/docker-entrypoint.sh"]

We based the image on the latest Debian image, that is line 1. The following lines define the PostgreSQL version we will use and define some environment variables we will user later. What follows is basically installing all the packages required for building PostgreSQL from source, adding the operating system user and group, preparing the directories, fetching the PostgreSQL source code, configure, make and make install. Pretty much straight forward. Finally, to shrink the image, we remove all the packages that are not any more required after PostgreSQL was compiled and installed.

The final setup of the PostgreSQL instance happens in the docker-entrypoint.sh script which is referenced at the very end of the Dockerfile:

#!/bin/bash

# this are the environment variables which need to be set
PGDATA=${PGDATA}/${PG_MAJOR}
PGHOME="/u01/app/postgres/product/${PG_VERSION}"
PGAUTOCONF=${PGDATA}/postgresql.auto.conf
PGHBACONF=${PGDATA}/pg_hba.conf
PGDATABASENAME=${PGDATABASE}
PGUSERNAME=${PGUSERNAME}
PGPASSWD=${PGPASSWORD}

# create the database and the user
_pg_create_database_and_user()
{
    ${PGHOME}/bin/psql -c "create user ${PGUSERNAME} with login password '${PGPASSWD}'" postgres
    ${PGHOME}/bin/psql -c "create database ${PGDATABASENAME} with owner = ${PGUSERNAME}" postgres
}

# start the PostgreSQL instance
_pg_prestart()
{
    ${PGHOME}/bin/pg_ctl -D ${PGDATA} -w start
}

# start postgres and do not disconnect
# required for docker
_pg_start()
{
    ${PGHOME}/bin/postgres "-D" "${PGDATA}"
}

# stop the PostgreSQL instance
_pg_stop()
{
    ${PGHOME}/bin/pg_ctl -D ${PGDATA} stop -m fast
}

# initdb a new cluster
_pg_initdb()
{
    ${PGHOME}/bin/initdb -D ${PGDATA} --data-checksums
}


# adjust the postgresql parameters
_pg_adjust_config() {
    # PostgreSQL parameters
    echo "shared_buffers='128MB'" >> ${PGAUTOCONF}
    echo "effective_cache_size='128MB'" >> ${PGAUTOCONF}
    echo "listen_addresses = '*'" >> ${PGAUTOCONF}
    echo "logging_collector = 'on'" >> ${PGAUTOCONF}
    echo "log_truncate_on_rotation = 'on'" >> ${PGAUTOCONF}
    echo "log_filename = 'postgresql-%a.log'" >> ${PGAUTOCONF}
    echo "log_rotation_age = '1440'" >> ${PGAUTOCONF}
    echo "log_line_prefix = '%m - %l - %p - %h - %u@%d '" >> ${PGAUTOCONF}
    echo "log_directory = 'pg_log'" >> ${PGAUTOCONF}
    echo "log_min_messages = 'WARNING'" >> ${PGAUTOCONF}
    echo "log_autovacuum_min_duration = '60s'" >> ${PGAUTOCONF}
    echo "log_min_error_statement = 'NOTICE'" >> ${PGAUTOCONF}
    echo "log_min_duration_statement = '30s'" >> ${PGAUTOCONF}
    echo "log_checkpoints = 'on'" >> ${PGAUTOCONF}
    echo "log_statement = 'none'" >> ${PGAUTOCONF}
    echo "log_lock_waits = 'on'" >> ${PGAUTOCONF}
    echo "log_temp_files = '0'" >> ${PGAUTOCONF}
    echo "log_timezone = 'Europe/Zurich'" >> ${PGAUTOCONF}
    echo "log_connections=on" >> ${PGAUTOCONF}
    echo "log_disconnections=on" >> ${PGAUTOCONF}
    echo "log_duration=off" >> ${PGAUTOCONF}
    echo "client_min_messages = 'WARNING'" >> ${PGAUTOCONF}
    echo "wal_level = 'replica'" >> ${PGAUTOCONF}
    echo "hot_standby_feedback = 'on'" >> ${PGAUTOCONF}
    echo "max_wal_senders = '10'" >> ${PGAUTOCONF}
    echo "cluster_name = '${PGDATABASENAME}'" >> ${PGAUTOCONF}
    echo "max_replication_slots = '10'" >> ${PGAUTOCONF}
    echo "work_mem=8MB" >> ${PGAUTOCONF}
    echo "maintenance_work_mem=64MB" >> ${PGAUTOCONF}
    echo "wal_compression=on" >> ${PGAUTOCONF}
    echo "max_wal_senders=20" >> ${PGAUTOCONF}
    echo "shared_preload_libraries='pg_stat_statements'" >> ${PGAUTOCONF}
    echo "autovacuum_max_workers=6" >> ${PGAUTOCONF}
    echo "autovacuum_vacuum_scale_factor=0.1" >> ${PGAUTOCONF}
    echo "autovacuum_vacuum_threshold=50" >> ${PGAUTOCONF}
    # Authentication settings in pg_hba.conf
    echo "host    all             all             0.0.0.0/0            md5" >> ${PGHBACONF}
}

# initialize and start a new cluster
_pg_init_and_start()
{
    # initialize a new cluster
    _pg_initdb
    # set params and access permissions
    _pg_adjust_config
    # start the new cluster
    _pg_prestart
    # set username and password
    _pg_create_database_and_user
}

# check if $PGDATA exists
if [ -e ${PGDATA} ]; then
    # when $PGDATA exists we need to check if there are files
    # because when there are files we do not want to initdb
    if [ -e "${PGDATA}/base" ]; then
        # when there is the base directory this
        # probably is a valid PostgreSQL cluster
        # so we just start it
        _pg_prestart
    else
        # when there is no base directory then we
        # should be able to initialize a new cluster
        # and then start it
        _pg_init_and_start
    fi
else
    # initialze and start the new cluster
    _pg_init_and_start
    # create PGDATA
    mkdir -p ${PGDATA}
    # create the log directory
    mkdir -p ${PGDATA}/pg_log
fi
# restart and do not disconnect from the postgres daemon
_pg_stop
_pg_start

The important point here is: PGDATA is a persistent volume that is linked into the Docker container. When the container comes up we need to check if something that looks like a PostgreSQL data directory is already there. If yes, then we just start the instance with what is there. If nothing is there we create a new instance. Remember: This is just a template and you might need to do more checks in your case. The same is true for what we add to pg_hba.conf here: This is nothing you should do on real systems but can be handy for testing.

Hope this helps …

 

Cet article How we build our customized PostgreSQL Docker image est apparu en premier sur Blog dbi services.

La Suisse concernée par le RGPD ?

Tue, 2018-02-13 05:23

Le 25 mai 2018 entretra en application le Règlement Général sur la Protection des Données (RGPD). Il impactera toutes les entreprises opérant du traitement de données à caractère personnel sur des résidents européens.

A l’aube de cette échéance, beaucoup de fausses informations circulent au sein des entreprises suisse. L’objectif de ce blog est d’apporter un peu de lumière à certaines questions qui se posent. Pour ce faire, nous avons demandé à Me Metille, avocat spécialisé en protection des données de répondre aux questions que peut se poser une entreprise Suisse face à ce règlement.

1. Q. Est-ce qu’un sous-traitant d’une entreprise soumise au RGPD est obligatoirement soumis au RGPD, ceci en lien avec la chaîne de solidarité entre le responsable du traitement et ses fournisseurs ?
A. > Certains spécialistes considèrent que le sous-traitant est soumis, mais la plupart retiennent que le sous-traitant n’est pas soumis du seul fait qu’il est sous-traitant. En revanche, le sous-traitant qui vise de manière intentionnelle des clients européens sera soumis. Dans tous les cas le sous-traitant aura des obligations liées au RGPD, notamment celles de l’article 28.

2. Q. Est-ce qu’une entreprise Suisse qui emploie des personnes dans l’Union Européenne est systématiquement soumise au RGPD ?
A. > Non, une entreprise n’est pas soumise au RGPD sous prétexte qu’elle emploie des ressortissants de l’union européenne.

3. Q. Est-ce qu’une entreprise suisse qui emploie des personnes dans l’Union Européenne est soumise au RGPD si ses employés ont une activité commerciale dans l’Union Européenne ?
A. > Cela dépend de plusieurs facteurs tel que le lieu d’émission des factures, le lieu depuis lequel est initié l’activité ou encore une volonté de développer une activité en Europe, mais il y a un risque que les personnes employés soient qualifiées d’établissement et que ‘entreprise suisse soit assimilée à une entreprise présente dans l’Union Européenne.

4. Q. En tant que sous-traitant, est-ce qu’il y a une obligation de conseil par rapport aux données qui sont traitées ?
A > Le sous-traitant doit principalement s’assurer de la sécurité des données traitées et du respect du contrat. Si le sous-traitant à connaissance d’un élément qui peut potentiellement mettre en péril la sécurité des données il se doit de l’annoncer. (ex. Connexion VPN non sécurisée permettant d’accéder à des données sensibles). Dans certains cas, il sera de son devoir de refuser d’exécuter les tâches qui lui sont mandatées (ex. faire un export de données sensibles non crypté et l’envoyer à un destinataire hors de l’Union Européenne) La responsable de traitement est celui qui a le plus de responsabilités et qui doit s’assurer que le traitement est licite.

5. Q. Est-ce que le droit à l’effacement ou droit à l’oubli spécifié dans l’article 17 de la RGPD est contradictoire avec des articles de loi nécessitant de conserver des données pour une durée déterminée?
A. > Si une loi suisse indique qu’il est nécessaire de conserver des informations pour une durée déterminée c’est cette loi qui prévaut.

6. Q. Comment concilier droit à l’oubli et politique de sauvegarde de données ?
A. > La sauvegarde ne doit pas être utilisée dans un autre but, et en particulier pas pour retrouver des données que l’on aurait pas le droit de conserver. Il faut prendre les mesures nécessaires afin que d’une part seules les personnes autorisées puissent accéder aux médias de sauvegarde et que d’autre part en cas de restauration d’un jeu de sauvegarde seule les données pouvant l’être soient exploitées (on retirera des données privées, des données qui ont été effacées du support original après la sauvegarde ou que le droit à l’oubli implique de ne pas conserver.)

7. Q. Combien de temps peut-on conserver des données personnelles ?
A. > Il ne faut pas les conserver  plus longtemps que le temps nécessaire à atteindre le but visé au moment de leur collecte. On peut néanmoins ajouter une marge de sécurité raisonnable. Un délai de quelques mois peut être considéré comme une marge de sécurité raisonnable

8. Q. Peut-on conserver ad vitam aeternam la postulation d’un employé ?
A. > Non, si le candidat n’est pas retenu, il faut retourner ou détruire son dossier de candidature. Une conservation durant 3-4 mois, pour pouvoir se justifier en cas de reproches liés au non-engagment, est admise.
Si on veut conserver le dossier en vue d’un autre poste, il faut explicite du candidat (et dans ce cas on se limitera à 2-3 ans).

9. Q. Quels moyens techniques peuvent participer concrètement à la protection des données ?
A. > Logiciel d’encryption, de disponibilité et d’intégrité des données. Tous ce qui permet de retracer ce qui a été fait (contrôle d’accès, log, documentation, hardening, etc..)

10. Q. Est-ce que les sanctions liées au RGPD s’appliquent uniquement en cas de faille de sécurité ou de vol de données ?
A. > Les sanctions s’appliquent en cas de violation des règles du RGPD et donc pas uniquement en cas de faille de sécurité. Il pourrait même y avoir une double sanction en cas de faille de sécurité si l’entreprise n’annonce pas une faille. Dans un premier temps l’entreprise soumise au RGPD pourrait être sanctionnée pour n’avoir pas assuré la sécurité des données et dans un deuxième temps car l’entreprise qui a subi un vol de données ne l’a pas annoncé. Même sans se faire voler les données, elle pourrait déjà être amendée pour ne pas avoir pris des mesures élémentaires visant à assurer la sécurité.

11. Q. Quel pays de l’Union Européenne va appliquer les sanctions sur le territoire suisse ?
A. > Vraisemblablement chaque autorité nationale d’un pays de l’Union européenne ou des résidents sont concernés. Il n’est pas certain que les actes d’enquêtes et la sanction soient valablement reconnus par un juge suisse, ce qui rendrait l’encaissement difficile.

12. Q. Est-ce qu’il y a des assurances pour couvrir les risques liés au RGPD ?
A. > Oui il existe des produits pour couvrir certains risques mais on trouve de tout et certains produits sont fantaisistes. Il est important de distinguer ce qui est couvert de ce qui ne l’est pas, les conditions d’exclusion, le franchises et les limites de couverture.

13. Q. Est-ce qu’une entreprise soumise au RGPD à l’obligation d’informer un sous-traitant qu’il travaille sur des données personnelles ?
A.> L’entreprise soumise au RGPD doit non seulement indiquer à son sous-traitant qu’il travaille sur des données personnelles, mais en plus lui indiquer précisément ce qu’il doit faire.

14. Q. Faut-il avoir un DPO externe ou interne ?
>A. Le DPO est exigé dans des cas limités (autorité publique, suivi systématique à grande échelles, données sensibles). Il peut être interne ou externe.
Indépendamment de l’obligation légale, l’important est que le DPO dispose d’une bonne connaissance de l’entreprise et soit disponible pour les employés ou chefs de projets. Si le DPO n’a pas les connaissances juridiques ou techniques, il pourra demander de l’aide extérieure, mais si il ne connaît pas bien l’entreprise, il aura de la peine à se faire aider sur les aspects métiers.
>A. Le rôle de DPO peut être assumé par plusieurs personnes (un comité) mais il faut que les responsabilités soient clairement définies au préalable et qu’il y ait un seul point de contact externe.

15. Q. Quel lien doit-il y avoir entre le DPO et la Direction ?
>A. Le DPO doit faire rapport au plus haut niveau de la Direction. Il exerce ses tâches de manière indépendante et ne reçoit pas d’instruction. Il peut aussi assumer d’autres tâches et être rattachés administrativement à un service et pas directement à la Direction.

16. Q. Existe-t-il un/des liens entre GDPR et ISO 27000 ?
A. > Les thèmes tel que Identification des traitements, audit, data maping, contrôle d’accès et documentation sont très semblables.

Vous trouverez d’autres informations en lien avec le RGPD sur le blog de Me Metille à l’adresse suivante : https://smetille.ch/2017/12/14/la-suisse-et-le-rgpd/ ou encore dans l’article du temps, Le RGPD, la révolution du consentement, daté du 12 février 2018.

En espérant que ce blog ait pu répondre à certaines de vos questions.

 

Cet article La Suisse concernée par le RGPD ? est apparu en premier sur Blog dbi services.

V$MYSTAT delta values

Mon, 2018-02-12 14:32

Here is a little script I use from time to time to look at V$MYSTAT values and displaying on one line a set of statistics with their delta value between two calls.

The first script, _mystat_init.sql, initializes the variables. The second one displays the values, such as:

SQL> @ _mystat_diff.sql
 
db block changes redo size undo change vector size redo entries
---------------- ---------------- ----------------------- ----------------
57,371 15,445,852 6,111,608 37,709

Those two scripts are generated by defining the statistics:

define names="'redo size','redo entries','undo change vector size','db block changes'"

abd running the following to spool the two scripts:

sqlplus -s / as sysdba <<'END'
set pagesize 0 feedback off linesize 1000 trimspool on verify off echo off
with stats as (
select rownum n,stat_id,name from (select stat_id,name from v$statname where name in (&names) order by stat_id)
)
select 'define LAG'||stat_id||'=0' from stats
union all
select 'column "CUR'||stat_id||'" new_value '||'LAG'||stat_id||' noprint' from stats
union all
select 'column "DIF'||stat_id||'" heading '''||name||''' format 999G999G999G999' from stats
.
spool _mystat_init.sql
/
spool off
with stats as (
select rownum n,stat_id,name from (select stat_id,name from v$statname where name in (&names) order by stat_id)
)
select 'set termout off verify off' from dual
union all
select 'select ' from dual
union all
select ' '||decode(n,1,' ',',')||'"CUR'||stat_id||'" - '||'&'||'LAG'||stat_id||' "DIF'||stat_id||'"' from stats
union all
select ' '||',nvl("CUR'||stat_id||'",0) "CUR'||stat_id||'"' from stats
union all
--select ','''||'&'||'1'' comments' from dual
--union all
select q'[from (select stat_id,value from v$mystat join v$statname using(statistic#) where name in (&names)) pivot (avg(value)for stat_id in (]' from dual
union all
select ' '||decode(n,1,' ',',')||stat_id||' as "CUR'||stat_id||'"' from stats
union all
select '))' from dual
union all
select '.' from dual
union all
select 'set termout on' from dual
union all
select '/' from dual
.
spool _mystat_diff.sql
/
spool off
END

Then, in sqlplus or SQLcl, you run:

SQL> _mystat_init.sql

to initialize the values to 0 and:

SQL> @ _mystat_diff.sql

each time you want to display the difference from last call.

 

Cet article V$MYSTAT delta values est apparu en premier sur Blog dbi services.

Index seek operator and residual IO

Mon, 2018-02-12 09:00

This blog post draws on the previous article about index seek and gotchas. I encountered another interesting case but it is not so much because of the nature of the problem I decided to write this article but rather the different ways that exist to troubleshoot it.

Firstly, let’s set the scene:

A simple update query and its corresponding execution plan that tend to say the plan is efficient  in terms of performance

declare @P0 datetime2
       ,@P1 int,
       @P2 nvarchar(4000)

set @P0 = GETDATE()
set @P1 = 1
set @P2 = '005245'

UPDATE
	[TABLE]
SET
	[_DATE] = @P0
WHERE
	[_ID] = @P1
    AND
    [_IDENTIFIER] = @P2
;

 

blog 129 - 1- query plan before optimization

An index Seek – with a cardinality of 1 – and then a clustered index update operation. A simple case as you may notice here. But although the customer noticed a warning icon in the execution plan, he was confident enough about this plan to take care about it.

But if we look closer at the warning icon it concerns a CONVERT_IMPLICT operation (meaning hidden operation done by the optimizer) that may affect the SeekPlan as stated by the pop up:

blog 129 - 2- query plan warning

In fact, in the context of this customer, this query became an issue when it was executed a thousand times in a very short period leading to trigger lock time out issues and to consume one entire VCPU of the server as well.

Let’s jump quickly to the root cause and the solution. Obviously the root cause concerned the @P2 parameter type here – NVARCHAR(4000) – that forced the optimizer to use an CONVERT_IMPLICT operation which makes the predicate non sargable and an index seek operation inefficient. This is particularly true when the CONVERT_IMPLICIT operation concerns the column in the predicate rather than the leading parameter. Indeed in this case @P2 type is NVARCHAR(4000) whereas the _IDENTIFIER column type is VARCHAR(50) and the optimizer has to convert first the _IDENTIFIER column (with lower precedence type) to match with the @P2 parameter (with the higher precedence type). Let’s say also that the @P1 parameter is not selective enough to prevent a range scan operation in this case.

Although the actual execution plan is showing index seek with a cardinality of 1, I was able to confirm quickly to my customer that the real number was completely overshadowed by using IO and TIME statistics.

  CPU time = 0 ms,  elapsed time = 0 ms.
Table 'TABLE'. Scan count 1, logical reads 17152, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
   CPU time = 609 ms,  elapsed time = 621 ms.

 

Well, do you really think that a true index seek may lead to read such number of pages and to consume this amount of CPU per execution? :) … I let you image how intensive this query may be in terms of CPU when executed a lot of times in a very short period …

Another point that puts us quickly on the right track is that the comparison is done in the predicate section of the index seek operation meaning it will be checked after the seek predicate. Thus, the index seek operator acts as an index scan by starting at the first page and keeps running until satisfying the seek predicate.

blog 129 - 3- query plan index seek operator

Referring to the IO statistics output above we may notice the operator read 17152 pages and a quick look at the corresponding index physical stats allowed us to assume safely the storage engine read a big part of the index in this case.

SELECT 
	index_id,
	partition_number,
	index_level,
	page_count
FROM sys.dm_db_index_physical_stats(DB_ID(), OBJECT_ID('TABLE'), NULL, NULL, 'DETAILED')
WHERE index_id = 10

 

Ultimately, we may also benefit from an undocumented trace flag 9130 that highlights in an obvious way how to address this issue by showing a filter operator (predicate section of index seek operator pop up) after the index seek operator itself (reading from the right to the left). From a storage perspective we are far from the first exposed reality.

blog 129 - 5- TF9130

SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.
Table 'table'. Scan count 1, logical reads 6, physical reads 1, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 1 ms.

 

Ok my problem was fixed but let’s jump quickly to another interesting point. Previously I used different tools / tricks to identify the “residual” seek operation. But shall I go through all this stuff to identify this kind of issue? Is there an easier way for that? The good news is yes if you run on SQL Server 2012 SP3+ / SQL Server 2014 SP2+ or SQL Server 2016+ because the SQL Server team added runtime information to showplan.

In my context, it may simplify drastically the analysis because runtime statistics are directly exposed per operator. I got interesting information as ActualRowsRead (3104706 rows) and ActualLogicalReads (17149 pages).

Note that if we refer to the IO statistics from the first query where I got 17152 pages for the table TABLE, the missing 3 pages are in fact on the update index operator.

<RelOp AvgRowSize="36" 
						   EstimateCPU="3.28721" 
						   EstimateIO="12.1472" 
						   EstimateRebinds="0" 
						   EstimateRewinds="0" 
						   EstimatedExecutionMode="Row" 
						   EstimateRows="1" 
						   LogicalOp="Index Seek" 
						   NodeId="3" 
						   Parallel="false" 
						   PhysicalOp="Index Seek" 
						   EstimatedTotalSubtreeCost="15.4344" 
						   TableCardinality="5976470">
                      <OutputList>
                        <ColumnReference 
							Database="[DB]" 
							Schema="[dbo]" 
							Table="[TABLE]" 
							Column="_DELIVERED" />
                      </OutputList>
                      <RunTimeInformation>
                        <RunTimeCountersPerThread Thread="0" 
												  ActualRows="1" 
												  ActualRowsRead="3104706" 
												  ActualEndOfScans="1" 
												  ActualExecutions="1" 
												  ActualElapsedms="1209" 
												  ActualCPUms="747" 
												  ActualScans="1" 
												  ActualLogicalReads="17149" 
												  ActualPhysicalReads="1" 
												  ActualReadAheads="17141" 
												  ActualLobLogicalReads="0" 
												  ActualLobPhysicalReads="0" 
												  ActualLobReadAheads="0" />
                      </RunTimeInformation>

 

Finally, if you’re an enthusiastic of the famous Plan Explorer tool like me in order to troubleshoot execution plans, the addition of runtime statistics was also included to this tool accordingly. The warning icons and corresponding popups alert us very quickly about the residual IO issue as shown below:

blog 129 - 6- PlanExplorer execution plan

blog 129 - 6- PlanExplorer

Happy troubleshooting!

 

Cet article Index seek operator and residual IO est apparu en premier sur Blog dbi services.

Windows Server – Service not starting with ‘Error 1067: The process terminated unexpectedly’

Sat, 2018-02-10 02:30

Some time ago, we were installing a Migration Center (from fmeAG) on a Windows Server and at the end of the installation, the service named Migration Center Job Server is configured and finally started. Unfortunately this didn’t go well and the start command wasn’t working at all. We were using a dedicated technical account (AD account) to do the installation and to run this service. This is the error we got:

fmeAG_MC_Service_Start_Fail_Error

 

The error code 1067 means ‘ERROR_PROCESS_ABORTED’. This was the first time I saw this error, ever, so I wanted to know more about it. I performed a lot of tests on this environment to try to narrow down the issue and the following tests in particular were quite relevant:

  • Changing the Log On As user to something else
    • The service is able to start
    • It means that the issue is somehow linked to this particular technical user
  • Checking the ‘Log On As a service’ Local Policies
    • The technical user is listed and allowed to start the service so there is no issue there
  • Using the technical user to start a service on another Windows Server
    • The service is able to start
    • It means that the issue is not linked to the technical user globally

 

So with all these information, it appeared that the issue was linked to this particular technical user but only locally to this Windows Server… When working with AD accounts, it is always possible to face some issues with local profile VS domain profile (thanks Stéphane Haby for pointing that out to me, I’m not a Windows expert ;)) so I tried to work on that and after some time, I found a workaround to this issue. The workaround is simply to delete the local profile…

Since this is an AD account (that I will call <SYS_AD_ACCOUNT> below), it is possible to remove the local profile, it will just be recreated automatically when needed, you just need to remove it properly and for that, there are particular steps:

  1. Login to the Windows Server with another administrator account
  2. Log out all <SYS_AD_ACCOUNT> sessions: stop all services running with this account, all processes (from the task manager, there should be nothing on the “users” tab), aso… Alternatively, you can also disable the services (or manual mode) and then reboot the Windows Server
  3. Delete the complete folder C:\Users\<SYS_AD_ACCOUNT>. If you already tried this but did not remove the registry properly, you might end up with a C:\Users\TEMP folder… If this is the case, remove it now too.
  4. Delete the registry key matching <SYS_AD_ACCOUNT> (check the “ProfileImagePath” parameter) under “HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\ProfileList”
  5. Reboot the Windows Server
  6. Login to the Windows Server with the administrator account used for the step 1.
  7. Set <SYS_AD_ACCOUNT> for the “Log On As” of the target Service
  8. Start the target Service => It should now be working
  9. Login to the Windows Server with the <SYS_AD_ACCOUNT> account
  10. Open a cmd prompt => The current folder should be C:\Users\<SYS_AD_ACCOUNT> and not C:\Users\TEMP

 

I am not saying that this workaround will fix each and every issues linked to an error code 1067… Sometimes, this error will be linked to the fact that the service cannot find the executable for example. But in my case, it fixed the issue so if you are running out of options, maybe you can just give it a try ;).

 

 

Cet article Windows Server – Service not starting with ‘Error 1067: The process terminated unexpectedly’ est apparu en premier sur Blog dbi services.

Documentum – DA 7.3 showing java.lang.NullPointerException on every actions

Sat, 2018-02-10 01:45

Last year, we started to upgrade some Documentum Administrator from 7.2 to 7.3 and directly after, we started to see some NullPointerException on the log files. We are using DA 7.3 on some recent WebLogic Servers versions (12.1.3, 12.2.1.2). We usually deploy DA as a WAR file (so not exploded) with just the dfc.properties, keystores and logs outside of it. This is the kind of errors we started to see as soon as it was upgraded to 7.3 in the startup log file (nohup log file in our case):

java.lang.NullPointerException
        at java.io.FileInputStream.<init>(FileInputStream.java:130)
        at java.io.FileInputStream.<init>(FileInputStream.java:93)
        at com.documentum.web.form.WebformTag.fetchExtnNativeVersion(WebformTag.java:282)
        at com.documentum.web.form.WebformTag.renderExtnJavaScript(WebformTag.java:268)
        at com.documentum.web.form.WebformTag.doStartTag(WebformTag.java:159)
        at jsp_servlet._custom._jsp.__loginex._jsp__tag3(__loginex.java:1687)
        at jsp_servlet._custom._jsp.__loginex._jspService(__loginex.java:272)
        at weblogic.servlet.jsp.JspBase.service(JspBase.java:35)
        at weblogic.servlet.internal.StubSecurityHelper$ServletServiceAction.run(StubSecurityHelper.java:286)
        at weblogic.servlet.internal.StubSecurityHelper$ServletServiceAction.run(StubSecurityHelper.java:260)
        at weblogic.servlet.internal.StubSecurityHelper.invokeServlet(StubSecurityHelper.java:137)
        at weblogic.servlet.internal.ServletStubImpl.execute(ServletStubImpl.java:350)
        at weblogic.servlet.internal.ServletStubImpl.onAddToMapException(ServletStubImpl.java:489)
        at weblogic.servlet.internal.ServletStubImpl.execute(ServletStubImpl.java:376)
        ...

 

The stack on the line 7 and 8 changed, depending on the action that was being performed. For example an access to the login page would print the following:

        at jsp_servlet._custom._jsp.__loginex._jsp__tag3(__loginex.java:1687)
        at jsp_servlet._custom._jsp.__loginex._jspService(__loginex.java:272)

 

Then once logged in, you would see each component printing the same kind of stack, with the following lines 7 and 8:

        at jsp_servlet._webtop._main.__mainex._jsp__tag0(__mainex.java:286)
        at jsp_servlet._webtop._main.__mainex._jspService(__mainex.java:116)

 

        at jsp_servlet._webtop._classic.__classic._jsp__tag0(__classic.java:408)
        at jsp_servlet._webtop._classic.__classic._jspService(__classic.java:112)

 

        at jsp_servlet._webtop._titlebar.__titlebar._jsp__tag0(__titlebar.java:436)
        at jsp_servlet._webtop._titlebar.__titlebar._jspService(__titlebar.java:175)

 

        at jsp_servlet._webtop._messagebar.__messagebar._jsp__tag0(__messagebar.java:145)
        at jsp_servlet._webtop._messagebar.__messagebar._jspService(__messagebar.java:107)

 

aso…

We were working with OpenText on this issue. As mentioned in the stack trace, this is actually because DA is trying to fetch the “ExtnNativeVersion”. This is a property file located there: wdk/extension/client/EMC/ContentXfer/com.emc.wdk.native/1. Unfortunately, when DA 7.3 is trying to locate this file, it does not work but the file is really present… It does not work because the DA is deployed as a WAR file (an archive) and therefore the path to the file is wrong. I suspect this is something that Documentum changed recently, using the getRealPath(). To change the behavior of the getRealPath function, you have to set the property “Archived Real Path Enabled” to true so it returns the canonical path of the file…

So to remove these exceptions, you have two options:

I. At the domain level:
  1. Login to the WebLogic Administration Console using your weblogic account
  2. Navigate to the correct page: DOMAIN > Configuration > Web Applications
  3. Click on the ‘Lock & Edit’ button
  4. Check the ‘Archived Real Path Enabled’ checkbox ( = set it to true)
  5. Click on the ‘Save’ and then ‘Activate Changes’ buttons

This will add the configuration to the global config.xml file so it is enabled for the whole domain. As often, I would not recommend that but rather configuring this at the application level because you might have other applications that do NOT want this setting to be set to true… So that is why you have the option number two:

II. At the application level:

The high level steps to do that would be the following ones:

  1. Extract the weblogic.xml file from the application war file
  2. Add the ‘Archived Real Path Enabled’ property in it and set it to true
  3. Repackage the war file and redeploy it

This is pretty simple:

[weblogic@weblogic_server_01 ~]$ cd $APPS_HOME
[weblogic@weblogic_server_01 apps]$ jar -xvf da.war WEB-INF/weblogic.xml
 inflated: WEB-INF/weblogic.xml
[weblogic@weblogic_server_01 apps]$
[weblogic@weblogic_server_01 apps]$ vi WEB-INF/weblogic.xml
[weblogic@weblogic_server_01 apps]$
[weblogic@weblogic_server_01 apps]$ tail -6 WEB-INF/weblogic.xml

   <container-descriptor>
      <show-archived-real-path-enabled>true</show-archived-real-path-enabled>
   </container-descriptor>

</weblogic-web-app>
[weblogic@weblogic_server_01 apps]$
[weblogic@weblogic_server_01 apps]$ jar -uvf da.war WEB-INF/weblogic.xml
adding: WEB-INF/weblogic.xml(in = 989) (out= 398)(deflated 59%)
[weblogic@weblogic_server_01 apps]$

 

Then, you just need to update the deployment in the WebLogic Administration Console and that’s it, the exceptions should be gone now. As far as I’m aware of, these exceptions did not have any impact on the proper behavior of Documentum Administrator but it is still very ugly to have hundreds of them in the log file…

 

 

Cet article Documentum – DA 7.3 showing java.lang.NullPointerException on every actions est apparu en premier sur Blog dbi services.

WebLogic – SSO/Atn/Atz – Infinite loop

Sat, 2018-02-10 01:00

This blog will be the last of my series around the WebLogic SSO (how to enable logs, 403 N°1, 403 N°2) which I started some weeks ago. Several months ago on a newly built High Availability Environment (2 WebLogic Servers behind a Load Balancer), an application team was deploying their D2 Client as always. After the deployment, it appeared to the tester that the Single Sign-On was not working when using the SSO URL through the Load Balancer. For this user, the URL was going crazy on an infinite loop between the LB URL and the SAML2 Partner URL. When I tried to replicate the issue, the SSO was working fine for me…

The only possible reason for the infinite loop for some users but not for all is that there is an issue with the SSO setup/configuration on one of the two WebLogic Servers only… Indeed, the Load Balancer (with sticky session) probably redirected the tester on the WLS with the issue while my session was redirected to the other one. Enabling the debug logs (on both WLS) quickly confirmed that all communications that were going through the first WebLogic Server were working properly (with SSO) while the second WebLogic Server had this infinite loop issue.

The logs from the WebLogic Server 1 and 2 were identical up to a certain point (basically all the SAML2 part), except some elements such as the local hostname, the date/hour and some IDs (like java class instances, aso…). This is the common content on both WLS logs:

<Nov 21, 2017 9:25:39 AM UTC> <Debug> <SecuritySAML2Service> <SAML2Filter: Processing request on URI '/D2/X3_Portal.jsp'>
<Nov 21, 2017 9:25:39 AM UTC> <Debug> <SecuritySAML2Service> <getServiceTypeFromURI(): request URI is '/D2/X3_Portal.jsp'>
<Nov 21, 2017 9:25:39 AM UTC> <Debug> <SecuritySAML2Service> <getServiceTypeFromURI(): request URI is not a service URI>
<Nov 21, 2017 9:25:39 AM UTC> <Debug> <SecuritySAML2Service> <getServiceTypeFromURI(): returning service type 'SPinitiator'>
<Nov 21, 2017 9:25:39 AM UTC> <Debug> <SecuritySAML2Service> <SP initiating authn request: processing>
<Nov 21, 2017 9:25:39 AM UTC> <Debug> <SecuritySAML2Service> <SP initiating authn request: partner id is null>
<Nov 21, 2017 9:25:39 AM UTC> <Debug> <SecuritySAML2Service> <SP initiating authn request: use partner binding HTTP/POST>
<Nov 21, 2017 9:25:39 AM UTC> <Debug> <SecuritySAML2Service> <post from template url: null>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecuritySAML2Service> <SAML2Servlet: Processing request on URI '/saml2/sp/acs/post'>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecuritySAML2Service> <getServiceTypeFromURI(): request URI is '/saml2/sp/acs/post'>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecuritySAML2Service> <getServiceTypeFromURI(): service URI is '/sp/acs/post'>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecuritySAML2Service> <getServiceTypeFromURI(): returning service type 'ACS'>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecuritySAML2Service> <Assertion consumer service: processing>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecuritySAML2Service> <get SAMLResponse from http request:PBvbnNSJ1cXMlFtZzaXM6bmHhtbG5zm46NhwOlJlOnNhbWxHbWxIb2Fc3wP6dGM6
U0FiB4bWxuczp4NTAwPSJ1cm46b2FzNTDoyLjA6YXNzZXJ0aW9uIaXM6bmFtZXM6
U0FdG9jb2wiIHhtbG5zOmRzaWc9Imh0dHA6Ly93NTDoyLjA6cHJvd3cudzMub3Jn
aHR0cDoa5vcmcvMjAwMS9W5zdGFuY2vL3d3dy53MyYTUxTY2hlbWEtUiIERlc3Rp
MWNxM2FjNzI1ZDjYmIVhNDM1Zlzc3VlSW5zdGFudD0ijhlNjc3OTkiIEMjAxNy0x
LzINpZyMiIHhtwMDAvMDkveG1sZHbG5DovL3d3dy53My5vczOmVuYz0iaHR0cmcv
...
YWRpb24+P1sOkF1ZGllbHJpY3mPHNhNlUmVzdC9zYW1sOkNvbmRpdGlvbnM+bWw6
YXNzUzpjbYW1YXNpczpuIuMlmVmPnVybjpvDphYxzp0YzpTQU1MOjGFzc2VzOlBh
OjAXh0Pj0OjUyWiI+PHNsOkF1dGhuQhxzYW129udGV4bWw6QXV0aG5Db250ZdENs
UmVnRlepBdXRobkNvbzYWmPjwvc2FtbDF1dGhuU1sOkHQ+PC93RhdGVtZW50Pjwv
c2F9zY25zZtbDpBcW1scDpSZXNwb3NlcnRpb24+PCT4=
>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecuritySAML2Service> <BASE64 decoded saml message:<samlp:Response xmlns:samlp="urn:oasis:names:tc:SAML:2.0:protocol" xmlns:dsig="http://www.w3.org/2000/09/xmldsig#" xmlns:enc="http://www.w3.org/2001/04/xmlenc#" xmlns:saml="urn:oasis:names:tc:SAML:2.0:assertion" xmlns:x500="urn:oasis:names:tc:SAML:2.0:profiles:attribute:X500" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" Destination="https://weblogic_server_01/saml2/sp/acs/post" ID="id-HpDtn7r5XxxAQFYnwSLXZmkuVgIHTExrlreEDZ4-" InResponseTo="_0x7258edc52961ccbd5a435fb13ac67799" IssueInstant="2017-11-21T9:25:55Z" Version="2.0"><saml:Issuer Format="urn:oasis:names:tc:SAML:2.0:nameid-format:entity">https://idp_partner_01/fed/idp</saml:Issuer><dsig:Signature><dsig:SignedInfo><dsig:CanonicalizationMethod Algorithm="http://www.w3.org/2001/10/xml-exc-c14n#"/><dsig:SignatureMethod Algorithm="http://www.w3.org/2000/09/xmldsig#rsa-sha1"/><dsig:Reference URI="#id-HpDtn7r5XxxAQFYnwSLXZmkuVgIHTExrlreEDZ4-"><dsig:Transforms><dsig:Transform Algorithm="http://www.w3.org/2000/09/xmldsig#enveloped-signature"/><dsig:Transform Algorithm="http://www.w3.org/2001/10/xml-exc-c14n#"/></dsig:Transforms><dsig:DigestMethod Algorithm="http://www.w3.org/2000/09/xmldsig#sha1"/><dsig:DigestValue>YGtUZvsfo3z51AsBo7UDhbd6Ts=</dsig:DigestValue></dsig:Reference></dsig:SignedInfo><dsig:SignatureValue>al8sJwbqzjh1qgM3Sj0QrX1aZjwyI...JB6l4jmj91BdQrYQ7VxFzvNLczZ2brJSdLLig==</dsig:SignatureValue><dsig:KeyInfo><dsig:X509Data><dsig:X509Certificate>MIwDQUg+nhYqGZ7pCgBQAwTTELMAkGA1UEBhMCQk1ZhQ...aATPRCd113tVqsvCkUwpfQ5zyUHaKw4FkXmiT2nzxxHA==</dsig:X509Certificate></dsig:X509Data></dsig:KeyInfo></dsig:Signature><samlp:Status><samlp:StatusCode Value="urn:oasis:names:tc:SAML:2.0:status:Success"/></samlp:Status><saml:Assertion ID="id-0WrMNbOz6wsuZdFPhfjnw7WIXXQ6k89-1AgHZ9Oi" IssueInstant="2017-11-21T9:25:55Z" Version="2.0"><saml:Issuer Format="urn:oasis:names:tc:SAML:2.0:nameid-format:entity">https://idp_partner_01/fed/idp</saml:Issuer><dsig:Signature><dsig:SignedInfo><dsig:CanonicalizationMethod Algorithm="http://www.w3.org/2001/10/xml-exc-c14n#"/><dsig:SignatureMethod Algorithm="http://www.w3.org/2000/09/xmldsig#rsa-sha1"/><dsig:Reference URI="#id-0WrMNbOz6wsuZdFPhfjnw7WIXXQ6k89-1AgHZ9Oi"><dsig:Transforms><dsig:Transform Algorithm="http://www.w3.org/2000/09/xmldsig#enveloped-signature"/><dsig:Transform Algorithm="http://www.w3.org/2001/10/xml-exc-c14n#"/></dsig:Transforms><dsig:DigestMethod Algorithm="http://www.w3.org/2000/09/xmldsig#sha1"/><dsig:DigestValue>7+jZtq8SpY3BKVaFjIFeEJm51cA=</dsig:DigestValue></dsig:Reference></dsig:SignedInfo><dsig:SignatureValue>GIlXt4B4rVFoDJRxidpZO73gXB68Dd+mcpoV9DKrjBBjLRz...zGTDcEYY2MG8FgtarZhVQGc4zxkkSg8GRT6Wng3NEuTUuA==</dsig:SignatureValue><dsig:KeyInfo><dsig:X509Data><dsig:X509Certificate>MIwDQUg+nhYqGZ7pCgBQAwTTELMAkGA1UEBhMCQk1ZhQ...aATPRCd113tVqsvCkUwpfQ5zyUHaKw4FkXmiT2nzxxHA==</dsig:X509Certificate></dsig:X509Data></dsig:KeyInfo></dsig:Signature><saml:Subject><saml:NameID Format="urn:oasis:names:tc:SAML:1.1:nameid-format:unspecified">PATOU_MORGAN</saml:NameID><saml:SubjectConfirmation Method="urn:oasis:names:tc:SAML:2.0:cm:bearer"><saml:SubjectConfirmationData InResponseTo="_0x7258edc52961ccbd5a435fb13ac67799" NotOnOrAfter="2017-11-21T9:30:55Z" Recipient="https://weblogic_server_01/saml2/sp/acs/post"/></saml:SubjectConfirmation></saml:Subject><saml:Conditions NotBefore="2017-11-21T9:25:55Z" NotOnOrAfter="2017-11-21T9:30:55Z"><saml:AudienceRestriction><saml:Audience>SAML2_Entity_ID_01</saml:Audience></saml:AudienceRestriction></saml:Conditions><saml:AuthnStatement AuthnInstant="2017-11-21T9:25:55Z" SessionIndex="id-oX9wXdpGmt9GQlVffvY4hEIRULEd25nrxDzE8D7w" SessionNotOnOrAfter="2017-11-21T9:40:55Z"><saml:AuthnContext><saml:AuthnContextClassRef>urn:oasis:names:tc:SAML:2.0:ac:classes:PasswordProtectedTransport</saml:AuthnContextClassRef></saml:AuthnContext></saml:AuthnStatement></saml:Assertion></samlp:Response>>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecuritySAML2Service> <<samlp:Response> is signed.>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.IdentityAssertionServiceImpl.assertIdentity(SAML2.Assertion.DOM)>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.IdentityAssertionTokenServiceImpl.assertIdentity(SAML2.Assertion.DOM)>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecuritySAML2Atn> <SAML2IdentityAsserterProvider: start assert SAML2 token>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecuritySAML2Atn> <SAML2IdentityAsserterProvider: SAML2IdentityAsserter: tokenType is 'SAML2.Assertion.DOM'>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecuritySAML2Atn> <SAML2Assert: Start verify assertion signature>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecuritySAML2Atn> <SAML2Assert: The assertion is signed.>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecuritySAML2Atn> <SAML2Assert: End verify assertion signature>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecuritySAML2Atn> <SAML2Assert: Start verify assertion attributes>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecuritySAML2Atn> <SAML2Assert: End verify assertion attributes>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecuritySAML2Atn> <SAML2Assert: Start verify assertion issuer>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecuritySAML2Atn> <SAML2Assert: End verify assertion issuer>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecuritySAML2Atn> <SAML2Assert: Start verify assertion conditions>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecuritySAML2Atn> <SAML2Assert: End verify assertion conditions>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecuritySAML2Atn> <SAML2Assert: Start verify assertion subject>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecuritySAML2Atn> <SAML2Assert: End verify assertion subject>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecuritySAML2Atn> <SAML2NameMapperCache.getNameMapper: Found name mapper in the cache>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecuritySAML2Atn> <SAML2Assert.processAttributes - processAttrs: false, processGrpAttrs: true>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecuritySAML2Atn> <DefaultSAML2NameMapperImpl: mapName: Mapped name: qualifier: null, name: PATOU_MORGAN>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecuritySAML2Atn> <SAMLIACallbackHandler: SAMLIACallbackHandler(true, PATOU_MORGAN, null)>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.IdentityAssertionCallbackServiceImpl.assertIdentity>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.IdentityAssertionCallbackServiceImpl.assertIdentity>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecuritySAML2Atn> <SAMLIACallbackHandler: callback[0]: NameCallback: setName(PATOU_MORGAN)>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.IdentityAssertionCallbackServiceImpl.assertIdentity returning PATOU_MORGAN>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.IdentityCacheServiceImpl.getCachedIdentity(idm:_def_Idmusr:PATOU_MORGAN)>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.IdentityCacheServiceImpl.getCachedIdentity(idm:_def_Idmusr:PATOU_MORGAN) returning null>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.IdentityAssertionCallbackServiceImpl.assertIdentity did not find a cached identity.>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.CallbackHandlerWrapper.constructor>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.JAASIdentityAssertionConfigurationServiceImpl.getJAASIdentityAssertionConfigurationName()>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <weblogic.security.service.internal.WLSJAASLoginServiceImpl$ServiceImpl.authenticate>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.JAASLoginServiceImpl.login ClassLoader=sun.misc.Launcher$AppClassLoader@7d4991ad>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.JAASLoginServiceImpl.login ThreadContext ClassLoader Original=com.bea.common.security.utils.SAML2ClassLoader@625ecb90>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.JAASIdentityAssertionConfigurationServiceImpl.getAppConfigurationEntry(com.bea.common.security.internal.service.JAASIdentityAssertionConfigurationServiceImpl0)>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.JAASIdentityAssertionConfigurationServiceImpl.getAppConfigurationEntry>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.legacy.service.JAASIdentityAssertionProviderImpl$V1Wrapper.getAssertionModuleConfiguration>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.JAASIdentityAssertionConfigurationServiceImpl$JAASProviderImpl.getProviderAppConfigurationEntry returning LoginModuleClassName=weblogic.security.providers.authentication.LDAPAtnLoginModuleImpl, ControlFlag=LoginModuleControlFlag: sufficient>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.legacy.service.JAASIdentityAssertionProviderImpl$V1Wrapper.getClassLoader>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.LoginModuleWrapper.wrap LoginModuleClassName=weblogic.security.providers.authentication.LDAPAtnLoginModuleImpl>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.LoginModuleWrapper.wrap ClassLoader=java.net.URLClassLoader@108c7fe5>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.LoginModuleWrapper.wrap ControlFlag=LoginModuleControlFlag: sufficient>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.legacy.service.JAASIdentityAssertionProviderImpl$V1Wrapper.getAssertionModuleConfiguration>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.JAASIdentityAssertionConfigurationServiceImpl$JAASProviderImpl.getProviderAppConfigurationEntry returning LoginModuleClassName=weblogic.security.providers.authentication.LDAPAtnLoginModuleImpl, ControlFlag=LoginModuleControlFlag: required>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.legacy.service.JAASIdentityAssertionProviderImpl$V1Wrapper.getClassLoader>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.LoginModuleWrapper.wrap LoginModuleClassName=weblogic.security.providers.authentication.LDAPAtnLoginModuleImpl>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.LoginModuleWrapper.wrap ClassLoader=java.net.URLClassLoader@108c7fe5>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.LoginModuleWrapper.wrap ControlFlag=LoginModuleControlFlag: required>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.JAASLoginServiceImpl.login created LoginContext>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.JAASLoginServiceImpl.login ThreadContext ClassLoader Current=com.bea.common.security.utils.SAML2ClassLoader@625ecb90>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.LoginModuleWrapper.initialize LoginModuleClassName=weblogic.security.providers.authentication.LDAPAtnLoginModuleImpl>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.LoginModuleWrapper.initialize ClassLoader=java.net.URLClassLoader@108c7fe5>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.LoginModuleWrapper.initialize created delegate login module>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <LDAP ATN LoginModule initialized>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.LoginModuleWrapper.initialize delegated>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.LoginModuleWrapper.login>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <LDAP Atn Login>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.CallbackHandlerWrapper.handle>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.CallbackHandlerWrapper.handle callbcacks[0] will be delegated>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.CallbackHandlerWrapper.handle callbcacks[0] will use NameCallback to retrieve name>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.CallbackHandlerWrapper.handle will delegate all callbacks>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecuritySAML2Atn> <SAMLIACallbackHandler: callback[0]: NameCallback: setName(PATOU_MORGAN)>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.CallbackHandlerWrapper.handle delegated callbacks>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.CallbackHandlerWrapper.handle got username from callbacks[0], UserName=PATOU_MORGAN>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <LDAP Atn Login username: PATOU_MORGAN>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <getUserDNName? user:PATOU_MORGAN>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <getDNForUser search("ou=people,ou=internal,dc=company,dc=com", "(&(uid=PATOU_MORGAN)(objectclass=companyperson))", base DN & below)>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <getConnection return conn:LDAPConnection {ldaps://ldap.company.com:3636 ldapVersion:3 bindDN:"ou=bind_user,ou=applications,ou=internal,dc=company,dc=com"}>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <ConnSetupMgr><Connecting to host=ldap.company.com, ssl port=3636>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <ConnSetupMgr><Successfully connected to host=ldap.company.com, ssl port=3636>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <Retrieved guid:c6e6f021-2b29d761-8284c0aa-df200d0f>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <DN for user PATOU_MORGAN: uid=PATOU_MORGAN,ou=people,ou=internal,dc=company,dc=com>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <found user from ldap, user:PATOU_MORGAN, userDN=uid=PATOU_MORGAN,ou=people,ou=internal,dc=company,dc=com>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <Retrieved username from LDAP :PATOU_MORGAN>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <returnConnection conn:LDAPConnection {ldaps://ldap.company.com:3636 ldapVersion:3 bindDN:"ou=bind_user,ou=applications,ou=internal,dc=company,dc=com"}>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <LDAP Atn Asserted Identity for PATOU_MORGAN>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <List groups that member: PATOU_MORGAN belongs to>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <DN for user PATOU_MORGAN: uid=PATOU_MORGAN,ou=people,ou=internal,dc=company,dc=com>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <Retrieved dn:uid=PATOU_MORGAN,ou=people,ou=internal,dc=company,dc=com for user:PATOU_MORGAN>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <search("ou=bind_user,ou=applications,ou=internal,dc=company,dc=com", "(&(uniquemember=uid=PATOU_MORGAN,ou=people,ou=internal,dc=company,dc=com)(objectclass=groupofuniquenames))", base DN & below)>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <getConnection return conn:LDAPConnection {ldaps://ldap.company.com:3636 ldapVersion:3 bindDN:"ou=bind_user,ou=applications,ou=internal,dc=company,dc=com"}>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <Result has more elements: false>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <search("ou=bind_user,ou=applications,ou=internal,dc=company,dc=com", "(objectclass=groupofURLs)", base DN & below)>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <returnConnection conn:LDAPConnection {ldaps://ldap.company.com:3636 ldapVersion:3 bindDN:"ou=bind_user,ou=applications,ou=internal,dc=company,dc=com"}>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <DN for user PATOU_MORGAN: uid=PATOU_MORGAN,ou=people,ou=internal,dc=company,dc=com>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <Retrieved dn:uid=PATOU_MORGAN,ou=people,ou=internal,dc=company,dc=com for user:PATOU_MORGAN>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <login succeeded for username PATOU_MORGAN>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.LoginModuleWrapper.login delegated, returning true>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.LoginModuleWrapper.commit>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <LDAP Atn Commit>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <LDAP Atn Principals Added>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.LoginModuleWrapper.commit delegated, returning true>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.JAASLoginServiceImpl.login logged in>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.JAASLoginServiceImpl.login subject=Subject:
        Principal: PATOU_MORGAN
        Private Credential: PATOU_MORGAN
>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <weblogic.security.service.internal.WLSIdentityServiceImpl.getIdentityFromSubject Subject: 1
        Principal = class weblogic.security.principal.WLSUserImpl("PATOU_MORGAN")
>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.PrincipalValidationServiceImpl.sign(Principals)>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.PrincipalValidationServiceImpl.sign(Principal) Principal=PATOU_MORGAN>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.PrincipalValidationServiceImpl.sign(Principal) PrincipalClassName=weblogic.security.principal.WLSUserImpl>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.PrincipalValidationServiceImpl.sign(Principal) trying PrincipalValidator for interface weblogic.security.principal.WLSPrincipal>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.PrincipalValidationServiceImpl.sign(Principal) PrincipalValidator handles this PrincipalClass>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <Generated signature and signed WLS principal PATOU_MORGAN>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.PrincipalValidationServiceImpl.sign(Principal) PrincipalValidator signed the principal>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.PrincipalValidationServiceImpl.sign(Principal) All required PrincipalValidators signed this PrincipalClass, returning true>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.JAASLoginServiceImpl.login identity=Subject: 1
        Principal = class weblogic.security.principal.WLSUserImpl("PATOU_MORGAN")
>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <weblogic.security.service.internal.WLSJAASLoginServiceImpl$ServiceImpl.authenticate authenticate succeeded for user PATOU_MORGAN, Identity=Subject: 1
        Principal = class weblogic.security.principal.WLSUserImpl("PATOU_MORGAN")
>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <weblogic.security.service.internal.WLSJAASLoginServiceImpl$ServiceImpl.authenticate login succeeded and PATOU_MORGAN was not previously locked out>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.IdentityCacheServiceImpl.cachedIdentity(Subject: 1
        Principal = class weblogic.security.principal.WLSUserImpl("PATOU_MORGAN")
),cacheKe y= idm:_def_Idmusr:PATOU_MORGAN>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecuritySAML2Service> <Using redirect URL from request cache: 'https://d2.company.com:443/D2/X3_Portal.jsp'>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecuritySAML2Service> <Redirecting to URL: https://d2.company.com:443/D2/X3_Portal.jsp>

 

At this point, the SAML2 exchange is successful and the user has been found in the LDAP successfully. So the only thing remaining is the validation of the user on WebLogic side as well as the access to the Application (D2 in this case). That’s where the logs of the WLS started to diverge.

On the WebLogic Server 1 (no issue), we can see WebLogic validating the user and finally granting access to the application:

<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.IdentityCacheServiceImpl.cachedIdentity(Subject: 1
        Principal = class weblogic.security.principal.WLSUserImpl("PATOU_MORGAN")
),cacheKe y= idm:_def_Idmusr:PATOU_MORGAN>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecuritySAML2Service> <Using redirect URL from request cache: 'https://d2.company.com:443/D2/X3_Portal.jsp'>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecuritySAML2Service> <Redirecting to URL: https://d2.company.com:443/D2/X3_Portal.jsp>

<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <PrincipalAuthenticator.validateIdentity>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <PrincipalAuthenticator.validateIdentity will use common security service>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.PrincipalValidationServiceImpl.validate(Principals)>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.PrincipalValidationServiceImpl.validate(Principal) Principal=PATOU_MORGAN>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.PrincipalValidationServiceImpl.validate(Principal) PrincipalClassName=weblogic.security.principal.WLSUserImpl>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.PrincipalValidationServiceImpl.validate(Principal) trying PrincipalValidator for interface weblogic.security.principal.WLSPrincipal>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.PrincipalValidationServiceImpl.validate(Principal) PrincipalValidator handles this PrincipalClass>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <Validate WLS principal PATOU_MORGAN returns true>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.PrincipalValidationServiceImpl.validate(Principal) PrincipalValidator said the principal is valid>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.PrincipalValidationServiceImpl.validate(Principal) One or more PrincipalValidators handled this PrincipalClass, returning true>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.PrincipalValidationServiceImpl.validate(Principals) validated all principals>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtz> <AuthorizationManager will use common security for ATZ>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtz> <weblogic.security.service.WLSAuthorizationServiceWrapper.isAccessAllowed>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtz> <com.bea.common.security.internal.service.AccessDecisionServiceImpl.isAccessAllowed Identity=Subject: 1
        Principal = class weblogic.security.principal.WLSUserImpl("PATOU_MORGAN")
>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtz> <com.bea.common.security.internal.service.AccessDecisionServiceImpl.isAccessAllowed Roles=[ "Anonymous" "consumer" ]>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtz> <com.bea.common.security.internal.service.AccessDecisionServiceImpl.isAccessAllowed Resource=type=<url>, application=D2, contextPath=/D2, uri=/X3_Portal.jsp, httpMethod=GET>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtz> <com.bea.common.security.internal.service.AccessDecisionServiceImpl.isAccessAllowed Direction=ONCE>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtz> <XACML Authorization isAccessAllowed(): input arguments:>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtz> < Subject: 1
        Principal = weblogic.security.principal.WLSUserImpl("PATOU_MORGAN")
>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtz> < Roles:Anonymous, consumer>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtz> < Resource: type=<url>, application=D2, contextPath=/D2, uri=/X3_Portal.jsp, httpMethod=GET>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtz> < Direction: ONCE>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtz> < Context Handler: >
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtz> <Accessed Subject: Id=urn:oasis:names:tc:xacml:2.0:subject:role, Value=[Anonymous,consumer]>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtz> <Evaluate urn:oasis:names:tc:xacml:1.0:function:string-at-least-one-member-of([consumer,consumer],[Anonymous,consumer]) -> true>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtz> <primary-rule evaluates to Permit>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtz> <urn:bea:xacml:2.0:entitlement:resource:type@E@Furl@G@M@Oapplication@ED2@M@OcontextPath@E@UD2@M@Ouri@E@U@K@M@OhttpMethod@EGET, 1.0 evaluates to Permit>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtz> <XACML Authorization isAccessAllowed(): returning PERMIT>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtz> <com.bea.common.security.internal.service.AccessDecisionServiceImpl.isAccessAllowed AccessDecision returned PERMIT>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtz> <com.bea.common.security.internal.service.AuthorizationServiceImpl.isAccessAllowed returning adjudicated: true>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtz> <AuthorizationManager will use common security for ATZ>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtz> <weblogic.security.service.WLSAuthorizationServiceWrapper.isAccessAllowed>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtz> <com.bea.common.security.internal.service.AccessDecisionServiceImpl.isAccessAllowed Identity=Subject: 1
        Principal = class weblogic.security.principal.WLSUserImpl("PATOU_MORGAN")
>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtz> <com.bea.common.security.internal.service.AccessDecisionServiceImpl.isAccessAllowed Roles=[ "Anonymous" "consumer" ]>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtz> <com.bea.common.security.internal.service.AccessDecisionServiceImpl.isAccessAllowed Resource=type=<url>, application=D2, contextPath=/D2, uri=/x3_portal/x3_portal.nocache.js, httpMethod=GET>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtz> <com.bea.common.security.internal.service.AccessDecisionServiceImpl.isAccessAllowed Direction=ONCE>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtz> <XACML Authorization isAccessAllowed(): input arguments:>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtz> <        Subject: 1
        Principal = weblogic.security.principal.WLSUserImpl("PATOU_MORGAN")
>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtz> <        Roles:Anonymous, consumer>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtz> <        Resource: type=<url>, application=D2, contextPath=/D2, uri=/x3_portal/x3_portal.nocache.js, httpMethod=GET>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtz> <        Direction: ONCE>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtz> <        Context Handler: >
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtz> <Accessed Subject: Id=urn:oasis:names:tc:xacml:2.0:subject:role, Value=[Anonymous,consumer]>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtz> <Evaluate urn:oasis:names:tc:xacml:1.0:function:string-at-least-one-member-of([consumer,consumer],[Anonymous,consumer]) -> true>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtz> <primary-rule evaluates to Permit>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtz> <urn:bea:xacml:2.0:entitlement:resource:type@E@Furl@G@M@Oapplication@ED2@M@OcontextPath@E@UD2@M@Ouri@E@U@K@M@OhttpMethod@EGET, 1.0 evaluates to Permit>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtz> <XACML Authorization isAccessAllowed(): returning PERMIT>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtz> <com.bea.common.security.internal.service.AccessDecisionServiceImpl.isAccessAllowed AccessDecision returned PERMIT>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtz> <com.bea.common.security.internal.service.AuthorizationServiceImpl.isAccessAllowed returning adjudicated: true>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtz> <AuthorizationManager will use common security for ATZ>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtz> <weblogic.security.service.WLSAuthorizationServiceWrapper.isAccessAllowed>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtz> <com.bea.common.security.internal.service.AccessDecisionServiceImpl.isAccessAllowed Identity=Subject: 1
        Principal = class weblogic.security.principal.WLSUserImpl("PATOU_MORGAN")
>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtz> <com.bea.common.security.internal.service.AccessDecisionServiceImpl.isAccessAllowed Roles=[ "Anonymous" "consumer" ]>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtz> <com.bea.common.security.internal.service.AccessDecisionServiceImpl.isAccessAllowed Resource=type=<url>, application=D2, contextPath=/D2, uri=/x3_portal/79E87564DA9D026D7ADA9326E414D4FC.cache.js, httpMethod=GET>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtz> <com.bea.common.security.internal.service.AccessDecisionServiceImpl.isAccessAllowed Direction=ONCE>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtz> <XACML Authorization isAccessAllowed(): input arguments:>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtz> <        Subject: 1
        Principal = weblogic.security.principal.WLSUserImpl("PATOU_MORGAN")
>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtz> <        Roles:Anonymous, consumer>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtz> <        Resource: type=<url>, application=D2, contextPath=/D2, uri=/x3_portal/79E87564DA9D026D7ADA9326E414D4FC.cache.js, httpMethod=GET>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtz> <        Direction: ONCE>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtz> <        Context Handler: >
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtz> <Accessed Subject: Id=urn:oasis:names:tc:xacml:2.0:subject:role, Value=[Anonymous,consumer]>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtz> <Evaluate urn:oasis:names:tc:xacml:1.0:function:string-at-least-one-member-of([consumer,consumer],[Anonymous,consumer]) -> true>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtz> <primary-rule evaluates to Permit>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtz> <urn:bea:xacml:2.0:entitlement:resource:type@E@Furl@G@M@Oapplication@ED2@M@OcontextPath@E@UD2@M@Ouri@E@U@K@M@OhttpMethod@EGET, 1.0 evaluates to Permit>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtz> <XACML Authorization isAccessAllowed(): returning PERMIT>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtz> <com.bea.common.security.internal.service.AccessDecisionServiceImpl.isAccessAllowed AccessDecision returned PERMIT>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtz> <com.bea.common.security.internal.service.AuthorizationServiceImpl.isAccessAllowed returning adjudicated: true>

 

On the other hand, the WebLogic Server 2 (with issue) didn’t even start to validate the user, it just started again the full SAML2 process:

<Nov 21, 2017 10:34:53 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.IdentityCacheServiceImpl.cachedIdentity(Subject: 1
        Principal = class weblogic.security.principal.WLSUserImpl("PATOU_MORGAN")
),cacheKe y= idm:_def_Idmusr:PATOU_MORGAN>
<Nov 21, 2017 10:34:53 AM UTC> <Debug> <SecuritySAML2Service> <Using redirect URL from request cache: 'https://d2.company.com:443/D2/X3_Portal.jsp'>
<Nov 21, 2017 10:34:53 AM UTC> <Debug> <SecuritySAML2Service> <Redirecting to URL: https://d2.company.com:443/D2/X3_Portal.jsp>

<Nov 21, 2017 10:34:53 AM UTC> <Debug> <SecuritySAML2Service> <SAML2Filter: Processing request on URI '/D2/X3_Portal.jsp'>
<Nov 21, 2017 10:34:53 AM UTC> <Debug> <SecuritySAML2Service> <getServiceTypeFromURI(): request URI is '/D2/X3_Portal.jsp'>
<Nov 21, 2017 10:34:53 AM UTC> <Debug> <SecuritySAML2Service> <getServiceTypeFromURI(): request URI is not a service URI>
<Nov 21, 2017 10:34:53 AM UTC> <Debug> <SecuritySAML2Service> <getServiceTypeFromURI(): returning service type 'SPinitiator'>
<Nov 21, 2017 10:34:53 AM UTC> <Debug> <SecuritySAML2Service> <SP initiating authn request: processing>
<Nov 21, 2017 10:34:53 AM UTC> <Debug> <SecuritySAML2Service> <SP initiating authn request: partner id is null>
<Nov 21, 2017 10:34:54 AM UTC> <Debug> <SecuritySAML2Service> <SP initiating authn request: use partner binding HTTP/POST>
<Nov 21, 2017 10:34:54 AM UTC> <Debug> <SecuritySAML2Service> <post from template url: null>
<Nov 21, 2017 10:34:54 AM UTC> <Debug> <SecuritySAML2Service> <SAML2Servlet: Processing request on URI '/saml2/sp/acs/post'>
<Nov 21, 2017 10:34:54 AM UTC> <Debug> <SecuritySAML2Service> <getServiceTypeFromURI(): request URI is '/saml2/sp/acs/post'>
<Nov 21, 2017 10:34:54 AM UTC> <Debug> <SecuritySAML2Service> <getServiceTypeFromURI(): service URI is '/sp/acs/post'>
<Nov 21, 2017 10:34:54 AM UTC> <Debug> <SecuritySAML2Service> <getServiceTypeFromURI(): returning service type 'ACS'>
<Nov 21, 2017 10:34:54 AM UTC> <Debug> <SecuritySAML2Service> <Assertion consumer service: processing>

 

This WebLogic Server 2 is acting just like if it was a completely new request and not an existing one… And this is the reaction I had when I saw this log so I checked the session identifier!

Whenever you configure an Application to support SAML2 SSO, then the session identifier must be the default one. If you set something else, then you will have this infinite loop of authentication because WebLogic will consider a successful authentication as a non-authenticated one. This is described in the Oracle documentation.

So to solve this issue, that’s pretty simple… Either comment the session identifier or set it to the default value (below I’m just resetting it to the default value):

[weblogic@weblogic_server_01 ~]$ cd $APPS_HOME
[weblogic@weblogic_server_01 apps]$ jar -xvf D2.war WEB-INF/weblogic.xml
 inflated: WEB-INF/weblogic.xml
[weblogic@weblogic_server_01 apps]$
[weblogic@weblogic_server_01 apps]$ sed -i 's,<cookie-name>[^<]*<,<cookie-name>JSESSIONID<,' WEB-INF/weblogic.xml
[weblogic@weblogic_server_01 apps]$
[weblogic@weblogic_server_01 apps]$ grep -A3 -B2 cookie-name WEB-INF/weblogic.xml

  <session-descriptor>
    <cookie-name>JSESSIONID</cookie-name>
    <cookie-http-only>false</cookie-http-only>
  </session-descriptor>

[weblogic@weblogic_server_01 apps]$
[weblogic@weblogic_server_01 apps]$ jar -uvf D2.war WEB-INF/weblogic.xml
adding: WEB-INF/weblogic.xml(in = 901) (out= 432)(deflated 52%)
[weblogic@weblogic_server_01 apps]$

 

Then you obviously need to redeploy your application and it will be working. :)

 

 

Cet article WebLogic – SSO/Atn/Atz – Infinite loop est apparu en premier sur Blog dbi services.

Server process name in Postgres and Oracle

Fri, 2018-02-09 16:01

Every database analysis should start with system load analysis. If the host is in CPU starvation, then looking at other statistics can be pointless. With ‘top’ on Linux, or equivalent such as process explorer on Windows, you see the process (and threads). If the name of the process is meaningful, you already have a clue about the active sessions. Postgres goes further by showing the operation (which SQL command), the state (running or waiting), and the identification of the client.

Postgres

By default ‘top’ displays the program name (like ‘comm’ in /proc or in ‘ps’ format), which will be ‘postgres’ for all PostgreSQL processes. But you can also display the command line with ‘c’ in interactive mode, or directly starting with ‘top -c’, which is the same as the /proc/$pid/cmdline or ‘cmd’ or ‘args’ in ‘ps’ format.


top -c
 
Tasks: 263 total, 13 running, 250 sleeping, 0 stopped, 0 zombie
%Cpu(s): 24.4 us, 5.0 sy, 0.0 ni, 68.5 id, 0.9 wa, 0.0 hi, 1.2 si, 0.0 st
KiB Mem : 4044424 total, 558000 free, 2731380 used, 755044 buff/cache
KiB Swap: 421884 total, 418904 free, 2980 used. 2107088 avail Mem
 
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
20347 postgres 20 0 394760 11660 8696 S 7.6 0.3 0:00.49 postgres: demo demo 192.168.56.125(37664) DELETE
20365 postgres 20 0 393816 11448 8736 S 6.9 0.3 0:00.37 postgres: demo demo 192.168.56.125(37669) idle
20346 postgres 20 0 393800 11440 8736 S 6.6 0.3 0:00.37 postgres: demo demo 192.168.56.125(37663) UPDATE
20356 postgres 20 0 396056 12480 8736 S 6.6 0.3 0:00.42 postgres: demo demo 192.168.56.125(37667) INSERT
20357 postgres 20 0 393768 11396 8736 S 6.6 0.3 0:00.40 postgres: demo demo 192.168.56.125(37668) DELETE waiting
20366 postgres 20 0 394728 11652 8736 S 6.6 0.3 0:00.35 postgres: demo demo 192.168.56.125(37670) UPDATE
20387 postgres 20 0 394088 11420 8720 S 6.6 0.3 0:00.41 postgres: demo demo 192.168.56.125(37676) UPDATE
20336 postgres 20 0 395032 12436 8736 S 6.3 0.3 0:00.37 postgres: demo demo 192.168.56.125(37661) UPDATE
20320 postgres 20 0 395032 12468 8736 R 5.9 0.3 0:00.33 postgres: demo demo 192.168.56.125(37658) DROP TABLE
20348 postgres 20 0 395016 12360 8736 R 5.9 0.3 0:00.33 postgres: demo demo 192.168.56.125(37665) VACUUM
20371 postgres 20 0 396008 12708 8736 R 5.9 0.3 0:00.40 postgres: demo demo 192.168.56.125(37673) INSERT
20321 postgres 20 0 396040 12516 8736 D 5.6 0.3 0:00.31 postgres: demo demo 192.168.56.125(37659) INSERT
20333 postgres 20 0 395016 11920 8700 R 5.6 0.3 0:00.36 postgres: demo demo 192.168.56.125(37660) UPDATE
20368 postgres 20 0 393768 11396 8736 R 5.6 0.3 0:00.43 postgres: demo demo 192.168.56.125(37671) UPDATE
20372 postgres 20 0 393768 11396 8736 R 5.6 0.3 0:00.36 postgres: demo demo 192.168.56.125(37674) INSERT
20340 postgres 20 0 394728 11700 8736 S 5.3 0.3 0:00.40 postgres: demo demo 192.168.56.125(37662) idle
20355 postgres 20 0 394120 11628 8672 S 5.3 0.3 0:00.32 postgres: demo demo 192.168.56.125(37666) DELETE waiting
20389 postgres 20 0 395016 12196 8724 R 5.3 0.3 0:00.37 postgres: demo demo 192.168.56.125(37677) UPDATE
20370 postgres 20 0 393768 11392 8736 S 4.6 0.3 0:00.34 postgres: demo demo 192.168.56.125(37672) DELETE
20376 postgres 20 0 393816 11436 8736 S 4.6 0.3 0:00.37 postgres: demo demo 192.168.56.125(37675) DELETE waiting
20243 postgres 20 0 392364 5124 3696 S 1.0 0.1 0:00.06 postgres: wal writer process

This is very useful information. Postgres changes the process title when it executes a statement. In this example:

  • ‘postgres:’ is the name of the process
  • ‘demo demo’ are the database name and the user name
  • ‘192.168.56.125(37664)’ are the IP address and port of the client.
  • DELETE, UPDATE… are the commands. They are more or less the command name used in the feed back after the command completion
  • ‘idle’ is for sessions not currently running a statement
  • ‘waiting’ is added when the session is waiting on a blocker session (enqueued on a lock for example)
  • ‘wal writer process’ is a background process

This is very useful information, especially because we have, on the same sampling, the Postgres session state (idle, waiting or running an operation) with the Linux process state (S when sleeping, R when runnable or running, D when in I/O,… ).

Oracle

With Oracle, you can have ASH to sample session state, but being able to see it at OS level would be great. It would also be a safeguard if we need to kill a process.

But, the Oracle processes do not change while running. They are set at connection time.

The background processes mention the Oracle process name and the Instance name:

[oracle@VM122 ~]$ ps -u oracle -o pid,comm,cmd,args | head
 
PID COMMAND CMD COMMAND
1873 ora_pmon_cdb2 ora_pmon_CDB2 ora_pmon_CDB2
1875 ora_clmn_cdb2 ora_clmn_CDB2 ora_clmn_CDB2
1877 ora_psp0_cdb2 ora_psp0_CDB2 ora_psp0_CDB2
1880 ora_vktm_cdb2 ora_vktm_CDB2 ora_vktm_CDB2
1884 ora_gen0_cdb2 ora_gen0_CDB2 ora_gen0_CDB2

The foreground processes mention the instance and the connection type, LOCAL=YES for bequeath, LOCAL=NO for remote via listener.


[oracle@VM122 ~]$ ps -u oracle -o pid,comm,cmd,args | grep -E "[ ]oracle_|[ ]PID"
 
PID COMMAND CMD COMMAND
21429 oracle_21429_cd oracleCDB2 (LOCAL=NO) oracleCDB2 (LOCAL=NO)
21431 oracle_21431_cd oracleCDB2 (LOCAL=NO) oracleCDB2 (LOCAL=NO)
21451 oracle_21451_cd oracleCDB2 (LOCAL=NO) oracleCDB2 (LOCAL=NO)
21517 oracle_21517_cd oracleCDB1 (LOCAL=NO) oracleCDB1 (LOCAL=NO)

You need to join V$PROCESS with V$SESSION on (V$PROCESS.ADDR=V$SESSION.PADDR) to find the state, operation and client information

For the fun, you can change the program name (ARGV0) and arguments (ARGS).

The local connections can change the name in the BEQueath connection string:


sqlplus -s system/oracle@"(ADDRESS=(PROTOCOL=BEQ)(PROGRAM=$ORACLE_HOME/bin/oracle)(ARGV0=postgres)(ARGS='(DESCRIPTION=(LOCAL=MAYBE)(ADDRESS=(PROTOCOL=BEQ)))')(ENVS='OLE_HOME=$ORACLE_HOME,ORACLE_SID=CDB1'))" <<< "host ps -u oracle -o pid,comm,cmd,args | grep -E '[ ]oracle_|[ ]PID'"
 
PID COMMAND CMD COMMAND
21155 oracle_21155_cd oracleCDB2 (LOCAL=NO) oracleCDB2 (LOCAL=NO)
21176 oracle_21176_cd oracleCDB2 (LOCAL=NO) oracleCDB2 (LOCAL=NO)
21429 oracle_21429_cd oracleCDB2 (LOCAL=NO) oracleCDB2 (LOCAL=NO)
21431 oracle_21431_cd oracleCDB2 (LOCAL=NO) oracleCDB2 (LOCAL=NO)
21451 oracle_21451_cd oracleCDB2 (LOCAL=NO) oracleCDB2 (LOCAL=NO)
21517 oracle_21517_cd oracleCDB1 (LOCAL=NO) oracleCDB1 (LOCAL=NO)
22593 oracle_22593_cd postgres (DESCRIPTION=(LOCA postgres (DESCRIPTION=(LOCAL=MAYBE)(ADDRESS=(PROTOCOL=BEQ)))

The remote connection can have the name changed from the static registration, adding an ARVG0 value on the listener side:


LISTENER=(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=0.0.0.0)(PORT=1521)))
SID_LIST_LISTENER=(SID_LIST=
(SID_DESC=(GLOBAL_DBNAME=MYAPP)(ARGV0=myapp)(SID_NAME=CDB1)(ORACLE_HOME=/u01/app/oracle/product/12.2.0/dbhome_1))
(SID_DESC=(GLOBAL_DBNAME=CDB1_DGMGRL)(SID_NAME=CDB1)(ORACLE_HOME=/u01/app/oracle/product/12.2.0/dbhome_1))
(SID_DESC=(GLOBAL_DBNAME=CDB2_DGMGRL)(SID_NAME=CDB2)(ORACLE_HOME=/u01/app/oracle/product/12.2.0/dbhome_1))
)

When reloading the listener with this (ARGV0=myapp) to identify connection from this MYAPP service

[oracle@VM122 ~]$ sqlplus -s system/oracle@//localhost/MYAPP <<< "host ps -u oracle -o pid,comm,cmd,args | grep -E '[ ]oracle_|[ ]PID'"
PID COMMAND CMD COMMAND
21155 oracle_21155_cd oracleCDB2 (LOCAL=NO) oracleCDB2 (LOCAL=NO)
21176 oracle_21176_cd oracleCDB2 (LOCAL=NO) oracleCDB2 (LOCAL=NO)
21429 oracle_21429_cd oracleCDB2 (LOCAL=NO) oracleCDB2 (LOCAL=NO)
21431 oracle_21431_cd oracleCDB2 (LOCAL=NO) oracleCDB2 (LOCAL=NO)
21451 oracle_21451_cd oracleCDB2 (LOCAL=NO) oracleCDB2 (LOCAL=NO)
21517 oracle_21517_cd oracleCDB1 (LOCAL=NO) oracleCDB1 (LOCAL=NO)
24261 oracle_24261_cd myapp (LOCAL=NO) myapp (LOCAL=NO)

However, I would not recommend to change the default. This can be very confusing for people expecting ora_xxxx_SID and oracleSID process names.

 

Cet article Server process name in Postgres and Oracle est apparu en premier sur Blog dbi services.

12cR2 PDB archive

Wed, 2018-02-07 13:50

In 12.1 we had the possibility to unplug a PDB by closing it and generating a .xml file that describes the PDB metadata required to plug the datafiles into another CDB.
In 12.2 we got an additional possibility to have this .xml file zipped together with the datafiles, for an easy transport. But that was not working for ASM files.
The latest Release Update, Oct 17 includes the patch that fixes this issue and is the occasion to show PDB archive.

Here is Oracle 12.2.0.1 with Oct 2017 (https://updates.oracle.com/download/26737266.html) applied (needs latest OPatch https://updates.oracle.com/download/6880880.html)
With a PDB1 pluggable database:

[oracle@VM106 ~]$ rman target /
 
Recovery Manager: Release 12.2.0.1.0 - Production on Wed Oct 18 16:16:41 2017
 
Copyright (c) 1982, 2017, Oracle and/or its affiliates. All rights reserved.
 
connected to target database: CDB1 (DBID=920040307)
 
RMAN> report schema;
 
using target database control file instead of recovery catalog
Report of database schema for database with db_unique_name CDB1
 
List of Permanent Datafiles
===========================
File Size(MB) Tablespace RB segs Datafile Name
---- -------- -------------------- ------- ------------------------
1 810 SYSTEM YES /acfs/oradata/CDB1/datafile/o1_mf_system_dmrbv534_.dbf
3 540 SYSAUX NO /acfs/oradata/CDB1/datafile/o1_mf_sysaux_dmrbxvds_.dbf
4 70 UNDOTBS1 YES /acfs/oradata/CDB1/datafile/o1_mf_undotbs1_dmrbz8mm_.dbf
5 250 PDB$SEED:SYSTEM NO /acfs/oradata/CDB1/datafile/o1_mf_system_dmrc52tm_.dbf
6 330 PDB$SEED:SYSAUX NO /acfs/oradata/CDB1/datafile/o1_mf_sysaux_dmrc52t9_.dbf
7 5 USERS NO /acfs/oradata/CDB1/datafile/o1_mf_users_dygrpz79_.dbf
8 100 PDB$SEED:UNDOTBS1 NO /acfs/oradata/CDB1/datafile/o1_mf_undotbs1_dmrc52x0_.dbf
21 250 PDB1:SYSTEM YES /acfs/oradata/CDB1/5BD3ED9D73B079D2E0536A4EA8C0967B/datafile/o1_mf_system_dygrqqq2_.dbf
22 350 PDB1:SYSAUX NO /acfs/oradata/CDB1/5BD3ED9D73B079D2E0536A4EA8C0967B/datafile/o1_mf_sysaux_dygrqqs8_.dbf
23 100 PDB1:UNDOTBS1 YES +ASM1/CDB1/5BD3ED9D73B079D2E0536A4EA8C0967B/DATAFILE/undotbs1.257.957719779
 
List of Temporary Files
=======================
File Size(MB) Tablespace Maxsize(MB) Tempfile Name
---- -------- -------------------- ----------- --------------------
1 33 TEMP 32767 /acfs/oradata/CDB1/datafile/o1_mf_temp_dmrc4wlh_.tmp
2 64 PDB$SEED:TEMP 32767 /acfs/oradata/CDB1/pdbseed/temp012017-06-10_19-17-38-745-PM.dbf
3 64 PDB1:TEMP 32767 /acfs/oradata/CDB1/5BD3ED9D73B079D2E0536A4EA8C0967B/datafile/o1_mf_temp_dygrqqsh_.dbf

I have moved one file to ASM to show that it is now handled correctly.

The pluggable database is closed, we can unplug it. Nothing changes with the unplug syntax except the extension of the file. If the file mentioned is a .pdb instead of a .xml then it is a PDB archive:

RMAN> alter pluggable database PDB1 unplug into '/var/tmp/PDB1.pdb';
 
RMAN> alter pluggable database PDB1 close;
 
Statement processed
 
RMAN> alter pluggable database PDB1 unplug into '/var/tmp/PDB1.pdb'
2> ;
 
Statement processed
 
RMAN> exit

Actually it is just a zip file with the datafiles, without the full path:

[oracle@VM106 ~]$ unzip -t /var/tmp/PDB1.pdb
Archive: /var/tmp/PDB1.pdb
testing: o1_mf_system_dygrqqq2_.dbf OK
testing: o1_mf_sysaux_dygrqqs8_.dbf OK
testing: undotbs1.257.957719779 OK
testing: /var/tmp/PDB1.xml OK
No errors detected in compressed data of /var/tmp/PDB1.pdb.

You can see that the ASM file is not different from the others.

I drop the pluggable database

RMAN> drop pluggable database PDB1 including datafiles;
 
using target database control file instead of recovery catalog
Statement processed
 

And plug back the PDB1, as PDB2, using the zip file:

RMAN> create pluggable database PDB2 using '/var/tmp/PDB1.pdb';
 
Statement processed
 
RMAN> report schema;
 
Report of database schema for database with db_unique_name CDB1
 
List of Permanent Datafiles
===========================
File Size(MB) Tablespace RB segs Datafile Name
---- -------- -------------------- ------- ------------------------
1 810 SYSTEM YES /acfs/oradata/CDB1/datafile/o1_mf_system_dmrbv534_.dbf
3 540 SYSAUX NO /acfs/oradata/CDB1/datafile/o1_mf_sysaux_dmrbxvds_.dbf
4 70 UNDOTBS1 YES /acfs/oradata/CDB1/datafile/o1_mf_undotbs1_dmrbz8mm_.dbf
5 250 PDB$SEED:SYSTEM NO /acfs/oradata/CDB1/datafile/o1_mf_system_dmrc52tm_.dbf
6 330 PDB$SEED:SYSAUX NO /acfs/oradata/CDB1/datafile/o1_mf_sysaux_dmrc52t9_.dbf
7 5 USERS NO /acfs/oradata/CDB1/datafile/o1_mf_users_dygrpz79_.dbf
8 100 PDB$SEED:UNDOTBS1 NO /acfs/oradata/CDB1/datafile/o1_mf_undotbs1_dmrc52x0_.dbf
24 250 PDB2:SYSTEM NO /acfs/oradata/CDB1/5BD3ED9D73B079D2E0536A4EA8C0967B/datafile/o1_mf_system_dygwt1lh_.dbf
25 350 PDB2:SYSAUX NO /acfs/oradata/CDB1/5BD3ED9D73B079D2E0536A4EA8C0967B/datafile/o1_mf_sysaux_dygwt1lm_.dbf
26 100 PDB2:UNDOTBS1 NO /acfs/oradata/CDB1/5BD3ED9D73B079D2E0536A4EA8C0967B/datafile/o1_mf_undotbs1_dygwt1lo_.dbf
 
List of Temporary Files
=======================
File Size(MB) Tablespace Maxsize(MB) Tempfile Name
---- -------- -------------------- ----------- --------------------
1 33 TEMP 32767 /acfs/oradata/CDB1/datafile/o1_mf_temp_dmrc4wlh_.tmp
2 64 PDB$SEED:TEMP 32767 /acfs/oradata/CDB1/pdbseed/temp012017-06-10_19-17-38-745-PM.dbf
4 64 PDB2:TEMP 32767 /acfs/oradata/CDB1/5BD3ED9D73B079D2E0536A4EA8C0967B/datafile/o1_mf_temp_dygwt1lp_.dbf

Here all files are there, created in the db_create_file_dest.

File name convert

When you create a pluggable database and you are not in OMF you need to add a FILE_NAME_CONVERT to convert from the source file names to destination file names. When the files are referenced by a .xml file, the .xml file references the path to the files as they were in the source database. If you move then, you can update the .xml file, or you can use SOURCE_FILE_NAME_CONVERT to mention the new place. With a .pdb archive, the .xml inside contains the original path, but this is not what will be used. The path of the .pdb itself is used, as if the files were unzipped at that place.

If you use Oracle-Managed-Files, don’t care about the file names and then you don’t need all those file name converts.

 

Cet article 12cR2 PDB archive est apparu en premier sur Blog dbi services.

JAN18: Database 11gR2 PSU, 12cR1 ProactiveBP, 12cR2 RU

Tue, 2018-02-06 15:32

If you want to apply the latest patches (and you should), you can go to the My Oracle Support Recommended Patch Advisor. But sometimes it is not up-todate. For example, for 12.1.0.2 only the PSU is displayed and not the Proactive Bundle Patch, which is highly recommended. And across releases, the names have changed and can be misleading: PSU for 11.2.0.4 (no Proactive Bundle Patch except for Engineered Systems). 12.1.0.2 can have SPU, PSU, or Proactive BP but the latest is highly recommended, especially now that it includes the adaptive statistics patches. 12.2.0.1 introduce the new RUR and RU, the latest one being the one recommended.

To get things clear, there’s also the Master Note for Database Proactive Patch Program, with reference to one note per release. This blog post is my master note to link directly to the recommended updates for Oracle Database.

Master Note for Database Proactive Patch Program (Doc ID 756671.1)
https://support.oracle.com/epmos/faces/DocContentDisplay?id=756671.1

11.2.0.4 – PSU

Database 11.2.0.4 Proactive Patch Information (Doc ID 2285559.1)
https://support.oracle.com/epmos/faces/DocContentDisplay?id=2285559.1
Paragraph -> 11.2.0.4 Database Patch Set Update

Latest as of Q1 2018 -> 16-Jan-2018 11.2.0.4.180116 (Jan 2018) Database Patch Set Update (DB PSU) 26925576 (Windows: 26925576)

12.1.0.2  – ProactiveBP

Database 12.1.0.2 Proactive Patch Information (Doc ID 2285558.1)
https://support.oracle.com/epmos/faces/DocContentDisplay?id=2285558.1
Paragraph -> 12.1.0.2 Database Proactive Bundle Patches (DBBP)

Latest as of Q1 2018 -> 16-Jan-2018 12.1.0.2.180116 Database Proactive Bundle Patch (Jan 2018) 12.1.0.2.180116 27010930

12.2.0.1 – RU

Database 12.2.0.1 Proactive Patch Information (Doc ID 2285557.1)
https://support.oracle.com/epmos/faces/DocContentDisplay?id=2285557.1
Paragraph -> 12.2.0.1 Database Release Update (Update)

Latest as of Q1 2018 -> 16-Jan-2018 12.2.0.1.180116 (Jan 2018) Database Release Update 27105253 (Windows: 12.2.0.1.180116 WIN DB BP 27162931)
 

Don’t forget SQL Developer

In the 12c Oracle Home SQL Developer is installed, but you should update it to the latest version.
Download the following from http://www.oracle.com/technetwork/developer-tools/sql-developer/downloads/index.html
– The SQL Developer zip for ‘Other Platforms’, currently 17.4
– The SQLcl zip for ‘All Platforms’, currently 17.4

On the server, remove, or rename the original directory:
mv $ORACLE_HOME/sqldeveloper $ORACLE_HOME/sqldeveloper.orig

Unzip what you have downloaded:
unzip -d $ORACLE_HOME/ sqldeveloper-*-no-jre.zip
unzip -d $ORACLE_HOME/sqldeveloper sqlcl-*-no-jre.zip

I suggest to have a login.sql which sets the beautiful ansiconsole for SQLcl:

echo "set sqlformat ansiconsole" > $ORACLE_HOME/sqldeveloper/sqlcl/login.sql

On 12.2 you can run SQLcl just with ‘sql’ (and same arguments as sqlplus: / as sysdba or /nolog,…) because this is what is defined in $ORACLE_HOME/bin.
However, it sets the current working directory and i prefer to keep the current one as it is probably were I want to run scripts from.

Then I add the following aliases in .bashrc

alias sqlcl='JAVA_HOME=$ORACLE_HOME/jdk SQLPATH=$ORACLE_HOME/sqldeveloper/sqlcl bash $ORACLE_HOME/sqldeveloper/sqlcl/bin/sql'
alias sqldev='$ORACLE_HOME/sqldeveloper/sqldeveloper.sh'

When running SQL Developer for the first time you can create automatically a ‘/ as sysdba’ connection (but remember this is not a good practice to connect like this) and a connection for each user declared in the database: Right click on Connections and Create Local Connections

 

Cet article JAN18: Database 11gR2 PSU, 12cR1 ProactiveBP, 12cR2 RU est apparu en premier sur Blog dbi services.

Multitenant, PDB, ‘save state’, services and standby databases

Mon, 2018-02-05 15:12

Creating – and using – your own services has always been the recommendation. You can connect to a database without a service name, though the instance SID, but this is not what you should do. Each database registers its db_unique_name as a service, and you can use it to connect, but it is always better to create your own application service(s). In multitenant, each PDB registers its name as a service, but the recommendation is still there: create your own services, and connect with your services.
I’ll show in this blog post what happens if you use the PDB name as a service and the standby database registers to the same listener as the primary database. Of course, you can workaround the non-unique service names by registering to different listeners. But this just hides the problem. The main reason to use services is to be independent from physical attributes, so being forced to assign a specific TCP/IP port is not better than using an instance SID.

I have the primary (CDB1) and standby (CDB2) databases registered to the default local listener:

LSNRCTL for Linux: Version 12.2.0.1.0 - Production on 03-FEB-2018 23:11:23
 
Copyright (c) 1991, 2016, Oracle. All rights reserved.
 
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=0.0.0.0)(PORT=1521)))
STATUS of the LISTENER
------------------------
Alias LISTENER
Version TNSLSNR for Linux: Version 12.2.0.1.0 - Production
Start Date 02-FEB-2018 09:32:30
Uptime 1 days 13 hr. 38 min. 52 sec
Trace Level off
Security ON: Local OS Authentication
SNMP OFF
Listener Parameter File /u01/app/oracle/product/12.2.0/dbhome_1/network/admin/listener.ora
Listener Log File /u01/app/oracle/diag/tnslsnr/VM122/listener/alert/log.xml
Listening Endpoints Summary...
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=0.0.0.0)(PORT=1521)))
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=VM122)(PORT=5501))(Presentation=HTTP)(Session=RAW))
Services Summary...
Service "627f7512a0452fd4e0537a38a8c055c0" has 2 instance(s).
Instance "CDB1", status READY, has 1 handler(s) for this service...
Instance "CDB2", status READY, has 1 handler(s) for this service...
Service "CDB1" has 1 instance(s).
Instance "CDB1", status READY, has 1 handler(s) for this service...
Service "CDB1XDB" has 1 instance(s).
Instance "CDB1", status READY, has 1 handler(s) for this service...
Service "CDB1_CFG" has 2 instance(s).
Instance "CDB1", status READY, has 1 handler(s) for this service...
Instance "CDB2", status READY, has 1 handler(s) for this service...
Service "CDB1_DGB" has 1 instance(s).
Instance "CDB1", status READY, has 1 handler(s) for this service...
Service "CDB1_DGMGRL" has 1 instance(s).
Instance "CDB1", status UNKNOWN, has 1 handler(s) for this service...
Service "CDB2" has 1 instance(s).
Instance "CDB2", status READY, has 1 handler(s) for this service...
Service "CDB2XDB" has 1 instance(s).
Instance "CDB2", status READY, has 1 handler(s) for this service...
Service "CDB2_DGB" has 1 instance(s).
Instance "CDB2", status READY, has 1 handler(s) for this service...
Service "CDB2_DGMGRL" has 1 instance(s).
Instance "CDB2", status UNKNOWN, has 1 handler(s) for this service...
Service "pdb1" has 2 instance(s).
Instance "CDB1", status READY, has 1 handler(s) for this service...
Instance "CDB2", status READY, has 1 handler(s) for this service...
The command completed successfully

Look at service ‘pdb1′, which is the name for my PDB. Connecting to //localhost:1521/PDB1 can connect you randomly to CDB1 (the primary database) or CDB2 (the standby database).

Here is an example, connecting several times to the PDB1 service:

[oracle@VM122 ~]$ for i in {1..5} ; do sqlplus -L -s sys/oracle@//localhost/pdb1 as sysdba <<< 'select name,open_mode,instance_name from v$instance , v$database;'; done
 
NAME OPEN_MODE INSTANCE_NAME
--------- -------------------- ----------------
CDB1 READ WRITE CDB1
 
NAME OPEN_MODE INSTANCE_NAME
--------- -------------------- ----------------
CDB1 READ ONLY WITH APPLY CDB2
 
NAME OPEN_MODE INSTANCE_NAME
--------- -------------------- ----------------
CDB1 READ WRITE CDB1
 
NAME OPEN_MODE INSTANCE_NAME
--------- -------------------- ----------------
CDB1 READ ONLY WITH APPLY CDB2
 
NAME OPEN_MODE INSTANCE_NAME
--------- -------------------- ----------------
CDB1 READ WRITE CDB1

I was connected at random to CDB1 or CDB2.

As an administrator, you know the instance names and you can connect to the one you want with: //localhost:1521/PDB1/CDB1 or //localhost:1521/PDB1/CDB2:

[oracle@VM122 ~]$ for i in {1..3} ; do sqlplus -L -s sys/oracle@//localhost/pdb1/CDB1 as sysdba <<< 'select name,open_mode,instance_name from v$instance , v$database;'; done
 
NAME OPEN_MODE INSTANCE_NAME
--------- -------------------- ----------------
CDB1 READ WRITE CDB1
 
NAME OPEN_MODE INSTANCE_NAME
--------- -------------------- ----------------
CDB1 READ WRITE CDB1
 
NAME OPEN_MODE INSTANCE_NAME
--------- -------------------- ----------------
CDB1 READ WRITE CDB1
 
[oracle@VM122 ~]$ for i in {1..3} ; do sqlplus -L -s sys/oracle@//localhost/pdb1/CDB2 as sysdba <<< 'select name,open_mode,instance_name from v$instance , v$database;'; done
 
NAME OPEN_MODE INSTANCE_NAME
--------- -------------------- ----------------
CDB1 READ ONLY WITH APPLY CDB2
 
NAME OPEN_MODE INSTANCE_NAME
--------- -------------------- ----------------
CDB1 READ ONLY WITH APPLY CDB2
 
NAME OPEN_MODE INSTANCE_NAME
--------- -------------------- ----------------
CDB1 READ ONLY WITH APPLY CDB2

Of course this is not what you want. And we must not start or stop the default services. For the application, the best you can do is to create your service. And if you want to be able to connect to the Active Data Guard standby, which is opened in read-only, then you can create a ‘read-write’ service and a ‘read-only’ service that you start depending on the role.

Create and Start a read-write service on the primary

This example supposes that you have only Oracle Database software installed. If you are in RAC, with the resources managed by Grid Infrastructure, or simply with Oracle Restart, creating a service is easy with srvctl, and you add it to a PDB with ‘-pdb’ and also with a role to start it automatically in the primary or in the standby. But without it, you use dbms_service:

SQL> connect /@CDB1 as sysdba
Connected.
 
SQL> alter session set container=pdb1;
Session altered.
 
SQL> exec dbms_service.create_service(service_name=>'pdb1_RW',network_name=>'pdb1_RW');
PL/SQL procedure successfully completed.
 
SQL> exec dbms_service.start_service(service_name=>'pdb1_RW');
PL/SQL procedure successfully completed.
 
SQL> alter session set container=cdb$root;
Session altered.

The service is created, stored in SERVICE$ visible with DBA_SERVICES:

SQL> select name,name_hash,network_name,creation_date,pdb from cdb_services order by con_id,service_id;
NAME NAME_HASH NETWORK_NAME CREATION_DATE PDB
---- --------- ------------ ------------- ---
pdb1_RW 3128030313 pdb1_RW 03-FEB-18 PDB1
pdb1 1888881990 pdb1 11-JAN-18 PDB1

Save state

I have created and started the PDB1_RW service. However, if I restart the database, the service will not start automatically. How do you ensure that the PDB1 pluggable database starts automatically when you open the CDB? You ‘save state’ when it is opened. It is the same for the services you create. You need to ‘save state’ when they are opened.


SQL> alter pluggable database all save state;
Pluggable database ALL altered.

The information is stored in PDB_SVC_STATE$, and I’m not aware of a dictionary view on it:

SQL> select name,name_hash,network_name,creation_date,con_id from v$active_services order by con_id,service_id;
 
NAME NAME_HASH NETWORK_NAME CREATION_DATE CON_ID
---- --------- ------------ ------------- ------
pdb1_RW 3128030313 pdb1_RW 03-FEB-18 4
pdb1 1888881990 pdb1 11-JAN-18 4
 
SQL> select * from containers(pdb_svc_state$);
 
INST_ID INST_NAME PDB_GUID PDB_UID SVC_HASH SPARE1 SPARE2 SPARE3 SPARE4 SPARE5 SPARE6 CON_ID
------- --------- -------- ------- -------- ------ ------ ------ ------ ------ ------ ------
1 CDB1 627F7512A0452FD4E0537A38A8C055C0 2872139986 3128030313 1

The name is not in this table, you have to join with v$services using(name_hash):

SQL> select name,name_hash,network_name,creation_date,con_id from v$active_services order by con_id,service_id;
 
NAME NAME_HASH NETWORK_NAME CREATION_DATE CON_ID
---- --------- ------------ ------------- ------
SYS$BACKGROUND 165959219 26-JAN-17 1
SYS$USERS 3427055676 26-JAN-17 1
CDB1_CFG 1053205690 CDB1_CFG 24-JAN-18 1
CDB1_DGB 184049617 CDB1_DGB 24-JAN-18 1
CDB1XDB 1202503288 CDB1XDB 11-JAN-18 1
CDB1 1837598021 CDB1 11-JAN-18 1
pdb1 1888881990 pdb1 11-JAN-18 4
pdb1_RW 3128030313 pdb1_RW 03-FEB-18 4

So, in addition to storing the PDB state in PDBSTATE$, visible with dba_pdb_saved_states, the service state is also stored. Note that they are at different level. PDBSTATE$ is a data link: stored on CDB$ROOT only (because the data must be read before opening the PDB) but PDB_SVC_STATE$ is a local table in the PDB as the services can be started only when the PDB is opened.

This new service is immediately registered on CDB1:

Service "pdb1" has 2 instance(s).
Instance "CDB1", status READY, has 1 handler(s) for this service...
Instance "CDB2", status READY, has 1 handler(s) for this service...
Service "pdb1_RW" has 1 instance(s).
Instance "CDB1", status READY, has 1 handler(s) for this service...
The command completed successfully

Create and Start a read-only service for the standby

If you try to do the same on the standby for a PDB1_RO service, you cannot because service information has to be stored in the dictionary:

SQL> exec dbms_service.create_service(service_name=>'pdb1_RO',network_name=>'pdb1_RO');
 
Error starting at line : 56 File @ /media/sf_share/122/blogs/pdb_svc_standby.sql
In command -
BEGIN dbms_service.create_service(service_name=>'pdb1_RO',network_name=>'pdb1_RO'); END;
Error report -
ORA-16000: database or pluggable database open for read-only access

So, the read-only service has to be created on the primary:

SQL> connect /@CDB1 as sysdba
Connected.
SQL> alter session set container=pdb1;
Session altered.
 
SQL> exec dbms_service.create_service(service_name=>'pdb1_RO',network_name=>'pdb1_RO');
 
SQL> select name,name_hash,network_name,creation_date,pdb from cdb_services order by con_id,service_id;
NAME NAME_HASH NETWORK_NAME CREATION_DATE PDB
---- --------- ------------ ------------- ---
pdb1_RW 3128030313 pdb1_RW 03-FEB-18 PDB1
pdb1_RO 1562179816 pdb1_RO 03-FEB-18 PDB1
pdb1 1888881990 pdb1 11-JAN-18 PDB1

The SERVICE$ dictionary table is replicated to the standby, so I can I can start it on the standby:

SQL> connect /@CDB2 as sysdba
Connected.
SQL> alter session set container=pdb1;
Session altered.
 
SQL> exec dbms_service.start_service(service_name=>'pdb1_RO');
PL/SQL procedure successfully completed.

Here is what is registered to the listener:

Service "pdb1" has 2 instance(s).
Instance "CDB1", status READY, has 1 handler(s) for this service...
Instance "CDB2", status READY, has 1 handler(s) for this service...
Service "pdb1_RO" has 1 instance(s).
Instance "CDB2", status READY, has 1 handler(s) for this service...
Service "pdb1_RW" has 1 instance(s).
Instance "CDB1", status READY, has 1 handler(s) for this service...
The command completed successfully

Now, the PDB_RO connects to the standby and PDB_RW to the primary. Perfect.

No ‘save state’ on the standby

At this point, you would like to have the PDB_RO started when PDB1 is opened on the standby, but ‘save state’ is impossible on a read-only database:

SQL> alter session set container=cdb$root;
Session altered.
 
SQL> alter pluggable database all save state;
 
Error starting at line : 84 File @ /media/sf_share/122/blogs/pdb_svc_standby.sql
In command -
alter pluggable database all save state
Error report -
ORA-16000: database or pluggable database open for read-only access

You can’t manage the state (open the PDB, start the services) in the standby database.

The primary ‘save state’ is replicated in standby

For the moment, everything is ok with my services:

Service "pdb1_RO" has 1 instance(s).
Instance "CDB2", status READY, has 1 handler(s) for this service...
Service "pdb1_RW" has 1 instance(s).
Instance "CDB1", status READY, has 1 handler(s) for this service...

If I restart the primary CDB1, everything is ok again because I saved the state of the PDB and the service. But what happens when the standby CDB2 restarts?


SQL> connect /@CDB2 as sysdba
Connected.
SQL> startup force;
...
SQL> show pdbs
 
CON_ID CON_NAME OPEN MODE RESTRICTED
---------- ------------------------------ ---------- ----------
2 PDB$SEED READ ONLY NO
4 PDB1 MOUNTED

The PDB is not opened: the ‘saved state’ for PDB is not read in the standby.
However, when I open the PDB, it seems that the ‘saved state’ for service is applied, and this one is replicated from the primary:

SQL> alter pluggable database PDB1 open;
Pluggable database altered.
SQL> host lsnrctl status
...
Service "pdb1" has 2 instance(s).
Instance "CDB1", status READY, has 1 handler(s) for this service...
Instance "CDB2", status READY, has 1 handler(s) for this service...
Service "pdb1_RW" has 2 instance(s).
Instance "CDB1", status READY, has 1 handler(s) for this service...
Instance "CDB2", status READY, has 1 handler(s) for this service...
The command completed successfully

My PDB1_RW is registered for both, connections will connect at random to the primary or the standby, and then the transactions will fail half of the times. It will be the same in case of switchover. This is not correct.

Save state instances=()

What I would like is the possibility to save state for a specific DB_UNIQUE_NAME, like with pluggable ‘spfile’ parameters. But this is not possible. What is possible is to mention an instance but you can use it only for the primary instance where you save the state (or you get ORA-65110: Invalid instance name specified) and anyway, this will not be correct after a switchover.

So what?

Be careful, with services and ensure that the services used by the application are registered only for the correct instance. Be sure that this persists when the instances are restarted. For this you must link a service name to a database role. This cannot be done correctly with ‘save state’. You can use startup triggers, or better, Grid Infrastructure service resources.

Do not connect to the default service with the PDB name, you cannot remove it and cannot stop it, so you may have the same name for different instances in a Data Guard configuration. You can register the standby instances to different local listeners, to avoid the confusion, but you may still register to the same SCAN listener.

Create your own services, start them depending on the database role, and do not use ‘save state’ in a physical standby configuration.

 

Cet article Multitenant, PDB, ‘save state’, services and standby databases est apparu en premier sur Blog dbi services.

Introducing SQL Server on Docker Swarm orchestration

Fri, 2018-02-02 07:48

SQL Server 2017 is available on multiple platforms: Windows, Linux and Docker. The latter provides containerization features with no lengthy setup and special prerequisites before running your SQL Server databases which are probably the key success of adoption for developers.

It was my case as developer for our DMK management kit which provide to our customers a SQL Server database maintenance solution on all editions from SQL Server 2005 to SQL Server 2017 (including Linux). In the context of our DMK, we have to develop for different versions of SQL Server, including cumulative updates and service packs that may provide new database maintenance features and it may be challenging when we often have to release a new development / fix or and to perform unit tests on different SQL Server versions or platforms. At this stage you may certainly claim that virtualization already addresses those requirements and you’re right because we used a lot of provisioned virtual machines on Hyper-V so far.

The obvious question is why to switch to Docker container technologies? Well, for different reasons in reality. Firstly, sharing my SQL Server containers and test databases with my team is pretty straightforward. We may use a Docker registry and use docker push / pull commands. Then, provisioning a new SQL Server instance is quicker with containers than virtual machines and generally lead to lower CPU / Memory / Disk footprint on my laptop. I talked a little bit about it in the last SQL Pass meetup in Geneva by the way.

But in this blog post I would like to take another step with Docker and to go beyond the development area. As DBA we may have to deal with container management in production in the near future (unless it is already done for you :) ) and we need to get a more global picture of the Docker echosystem. I remembered a discussion with an attendee during my SQL Server Docker and Microservices session in the last TugaIT 2017 Lisbon who told me Docker and containers are only for developers and not suitable for production. At the time of this discussion, I had to admit he was not entirely wrong. Firstly, let’s say that as virtualization before, based-container application adoption will probably take time. This is at least what I may concluded from my experience and from what I may notice around me, even if DevOps and microservices architectures seem to contribute to improve the situation. This is probably because production environments introduce other challenges and key factors than those we may have on development area as service availability, patching or upgrading stuff, monitoring and alerting, performance …. In the same time, Docker and more generally speaking container technologies are constantly maturing as well as tools to manage such infrastructures and in production area, as you know, we prefer to be safe and there is no room to no stable and non-established products that may compromise the core business.

So, we may wonder what’s the part of the DBAs in all of this? Well, regardless the underlying infrastructure we still have the same responsibilities as to provide configuration stuff, to ensure databases are backed up, to manage and to maintain data including performance, to prevent security threats and finally to guarantee data availability. In fact, looking back to last decade, we already faced exactly the same situation with the emergence of virtualization paradigm where we had to install our SQL Server instance in such infrastructures. I still remember some reluctance and heated discussions from DBAs.

From my side, I always keep in mind high availability and performance because it is the most concern of my customers when it comes to production environments. So, I was curious to dig further on container technologies in this area and with a first start on how to deal with different orchestration tools. The main leaders on the market are probably Docker Swarm, Kubernetes, Mesosphere, CoreOS fleet (recently acquired by RedHat), RedHat OpenShift, Amazon ECS and Azure Container Services.

In this first blog, I decided to write about Docker Swarm orchestrator probably because I was already comfortable with native Docker commands and Docker Swarm offers additional set of docker commands. When going into details, the interesting point is that I discovered a plenty of other concepts which lead me to realize I was reaching another world … a production world. This time it is not just about pulling / pushing containers for sure :) Before to keep reading this blog post, it is important to precise that it is not intended to learn about how to implement Docker Swarm. Docker web site is well-designed for that. My intention is just to highlight some important key features I think DBAs should to be aware before starting managing container infrastructures.

Firstly, implementing a Docker Swarm requires to be familiar with some key architecture concepts. Fortunately, most of them are easy to understand if you are already comfortable with SQL Server and cluster-based architectures including SQL FCIs or availability groups.

Let’s have a look at the main components:

  • Nodes: A node is just an instance of docker engine participating in swarm
  • Manager nodes: They are firstly designed to dispatch units of works (called tasks tied to containers) to worker nodes according your service definition
  • Worker nodes: Receive and execute tasks dispatched from manager nodes

Here was my first implementation of my Docker lab infrastructure:

blog 127 - 0 - swarm architecture lab

It was composed of 3 docker nodes and one of them acted as a both worker and manager. Obviously, this is not an ideal scenario to implement on production because this architecture lacks of fault-tolerance design. But anyway, that was enough to start with my basic container labs.

I use a Docker Server version 17.12.0 CE and as shown below swarm mode is enabled.

$sudo docker info
Containers: 13
 Running: 4
 Paused: 0
 Stopped: 9
Images: 37
Server Version: 17.12.0-ce
Storage Driver: overlay2
 Backing Filesystem: xfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: active
 NodeID: 7a9v6uv6jur5cf8x3bi6ggpiz
 Is Manager: true
 ClusterID: 63pdntf40nzav9barmsnk91hb
 Managers: 1
 Nodes: 3
 Orchestration:
  Task History Retention Limit: 5
The IP address of the manager is reachable from the host operation system
…

$ sudo docker node ls
ID                            HOSTNAME                      STATUS              AVAILABILITY        MANAGER STATUS
s6pu7x3htoxjqvg9vilkoffj1     sqllinux2.dbi-services.test   Ready               Active
ptcay2nq4uprqb8732u8k451a     sqllinux3.dbi-services.test   Ready               Active
7a9v6uv6jur5cf8x3bi6ggpiz *   sqllinux.dbi-services.test    Ready               Active              Leader

Here the IP address of the manager (sqllinux node) used during the swarm initialization with –advertise-addr parameter. Tu put it simply, this is the address used by other nodes to connect into this node during the joining phase.

There are a plenty of options to configure and to change the behavior of the swarm. Some of them concern resource and placement management and as DBA it makes sense to know how such infrastructures behave on your database environments regarding these settings. Maybe in a next blog post.

$ ip addr show eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    link/ether 00:15:5d:00:13:4b brd ff:ff:ff:ff:ff:ff
    inet 192.168.40.20/24 brd 192.168.40.255 scope global eth0

 

I also opened the required ports on each node

  • TCP port 2376 for secure docker client communication (Docker machine)
  • TCP port 2377 for cluster management communications
  • TCP and UDP port 7946 for communication among nodes
  • UDP port 4789 for overlay network traffic (container ingress networking)
$ sudo firewall-cmd --list-all
public (active)
  target: default
  icmp-block-inversion: no
  interfaces: eth0
  sources:
  services: dhcpv6-client ssh nfs mountd rpc-bind
  ports: 2376/tcp 2377/tcp 7946/tcp 7946/udp 4789/udp 80/tcp 1433/tcp 8080/tcp
  protocols:
  masquerade: no
  forward-ports:
  source-ports:
  icmp-blocks:
  rich rules:

 

After initializing the swarm, the next step will consist in deploying the containers on it. The point here is the swarm mode changes a little bit the game because you have to deal with service or stacks (collection of services) deployment rather than using container deployment. It does mean you cannot deploy directly containers but you won’t benefit from swarm features in this case.

Let’s continue with stack / service model on docker. Because a picture is often worth a thousand words I put an overview of relationship between stacks, services and tasks.

blog 127 - 1 - swarm stack service task relationship

You may find the definition of each component in the docker documentation but let’s make a brief summary of important things: services are the primary location of interactions with the swarm and includes the definition of tasks to execute on the manager or worker nodes. Then tasks carry Docker containers and commands to run inside. Maybe the most important thing to keep in mind here: /!\Once a task is assigned to a node, it cannot move to another node. It can only run on the assigned node or fail /!\. This is a different behavior from virtualized architectures when you may go through different features to move manually one virtual machine from one host to another one (VMotion, DRS for VMware …). Finally, a stack is just a collection of services (1-N) that make up an application on a specific environment.

From a user perspective, you may deploy directly a service or to go through a stack definition if you have to deal with an application composed of several services and relationships between them. For the latter, you may probably guess that this model is pretty suitable with microservices architectures. These concepts may seem obscure but with practice they become clearer.

But just before introducing services deployment models, one aspect we did not cover so far is the storage layout. Docker has long been considered as designed for stateless applications and storage persistence a weakness in the database world. Furthermore, from a container perspective, it is always recommended to isolate the data from a container to retain the benefits of adopting containerization. Data management should be separate from the container lifecycle. Docker has managed to overcome this issue by providing different ways to persist data outside containers since the version 1.9 including the capabilities to share volumes between containers on the same host (aka data volumes). But thinking about production environments, customers will certainly deploy docker clusters rending these options useless and the containers non-portables as well. In my context, I want to be able to share data containers on different hosts and the good news is Docker provide distributed filesystem capabilities. I picked up NFS for convenience but it exists other solutions like Ceph or GluterFS for instance. A direct mapping between my host directory and the directory inside my SQL Server container over a distributed storage based on a NFS share seems to work well in my case. From a SQL Server perspective this is not an issue as long as you deploy the service with a maximum of one replica at time to avoid data corruption. My updated architecture is as following:

blog 127 - 2 - swarm architecture lab with nfs

Here the configuration from one node concerning the mount point based on NFS share. Database files will be stored on /u01/sql2 in my case.

$ cat /etc/fstab
/dev/mapper/cl-root     /                       xfs     defaults        0 0
UUID=eccbc689-88c6-4e5a-ad91-6b47b60557f6 /boot                   xfs     defaults        0 0
/dev/mapper/cl-swap     swap                    swap    defaults        0 0
/dev/sdb1       /u99    xfs     defaults        0 0
192.168.40.14:/u01      /u01    nfs     nfsvers=4.2,timeo=14,intr       0 0

$ sudo showmount -e 192.168.40.14
Export list for 192.168.40.14:
/u01 192.168.40.22,192.168.40.21,192.168.40.20

 

My storage is in-place and let’s continue with network considerations. As virtualization products, you have different option to configure network:

  • Bridge: Allows internal communication between containers on the same host
  • docker_gwbridge : Network created when the swarm is installed and it is dedicated for the communication between nodes
  • Ingress: All nodes by default participate to ingress routing mesh. I will probably introduce this feature in my next blog post but let’s say that the routing mesh enables each node in the swarm to accept connections on published ports for any service running in the swarm, even if there’s no task running on the node
  • Overlay: The manager node automatically extends the overlay network to nodes that run service tasks to allow communication between host containers. Not available if you deploy containers directly

Let’s add that Docker includes an embedded DNS server which provides DNS resolution among containers connected to the same user defined network. Pretty useful feature when you deploy applications with dependent services!

So, I created 2 isolated networks. One is dedicated for back-end server’s communication (backend-server) and the other one for the front-end server’s communication (frontend-server).

$sudo docker network create \
  --driver overlay \
  --subnet 172.20.0.0/16 \
  --gateway 172.20.0.1 \
  backend-server

$sudo docker network create \
  --driver overlay \
  --subnet 172.19.0.0/16 \
  --gateway 172.19.0.1 \
  frontend-server

$ sudo docker network ls
NETWORK ID	NAME			DRIVER		SCOPE
oab2ck3lsj2o	backend-server      	overlay		swarm
1372e2d1c92f   	bridge			bridge       	local
aeb179876301  	docker_gwbridge     	bridge       	local
qmlsfg6vjdsb	frontend-server     	overlay  	swarm
8f834d49873e  	host			host		local
2dz9wi4npgjw  	ingress             	overlay         swarm

we are finally able to deploy our first service based on SQL Server on Linux image:

$ sudo docker service create \
   --name "sql2" \
   --mount 'type=bind,src=/u01/sql2,dst=/var/opt/mssql' \
   --replicas 1 \
   --network backend-server \
   --env "ACCEPT_EULA=Y" --env "MSSQL_SA_PASSWORD=P@$$w0rd1" \
   --publish published=1500,target=1433 \
   microsoft/mssql-server-linux:2017-latest

Important settings are:

  • –name “sql2″ = Name of the service to deploy
  • –replicas 1 = We tell to the manage to deploy only on one replica of the SQL Server container at time on docker workers
  • – mount ‘type=bind,src=…,dst=…’ = Here we define the data persistence strategy. It will map the /u01/02 folder directory on the host with /var/opt/mssql directory within the container. If we shutdown or remove the container the data is persisted. If container moves to another docker node, data is still available thank to the distributed storage over NFS.
  • –network back-endserver = we will attach the sql2 service to the back-endserver user network
  • microsoft/mssql-server-linux:2017-latest = The container based-image used in this case (Latest image available for SQL Server 2017 on Linux)

After deploying the sql2 service, let’s have a look at services installed from the manager. We get interesting output including the service name, the replication mode and the listen port as well. You may notice replication mode is set to replicated. In this service model, the swarm distributes a specific number of replicas among nodes. In my context I capped the number of maximum task to 1 as discussed previously.

$ sudo docker service ls
ID                  NAME                MODE                REPLICAS            IMAGE                                      PORTS
ih4e2acqm2dm        registry            replicated          1/1                 registry:2                                 *:5000->5000/tcp
bqx1u9lc8dni        sql2                replicated          1/1                 microsoft/mssql-server-linux:2017-latest   *:1433->1433/tcp

Maybe you have noticed one additional service registry. Depending on the context, when you deploy services the correspond based-images must be available from all the nodes to be deployed. You may use images stored in public docker registry and to use a private one if you deploy internal images.

Let’s dig further by looking at the tasks associated to the sql2 service. We get other useful information as the desired state, current state and the node where the task is running.

$ sudo docker service ls
ID                  NAME                MODE                REPLICAS            IMAGE                                      PORTS
ih4e2acqm2dm        registry            replicated          1/1                 registry:2                                 *:5000->5000/tcp
bqx1u9lc8dni        sql2                replicated          1/1                 microsoft/mssql-server-linux:2017-latest   *:1433->1433/tcp

Maybe you have noticed one additional service registry. Depending on the context, when you deploy services the correspond based-images must be available from all the nodes to be deployed. You may use images stored in public docker registry and to use a private one if you deploy internal images.

Let’s dig further by looking at the tasks associated to the sql2 service. We get other useful information as the desired state, current state and the node where the task is running.

$ sudo docker service ps sql2
ID                  NAME                IMAGE                                      NODE                          DESIRED STATE       CURRENT STATE            ERROR               PORTS
zybtgztgavsd        sql2.1              microsoft/mssql-server-linux:2017-latest   sqllinux3.dbi-services.test   Running             Running 11 minutes ago

In the previous example I deployed a service that concerned only my SQL Server instance. For some scenarios it is ok but generally speaking a back-end service doesn’t come alone on container world and it is often part of a more global application service architecture. This is where stack deployment comes into play.

As stated to the Docker documentation stacks are a convenient way to automatically deploy multiple services that are linked to each other, without needing to define each one separately. Stack files include environment variables, deployment tags, the number of services and dependencies, number of tasks to deploy, related environment-specific configuration etc… If you already dealt with docker-compose files to deploy containers and dependencies you will be comfortable with stack files. The stack file is nothing more than a docker-compose file adjusted for stack deployments. I used one to deploy the app-voting application here. This application is composed to 5 services including Python, NodeJS, Java Worker, Redis Cache and of course SQL Server.

Here the result on my lab environment. My SQL Server instance is just a service that composes the stack related to my application. Once again you may use docker commands to get a picture of the stack hierarchy.

$ sudo docker stack ls
NAME                SERVICES
myapp               5

$ sudo docker stack services myapp
ID                  NAME                MODE                REPLICAS            IMAGE                                               PORTS
fo2g822czblu        myapp_worker        replicated          1/1                 127.0.0.1:5000/examplevotingapp_worker:latest
o4wj3gn5sqd2        myapp_result-app    replicated          1/1                 127.0.0.1:5000/examplevotingapp_result-app:latest   *:8081->80/tcp
q13e25byovdr        myapp_db            replicated          1/1                 microsoft/mssql-server-linux:2017-latest            *:1433->1433/tcp
rugcve5o6i7g        myapp_redis         replicated          1/1                 redis:alpine                                        *:30000->6379/tcp
tybmrowq258s        myapp_voting-app    replicated          1/1                 127.0.0.1:5000/examplevotingapp_voting-app:latest   *:8080->80/tcp

 

So, let’s finish with the following question: what is the role of DBAs in such infrastructure as code? I don’t pretend to hold the truth but here my opinion:

From an installation and configuration perspective, database images (from official editors) are often released without neither any standard nor best practices. I believe very strongly (and it seems I’m aligned with the dbi services philosophy on this point) that the responsibility of the DBA team here is to prepare, to build and to provide well-configured images as well as related deployment files (at least the database service section(s)) related to their context – simple containers and more complex environments with built-in high availability for instance.

In addition, from a management perspective, containers will not really change the game concerning the DBA daily job. They are still responsible of the core data business regardless the underlying infrastructure where database systems are running on.

In this blog post we just surfaced the Docker Swarm principles and over the time I will try to cover other important aspects DBAs may have to be aware with such infrastructures.

See you!

 

 

 

 

 

Cet article Introducing SQL Server on Docker Swarm orchestration est apparu en premier sur Blog dbi services.

Migrate Windows VM with more than 4 Disks from VMware to OVM

Fri, 2018-02-02 06:24

Suppose you got an OVA image created on VMware and the VM contains more than 4 Disks
and you have to migrate this machine from VMware to OVM.

As first step you import the OVA into the OVM in the usual way:

Bildschirmfoto 2018-02-02 um 09.43.12

You see the that the appliance was imported successfully, we have 5 disks:

Bildschirmfoto 2018-02-02 um 09.45.25

Now you create your VM from the imported appliance:

Bildschirmfoto 2018-02-02 um 09.47.34

So far so good, lets have a look on our newly created VM:

Bildschirmfoto 2018-02-02 um 09.51.22

All seems good, but if you edit the machine (we want to add a network and give a boot order for the system) you will be surprised:

Bildschirmfoto 2018-02-02 um 10.11.35

Oops, we lost a disk, what happened? You don’t make a mistake, its a restriction on OVM:
http://www.oracle.com/us/technologies/virtualization/ovm3-supported-config-max-459314.pdf states that a VM can have only 4 IDE disks in maximum, and if you import your Windows VM its considered as Xen HVM Domain Type, so you can only attach 4 disks to the VM.

And now, how can we solve the problem? Lets first try, if we can boot the system:

Bildschirmfoto 2018-02-02 um 10.25.32

Ok system is up what next? We deinstall all the VMware utilities:

Bildschirmfoto 2018-02-02 um 10.35.40For the next step we download the Oracle VM Server for x86 Windows PV Drivers – 3.4.2.0.0 for Microsoft Windows x64 (64-bit) from https://edelivery.oracle.com and install them on our Windows Box:

Bildschirmfoto 2018-02-02 um 11.04.28

After a system restart, all disks except the C: drive are gone:

Bildschirmfoto 2018-02-02 um 11.08.51

We shutdown the Windows Box and put the VM into the Xen HVM PV Driver Domain:

Bildschirmfoto 2018-02-02 um 12.07.39

After that we can add our lost disk without any problems:

Bildschirmfoto 2018-02-02 um 12.09.00

Ok lets restart the system look what happens:

Bildschirmfoto 2018-02-02 um 12.16.48

Ok all disks are there, we can now bring them online:

Bildschirmfoto 2018-02-02 um 12.59.29

After a reboot we can see that our drives are used correctly:

ARC3: Archival started
LGWR: STARTING ARCH PROCESSES COMPLETE
Thread 1 opened at log sequence 4743
Current log# 3 seq# 4743 mem# 0: I:\ORADATA\TESTIDM1\REDO03.LOG
Successful open of redo thread 1

 

 

 

 

Cet article Migrate Windows VM with more than 4 Disks from VMware to OVM est apparu en premier sur Blog dbi services.

Pages