Yann Neuhaus

Subscribe to Yann Neuhaus feed
dbi services technical blog
Updated: 9 hours 48 min ago

oracle update-dcscomponents failure during ODA patching to 19.11

Wed, 2021-07-21 10:03

You might know that since ODA 19.10, the ODA metadata database is now stored in a MySQL database and not in the apache derby database any more. During a patching from a release earlier than 19.10 to 19.10 or 19.11, the apache derby database will be migrated to a mysql database. This is performed during the odacli update-dcscomponent command. Starting 19.10 DCS Agent will then use MySQL server to display any metadata information, as for example when running odacli list-databases command. MySQL server is automatically installed and configured during the dcscomponent update. MySQL server does not run with root user. A new operating system user, called odamysql, will be created. This user is not allowed to run any shell and thus can not be used for logging. systemctl oda-mysql service can be used to stop and start MySQL server. Stopping MySQL server will stop the DCS Agent as well.


I recently patched an ODA from version 19.9 to 19.11. During the patching, update-dcscomponent failed on the metadata migration which blocked the entire patching project. I could find a workaround that I wanted to share with you.

Failure during odacli update-dcscomponents

Running the update of all components failed into following error:

[root@ODA01 patch]# /opt/oracle/dcs/bin/odacli update-dcscomponents -v 19.11.0.0.0
DCS-10008:Failed to update DCScomponents: 19.11.0.0.0
 
Internal error while patching the DCS components :
DCS-10001:Internal error encountered: DCS-10001:Internal error encountered: Failed to configure MySQL for ODA...DCS-Agent shutdown is successful.
[root@ODA01 patch]# 
Failure in DCS Admin log file

During the troubleshooting, I could see that the MySQL server package has been installed successfully but found, in the dcs-admin.log stored in /opt/oracle/dcs/log directory, the following errors related to the MySQL configuration:

2021-07-20 09:18:45,177 DEBUG [dw-24 - PUT /dcscomponents] [] c.o.d.a.u.MySQLTask: Configure MySQL for ODA
2021-07-20 09:18:45,177 DEBUG [dw-24 - PUT /dcscomponents] [] c.o.d.c.u.CommonsUtils: 
run: cmd= '[/opt/oracle/dcs/mysql/etc/configodamysql.sh]'
2021-07-20 09:18:50,993 DEBUG [Thread-30] [] c.o.d.c.u.CommonsUtils: Output :
Initialize MySQL - Failed
2021-07-20 09:18:50,993 DEBUG [dw-24 - PUT /dcscomponents] [] c.o.d.c.u.c.DCSProcessBuilder: Return code: 1
2021-07-20 09:18:50,993 DEBUG [dw-24 - PUT /dcscomponents] [] c.o.d.c.u.c.CommandExecutor: Return code: 1
2021-07-20 09:18:50,993 ERROR [dw-24 - PUT /dcscomponents] [] c.o.d.a.u.MySQLTask: Exception : DCS-10001:Internal error encountered: Failed to configure MySQL for ODA..
ncountered: Failed to configure MySQL for ODA..
2021-07-20 09:18:50,994 ERROR [dw-24 - PUT /dcscomponents] [] c.o.d.a.r.DcsComponentServiceTaskFactory: Current status - DCS-Agent shutdown is successful. 
2021-07-20 09:18:50,997 INFO [dw-24 - PUT /dcscomponents] [] c.o.d.a.r.DcsComponentServiceTaskFactory: Components {zookeeper ,MySQL, Agent, Dcs-Cli, Dcs-Controller} upgrade operation completed.
2021-07-20 09:18:50,997 ERROR [dw-24 - PUT /dcscomponents] [] c.o.d.a.r.DcsComponentLifecycleApi: Internal error while patching the DCS components  


Failure in MySQL log files

Going further with the troubleshooting, I could find the following in the MySQL log file:

[root@ODA01 patch]# cd /opt/oracle/dcs/mysql/log/
  
[root@ODA01 log]# ls -ltrh
total 4.0K
-rw-r----- 1 odamysql odamysql 2.3K Jul 20 09:18 mysqldb.log
  
[root@ODA01 log]# more mysqldb.log
2021-07-20T07:18:45.946735Z 0 [System] [MY-013169] [Server] /opt/oracle/dcs/mysql/bin/mysqld (mysqld 8.0.23-commercial) initializing of server in progress as process 66381
2021-07-20T07:18:45.952983Z 1 [System] [MY-013576] [InnoDB] InnoDB initialization has started.
2021-07-20T07:18:45.959155Z 1 [Warning] [MY-012582] [InnoDB] io_setup() failed with EAGAIN. Will make 5 attempts before giving up.
2021-07-20T07:18:45.959220Z 1 [Warning] [MY-012583] [InnoDB] io_setup() attempt 1.
2021-07-20T07:18:46.459426Z 1 [Warning] [MY-012583] [InnoDB] io_setup() attempt 2.
2021-07-20T07:18:46.959669Z 1 [Warning] [MY-012583] [InnoDB] io_setup() attempt 3.
2021-07-20T07:18:47.459946Z 1 [Warning] [MY-012583] [InnoDB] io_setup() attempt 4.
2021-07-20T07:18:47.960193Z 1 [Warning] [MY-012583] [InnoDB] io_setup() attempt 5.
2021-07-20T07:18:48.460487Z 1 [Warning] [MY-012582] [InnoDB] io_setup() failed with EAGAIN. Will make 5 attempts before giving up.
2021-07-20T07:18:48.460594Z 1 [Warning] [MY-012583] [InnoDB] io_setup() attempt 1.
2021-07-20T07:18:48.960802Z 1 [Warning] [MY-012583] [InnoDB] io_setup() attempt 2.
2021-07-20T07:18:49.461068Z 1 [Warning] [MY-012583] [InnoDB] io_setup() attempt 3.
2021-07-20T07:18:49.961352Z 1 [Warning] [MY-012583] [InnoDB] io_setup() attempt 4.
2021-07-20T07:18:50.461608Z 1 [Warning] [MY-012583] [InnoDB] io_setup() attempt 5.
2021-07-20T07:18:50.961904Z 1 [ERROR] [MY-012584] [InnoDB] io_setup() failed with EAGAIN after 5 attempts.
2021-07-20T07:18:50.962203Z 1 [ERROR] [MY-012954] [InnoDB] Cannot initialize AIO sub-system
2021-07-20T07:18:50.962314Z 1 [ERROR] [MY-012929] [InnoDB] InnoDB Database creation was aborted with error Generic error. You may need to delete the ibdata1 file before trying to start up again.
2021-07-20T07:18:50.962557Z 0 [ERROR] [MY-010020] [Server] Data Dictionary initialization failed.
2021-07-20T07:18:50.962716Z 0 [ERROR] [MY-013236] [Server] The designated data directory /opt/oracle/dcs/mysql/data/ is unusable. You can remove all files that the server added to it.
2021-07-20T07:18:50.962890Z 0 [ERROR] [MY-010119] [Server] Aborting
2021-07-20T07:18:50.963406Z 0 [System] [MY-010910] [Server] /opt/oracle/dcs/mysql/bin/mysqld: Shutdown complete (mysqld 8.0.23-commercial)  MySQL Enterprise Server - Commercial.


It seems there were troubles with InnoDB function because the ODA has no Asynchronous IO available. This is another question why and I opened a SR on the Oracle support portal to have an explanation and ensure my next workaround is adequate.

Workaround

The workaround is to add innodb_use_native_aio = 0 parameter to both mysql cnf files : mysqldb_ssl.cnf and mysqldb.cnf and manually run configodamysql.sh before executing again odacli update-dcscomponents command.

Adding innodb_use_native_aio = 0 parameter to both mysql cnf files
[root@ODA01 /u01]# cd /opt/oracle/dcs/mysql/etc
  
[root@ODA01 etc]# ls -ltrh
total 40K
-rwxr-x--- 1 odamysql odamysql  892 May 31 04:05 oda-mysql.service
-rwxr-x--- 1 odamysql odamysql  596 May 31 04:05 mysqldb_ssl.cnf.bak
-rwxr-x--- 1 odamysql odamysql  399 May 31 04:05 mysqldb.cnf.bak
-rwxr-x--- 1 odamysql odamysql 1.9K May 31 04:05 getodamysqlport.sh
-rwxr-x--- 1 odamysql odamysql   55 May 31 04:05 enable_auth_socket.sql
-rwxr-x--- 1 odamysql odamysql 6.6K May 31 04:05 configodamysql.sh
-rwxr-x--- 1 odamysql odamysql 2.4K May 31 04:05 cleanodamysql.sh
-rwxr-x--- 1 odamysql odamysql  399 Jul 20 11:57 mysqldb.cnf
-rwxr-x--- 1 odamysql odamysql  596 Jul 20 11:57 mysqldb_ssl.cnf
  
[root@ODA01 etc]# cp -p mysqldb_ssl.cnf mysqldb_ssl.cnf.bak.20210720
  
[root@ODA01 etc]# vi mysqldb_ssl.cnf
  
[root@ODA01 etc]# diff mysqldb_ssl.cnf mysqldb_ssl.cnf.bak.20210720
16d15
< innodb_use_native_aio = 0
  
[root@ODA01 etc]# cp -p mysqldb.cnf mysqldb.cnf.bak.20210720
  
[root@ODA01 etc]# vi mysqldb.cnf
  
[root@ODA01 etc]# diff mysqldb.cnf mysqldb.cnf.bak.20210720
12d11
< innodb_use_native_aio = 0
  
[root@ODA01 etc]# grep innodb_use_native_aio *
mysqldb.cnf:innodb_use_native_aio = 0
mysqldb_ssl.cnf:innodb_use_native_aio = 0


Start DCS Agent again

DCS Agent has been stopped during dcs-components update and needs to be restarted.

[root@ODA01 etc]# ps -ef | grep dcs-agent
root     90513 25271  0 11:58 pts/3    00:00:00 grep --color=auto dcs-agent
  
[root@ODA01 etc]# systemctl start initdcsagent
  
[root@ODA01 etc]# ps -ef | grep dcs-agent
root     90893     1  0 11:58 ?        00:00:00 /bin/sh -c . /opt/oracle/dcs/bin/setupJreAgent.sh;$JAVA -Xms128m -Xmx256m -XX:MetaspaceSize=128m -XX:MaxMetaspaceSize=256m -XX:+DisableExplicitGC -XX:ParallelGCThreads=4 -XX:+PrintGCDetails -XX:+PrintHeapAtGC -XX:+PrintTenuringDistribution -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xloggc:/opt/oracle/dcs/log/gc-dcs-agent-%t-%p.log -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=10M -Doracle.security.jps.config=/opt/oracle/dcs/agent/jps-config.xml -jar  /opt/oracle/dcs/bin/dcs-agent*.jar server /opt/oracle/dcs/conf/dcs-agent.json >/opt/oracle/dcs/log/dcsagent-stdout_$(date +%Y%m%d-%H%M).log 2>/opt/oracle/dcs/log/dcsagent-stderr_$(date +%Y%m%d-%H%M).log
root     90894 90893 99 11:58 ?        00:00:04 /opt/oracle/dcs/java/1.8.0_281/bin/java -Xms128m -Xmx256m -XX:MetaspaceSize=128m -XX:MaxMetaspaceSize=256m -XX:+DisableExplicitGC -XX:ParallelGCThreads=4 -XX:+PrintGCDetails -XX:+PrintHeapAtGC -XX:+PrintTenuringDistribution -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xloggc:/opt/oracle/dcs/log/gc-dcs-agent-%t-%p.log -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=10M -Doracle.security.jps.config=/opt/oracle/dcs/agent/jps-config.xml -jar /opt/oracle/dcs/bin/dcs-agent-19.9.0.0.0.jar server /opt/oracle/dcs/conf/dcs-agent.json
root     91038 25271  0 11:59 pts/3    00:00:00 grep --color=auto dcs-agent
[root@ODA01 etc]# 


Manually execute the configodamysql.sh script
[root@ODA01 /]# cd /opt/oracle/dcs/mysql/etc/
  
[root@ODA01 etc]# ls -l
total 48
-rwxr-x--- 1 odamysql odamysql 2393 May 31 04:05 cleanodamysql.sh
-rwxr-x--- 1 odamysql odamysql 6731 May 31 04:05 configodamysql.sh
-rwxr-x--- 1 odamysql odamysql   55 May 31 04:05 enable_auth_socket.sql
-rwxr-x--- 1 odamysql odamysql 1923 May 31 04:05 getodamysqlport.sh
-rwxr-x--- 1 odamysql odamysql  425 Jul 20 12:00 mysqldb.cnf
-rwxr-x--- 1 odamysql odamysql  399 May 31 04:05 mysqldb.cnf.bak
-rwxr-x--- 1 odamysql odamysql  399 Jul 20 11:57 mysqldb.cnf.bak.20210720
-rwxr-x--- 1 odamysql odamysql  622 Jul 20 11:58 mysqldb_ssl.cnf
-rwxr-x--- 1 odamysql odamysql  596 May 31 04:05 mysqldb_ssl.cnf.bak
-rwxr-x--- 1 odamysql odamysql  596 Jul 20 11:57 mysqldb_ssl.cnf.bak.20210720
-rwxr-x--- 1 odamysql odamysql  892 May 31 04:05 oda-mysql.service
  
[root@ODA01 etc]# ./configodamysql.sh
/opt/oracle/dcs/mysql/data directory should be empty to proceed further with ODA MySQL post install configuration
  
[root@ODA01 etc]# cd ../data
  
[root@ODA01 data]# ls
binlog.index
  
[root@ODA01 data]# rm binlog.index -f
  
[root@ODA01 data]# cd ../etc
  
[root@ODA01 etc]# ./configodamysql.sh
Initialize MySQL - Done
Starting MySQL for ODA - Done
mysql: [Warning] Using a password on the command line interface can be insecure.
Enable server-side auth_socket authentication plugin for MySQL root user - Done
Generating RSA private key, 2048 bit long modulus
................+++
.................................................................................................................................................................................................+++
e is 65537 (0x10001)
Generating a 2048 bit RSA private key
...................................................................................+++
.........................................+++
writing new private key to 'server-key.pem'
-----
writing RSA key
Signature ok
subject=/CN=ODA01
Getting CA Private Key
Generating a 2048 bit RSA private key
....................................................+++
...+++
writing new private key to 'dcsagent-client-key.pem'
-----
writing RSA key
Signature ok
subject=/CN=dcsagent
Getting CA Private Key
Generating a 2048 bit RSA private key
.................................+++
.+++
writing new private key to 'rhp-client-key.pem'
-----
writing RSA key
Signature ok
subject=/CN=rhp user
Getting CA Private Key
All certificates : OK
mkdir: cannot create directory ‘/opt/oracle/dcs/odamysqlcert/’: File exists
Generate certificates for CA, MySQL Server and Clients (DCS Agent & RHP) - Done
Enable SSL and Restart MySQL - Done
Create database dcsagentdb - Done
Create user dcsagent - Done
Grant privileges to user dcsagent - Done
Create database GHSUSER21 - Done
Create user GHSUSER21 - Done
Grant privileges to user GHSUSER21 - Done
Create schemas, users for CS Agent & RHP - Done
Warning: Unable to load '/usr/share/zoneinfo/iso3166.tab' as time zone. Skipping it.
Warning: Unable to load '/usr/share/zoneinfo/leapseconds' as time zone. Skipping it.
Warning: Unable to load '/usr/share/zoneinfo/tzdata.zi' as time zone. Skipping it.
Warning: Unable to load '/usr/share/zoneinfo/zone.tab' as time zone. Skipping it.
Warning: Unable to load '/usr/share/zoneinfo/zone1970.tab' as time zone. Skipping it.
Load the timezone tables in mysql database - Done
MySQL installed version details :
/opt/oracle/dcs/mysql/bin/mysql  Ver 8.0.23-commercial for Linux on x86_64 (MySQL Enterprise Server - Commercial)
MySQL installation and configuration.....Completed


We can check that mySQL server daemon is now running :

[root@ODA01 etc]# ps -ef | grep mysql
odamysql 26130     1 11 12:07 ?        00:00:01 /opt/oracle/dcs/mysql/bin/mysqld --defaults-file=/opt/oracle/dcs/mysql/etc/mysqldb.cnf
root     27198 35360  0 12:08 pts/4    00:00:00 grep --color=auto mysql


Execute again odacli update-dcscomponents
[root@ODA01 etc]# /opt/oracle/dcs/bin/odacli update-dcscomponents -v 19.11.0.0.0
{
  "jobId" : "ff4d652d-b394-4c43-9195-a0740d1a08e6",
  "status" : "Success",
  "message" : "Update-dcscomponents is successful on all the node(s):DCS-Agent shutdown is successful. MySQL already on required version. Metadata migration is successful. Metadata schema update is done. dcsagent RPM upgrade is successful.  dcscli RPM upgrade is successful.  dcscontroller RPM upgrade is successful.  Successfully ran setupAgentAuth.sh zookeeper RPM upgrade is successful.  DCS-Agent restart is successful. ",
  "reports" : null,
  "createTimestamp" : "July 20, 2021 12:09:47 PM CEST",
  "description" : "Update-dcscomponents job completed and is not part of Agent job list",
  "updatedTime" : "July 20, 2021 12:11:48 PM CEST"
}


And we can see that the dcscomponents update is now successfull. I could also test that the Apache derby database repository is no more existing :

[root@ODA01 etc]# cd /opt/oracle/dcs/
[root@ODA01 dcs]# ls
agent  bin  clients  commonstore  conf  configuredcs.pl  dcsagent_wallet  dcscli  dcs-ui  docs  ft  Inventory  java  log  mesg  mysql  odajavasdk  odamysqlcert  oracle.ahf  rdbaas  sample
[root@ODA01 dcs]# 


And that new MySQL metadata database is running fine executing any commands like odacli list-databases.

As explained previously, I opened a SR on oracle support to know the root cause, to have feedback on this workaround and in case of any other resolution. I will update this post accordingly as soon as I have got a feedback.

Cet article oracle update-dcscomponents failure during ODA patching to 19.11 est apparu en premier sur Blog dbi services.

odabr tool : how to resolve efi device backup failure?

Wed, 2021-07-21 08:54

During all my ODA projects where I need to perform patching operation, I’m using odabr tool. This tool is really easy to use and well designed. It offers the possibility to create snapshot on the /, /u01 and /opt LVM file systems as well as a physical copy (rsync) of the files to an external destination storage as NFS. This is powerful in case you would like to get just one file back. The tool, as well as the user guide, can be found on Oracle Doc ID 2466177.1. I recently faced some errors when the tool is performing EFI device backup, and thus resulting in failing without been able to create any snapshot. I wanted to share my finding here, and maybe for you to win some troubleshooting time.

How to install the tool?

Easy to install. You just need to download from the Doc ID the odabr-2.0.1-66.noarch.rpm package.

Check that the package is not already installed:

[root@ODA01 patch]# rpm -qa | grep -i odabr
[root@ODA01 patch]#

Install the tool:
[root@ODA01 patch]# ls -ltrh
total 5.2G
-rwxrwxrwx 1 root root 5.2G Jul 19 15:00 p30403673_1911000_Linux-x86-64.zip
-rw-r--r-- 1 root root 27K Jul 21 09:00 odabr-2.0.1-66.noarch.rpm
 
[root@ODA01 patch]# rpm -ivh odabr-2.0.1-66.noarch.rpm
warning: odabr-2.0.1-66.noarch.rpm: Header V4 RSA/SHA1 Signature, key ID 939112d6: NOKEY
Preparing... ################################# [100%] Updating / installing...
1:odabr-2.0.1-66 ################################# [100%]  
odabr-2.0.1.66 has been installed on /opt/odabr succesfully!
 
[root@ODA01 patch]# rpm -qa | grep -i odabr
odabr-2.0.1-66.noarch

Making a snapshot backup

First we will need to check if we have enough place in the physical volume.

[root@ODA01 patch]# pvdisplay
--- Physical volume ---
PV Name /dev/md126p3
VG Name VolGroupSys
PV Size <446.15 GiB / not usable 23.00 MiB
Allocatable yes
PE Size 32.00 MiB
Total PE 14276
Free PE 5444
Allocated PE 8832
PV UUID VyihMy-xEoI-SgKi-Apku-PBIS-BTnf-v2tsfU

In our case we have 5444 Free PE, one PE has a size of 32 MiB, thus making here about 170 GB available for snapshots.

By default the tool is assigning 190 GB snapshot to the 3 file systems (/, /u01 and /opt). If we have enough free space, we could simply run the following command to create snapshots backup:
[root@ODA01 patch]# /opt/odabr/odabr backup -snap

In our case knowing we only have 170 GB free, we might need to adjust the snapshots size.

With a df command we will first check how many place is needed :
[root@ODA01 patch]# df -h / /u01 /opt
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/VolGroupSys-LogVolRoot 30G 7.1G 21G 26% /
/dev/mapper/VolGroupSys-LogVolU01 118G 77G 36G 69% /u01
/dev/mapper/VolGroupSys-LogVolOpt 99G 53G 41G 57% /opt

From there we will size each snapshot size (rsize for /, usize for /u01 and osize for /opt) :
[root@ODA01 patch]# /opt/odabr/odabr backup -snap -rsize 15 -usize 90 -osize 60

EFI device backup issue

Running it could fail with EFI device backup issue:
[root@ODA01 patch]# /opt/odabr/odabr backup -snap -rsize 15 -usize 90 -osize 60
INFO: 2021-07-21 09:08:08: Please check the logfile '/opt/odabr/out/log/odabr_84675.log' for more details
 
 
│▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒│
odabr - ODA node Backup Restore - Version: 2.0.1-66
Copyright Oracle, Inc. 2013, 2021
--------------------------------------------------------
Author: Ruggero Citton
RAC Pack, Cloud Innovation and Solution Engineering Team
│▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒│
 
INFO: 2021-07-21 09:08:08: Checking superuser
INFO: 2021-07-21 09:08:08: Checking Bare Metal
 
INFO: 2021-07-21 09:08:08: Removing existing LVM snapshots
WARNING: 2021-07-21 09:08:08: LVM snapshot for 'opt' does not exist
WARNING: 2021-07-21 09:08:08: LVM snapshot for 'u01' does not exist
WARNING: 2021-07-21 09:08:08: LVM snapshot for 'root' does not exist
 
INFO: 2021-07-21 09:08:08: Checking current OS version
INFO: 2021-07-21 09:08:08: Checking LVM restore backgroud process
INFO: 2021-07-21 09:08:08: Checking LVM size
 
INFO: 2021-07-21 09:08:08: Boot device backup
INFO: 2021-07-21 09:08:08: Getting EFI device
INFO: 2021-07-21 09:08:08: ...step1 - unmounting EFI
INFO: 2021-07-21 09:08:08: ...step2 - making efi device backup
SUCCESS: 2021-07-21 09:08:10: ...EFI device backup saved as '/opt/odabr/out/hbi/efi.img'
INFO: 2021-07-21 09:08:10: ...step3 - checking EFI device backup
ERROR: 2021-07-21 09:08:10: Error running fsck over /opt/odabr/out/hbi/efi.img
ERROR: 2021-07-21 09:08:10: Command: 'fsck -a /opt/odabr/out/hbi/efi.img' failed as fsck from util-linux 2.23.2 fsck.fat 3.0.20 (12 Jun 2013) 0x25: Dirty bit is set. Fs was not properly unmounted and some data may be corrupt. Automatically removing dirty bit. Performing changes. /opt/odabr/out/hbi/efi.img: 20 files, 1100/63965 clusters
INFO: 2021-07-21 09:08:10: Mounting EFI back
ERROR: 2021-07-21 09:08:10: Backup not completed, exiting...

See the error :
ERROR: 2021-07-21 09:08:10: Error running fsck over /opt/odabr/out/hbi/efi.img
ERROR: 2021-07-21 09:08:10: Command: 'fsck -a /opt/odabr/out/hbi/efi.img' failed as fsck from util-linux 2.23.2 fsck.fat 3.0.20 (12 Jun 2013) 0x25: Dirty bit is set. Fs was not properly unmounted and some data may be corrupt. Automatically removing dirty bit. Performing changes. /opt/odabr/out/hbi/efi.img: 20 files, 1100/63965 clusters

This mainly means that the the image being checked was not properly unmounted and contains dirty bits. This is explained as well in Doc ID 2679511.1.

Solution

In order to resolve this issue, simply umount /boot/efi before running the tool.

[root@ODA01 hbi]# df -h | grep /boot/efi
/dev/md126p1 500M 8.6M 492M 2% /boot/efi
 
[root@ODA01 hbi]# umount /boot/efi
 
[root@ODA01 hbi]# df -h | grep /boot/efi
[root@ODA01 hbi]#

Running odabr tool again will make the snapshots creation now successful:

[root@ODA01 hbi]# /opt/odabr/odabr backup -snap -rsize 15 -usize 90 -osize 60
INFO: 2021-07-21 09:17:27: Please check the logfile '/opt/odabr/out/log/odabr_18986.log' for more details
 
 
│▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒│
odabr - ODA node Backup Restore - Version: 2.0.1-66
Copyright Oracle, Inc. 2013, 2021
--------------------------------------------------------
Author: Ruggero Citton
RAC Pack, Cloud Innovation and Solution Engineering Team
│▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒│
 
INFO: 2021-07-21 09:17:27: Checking superuser
INFO: 2021-07-21 09:17:27: Checking Bare Metal
 
INFO: 2021-07-21 09:17:27: Removing existing LVM snapshots
WARNING: 2021-07-21 09:17:27: LVM snapshot for 'opt' does not exist
WARNING: 2021-07-21 09:17:27: LVM snapshot for 'u01' does not exist
WARNING: 2021-07-21 09:17:27: LVM snapshot for 'root' does not exist
 
INFO: 2021-07-21 09:17:27: Checking current OS version
INFO: 2021-07-21 09:17:27: Checking LVM restore backgroud process
INFO: 2021-07-21 09:17:27: Checking LVM size
 
INFO: 2021-07-21 09:17:27: Boot device backup
INFO: 2021-07-21 09:17:27: Getting EFI device
WARNING: 2021-07-21 09:17:27: Failing checking efi device
INFO: 2021-07-21 09:17:27: ...no EFI device found
INFO: 2021-07-21 09:17:27: Getting boot device
INFO: 2021-07-21 09:17:27: ...step1 - making boot device backup using tar
SUCCESS: 2021-07-21 09:17:31: ...boot content saved as '/opt/odabr/out/hbi/boot.tar.gz'
INFO: 2021-07-21 09:17:31: ...step2 - unmounting boot
INFO: 2021-07-21 09:17:31: ...step3 - making boot device backup using dd
SUCCESS: 2021-07-21 09:17:33: ...boot device backup saved as '/opt/odabr/out/hbi/boot.img'
INFO: 2021-07-21 09:17:33: ...step4 - mounting boot
INFO: 2021-07-21 09:17:33: ...step5 - checking boot device backup
 
INFO: 2021-07-21 09:17:33: Making OCR physical backup
INFO: 2021-07-21 09:17:38: ...ocr backup saved as '/opt/odabr/out/hbi/ocrbackup_18986.bck'
SUCCESS: 2021-07-21 09:17:38: OCR physical backup created successfully
INFO: 2021-07-21 09:17:38: OCR export backup
INFO: 2021-07-21 09:17:40: ...ocr export saved as '/opt/odabr/out/hbi/ocrexport_18986.bck'
SUCCESS: 2021-07-21 09:17:40: OCR export backup created successfully
 
INFO: 2021-07-21 09:17:40: Making LVM snapshot backup
SUCCESS: 2021-07-21 09:17:40: ...snapshot backup for 'opt' created successfully
SUCCESS: 2021-07-21 09:17:41: ...snapshot backup for 'u01' created successfully
SUCCESS: 2021-07-21 09:17:42: ...snapshot backup for 'root' created successfully
SUCCESS: 2021-07-21 09:17:42: LVM snapshots backup done successfully

Once the snapshot have been created, we can mount the /boot/efi file system again:
[root@ODA01 hbi]# df -h | grep /boot/efi
[root@ODA01 hbi]# mount /boot/efi
[root@ODA01 hbi]# df -h | grep /boot/efi
/dev/md126p1 500M 8.6M 492M 2% /boot/efi

Checking snapshot backups

To display the current running snapshot, use the following command:
[root@ODA01 hbi]# /opt/odabr/odabr infosnap
 
│▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒│
odabr - ODA node Backup Restore - Version: 2.0.1-66
Copyright Oracle, Inc. 2013, 2021
--------------------------------------------------------
Author: Ruggero Citton
RAC Pack, Cloud Innovation and Solution Engineering Team
│▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒│
 
 
LVM snap name Status COW Size Data%
------------- ---------- ---------- ------
root_snap active 15.00 GiB 0.01%
opt_snap active 60.00 GiB 0.01%
u01_snap active 90.00 GiB 0.01%

Deleting snapshot backups

To delete the snapshot backups, run:
[root@ODA01 ~]# /opt/odabr/odabr delsnap
INFO: 2021-07-21 11:12:18: Please check the logfile '/opt/odabr/out/log/odabr_35232.log' for more details
 
INFO: 2021-07-21 11:12:18: Removing LVM snapshots
INFO: 2021-07-21 11:12:18: ...removing LVM snapshot for 'opt'
SUCCESS: 2021-07-21 11:12:18: ...snapshot for 'opt' removed successfully
INFO: 2021-07-21 11:12:18: ...removing LVM snapshot for 'u01'
SUCCESS: 2021-07-21 11:12:19: ...snapshot for 'u01' removed successfully
INFO: 2021-07-21 11:12:19: ...removing LVM snapshot for 'root'
SUCCESS: 2021-07-21 11:12:19: ...snapshot for 'root' removed successfully
SUCCESS: 2021-07-21 11:12:19: Remove LVM snapshots done successfully

Cet article odabr tool : how to resolve efi device backup failure? est apparu en premier sur Blog dbi services.

Manage ODA patching with Data Guard or Dbvisit Standby

Mon, 2021-07-19 11:33
Introduction

Building an Oracle infrastructure today without thinking about a Disaster Recovery solution (DR) is quite rare. This became obvious that a backup or a dump will not help if you do not know where to restore or import once your production server is down. And restoring a backup is definitely not the fastest way to bring back your database to life. As a consequence, Data Guard or Dbvisit Standby, depending on which edition you’re running, is a must have. And these tools are much more than Disaster Recovery solutions. You can use them for planned maintenance as well, or if you need to move your server to another datacenter for example.

Oracle Database Appliance does not deviate from that, and the bare minimum configuration is composed of 2 ODAs, 1 for production, and 1 for DR and test/dev. DR feature being implemented with Data Guard or Dbvisit Standby.

Using Data Guard or Dbvisit Standby also helps when it comes to patching. Because it’s a good practice to patch from time to time your ODAs. You may ask how to proceed when using Data Guard or Dbvisit Standby, here is how I do that since years.

The 3-step patch on ODA

You apply an ODA patch step-by-step:

  • a few pre-patches for updating the dcs components
  • a system patch for updating BIOS, ILOM, Operating System and Grid Infrastructure
  • a storage patch for updating data disk firmwares
  • a DB patch for updating DB homes, this one being applied multiple times if you have multiple DB homes

The patch can take quite a long time, you need to plan a minimum of 3/4 hours for a single-node ODA. If you add preparing the ODA prior patching, troubleshooting and doing the sanity checks after applying the patch, you much likely need 1 complete day for patching. Downtime may vary, but as at least one or 2 reboots are needed, I usually consider the full day of downtime. Yes this is huge, but this is real life.

Furthermore, if you don’t patch often enough, a single patch will probably not do the job. Patches are not always cumulative, and you sometimes need to apply 3 or 4 patches to upgrade to the latest version, significantly increasing the time to patch and the associated downtime.

As if it were not already complicated, you can encounter problems when patching, and get stuck for a while before finding a solution or a workaround. But don’t blame Oracle for that: who can bundle such a variety of updates (OS, BIOS, firmwares, ILOM, GI, DB) in just one patch? Oracle database has always been a powerfull RDBMS, but with a high degree of complexity. Adding the GI layer, Linux OS and hardware updates definitely makes patching a tough task.

Patching strategy when using a DR solution

Patching can be your nightmare… or not. It totally depends on how you manage these patches.

First of all, I would recommend to only patch an ODA where no primary is running on it. And this is only possible if you use Data Guard or Dbvisit Standby: plan a switchover of the primaries to the other ODA before patching. If something goes wrong during patching, or if it takes more time than planned, it won’t have any impact on your business. You may just miss your standby databases for hours, but this is normally something you can manage. Highly critical databases may use multiple standbys in order to keep maximum safety during patches.

I would also recommend to keep one test primary for each DB home on each ODA (I’m used to create a DBTEST on each ODA when deploying, and keep it for testing purpose). This primary will be patched to validate the complete patching process.

When you will apply the patch on this first ODA, you can eventually stop the patching process before patching the DB homes in order to keep them with the same version as the other ODA. Or you can decide to apply this DB home patch: it will update the binaries, but it will not be able to apply the datapatch on the databases themselves. It doesn’t matter AS SOON AS YOU DO NOT SWITCH BACK THE PRIMARIES TO THIS PATCHED ODA. If you decide to apply the DB home patches, and then do the switchover to this updated ODA, your binaries will not be in sync with the catalog anymore, and it could lead to several problems (especially regarding Java code inside the DB). So applying the complete patch is OK if your keep only standby databases on this ODA, until you patch the other ODA.

It’s not recommended to wait more than few weeks for patching the other ODA. So once you successfully patched the first one and you have waited few days to make sure everything is OK with this patch, switchover all the primaries to the patched ODA. Once done, you now need to first apply the DB home patch with update-dbhome on these primaries. You can also use a manual datapatch. This is mandatory to update the catalog to match the binaries’ version. Then apply the complete patch to your other ODA. Once done, both ODAs are up to date and you can dispatch your databases again on both servers.

I would highly recommend to verify on each database if the patch has been applied correctly. Sometimes a manual datapatch is needed, so don’t hesitate if something went wrong with odacli update-dbhome:

set serverout on
exec dbms_qopatch.get_sqlpatch_status;
...
Patch Id : 29972716
Action : APPLY
Action Time : 14-OCT-2020 10:59:29
Description : DATABASE BUNDLE PATCH 12.1.0.2.191015
Logfile :
/u01/app/oracle/cfgtoollogs/sqlpatch/29972716/23132651/29972716_apply_DEV121_202
0Oct14_10_58_32.log
Status : SUCCESS

And when reimaging?

Sometimes reimaging is a better idea than applying patches. Reimaging may be faster than applying several patches, and there is much less troubleshooting as you cleanse everything and restart from scratch. Reimaging can only be considered if you can switch all your primaries to another ODA for several days. You then need to remove the standby configuration from Data Guard (only removing the standby is OK), because your standby database will not be available for hours/days (actually you will rebuild it).

When you do the reimage, you cannot decide which patch you will apply on top of the DB home, it’s the one that will come with the global version. So you won’t be able to immediately switch back your primaries after patching. For example, patching from 18.5 to 19.11 with a reimaging (it makes sense because the gap is important) will bring a 18.14 DB home that can cause problems with a 18.5 database’s catalog.

Prior reimaging, I would recommend to backup to an external filesystem the controlfile autobackup of the standby, it also includes the latest spfile. Then you don’t need to create again the spfile from a primary pfile, and you will not forget to restore a STANDBY controlfile because your controlfile backup is already a standby controlfile.

And restoring the standby database with primary backup is OK if you don’t want to use a RMAN DUPLICATE.

Why can I safely run a mounted standby with different binaries but not using it as a primary?

Normaly you should only run binaries and database’s catalog with the same version. But running a standby with different binaries’ version is actually OK. This is because the catalog is not opened on a standby: catalog resides in SYSTEM’s tablespace, and a mounted database does not open any datafile. You may also notice that applying the datapatch on a primary is only possible if you can open your old database with newer binaries, unless you would never be able to update this catalog…

There is no risk opening a Standby in READ ONLY mode with different binaries, or if you’re using Active Guard option, but some advanced features may not work correctly.

Data Guard vs Dbvisit Standby

Data Guard being included with Enterprise Edition, there is no cost to have (at least) one standby for each primary (unless the cost of the storage). Don’t hesitate to give each primary a standby, for Disaster Recovery but also for being able to patch with this method.

Dbvisit licensing metric is per database, so you may only consider using standby for productions. But as Standard Edition and Dbvisit Standby are quite inexpensive (compared to Enterprise Edition), buying extra licenses for test and dev databases is definitely a brilliant idea in order to release your ODA from primaries when it comes to patching.

Conclusion

Data Guard and Dbvisit Standby are much more than DR solutions: they simplify the patching of your ODAs, and make reimaging possible. This definitely improves your ODAs lifecycle management.

Cet article Manage ODA patching with Data Guard or Dbvisit Standby est apparu en premier sur Blog dbi services.

Creating an Oracle Snapshot Copy PDB on top of ASM

Fri, 2021-07-16 11:04

Creating a snapshopt copy Pluggable Database (PDB) on top of Automatic Storage Management (ASM) is not possible, because you need to have a filesystem which supports sparse files. According the following MOS-Note sparse files are supported on Automatic Storage Management Cluster File System (ACFS) since 12.2.:

12.2 Oracle ACFS Sparse Files Enhancements (Doc ID 2200264.1)

However, I thought it is more fun to create a PDB Snapshot Copy on a XFS-filesystem, which I wanted to create as a 3rd party filesystem on top of Oracle ASM Dynamic Volume Manager (ADVM) volumes.

REMARK1: See the steps at the end of this Blog on how to create a PDB Snapshot Copy on top of ACFS.
REMARK2: All tests have been done on the Oracle Cloud with a DB system running as a VM.

Here’s what I did:

1. Create a XFS-filesystem of size 20G on ASM diskgroup DATA

Create the volume MYXFSVOL on diskgroup DATA as grid:

grid@cbl:/home/grid/ [+ASM1] asmcmd volcreate -G DATA -s 20g MYXFSVOL
grid@cbl:/home/grid/ [+ASM1] asmcmd volinfo --all
Diskgroup Name: DATA

	 Volume Name: MYXFSVOL
	 Volume Device: /dev/asm/myxfsvol-187
	 State: ENABLED
	 Size (MB): 20480
	 Resize Unit (MB): 64
	 Redundancy: UNPROT
	 Stripe Columns: 8
	 Stripe Width (K): 1024
	 Usage: 
	 Mountpath: 

Create the xfs-filesystem as root:

[root@cbl ~]# mkdir /u01/app/oracle/myxfs
[root@cbl ~]# mkfs.xfs /dev/asm/myxfsvol-187
meta-data=/dev/asm/myxfsvol-187  isize=256    agcount=4, agsize=1310720 blks
         =                       sectsz=4096  attr=2, projid32bit=1
         =                       crc=0        finobt=0, sparse=0
data     =                       bsize=4096   blocks=5242880, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
log      =internal log           bsize=4096   blocks=2560, version=2
         =                       sectsz=4096  sunit=1 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
[root@cbl ~]# mount -t xfs /dev/asm/myxfsvol-187 /u01/app/oracle/myxfs
[root@cbl ~]# chown oracle:oinstall /u01/app/oracle/myxfs

2. Prepare the CDB for Snapshot Copy

SQL> connect / as sysdba
Connected.
SQL> show parameter clonedb

NAME				     TYPE	 VALUE
------------------------------------ ----------- ------------------------------
clonedb 			     boolean	 FALSE
clonedb_dir			     string

SQL> alter system set clonedb=true scope=spfile;

System altered.

and restart the DB:

oracle@cbl:/home/oracle/ [DB0111 (CDB$ROOT)] srvctl stop database -db DB0111_zrh13m
oracle@cbl:/home/oracle/ [DB0111 (CDB$ROOT)] srvctl start database -db DB0111_zrh13m

3. Create a PDB as a Read-Only master for the Snapshot Copy

SQL> create pluggable database pdb2 admin user pdbadmin identified by "manager";

Pluggable database created.

SQL> alter pluggable database pdb2 open;

Pluggable database altered.

SQL> alter session set container=pdb2;

Session altered.

SQL> create user cbleile identified by "in_reality_I_used_something_different";

User created.

SQL> grant connect, resource, dba to cbleile;

Grant succeeded.

SQL> create table cbleile.t1 (n number, m varchar2(100));

Table created.

SQL> insert into cbleile.t1 values (1,'Before Snapshot Copy created');

1 row created.

SQL> commit;

Commit complete.

SQL> connect / as sysdba
Connected.
SQL> alter pluggable database pdb2 close immediate;

Pluggable database altered.

SQL> alter pluggable database pdb2 open read only;

Pluggable database altered.

Here the file names of my read-only opened PDB:

SQL> select file_name from cdb_data_files where con_id=5;

FILE_NAME
-------------------------------------------------------------------------------------------
+DATA/DB0111_ZRH13M/C72C15BBCD038123E0530300000A0FAF/DATAFILE/system.288.1077982211
+DATA/DB0111_ZRH13M/C72C15BBCD038123E0530300000A0FAF/DATAFILE/sysaux.289.1077982211
+DATA/DB0111_ZRH13M/C72C15BBCD038123E0530300000A0FAF/DATAFILE/undotbs1.287.1077982211
+DATA/DB0111_ZRH13M/C72C15BBCD038123E0530300000A0FAF/DATAFILE/undotbs1_temp.286.1077982211

4. Create the Snapshot Copy:

SQL> create pluggable database pdb2sc from pdb2 snapshot copy file_name_convert=('+DATA/DB0111_ZRH13M/C72C15BBCD038123E0530300000A0FAF','/u01/app/oracle/myxfs');
create pluggable database pdb2sc from pdb2 snapshot copy file_name_convert=('+DATA/DB0111_ZRH13M/C72C15BBCD038123E0530300000A0FAF','/u01/app/oracle/myxfs')
*
ERROR at line 1:
ORA-65169: error encountered while attempting to copy file
+DATA/DB0111_ZRH13M/C72C15BBCD038123E0530300000A0FAF/DATAFILE/undotbs1_temp.286.1077982211
ORA-17515: Creation of clonedb failed using snapshot file
+DATA/DB0111_ZRH13M/C72C15BBCD038123E0530300000A0FAF/DATAFILE/undotbs1_temp.286.1077982211
ORA-17637: failed to change permission on file
'+DATA/DB0111_ZRH13M/C72C15BBCD038123E0530300000A0FAF/DATAFILE/undotbs1_temp.286.1077982211'
ORA-15304: operation requires ACCESS_CONTROL.ENABLED attribute to be TRUE

So obviously Oracle tried to set the files of the read only started PDB PDB2 explicitely to read only on ASM-level.

OK, let’s enable ACCESS_CONTROL:

grid@cbl:/home/grid/ [+ASM1] asmcmd
ASMCMD> setattr -G DATA access_control.enabled TRUE

It still doesn’t work, but now I get a permission denied error:

SQL> create pluggable database pdb2sc from pdb2 snapshot copy file_name_convert=('+DATA/DB0111_ZRH13M/C72C15BBCD038123E0530300000A0FAF','/u01/app/oracle/myxfs');
create pluggable database pdb2sc from pdb2 snapshot copy file_name_convert=('+DATA/DB0111_ZRH13M/C72C15BBCD038123E0530300000A0FAF','/u01/app/oracle/myxfs')
*
ERROR at line 1:
ORA-65169: error encountered while attempting to copy file
+DATA/DB0111_ZRH13M/C72C15BBCD038123E0530300000A0FAF/DATAFILE/undotbs1_temp.286.1077982211
ORA-17515: Creation of clonedb failed using snapshot file
+DATA/DB0111_ZRH13M/C72C15BBCD038123E0530300000A0FAF/DATAFILE/undotbs1_temp.286.1077982211
ORA-17637: failed to change permission on file
'+DATA/DB0111_ZRH13M/C72C15BBCD038123E0530300000A0FAF/DATAFILE/undotbs1_temp.286.1077982211'
ORA-15260: permission denied on ASM disk group

ASMCMD> pwd
+DATA/DB0111_ZRH13M/C72C15BBCD038123E0530300000A0FAF/DATAFILE
ASMCMD> ls --permission 
User  Group  Permission  Name
              rw-rw-rw-  SYSAUX.289.1077982211
              rw-rw-rw-  SYSTEM.288.1077982211
              rw-rw-rw-  UNDOTBS1.287.1077982211
              rw-rw-rw-  UNDOTBS1_TEMP.286.1077982211

So I have to create the User oracle in ASM for diskgroup DATA:

ASMCMD> mkusr DATA oracle

And then set the owner of the PDB2-datafiles to oracle:

ASMCMD> chown oracle SYSAUX.289.1077982211 SYSTEM.288.1077982211 UNDOTBS1.287.1077982211 UNDOTBS1_TEMP.286.1077982211
ASMCMD> ls --permission
User                                     Group  Permission  Name
oracle@0a5d56d4388b4f7bbf6da84da547ae9e          rw-rw-rw-  SYSAUX.289.1077982211
oracle@0a5d56d4388b4f7bbf6da84da547ae9e          rw-rw-rw-  SYSTEM.288.1077982211
oracle@0a5d56d4388b4f7bbf6da84da547ae9e          rw-rw-rw-  UNDOTBS1.287.1077982211
oracle@0a5d56d4388b4f7bbf6da84da547ae9e          rw-rw-rw-  UNDOTBS1_TEMP.286.1077982211
ASMCMD> 

Let’s do it for the tempfile as well:

ASMCMD> cd ../TEMPFILE
ASMCMD> ls
TEMP.290.1077982223
ASMCMD> chown oracle TEMP.290.1077982223
ASMCMD> ls --permission
User                                     Group  Permission  Name
oracle@0a5d56d4388b4f7bbf6da84da547ae9e          rw-rw-rw-  TEMP.290.1077982223
ASMCMD> 

Finally I just created the necessesary folders:

SQL> !mkdir /u01/app/oracle/myxfs/DATAFILE/
SQL> !mkdir /u01/app/oracle/myxfs/TEMPFILE/

And now it worked:

SQL> create pluggable database pdb2sc from pdb2 snapshot copy file_name_convert=('+DATA/DB0111_ZRH13M/C72C15BBCD038123E0530300000A0FAF','/u01/app/oracle/myxfs');

Pluggable database created.

SQL> alter pluggable database PDB2SC open;

Pluggable database altered.

SQL> show pdbs

    CON_ID CON_NAME			  OPEN MODE  RESTRICTED
---------- ------------------------------ ---------- ----------
	 2 PDB$SEED			  READ ONLY  NO
	 3 PDB1 			  READ WRITE NO
	 5 PDB2 			  READ ONLY  NO
	 6 PDB2SC			  READ WRITE NO

That looks promising. Let’s try it:

SQL> alter session set container=PDB2SC;

Session altered.

SQL> set lines 200 pages 999
SQL> column m format a32
SQL> select * from cbleile.t1;

         N M
---------- --------------------------------
         1 Before Snapshot Copy created

SQL> 
SQL> column file_name format a70
SQL> select file_name from dba_data_files;

FILE_NAME
----------------------------------------------------------------------
/u01/app/oracle/myxfs/DATAFILE/system.288.1077982211
/u01/app/oracle/myxfs/DATAFILE/sysaux.289.1077982211
/u01/app/oracle/myxfs/DATAFILE/undotbs1.287.1077982211
/u01/app/oracle/myxfs/DATAFILE/undotbs1_temp.286.1077982211

SQL> 
SQL> !ls -lhs /u01/app/oracle/myxfs/DATAFILE
total 196K
116K -rw-r----- 1 oracle asmadmin 401M Jul 15 16:02 sysaux.289.1077982211
 48K -rw-r----- 1 oracle asmadmin 291M Jul 15 16:02 system.288.1077982211
 16K -rw-r----- 1 oracle asmadmin  46M Jul 15 16:02 undotbs1.287.1077982211
 16K -rw-r----- 1 oracle asmadmin 3.2M Jul 15 16:02 undotbs1_temp.286.1077982211

SQL>

–> Cool, the files are only a couple of KB in size.

SQL> insert into cbleile.t1 values (2,'After creating the Snapshot Copy');

1 row created.

SQL> commit;

Commit complete.

SQL> select * from cbleile.t1;

         N M
---------- --------------------------------
         1 Before Snapshot Copy created
         2 After creating the Snapshot Copy

SQL> 

Summary: You can create a Snapshot Copy Pluggable Database on top of ASM by using a filesystem which supports sparse files. If your read-only source-PDB is on ASM then you have to enable access control in ASM and set oracle as the owner of the datafiles of the read-only source-PDB.

REMARK: Above tests with XFS have been done on a single instance setup. Running your CDB in RAC then ACFS as the filesystem for the Snapshot Copy PDB should to be used.

For completness below the steps to create the Snapshot Copy on top of ACFS (as I used the same Read-Only source-PDB, I didn’t have to set permissions again):

grid@cbl:/home/grid/ [+ASM1] asmcmd volcreate -G DATA -s 20g MYACFSVOL
grid@cbl:/home/grid/ [+ASM1] asmcmd volinfo --all
Diskgroup Name: DATA

	 Volume Name: MYACFSVOL
	 Volume Device: /dev/asm/myacfsvol-187
	 State: ENABLED
	 Size (MB): 20480
	 Resize Unit (MB): 64
	 Redundancy: UNPROT
	 Stripe Columns: 8
	 Stripe Width (K): 1024
	 Usage: 
	 Mountpath: 
 
grid@cbl:/home/grid/ [+ASM1] 

[root@cbl ~]# mkdir /u01/app/oracle/myacfs
[root@cbl ~]# mkfs.acfs /dev/asm/myacfsvol-187
mkfs.acfs: version                   = 21.0.0.0.0
mkfs.acfs: on-disk version           = 50.0
mkfs.acfs: volume                    = /dev/asm/myacfsvol-187
mkfs.acfs: volume size               = 21474836480  (  20.00 GB )
mkfs.acfs: file system size          = 21474836480  (  20.00 GB )
mkfs.acfs: Format complete.
[root@cbl ~]# 


[root@cbl ~]# mount -t acfs /dev/asm/myacfsvol-187 /u01/app/oracle/myacfs
[root@cbl ~]# chown oracle:oinstall /u01/app/oracle/myacfs

oracle@cbl:/home/oracle/ [DB0111 (CDB$ROOT)] sqlplus / as sysdba

SQL> !ls -l /u01/app/oracle/myacfs
total 64
drwx------ 2 root root 65536 Jul 15 17:30 lost+found

SQL> !mkdir /u01/app/oracle/myacfs/DATAFILE

SQL> !mkdir /u01/app/oracle/myacfs/TEMPFILE

SQL> create pluggable database pdb2scacfs from pdb2 snapshot copy file_name_convert=('+DATA/DB0111_ZRH13M/C72C15BBCD038123E0530300000A0FAF','/u01/app/oracle/myacfs');

Pluggable database created.

SQL> alter pluggable database pdb2scacfs open;

Pluggable database altered.

SQL> show pdbs

    CON_ID CON_NAME			  OPEN MODE  RESTRICTED
---------- ------------------------------ ---------- ----------
	 2 PDB$SEED			  READ ONLY  NO
	 3 PDB1 			  READ WRITE NO
	 4 PDB2SCACFS			  READ WRITE NO
	 5 PDB2 			  READ ONLY  NO
	 6 PDB2SC			  READ WRITE NO

SQL> !ls -lhs /u01/app/oracle/myacfs/DATAFILE
total 208K
116K -rw-r----- 1 oracle asmadmin 401M Jul 15 17:34 sysaux.289.1077982211
 52K -rw-r----- 1 oracle asmadmin 291M Jul 15 17:34 system.288.1077982211
 20K -rw-r----- 1 oracle asmadmin  46M Jul 15 17:34 undotbs1.287.1077982211
 20K -rw-r----- 1 oracle asmadmin 3.2M Jul 15 17:34 undotbs1_temp.286.1077982211

SQL> alter session set container=PDB2SCACFS;

Session altered.

SQL> set lines 200 pages 999
SQL> column m format a50
SQL> select * from cbleile.t1;

	 N M
---------- --------------------------------------------------
	 1 Before Snapshot Copy created

SQL> insert into cbleile.t1 values (2,'After creating a Snapshot Copy on top of ACFS');

1 row created.

SQL> commit;

Commit complete.

SQL> select * from cbleile.t1;

	 N M
---------- --------------------------------------------------
	 1 Before Snapshot Copy created
	 2 After creating a Snapshot Copy on top of ACFS

SQL> 

Cet article Creating an Oracle Snapshot Copy PDB on top of ASM est apparu en premier sur Blog dbi services.

Automate restore from Rubrik with PowerShell (part 2)

Fri, 2021-07-16 07:06

As promised during my last Rubrik blog post I’m writing a second one to explain how to:

  • log your different steps in a file
  • add an integrity check after the restore
  • and also improve my first scripting

I will start by the end and improve my first scripts by creating functions for each step of the process.
With Rubrik you can have multiple appliances (servers), it means that each server/instance of you SQL Server environment will be dispatched on those servers and backup/restore operations have to take place in the good Rubrik server.
To connect to multiple Rubrik servers I created the Connect-dbiRubrik function:

Function Connect-dbiRubrik(){
	param(
		[Parameter(Mandatory=$true)]
        $RubrikServer,
		[PSCredential]
		$Credential
	)
    #Rubrik connection
	If (-not $Credential){
		$Credential = Get-Credential;
	}
		$Connection = Connect-Rubrik -Server $RubrikServer -Credential $Credential;

    return $Connection;
}

This function will help us to find our protected databases on all Rubrik appliances.
To get also the databases from Rubrik a new function has been created with, as mandatory parameter, an instance name and ,as optional parameters, a database name and a switch to add or not the system databases. This function will retrieve the instance databases part or not of an AlwaysOn Availability Group and fill a custom object to store some information about the database:

function Get-dbiRubrikDatabase(){
	param(
		[CmdletBinding()]
		[Parameter(Mandatory=$true)]
		[string[]]
        $SqlInstance,
		[String]
        $Database,
        [switch]
        $IncludeSystemDatabases
	)
	
	$DatabaseList = [System.Collections.ArrayList] @();
	
	@($SqlInstance) | Foreach-Object -Process {
		$ServerName = $_;
		## StandAlone databases
		$Command = 'Get-RubrikDatabase -ServerInstance $ServerName';

		If ($Database){
			$Command += ' -Database $Database';
		}

        If ($IncludeSystemDatabases) {
            $Command += ' | Where-Object isrelic -eq $false';
            }
        Else {
            $Command += ' | Where-Object {($_.isrelic -eq $false) -and ($_.name -notin ("master","msdb","model"))} '; 
            }
		
		Write-Debug -Message "Command to execute for StandAlone databases: $($Command)";
		$StandAloneDatabases = Invoke-Expression -Command $Command;
		
		@($StandAloneDatabases) | Foreach-Object -Process { `
			$PSDatabase = [pscustomobject] @{
				ServerName     	= "$($ServerName)"
				DatabaseName   	= "$($_.Name)"
				SLA 			= "$($_.effectiveSlaDomainName)"
				DatabaseID  	= "$($_.id)"
				ParentID  	    = "$($_.instanceId)"
				RubrikServer    = "$($rubrikConnection.Server)"
			}; `
			$Null = $DatabaseList.Add($PSDatabase); `
			$PSDatabase = $Null; `
		};
		$StandAloneDatabases 	= $Null;
		$Command 				= $Null;
		
		$AAGs = Get-dbaAvailabilityGroup -SqlInstance $ServerName -WarningAction SilentlyContinue ;


		If ($AAGs){
			@($AAgs) | Foreach-Object -Process {

                $Command = 'Get-RubrikDatabase -AvailabilityGroupName $_.AvailabilityGroup';
                
                If ($Database){
			        $Command += ' -Database $Database';
		        } 
                $Command += ' | Where-Object isrelic -eq $false -WarningAction SilentlyContinue';
                
				$AAGDatabases = Invoke-Expression -Command $Command;
                $Command = $null;

				@($AAGDatabases) | Foreach-Object -Process { `
					$PSDatabase = [pscustomobject] @{
						ServerName     	= "$($ServerName)"
						DatabaseName   	= "$($_.Name)"
						SLA 			= "$($_.effectiveSlaDomainName)"
						DatabaseID  	= "$($_.id)"
						ParentID  		= "$($_.availabilityGroupId)"
				        RubrikServer    = "$($rubrikConnection.Server)"
					}; `
					$Null = $DatabaseList.Add($PSDatabase); `
					$PSDatabase = $Null; `
				};
				$AAGDatabases 	= $Null;
			};
		};
		
		$ServerName = $Null;
	};
		
	return $DatabaseList;
}

Now, to log the required information during our restore test we need to create a log file and to populate it. This log file will contain the instance name with current date and time.
A function to create such a file has been also created to remove for example the possible \ in the instance name and to add the timestamp:

function Convert-dbiInstanceToFileName(){
	param(
		[Parameter(Mandatory=$true)]
        $SqlInstance,
		[Parameter(Mandatory=$true)]
        $Title
	)
	$SqlInstance_Str = $SqlInstance.Replace('\','_');
	return "$($Title)_$($SqlInstance_Str)" + '_' + (((Get-Date) -f 'yyyyMMddHHmmss') -replace '\D', '') + '.txt';
}

Once the file name is generated we can create the file and write the desired information in it. The new function called Out-dbiLog will do that:

function Out-dbiLog() {
param(
[Parameter(Mandatory=$true)]
[String] $Message,
[Parameter(Mandatory=$true)]
[String] $LogPath
)

If (-not (Test-Path -Path $LogPath)) {
Try{
$Null = New-Item -Path $LogPath -ItemType 'file';

'['+(Get-Date -f 'yyyy-MM-dd HH:mm:ss') +'] ' + $Message | Out-File -FilePath $LogPath -Append;
}
catch {
Write-Host "Function 'Out-dbiLog': cannot create file located at $($LogPath)"  -ForegroundColor DarkRed -BackgroundColor Black;
}
}
Else {
'['+(Get-Date -f 'yyyy-MM-dd HH:mm:ss') +'] ' + $Message | Out-File -FilePath $LogPath -Append;
}
}

After the database is restored we can run an integrity check to be sure that the database is in a good shape. I will just use the dbatools cmdlet Invoke-dbaQuery with the switch -MessagesToOutput:

Invoke-dbaQuery -SQLInstance $DestinationInstance -QueryTimeout 14400 -Database $destDBName -Query "DBCC CHECKDB WITH NO_INFOMSGS" -MessagesToOutput;

To finalize we can plan a scenario where we want to restore a complete production instance to a test instance, database by database to valid our backup solution, once done run an integrity check and after drop the restored database and log all those actions. Hereby the script to do that:

$SourceInstance = 'ProdInst';
$DestinationInstance = 'TestInst';
[System.Collections.ArrayList] $statusList = @();

$credential = Get-Credential;
$RubrikServers = ('168.22.2.3','168.22.2.4');

#LOG
$LogFileName = Convert-dbiInstanceToFileName -SqlInstance $SourceInstance -Title 'Restore_Test';
$LogPath = 'c:\dbi\TestRestore\';
Out-dbiLog -Message "Test Restore from source instance $SourceInstance to destination instance $DestinationInstance " -LogPath "$LogPath$LogFileName";

#Retrieve all databases on both Rubrik servers
$Dbs = @();
foreach ($RubrikServer in $RubrikServers) {
    $connect = Connect-dbiRubrik -RubrikServer $RubrikServer -Credential $credential;

    $Dbs += Get-dbiRubrikDatabase -SqlInstance $SourceInstance;
}

#Remove the database with unprotected SLA 
$Dbs = $Dbs | Where-Object SLA -ne 'UNPROTECTED';

#Restore the selected databases
foreach ($Db in $Dbs) {
    #LOG
    Out-dbiLog -Message "Start to restore database $($db.DatabaseName)" -LogPath "$LogPath$LogFileName";

    #Change database name on destination instance to visualize it quickly
    $destDBName = 'Rest_' + $Db.DatabaseName;

    #need to connect to the Rubrik server where the database is protected
    $connect = Connect-dbiRubrik -rubrikserver $Db.RubrikServer -Credential $credential;

    $Restore = Restore-dbiRubrikDatabase -Source $SourceInstance -Destination $DestinationInstance -DatabaseName $db.DatabaseName -NewDatabaseName $destDBName -TargetRestoreTime $date;

    #Loop until restore is finished
    $progress = 0;
    $RestoreID = $Restore.ID
    Do
    {
        #Search status of the restore
        $status = (Get-RubrikRequest -id $RestoreID -Type 'mssql'); 
        Write-Debug $status.status;

        #show a progress bar for the restore
        If ($status.status -eq 'RUNNING') {
            $progress = $status.progress; 
            }
        else {
            $progress = 100;
        }
        Write-Progress -Activity Restoring -Status $status.status -PercentComplete $progress -CurrentOperation "Restore database $destDBName"
        
        Start-Sleep -Seconds 1;
    } Until ($status.status -in ('SUCCEEDED','FAILED'));

    $null = $statusList.Add($status);

    If ($status.status -eq 'SUCCEEDED') {
        #LOG
        Out-dbiLog -Message "Database $($db.DatabaseName) restored successfully" -LogPath "$LogPath$LogFileName";

        #run a dbbc checkdb
        #QueryTimeout of 4h
        $DBCC = Invoke-dbaQuery -SQLInstance $DestinationInstance -QueryTimeout 14400 -Database $destDBName -Query "DBCC CHECKDB WITH NO_INFOMSGS" -MessagesToOutput;
        
        #LOG
        If ([string]::IsNullOrEmpty($DBCC)){
            Out-dbiLog -Message "Integrity Check for database $($db.DatabaseName) done successfully" -LogPath "$LogPath$LogFileName";
            }
        Else {
            Out-dbiLog -Message "Integrity Check for database $($db.DatabaseName) failed" -LogPath "$LogPath$LogFileName";
            }

        #drop database after
        $drop = Remove-DbaDatabase -SqlInstance $DestinationInstance -Database $destDBName -Confirm:$false;
    }

}

$statusList | FT status, starttime, endtime, error;

As you may have noticed one function is missing, Restore-dbiRubrikDatabase, I will point out this one in another blog post.
I hope those script could help.
Happy scripting.

Cet article Automate restore from Rubrik with PowerShell (part 2) est apparu en premier sur Blog dbi services.

SQL Server: table variables performance limitations

Wed, 2021-07-14 11:39

Doing some performance troubleshooting for one of my customers I identified some issues with very large table variables inside Stored procedures.
Table variables limitations are not well understood by developers although they are now well documented.

Table variable rows estimation

Let’s have a look at an example with my customer context which is SQL Server 2016, so compatibility level 130 at the database level. You can reproduce this demo with the Wide World Importers database.

create table #Orders (
	[OrderID] [int] NOT NULL,
	[CustomerID] [int] NOT NULL,
	[SalespersonPersonID] [int] NOT NULL,
	[PickedByPersonID] [int] NULL,
	[ContactPersonID] [int] NOT NULL,
	[BackorderOrderID] [int] NULL,
	[OrderDate] [date] NOT NULL,
	[ExpectedDeliveryDate] [date] NOT NULL,
	[CustomerPurchaseOrderNumber] [nvarchar](20) NULL,
	[IsUndersupplyBackordered] [bit] NOT NULL,
	[Comments] [nvarchar](max) NULL,
	[DeliveryInstructions] [nvarchar](max) NULL,
	[InternalComments] [nvarchar](max) NULL,
	[PickingCompletedWhen] [datetime2](7) NULL,
	[LastEditedBy] [int] NOT NULL,
	[LastEditedWhen] [datetime2](7) NOT NULL
);
insert into @Orders
	select top 50000 * from Sales.Orders;

select CustomerID, MAX(OrderDate) AS LastOrderDate
from @Orders
group by CustomerID;

This simple query touching all 50000 rows of the table variable get an estimated number of rows of 1.
This is always the case with table variables unless you use some tricks like the option Recompile. As we can see in the plan the query is spilling to TempDB so the performance impact can be significant with a large amount of data.

Estimates are also bad for queries with predicates in the WHERE clause or JOIN as you can see. The estimated number of rows is always 1.

select CustomerID, OrderID, OrderDate 
from @Orders
where CustomerID = 89;

Table variable with compatibility level 150 (SQL Server 2019)

Things were improved with SQL Server 2019 and the introduction of Table variable deferred compilation.
The query optimizer is now aware of the table variable cardinality and can better estimate some queries.

The original query now executed with a 150 compatibility got the correct estimated number of rows, no TempDB spill, and got its execution time (which is very fast already for this demo) divided by two.

Estimation for queries with predicates is still not perfect with SQL Server 2019 because we still don’t get statistics on columns. Based only on table cardinality the optimizer can estimate the number of rows for such query. It estimated 224 instead of 84 actual rows. Not perfect but it can sometimes create a way better plan than the forced “1” value of previous versions.

Temp tables estimation

Rewriting table variables to temporary tables can improve cardinality estimation regardless of the database compatibility level.

Here are the queries using a Temp table:

drop table if exists #Orders;
create table #Orders (
	[OrderID] [int] NOT NULL,
	[CustomerID] [int] NOT NULL,
	[SalespersonPersonID] [int] NOT NULL,
	[PickedByPersonID] [int] NULL,
	[ContactPersonID] [int] NOT NULL,
	[BackorderOrderID] [int] NULL,
	[OrderDate] [date] NOT NULL,
	[ExpectedDeliveryDate] [date] NOT NULL,
	[CustomerPurchaseOrderNumber] [nvarchar](20) NULL,
	[IsUndersupplyBackordered] [bit] NOT NULL,
	[Comments] [nvarchar](max) NULL,
	[DeliveryInstructions] [nvarchar](max) NULL,
	[InternalComments] [nvarchar](max) NULL,
	[PickingCompletedWhen] [datetime2](7) NULL,
	[LastEditedBy] [int] NOT NULL,
	[LastEditedWhen] [datetime2](7) NOT NULL
);

insert into #Orders
	select top 50000 * from Sales.Orders;

select CustomerID, MAX(OrderDate) AS LastOrderDate
from #Orders
group by CustomerID;

Conclusion

Table variables are fine for small data sets. Performance issues can happen with large numbers of rows and queries using JOINs or WHERE clauses.
If rewriting the code is a possibility, using a temporary table will improve the query plan.
Temporary table performances are more predictable with columns statistics.
The query optimizer has been improved for table variables so in some cases you might get faster execution times just by upgrading to SQL Server 2019.

Cet article SQL Server: table variables performance limitations est apparu en premier sur Blog dbi services.

Setup pgpass for replication

Tue, 2021-07-13 07:18

This script provides functionality to setup passwordless authentication in PostgreSQL for defined Server Names.
This is very helpful by using replication via repmgr and is not Linux Distribution specific, it was used till now with SLES 12, SLES 15 and RHEL 8 Clones.

As all of my scripts it provides help how the usage is with -h parameter:

$ [[root@rocky ~]$ sh pgpass.sh -h
$ [[root@rocky ~]$ Usage:
$ [[root@rocky ~]$ pgpass.sh [OPTION]
$ [[root@rocky ~]$ 
$ [[root@rocky ~]$ Options:
$ [[root@rocky ~]$          -p                     server where the primary host is running on (required)
$ [[root@rocky ~]$          -s                     server where the secondary host is running on (required)
$ [[root@rocky ~]$          -h                     prints this help

The parameters -p and -s should be server DNS Names without domain, the domain will be specified within the script.

The script itself:


!/bin/sh

########################################
#  pgpass setup script                 #
#  Author: Karsten Lenz / 2020.07.13   #
########################################

progName=$(basename $0)
domain=put your domain here
postgresHome=/var/lib/pgsql
pgpass=$postgresHome/.pgpass
password=put your password here

function printHelp() {
  printf "Usage:\n"
  printf "${progName} [OPTION]\n\n"
  printf "Options:\n"
  printf "\t -p \t\t\tserver where the primary host is running on (required)\n"
  printf "\t -s \t\t\tserver where the secondary host is running on (required)\n"
  printf "\t -h \t\t\t\tprints this help\n"
}

while getopts p:s:h option 2>/dev/null
do
  case "${option}"
  in
  p) primServer=${OPTARG};;
  s) secdServer=${OPTARG};;
  h) printHelp; exit 2;;
  *) printf "Unsupported option or parameter value missing '$*'\n";
     printf "Run ${progName} -h to print help\n"; exit 1;;
  esac
done

############ Log function ############

logFile=/tmp/pgpass_install.log

function log() {
  echo "$(date +%Y.%m.%d-%H:%M:%S) [$$]$*" | tee -a $logFile
}

if [ -f $logFile ]; then
  continue
else
  touch $logFile
  chmod -R 774 $logFile
  sleep 2
fi

#clean .pgpass
rm -f $pgpass

#set values in .pgpass
log "INFO: #host:port:database:user:password in $pgpass"
echo "#host:port:database:user:password" | tee -a $pgpass
log "INFO: Setting localhost in $pgass"
echo "localhost:5432:*:repmgr:$password" | tee -a $pgpass
log "INFO: Setting 127.0.0.1 in $pgpass"
echo "127.0.0.1:5432:*:repmgr:$password" | tee -a $pgpass
log "INFO: Setting Primary $primServer in $pgpass"
echo "$primServer.$domain:5432:*:repmgr:$password" | tee -a $pgpass
log "INFO: Setting Primary $secdServer in $pgpass"
echo "$secdServer.$domain:5432:*:repmgr:$password" | tee -a $pgpass

#set .pgpass 0600
chmod 0600 $pgpass

The script is used in a customer build cloud solution as part of recurrent setup step of replication using repmgr.

Cet article Setup pgpass for replication est apparu en premier sur Blog dbi services.

SQL Server 2019: What’s new on database scoped configuration options?

Tue, 2021-07-06 08:25

SQL Server 2019 added three new options in sp_configure and sys.configurations.

First, how can we find the difference between these SQL Server Versions.
This simple query will give us the number of options and the SQL Server Version:

select count(*) as number_of_configurations, @@version FROM sys.database_scoped_configurations

In SQL server 2016, we have 4 parameters for the database configuration:

In SQL server 2017, we have 5 parameters for the database configuration:

In SQL server 2019, we have 23 parameters for the database configuration:

As you can see, we have ~ 5 time more parameters in SQL Server 2019.
Go deeper into these parameters with the query:

select * from sys.database_scoped_configurations

In SQL Server 2016, you have the paramerters MAXDOP, LEGACY_CARDINALITY_ESTIMATION, PARAMETER_SNIFFING and QUERY_OPTIMIZER_HOTFIXES. In SQL Server 2017, IDENTITY_CACHE is added and after in SQL Server 2019, 18 more parameters.

I will go through all parameters in this blog.
The goal is to discover and use these parameters and no more the old way…
The details of these parameters is here.

Many of these parameters are also only on the instance configuration and now they are on the database level like for example OPTIMIZE_FOR_AD_HOC_WORKLOADS

If I take this example,
On the server level, we use the query and all databases are impacted with this change:

SP_CONFIGURE 'Show Advanced Options', 1
GO
RECONFIGURE
GO
SP_CONFIGURE 'optimize for ad hoc workloads', 1
GO
RECONFIGURE
GO

On the database level, we use the query:

ALTER DATABASE [Database] SCOPED CONFIGURATION
SET OPTIMIZE_FOR_AD_HOC_WORKLOADS = ON;

It’s very important to know that we have new options on the database level and how to set these options.

I hope this little blog will help you to migrate to SQL Server 2019 and use these options at the database level.

Cet article SQL Server 2019: What’s new on database scoped configuration options? est apparu en premier sur Blog dbi services.

Creating simple extensions for PostgreSQL

Fri, 2021-07-02 08:44

When you are using PostgreSQL you should already know that PostgreSQL comes with a set of extensions by default. It might be that you need to install an additional package if you installed PostgreSQL with a package manager to get those extensions. Usually it is called something with “contrib” in its name. There is also the PostgreSQL Extension Network which list a lot of external extensions that might be useful for you. If you can’t find what you’re looking for, you can still write your own extension. How to do that is the topic of this post: We’ll create a very simple extension for PostgreSQL.

Lets assume you always need a table containing the list of countries with a few properties like this:

postgres=# create table countries ( id int primary key, name text, alpha2 text, alpha3 text );
CREATE TABLE
postgres=# copy countries 
           from program 'curl https://raw.githubusercontent.com/stefangabos/world_countries/master/data/en/countries.csv'
           with (header true, delimiter ',', format csv);
COPY 193

This gives as 193 countries and each country has a few properties:

postgres=# select * from countries limit 5;
 id |    name     | alpha2 | alpha3 
----+-------------+--------+--------
  4 | Afghanistan | af     | afg
  8 | Albania     | al     | alb
 12 | Algeria     | dz     | dza
 20 | Andorra     | ad     | and
 24 | Angola      | ao     | ago
(5 rows)

Packaging this into an extension is quite easy. You need a so called control file that provides some basic information about your extension. In my case it looks like this and I’ve placed it where all the other extensions are as well:

postgres@debian11:/u01/app/postgres/product/DEV/db_1/share/extension/ [pg15] pwd
/u01/app/postgres/product/DEV/db_1/share/extension
postgres@debian11:/u01/app/postgres/product/DEV/db_1/share/extension/ [pg15] cat my_cool_extension.control
# my cool extension control file
comment = 'my cool extension for providing a table with countries'
default_version = '1.0'
relocatable = false
trusted = true

Now we need the table and the data and this goes into a simple SQL file:

postgres@debian11:/home/postgres/ [pg15] pg_dump --column-inserts --table=countries postgres > /u01/app/postgres/product/DEV/db_1/share/extension/my_cool_extension--1.0.sql
postgres@debian11:/home/postgres/ [pg15] psql -c "drop table countries" postgres

If you look at the default extensions they all have the first two lines of the SQL files like e.g. this:

-- complain if script is sourced in psql, rather than via CREATE EXTENSION
\echo Use "CREATE EXTENSION ltree_plpython2u" to load this file. \quit

This prevents that the file can directly be piped into psql and we should do the same. Only “create extension” should be used to create extensions:

postgres@debian11:/home/postgres/ [pgdev] sed -i '1s/^/-- complain if script is sourced in psql, rather than via CREATE EXTENSION/' /u01/app/postgres/product/DEV/db_1/share/extension/my_cool_extension--1.0.sql
postgres@debian11:/home/postgres/ [pgdev] sed -i '2s/^/\\echo Use "CREATE EXTENSION my_cool_extension" to load this file. \\quit\n/' /u01/app/postgres/product/DEV/db_1/share/extension/my_cool_extension--1.0.sql

That’s all you need to do. Having these two files in place the extension is recognized by PostgreSQL:

postgres=# select * from pg_available_extensions where comment like 'my cool%';
       name        | default_version | installed_version |                       comment                        
-------------------+-----------------+-------------------+------------------------------------------------------
 my_cool_extension | 1.0             |                   | my cool extension for providing a table of countries
(1 row)

Now we can install it as usual:

postgres=# create extension my_cool_extension;
CREATE EXTENSION
postgres=# \dx
                                  List of installed extensions
       Name        | Version |   Schema   |                     Description                      
-------------------+---------+------------+------------------------------------------------------
 my_cool_extension | 1.0     | public     | my cool extension for providing a table of countries
 plpgsql           | 1.0     | pg_catalog | PL/pgSQL procedural language
(2 rows)
postgres=# select * from countries limit 5;
 id |    name     | alpha2 | alpha3 
----+-------------+--------+--------
  4 | Afghanistan | af     | afg
  8 | Albania     | al     | alb
 12 | Algeria     | dz     | dza
 20 | Andorra     | ad     | and
 24 | Angola      | ao     | ago
(5 rows)

Over time you might need to update your extension so lets assume in our case that there is a new country. What you need to do is create a new SQL file that does the necessary changes and update the default version in the control file:

postgres@debian11:/u01/app/postgres/product/DEV/db_1/share/extension/ [pgdev] pwd
/u01/app/postgres/product/DEV/db_1/share/extension
postgres@debian11:/u01/app/postgres/product/DEV/db_1/share/extension/ [pgdev] cat my_cool_extension--1.0--1.1.sql
-- complain if script is sourced in psql, rather than via ALTER EXTENSION
\echo Use "ALTER EXTENSION my_cool_extension UPDATE TO '1.1'" to load this file. \quit
-- the list of new countries
insert into countries (id,name,alpha2,alpha3) values (-1,'my new country','aaa','aaa');

postgres@debian11:/u01/app/postgres/product/DEV/db_1/share/extension/ [pgdev] grep default my_cool_extension.control 
default_version = '1.1'

Afterwards update the extension using alter extension:

postgres=# select * from pg_available_extensions where name = 'my_cool_extension';
       name        | default_version | installed_version |                       comment                        
-------------------+-----------------+-------------------+------------------------------------------------------
 my_cool_extension | 1.1             | 1.0               | my cool extension for providing a table of countries
(1 row)

postgres=# alter extension my_cool_extension update;
ALTER EXTENSION
postgres=# select count(*) from countries where id = -1;
 count 
-------
     1
(1 row)

All fine, ready to go.

Cet article Creating simple extensions for PostgreSQL est apparu en premier sur Blog dbi services.

What are these *.ready and *.done files for in PostgreSQL?

Fri, 2021-07-02 06:18

When you run PostgreSQL workloads in production you must have a backup and restore implementation. Even for development instances, which are like production for the developers, a well-tested backup and restore procedure sometimes must be in place. Community PostgreSQL comes with pg_basebackup to help you with creating a full consistent backup of your PostgreSQL cluster. If you want to be able to do point in time recovery (PITR) you need to archive the WAL segments which can later be used to roll forward from a base backup. WAL segments go to the pg_wal directory, but if you have closer look into this directory you will see more than just the WAL segments in case you have enabled archiving.

IF you look at a freshly initialized PostgreSQL cluster the content of pg_wal looks like this:

postgres@debian11:/home/postgres/ [pgdev] ls -la $PGDATA/pg_wal
total 16396
drwx------  3 postgres postgres     4096 Jul  4 11:47 .
drwx------ 20 postgres postgres     4096 Jul  4 11:47 ..
-rw-------  1 postgres postgres 16777216 Jul  4 11:52 000000010000000000000001
drwx------  2 postgres postgres     4096 Jul  4 11:47 archive_status

There is one WAL segment and a directory called “archived_status” which is empty right now:

postgres@debian11:/home/postgres/ [pgdev] ls -la $PGDATA/pg_wal/archive_status/
total 8
drwx------ 2 postgres postgres 4096 Jul  4 11:47 .
drwx------ 3 postgres postgres 4096 Jul  4 11:47 ..

To see what’s going on in this directory we need to enable archiving and tell PostgreSQL what do when a WAL segment needs to be archived:

postgres@debian11:/home/postgres/ [pgdev] mkdir /tmp/archived_wal
postgres@debian11:/home/postgres/ [pgdev] psql -c "alter system set archive_mode='on'" postgres
ALTER SYSTEM
postgres@debian11:/home/postgres/ [pgdev] psql -c "alter system set archive_command='test ! -f /tmp/archived_wal/%f && cp %p /tmp/archived_wal/%f'" postgres
ALTER SYSTEM

Of course you should not archive to /tmp and you also should not use cp, as this does not guarantee that the archived segment really is written to disk. For the scope of this post it is fine like that.

Changing the archive command can be done online, but enabling or disabling archiving requires a restart:

postgres@debian11:/home/postgres/ [pgdev] pg_ctl restart

Let’s see what happens if we force the current segment to be archived:

postgres@debian11:/home/postgres/ [pgdev] psql -c "select pg_switch_wal()" postgres
 pg_switch_wal 
---------------
 0/167AF68
(1 row)
postgres@debian11:/home/postgres/ [pgdev] ls -la /tmp/archived_wal/
total 16392
drwxr-xr-x  2 postgres postgres     4096 Jul  4 12:08 .
drwxrwxrwt 10 root     root         4096 Jul  4 12:07 ..
-rw-------  1 postgres postgres 16777216 Jul  4 12:08 000000010000000000000001

As expected we see the archived segment in the directory we specified in the archive command. In addition PostgreSQL generated a “.done” file in the archive_status directory in pg_wal:

postgres@debian11:/home/postgres/ [pgdev] ls -la $PGDATA/pg_wal/archive_status/
total 8
drwx------ 2 postgres postgres 4096 Jul  4 12:08 .
drwx------ 3 postgres postgres 4096 Jul  4 12:08 ..
-rw------- 1 postgres postgres    0 Jul  4 12:08 000000010000000000000001.done

So, whenever a segment was successfully archived you get the corresponding “.done” file. When do you see “.ready” files then? The answer is simple: Exactly when the opposite happens: Archiving of a segment failed for whatever reason:

postgres@debian11:/home/postgres/ [pgdev] psql -c "alter system set archive_command='/bin/false'" postgres
ALTER SYSTEM
postgres@debian11:/home/postgres/ [pgdev] psql -c "select pg_reload_conf()";
 pg_reload_conf 
----------------
 t
(1 row)
postgres@debian11:/home/postgres/ [pgdev] psql -c "select pg_switch_wal()" postgres
 pg_switch_wal 
---------------
 0/2000160
(1 row)
postgres@debian11:/home/postgres/ [pgdev] ls -la $PGDATA/pg_wal/archive_status/
total 8
drwx------ 2 postgres postgres 4096 Jul  4 12:13 .
drwx------ 3 postgres postgres 4096 Jul  4 12:12 ..
-rw------- 1 postgres postgres    0 Jul  4 12:13 000000010000000000000002.ready

This file will be there until archiving succeeds again:

postgres@debian11:/home/postgres/ [pgdev] psql -c "alter system set archive_command='test ! -f /tmp/archived_wal/%f && cp %p /tmp/archived_wal/%f'" postgres
ALTER SYSTEM
postgres@debian11:/home/postgres/ [pgdev] psql -c "select pg_reload_conf()";
 pg_reload_conf 
----------------
 t
(1 row)
postgres@debian11:/home/postgres/ [pgdev] ls -la $PGDATA/pg_wal/archive_status/
total 8
drwx------ 2 postgres postgres 4096 Jul  4 12:15 .
drwx------ 3 postgres postgres 4096 Jul  4 12:12 ..
-rw------- 1 postgres postgres    0 Jul  4 12:13 000000010000000000000002.done

When you see many “.ready” files this can either mean your archive command is currently failing or load on the system is so high, that archiving can not keep up.

Cet article What are these *.ready and *.done files for in PostgreSQL? est apparu en premier sur Blog dbi services.

SQL Server: Fixing another huge MSDB database – 80GB+

Thu, 2021-07-01 07:48

I have blogged several times about unusually large MSDB databases like here.
Most of the time the problem comes from the backup history which is never purged.
This time it’s different. Try to guess, if not the backup history, what can cause MSDB to increase abnormally in size?

Let’s start by looking at the total size of the database:

It’s huge. The database size is almost 90GB. The 10% configuration on data files causes large 8GB autogrowth increments.

Now let’s take a look at the storage used for each table:

The problem comes from emails sent by the application directly through Database Mail. These emails have a very large HTML body.

Here are some information about the emails stored in this msdb.

select CAST(MIN(send_request_date) AS DATE) AS oldestEmail, Count(*) AS nbRows
from msdb.dbo.sysmail_allitems
oldestEmail nbRows
----------- -----------
2020-04-06  43935

There aren’t a lot of emails. There’s only about one year of emails stored.

select Count(*) AS nbRows, AVG(DATALENGTH(body))/1048576 AS avgSizeMB
from msdb.dbo.sysmail_allitems
where DATALENGTH(body) > 1048576 -- 1MB
nbRows      avgSizeMB
----------- --------------------
297         287

There are almost 300 emails with an HTML body exceeding 1 Megabytes size. The average size for these 300 emails is a 287MB of HTML body.
This HTML size is obviously abnormal for an email and probably for any HTML file.

select SUM(IIF(sent_status='failed', 0, 1)) AS emailsSent
from msdb.dbo.sysmail_allitems
where DATALENGTH(body) > 1048576 -- 1MB
emailsSent
-----------
0

None of those large emails has been sent successfully…
Looking at the Database Mail log we can see the following error message:

The mail could not be sent to the recipients because of the mail server failure.
Exception Message: Cannot send mails to mail server. 
(Exceeded storage allocation. The server response was: 5.3.4 Message size exceeds fixed maximum message size).

Email size is often restricted to a few Megabytes to limit bandwidth usage.
These large emails have nowhere to go and are just bloating the msdb database.

I decided to delete old emails with a retention of 6 months.
To clean up this table we can use the following procedure: msdb.dbo.sysmail_delete_mailitems_sp

I used the following script to minimize the impact on the transaction log file.

use msdb
go
declare @retentionDate datetime = DATEADD(MONTH, -6, getdate())
declare @oldest_date datetime = (select min(send_request_date) from msdb.dbo.sysmail_allitems)
 
while (@oldest_date  < @retentionDate)
begin
    print 'sysmail_delete_mailitems_sp ' + CAST(@oldest_date AS varchar)
    exec msdb.dbo.sysmail_delete_mailitems_sp @oldest_date
 
    --  Delete by 1 week increments
    set @oldest_date = DATEADD(WEEK, 1, @oldest_date)
 
    checkpoint
	WAITFOR DELAY '00:00:10'
end

 

The application team has been informed so they can fix this email content issue.
I recommend checking the size of your system databases frequently to avoid any weird situation like this.

Cet article SQL Server: Fixing another huge MSDB database – 80GB+ est apparu en premier sur Blog dbi services.

New predefined roles for PostgreSQL 14

Thu, 2021-07-01 01:37

Time is moving fast and PostgreSQL 14 is already in beta 2. PostgreSQL 14 will ship with a lot of new features and in this post will look at a smaller one: There are three new predefined roles: pg_read_all_data, pg_write_all_data and pg_database_owner. While it seems to be obvious what the first two roles are about, the third one might sound strange at the beginning. Let’s have a look.

We’ll start with a sample setup, as always:

postgres=# create user u1 with login password 'u1';
CREATE ROLE
postgres=# create user u2 with login password 'u2';
CREATE ROLE
postgres=# create user u3 with login password 'u3';
CREATE ROLE
postgres=# create schema s1;
CREATE SCHEMA
postgres=# create schema s2;
CREATE SCHEMA
postgres=# alter schema s1 owner to u1;
ALTER SCHEMA
postgres=# alter schema s2 owner to u2;
ALTER SCHEMA
postgres=# \c postgres u1
You are now connected to database "postgres" as user "u1".
postgres=> create table s1.t1 ( a int, b text, c date);
CREATE TABLE
postgres=> insert into s1.t1 select i,i::text,now() from generate_series(1,1000000) i;
INSERT 0 1000000
ostgres=> \c postgres u2
You are now connected to database "postgres" as user "u2".
postgres=> create table s2.t1 ( a int, b text, c date);
CREATE TABLE
postgres=> insert into s2.t1 select i,i::text,now() from generate_series(1,1000000) i;
INSERT 0 1000000
postgres=> \c postgres postgres
You are now connected to database "postgres" as user "postgres".

We’ve created three users (u1,u2,u3) and two schemas (s1,s2). The s1 schema is owned by u1 and the s2 schema is owned by u2. Both users (u1,u2) have a table with some data in their schemas. The u3 user currently does not have anything.

When we connect as user u3 we of course do not have any permissions on the schemas we created above:

postgres=# \c postgres u3
You are now connected to database "postgres" as user "u3".
postgres=> select count(*) from s1.t1;
ERROR:  permission denied for schema s1
LINE 1: select count(*) from s1.t1;
                             ^
postgres=> select count(*) from s2.t1;
ERROR:  permission denied for schema s2
LINE 1: select count(*) from s2.t1;
                             ^
postgres=> \c postgres postgres
You are now connected to database "postgres" as user "postgres".

Granting pg_read_all_data to u3 gives as read access to all objects:

postgres=# grant pg_read_all_data to u3;
GRANT ROLE
postgres=# \c postgres u3
You are now connected to database "postgres" as user "u3".
postgres=> select count(*) from s1.t1;
  count  
---------
 1000000
(1 row)

postgres=> select count(*) from s2.t1;
  count  
---------
 1000000
(1 row)

Writing data or creating object is not yet possible:

postgres=> select current_user;
 current_user 
--------------
 u3
(1 row)

postgres=> create table s1.t2(a int);
ERROR:  permission denied for schema s1
LINE 1: create table s1.t2(a int);
                     ^
postgres=> create table s2.t2(a int);
ERROR:  permission denied for schema s2
LINE 1: create table s2.t2(a int);

postgres=> \c postgres postgres
You are now connected to database "postgres" as user "postgres".

This is what pg_write_all_data seems to be about:

postgres=> \c postgres postgres
You are now connected to database "postgres" as user "postgres".
postgres=# grant pg_write_all_data to u3;
GRANT ROLE
postgres=# \c postgres u3
You are now connected to database "postgres" as user "u3".
postgres=> create table s1.t2(a int);
ERROR:  permission denied for schema s1
LINE 1: create table s1.t2(a int);
                     ^
postgres=> create table s2.t2(a int);
ERROR:  permission denied for schema s2
LINE 1: create table s2.t2(a int);

This role does not grant permissions to create new objects but it grant permission to write data into existing relations:

postgres=> select current_user;
 current_user 
--------------
 u3
(1 row)

postgres=> insert into s1.t1 values (-1,'xxx',now() );
INSERT 0 1
postgres=> insert into s2.t1 values (-1,'xxx',now() );
INSERT 0 1

As roles are global objects that works for all databases in a cluster:

postgres=> \c postgres postgres
You are now connected to database "postgres" as user "postgres".
postgres=# create database db1;
CREATE DATABASE
postgres=# \c db1
You are now connected to database "db1" as user "postgres".
db1=# create schema s1;
CREATE SCHEMA
db1=# create table s1.t1(a int);
CREATE TABLE
db1=# insert into s1.t1 values(1);
INSERT 0 1
db1=# \c db1 u3
You are now connected to database "db1" as user "u3".
db1=> select * from s1.t1;
 a 
---
 1
(1 row)

db1=> insert into s1.t1 values(2);
INSERT 0 1

This is really nice if you want to give a specific user or role read and/or write permissions for all objects in all databases.

Finally the third predefined role: pg_database_owner. What is this about? Let’s grant that role to the u3 user:

db1=> \c postgres postgres
You are now connected to database "postgres" as user "postgres".
postgres=# grant pg_database_owner to u3;
ERROR:  role "pg_database_owner" cannot have explicit members

A new predefined role that can not be granted explicitly? Looking at the role definition it can not do much:

postgres=# \x
Expanded display is on.
postgres=# select * from pg_roles where rolname = 'pg_database_owner';
-[ RECORD 1 ]--+------------------
rolname        | pg_database_owner
rolsuper       | f
rolinherit     | t
rolcreaterole  | f
rolcreatedb    | f
rolcanlogin    | f
rolreplication | f
rolconnlimit   | -1
rolpassword    | ********
rolvaliduntil  | 
rolbypassrls   | f
rolconfig      | 
oid            | 6171

What is it there for then? The answer is here.

Cet article New predefined roles for PostgreSQL 14 est apparu en premier sur Blog dbi services.

Documentum – E_INTERNAL_SERVER_ERROR on D2-REST Product page related to GUAVA libraries after WebLogic PSU

Wed, 2021-06-30 13:30

At a customer, the D2-REST (16.5.1) application hosted on WebLogic Server 12c started showing 500 Internal Server Errors, after a customer release including many things. The error was rather simple to replicate since opening the D2-REST Product info page was sufficient (https://<host>/D2-REST/product-info). The URL was returning the following:

At the same time, on the logs:

2021-04-26 06:46:20,340 UTC [ERROR] ([ACTIVE] ExecuteThread: '2' for queue: 'weblogic.kernel.Default (self-tuning)') - com.emc.documentum.rest.util.LogHelper        : LogId: 9b360f83-335a-413e-87e3-481ba5cbf168, Status: 500, code: E_INTERNAL_SERVER_ERROR, message: An internal server error occurs.
org.springframework.web.util.NestedServletException: Handler dispatch failed; nested exception is java.lang.NoSuchMethodError: com.google.common.base.Objects.firstNonNull(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;
        at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:982)
        at com.emc.documentum.rest.servlet.RestDispatcherServlet.doDispatch(RestDispatcherServlet.java:33)
        at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:901)
        at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:970)
        at org.springframework.web.servlet.FrameworkServlet.doGet(FrameworkServlet.java:861)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:687)
        at org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:846)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
        at weblogic.servlet.internal.StubSecurityHelper$ServletServiceAction.run(StubSecurityHelper.java:286)
        at weblogic.servlet.internal.StubSecurityHelper$ServletServiceAction.run(StubSecurityHelper.java:260)
        at weblogic.servlet.internal.StubSecurityHelper.invokeServlet(StubSecurityHelper.java:137)
        at weblogic.servlet.internal.ServletStubImpl.execute(ServletStubImpl.java:350)
        at weblogic.servlet.internal.TailFilter.doFilter(TailFilter.java:25)
        at weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java:78)
        at com.emc.documentum.rest.filter.ApplicationFilter.doFilter(ApplicationFilter.java:33)
        at weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java:78)
        at org.springframework.web.filter.HiddenHttpMethodFilter.doFilterInternal(HiddenHttpMethodFilter.java:93)
        at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
        at weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java:78)
        at com.emc.documentum.d2.rest.filter.AppValidationFilter.doFilter(AppValidationFilter.java:35)
        at weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java:78)
        at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:317)
        at org.springframework.security.web.access.intercept.FilterSecurityInterceptor.invoke(FilterSecurityInterceptor.java:127)
        at org.springframework.security.web.access.intercept.FilterSecurityInterceptor.doFilter(FilterSecurityInterceptor.java:91)
        at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:331)
        at org.springframework.security.web.access.ExceptionTranslationFilter.doFilter(ExceptionTranslationFilter.java:114)
        at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:331)
        at org.springframework.security.web.authentication.AnonymousAuthenticationFilter.doFilter(AnonymousAuthenticationFilter.java:111)
        at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:331)
        at org.springframework.security.web.header.HeaderWriterFilter.doFilterInternal(HeaderWriterFilter.java:66)
        at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
        at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:331)
        at org.springframework.security.web.FilterChainProxy.doFilterInternal(FilterChainProxy.java:214)
        at org.springframework.security.web.FilterChainProxy.doFilter(FilterChainProxy.java:177)
        at org.springframework.web.filter.DelegatingFilterProxy.invokeDelegate(DelegatingFilterProxy.java:347)
        at org.springframework.web.filter.DelegatingFilterProxy.doFilter(DelegatingFilterProxy.java:263)
        at weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java:78)
        at com.emc.documentum.d2.rest.filter.AppInfoFilter.doFilter(AppInfoFilter.java:39)
        at weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java:78)
        at com.emc.documentum.rest.security.filter.RepositoryNamingFilter.doFilter(RepositoryNamingFilter.java:40)
        at weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java:78)
        at com.emc.documentum.rest.filter.RestCorsFilter.doFilterInternal(RestCorsFilter.java:47)
        at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
        at weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java:78)
        at org.springframework.web.filter.CharacterEncodingFilter.doFilterInternal(CharacterEncodingFilter.java:197)
        at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
        at weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java:78)
        at com.emc.documentum.rest.filter.CompressionFilter.doFilter(CompressionFilter.java:73)
        at weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java:78)
        at com.emc.documentum.rest.log.MessageLoggingFilter.doFilter(MessageLoggingFilter.java:69)
        at weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java:78)
        at com.emc.documentum.rest.security.filter.ExceptionHandlerFilter.doFilterInternal(ExceptionHandlerFilter.java:31)
        at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
        at weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java:78)
        at weblogic.servlet.internal.WebAppServletContext$ServletInvocationAction.wrapRun(WebAppServletContext.java:3706)
        at weblogic.servlet.internal.WebAppServletContext$ServletInvocationAction.run(WebAppServletContext.java:3672)
        at weblogic.security.acl.internal.AuthenticatedSubject.doAs(AuthenticatedSubject.java:344)
        at weblogic.security.service.SecurityManager.runAsForUserCode(SecurityManager.java:197)
        at weblogic.servlet.provider.WlsSecurityProvider.runAsForUserCode(WlsSecurityProvider.java:203)
        at weblogic.servlet.provider.WlsSubjectHandle.run(WlsSubjectHandle.java:71)
        at weblogic.servlet.internal.WebAppServletContext.doSecuredExecute(WebAppServletContext.java:2443)
        at weblogic.servlet.internal.WebAppServletContext.securedExecute(WebAppServletContext.java:2291)
        at weblogic.servlet.internal.WebAppServletContext.execute(WebAppServletContext.java:2269)
        at weblogic.servlet.internal.ServletRequestImpl.runInternal(ServletRequestImpl.java:1705)
        at weblogic.servlet.internal.ServletRequestImpl.run(ServletRequestImpl.java:1665)
        at weblogic.servlet.provider.ContainerSupportProviderImpl$WlsRequestExecutor.run(ContainerSupportProviderImpl.java:272)
        at weblogic.invocation.ComponentInvocationContextManager._runAs(ComponentInvocationContextManager.java:352)
        at weblogic.invocation.ComponentInvocationContextManager.runAs(ComponentInvocationContextManager.java:337)
        at weblogic.work.LivePartitionUtility.doRunWorkUnderContext(LivePartitionUtility.java:57)
        at weblogic.work.PartitionUtility.runWorkUnderContext(PartitionUtility.java:41)
        at weblogic.work.SelfTuningWorkManagerImpl.runWorkUnderContext(SelfTuningWorkManagerImpl.java:652)
        at weblogic.work.ExecuteThread.execute(ExecuteThread.java:420)
        at weblogic.work.ExecuteThread.run(ExecuteThread.java:360)
Caused by: java.lang.NoSuchMethodError: com.google.common.base.Objects.firstNonNull(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;
        at com.emc.documentum.d2fs.controller.D2AppInfoController.attribute(D2AppInfoController.java:160)
        at com.emc.documentum.d2fs.controller.D2AppInfoController.getProductInfo(D2AppInfoController.java:94)
        at com.emc.documentum.d2fs.controller.D2AppInfoController.get(D2AppInfoController.java:65)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:205)
        at org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:133)
        at org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:97)
        at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandlerMethod(RequestMappingHandlerAdapter.java:849)
        at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:760)
        at org.springframework.web.servlet.mvc.method.AbstractHandlerMethodAdapter.handle(AbstractHandlerMethodAdapter.java:85)
        at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:967)
        ... 72 common frames omitted
2021-04-26 06:46:20,414 UTC [INFO ] ([ACTIVE] ExecuteThread: '2' for queue: 'weblogic.kernel.Default (self-tuning)') - com.emc.documentum.rest.util.LogHelper        : XMLOutputFactory loaded com.ctc.wstx.stax.WstxOutputFactory.
2021-04-26 06:46:20,416 UTC [INFO ] ([ACTIVE] ExecuteThread: '2' for queue: 'weblogic.kernel.Default (self-tuning)') - com.emc.documentum.rest.util.LogHelper        : XMLInputFactory loaded com.ctc.wstx.stax.WstxInputFactory.
2021-04-26 06:46:20,451 UTC [INFO ] ([ACTIVE] ExecuteThread: '2' for queue: 'weblogic.kernel.Default (self-tuning)') - com.emc.documentum.rest.util.LogHelper        : Class com.emc.documentum.rest.config.DataBindingRuntime addLastPropertySource rest-api-data-binding.properties.

 

The recently deployed release contained many things but looking into it in details, the most promising suspect was the Oracle WebLogic Server PSU (+coherence patch) from April 2021. Based on the logs, this looked like a GUAVA (Google core libraries for Java) related issue. Usually, the D2-REST application would be using its own application libraries but it might happen that for some security reasons, the configuration would be changed to force WebLogic to use the Oracle provided ones instead. This would be in order to keep the third-party libraries up-to-date, as much as possible, to reduce the potential security issues. At this customer, rollback the PSU would be a rather important security problem. After looking into the details, it was clear that the method mentioned above has been deleted in GUAVA 21.0 (deprecated in 20.0). On the other hand, D2 16.5.1 comes with GUAVA 13.0.1 by default and D2-REST (+ D2-Smartview) comes with GUAVA 20.0. As part of the April PSU, this library was probably upgraded to 21.0 (I didn’t find any confirmation). Therefore, I tried to force D2-REST to re-use its internal GUAVA libraries instead (while keeping the others from WebLogic) by adding a new line inside the “<prefer-application-packages>” section:

[weblogic@wsd2rest-0 ~]$ cd $APPLICATIONS/D2-REST/WEB-INF/
[weblogic@wsd2rest-0 WEB-INF]$ cat weblogic.xml
<?xml version="1.0" encoding="UTF-8"?>

<weblogic-web-app>
  ...
  <container-descriptor>
    <!--prefer-web-inf-classes>true</prefer-web-inf-classes-->
    <prefer-application-packages>
      <package-name>org.slf4j</package-name>
      <package-name>com.google.common.*</package-name>
    </prefer-application-packages>
    <!--show-archived-real-path-enabled>true</show-archived-real-path-enabled-->
  </container-descriptor>
  ...
</weblogic-web-app>
[weblogic@wsd2rest-0 WEB-INF]$

 

Adding the line 11 above forces WebLogic to load the application specific packages instead of its own. After a Managed Server restart, the issue was gone, which confirms that the April PSU was the culprit:

Since we force WebLogic to not use its own jar files for some Google libraries, that means that potential security issues related to these jar files are obviously re-opened… However, at some point, you have a choice to make between being secure but having a non-working application OR potentially having some flaws but a working application. It’s obviously possible to go one-step further and instead of using “<package-name>com.google.common.*</package-name>“, which is rather generic, use a more refined definition of the package so that the scope affected is smaller.

The same applies to D2-Smartview as well since it is also a REST client, so it relies heavily on such packages…

Cet article Documentum – E_INTERNAL_SERVER_ERROR on D2-REST Product page related to GUAVA libraries after WebLogic PSU est apparu en premier sur Blog dbi services.

Documentum – dmqdocbroker/iapi/idql not working because of dbor.properties.lck

Wed, 2021-06-30 12:10

Have you ever faced an issue where dmqdocbroker, iapi, idql and the likes aren’t able to communicate at all with any Docbroker (connection broker)? Here, I’m not talking about potentially wrong hostname, port or connect modes, which might prevent you to reach a Docbroker if it’s not configured properly because this will still most likely reply to you with an error message… I’m really talking about the utility/binaries that cannot communicate anymore, it’s like all messages are sent to the void and nothing will ever respond (is that a black-hole I’m seeing?)!

Earlier this month, I suddenly had this behavior at one of our customer on two out of dozens of Documentum Servers. Everything seemed to be up&running, all the processes were there:

[dmadmin@cs-0 ~]$ ps -ef
UID      PID PPID C  STIME TTY       TIME CMD
dmadmin 7005    1 0  14:11 ?     00:00:00 ./dmdocbroker -port 1489 -init_file $DOCUMENTUM/dba/Docbroker.ini
dmadmin 7014    1 0  14:11 ?     00:00:00 ./dmdocbroker -port 1487 -init_file $DOCUMENTUM/dba/DocbrokerExt.ini
dmadmin 7077    1 0  14:11 ?     00:00:07 ./documentum -docbase_name GR_REPO -security acl -init_file $DOCUMENTUM/dba/config/GR_REPO/server.ini
dmadmin 7087    1 0  14:11 ?     00:00:07 ./documentum -docbase_name REPO1 -security acl -init_file $DOCUMENTUM/dba/config/REPO1/server.ini
dmadmin 7100 7077 0  14:11 ?     00:00:00 $DM_HOME/bin/mthdsvr master 0xfd7308a8, 0x7f9f93d81000, 0x223000 1000726 5 7077 GR_REPO $DOCUMENTUM/dba/log
dmadmin 7101 7100 0  14:11 ?     00:00:04 $DM_HOME/bin/mthdsvr worker 0xfd7308a8, 0x7f9f93d81000, 0x223000 1000726 5 0 GR_REPO $DOCUMENTUM/dba/log
dmadmin 7102 7087 0  14:11 ?     00:00:00 $DM_HOME/bin/mthdsvr master 0xfd7308be, 0x7fe2fe3ac000, 0x223000 1000727 5 7087 REPO1 $DOCUMENTUM/dba/log
dmadmin 7121 7102 0  14:11 ?     00:00:03 $DM_HOME/bin/mthdsvr worker 0xfd7308be, 0x7fe2fe3ac000, 0x223000 1000727 5 0 REPO1 $DOCUMENTUM/dba/log
dmadmin 7122 7100 0  14:11 ?     00:00:03 $DM_HOME/bin/mthdsvr worker 0xfd7308a8, 0x7f9f93d81000, 0x223000 1000726 5 1 GR_REPO $DOCUMENTUM/dba/log
dmadmin 7123 7077 0  14:11 ?     00:00:00 ./documentum -docbase_name GR_REPO -security acl -init_file $DOCUMENTUM/dba/config/GR_REPO/server.ini
dmadmin 7124 7077 0  14:11 ?     00:00:00 ./documentum -docbase_name GR_REPO -security acl -init_file $DOCUMENTUM/dba/config/GR_REPO/server.ini
dmadmin 7144 7102 0  14:11 ?     00:00:04 $DM_HOME/bin/mthdsvr worker 0xfd7308be, 0x7fe2fe3ac000, 0x223000 1000727 5 1 REPO1 $DOCUMENTUM/dba/log
dmadmin 7148 7087 0  14:11 ?     00:00:00 ./documentum -docbase_name REPO1 -security acl -init_file $DOCUMENTUM/dba/config/REPO1/server.ini
dmadmin 7149 7087 0  14:11 ?     00:00:00 ./documentum -docbase_name REPO1 -security acl -init_file $DOCUMENTUM/dba/config/REPO1/server.ini
dmadmin 7165 7100 0  14:11 ?     00:00:04 $DM_HOME/bin/mthdsvr worker 0xfd7308a8, 0x7f9f93d81000, 0x223000 1000726 5 2 GR_REPO $DOCUMENTUM/dba/log
dmadmin 7166 7077 0  14:11 ?     00:00:00 ./documentum -docbase_name GR_REPO -security acl -init_file $DOCUMENTUM/dba/config/GR_REPO/server.ini
dmadmin 7167 7102 0  14:11 ?     00:00:03 $DM_HOME/bin/mthdsvr worker 0xfd7308be, 0x7fe2fe3ac000, 0x223000 1000727 5 2 REPO1 $DOCUMENTUM/dba/log
dmadmin 7168 7087 0  14:11 ?     00:00:00 ./documentum -docbase_name REPO1 -security acl -init_file $DOCUMENTUM/dba/config/REPO1/server.ini
dmadmin 7169 7100 0  14:11 ?     00:00:04 $DM_HOME/bin/mthdsvr worker 0xfd7308a8, 0x7f9f93d81000, 0x223000 1000726 5 3 GR_REPO $DOCUMENTUM/dba/log
dmadmin 7187 7077 0  14:11 ?     00:00:00 ./documentum -docbase_name GR_REPO -security acl -init_file $DOCUMENTUM/dba/config/GR_REPO/server.ini
dmadmin 7190 7102 0  14:11 ?     00:00:03 $DM_HOME/bin/mthdsvr worker 0xfd7308be, 0x7fe2fe3ac000, 0x223000 1000727 5 3 REPO1 $DOCUMENTUM/dba/log
dmadmin 7194 7087 0  14:11 ?     00:00:00 ./documentum -docbase_name REPO1 -security acl -init_file $DOCUMENTUM/dba/config/REPO1/server.ini
dmadmin 7210 7100 0  14:11 ?     00:00:03 $DM_HOME/bin/mthdsvr worker 0xfd7308a8, 0x7f9f93d81000, 0x223000 1000726 5 4 GR_REPO $DOCUMENTUM/dba/log
dmadmin 7213 7077 0  14:11 ?     00:00:00 ./documentum -docbase_name GR_REPO -security acl -init_file $DOCUMENTUM/dba/config/GR_REPO/server.ini
dmadmin 7215 7102 0  14:11 ?     00:00:04 $DM_HOME/bin/mthdsvr worker 0xfd7308be, 0x7fe2fe3ac000, 0x223000 1000727 5 4 REPO1 $DOCUMENTUM/dba/log
dmadmin 7225 7087 0  14:11 ?     00:00:00 ./documentum -docbase_name REPO1 -security acl -init_file $DOCUMENTUM/dba/config/REPO1/server.ini
dmadmin 7334    1 0  14:12 ?     00:00:00 /bin/sh $JMS_HOME/server/startMethodServer.sh
dmadmin 7336 7334 0  14:12 ?     00:00:00 /bin/sh $JMS_HOME/bin/standalone.sh
dmadmin 7447 7336 21 14:12 ?     00:02:57 $JAVA_HOME/bin/java -D[Standalone] -server -XX:+UseCompressedOops -server -XX:+UseCompressedOops -Xms8g -Xmx8g -XX:MaxMetaspaceSize=512m -XX
dmadmin 7695 7077 0  14:12 ?     00:00:04 ./dm_agent_exec -enable_ha_setup 1 -docbase_name GR_REPO.GR_REPO -docbase_owner dmadmin -sleep_duration 0
dmadmin 7698 7087 0  14:12 ?     00:00:04 ./dm_agent_exec -enable_ha_setup 1 -docbase_name REPO1.REPO1 -docbase_owner dmadmin -sleep_duration 0
dmadmin 7908 7077 0  14:13 ?     00:00:00 ./documentum -docbase_name GR_REPO -security acl -init_file $DOCUMENTUM/dba/config/GR_REPO/server.ini
dmadmin 7918 7087 0  14:13 ?     00:00:00 ./documentum -docbase_name REPO1 -security acl -init_file $DOCUMENTUM/dba/config/REPO1/server.ini
dmadmin 8269 7077 0  14:21 ?     00:00:00 ./documentum -docbase_name GR_REPO -security acl -init_file $DOCUMENTUM/dba/config/GR_REPO/server.ini
dmadmin 8270 7077 0  14:21 ?     00:00:00 ./documentum -docbase_name GR_REPO -security acl -init_file $DOCUMENTUM/dba/config/GR_REPO/server.ini
dmadmin 8327 6370 0  14:27 pts/1 00:00:00 ps -ef
[dmadmin@cs-0 ~]$

 

However, I could see the communication issues by looking at the Repository log because it would show that the AgentExec was actually not connected, even after almost 20 minutes:

[dmadmin@cs-0 ~]$ cd $DOCUMENTUM/dba/log/REPO1/agentexec/
[dmadmin@cs-0 agentexec]$ date
Mon Apr 12 14:29:03 UTC 2021
[dmadmin@cs-0 agentexec]$
[dmadmin@cs-0 agentexec]$ tail -8 ../../REPO1.log
2021-04-12T14:11:56.453407      7087[7087]      0000000000000000        [DM_WORKFLOW_I_AGENT_START]info:  "Workflow agent worker (pid : 7194, session 010f12345000000c) is started sucessfully."
2021-04-12T14:11:57.455899      7087[7087]      0000000000000000        [DM_SERVER_I_START]info:  "Sending Initial Docbroker check-point "

2021-04-12T14:11:57.547764      7087[7087]      0000000000000000        [DM_MQ_I_DAEMON_START]info:  "Message queue daemon (pid : 7225, session 010f123450000456) is started sucessfully."
2021-04-12T14:11:58.348442      7223[7223]      010f123450000003        [DM_DOCBROKER_I_PROJECTING]info:  "Sending information to Docbroker located on host (cs-0.domain.com) with port (1490).  Information: (Config(REPO1), Proximity(1), Status(Open), Dormancy Status(Active))."
2021-04-12T14:11:58.661666      7223[7223]      010f123450000003        [DM_DOCBROKER_I_PROJECTING]info:  "Sending information to Docbroker located on host (cs-0.domain.com) with port (1488).  Information: (Config(REPO1), Proximity(1), Status(Open), Dormancy Status(Active))."
2021-04-12T14:11:58.959490      7223[7223]      010f123450000003        [DM_DOCBROKER_I_PROJECTING]info:  "Sending information to Docbroker located on host (cs-1.domain.com) with port (1490).  Information: (Config(REPO1), Proximity(2), Status(Open), Dormancy Status(Active))."
Mon Apr 12 14:12:55 2021 [INFORMATION] [AGENTEXEC 7698] Detected during program initialization: Version: 16.4.0200.0256  Linux64
[dmadmin@cs-0 agentexec]$
[dmadmin@cs-0 agentexec]$ # Previous startup from the AgentExec logs showing that it didn't start yet
[dmadmin@cs-0 agentexec]$ tail -2 agentexec.log
Sat Apr 10 19:20:30 2021 [INFORMATION] [LAUNCHER 23135] Detected during program initialization: Version: 16.4.0200.0256 Linux64
Sun Apr 11 19:20:26 2021 [INFORMATION] [LAUNCHER 2890] Detected during program initialization: Version: 16.4.0200.0256 Linux64
[dmadmin@cs-0 agentexec]$

 

The interesting part is that the Repositories have all been started properly and projected to the Docbroker. However, any client from the Documentum Server locally wouldn’t be able to connect to the Docbroker. Even more interesting, this was actually a HA environment with 2 CS. The Documentum Server hosting the Primary CS (I will call it cs-0) had the issue while the Documentum Server hosting the Remote CS (I will call it cs-1) had no problem. Executing the dmqdocbroker on the cs-0 to ping the Docbroker of the cs-0 never gave a response, however doing the exact same command to ping the Docbroker of the cs-0 but from the cs-1 host did work without any problem and showed the correct projection of the repositories:

## on cs-0 (Documentum Server hosting the PCS)
[dmadmin@cs-0 ~]$ hostname -f
cs-0.domain.com
[dmadmin@cs-0 ~]$
[dmadmin@cs-0 ~]$ time dmqdocbroker -t cs-0.domain.com -p 1489 -c ping
^C
real 0m53.513s
user 0m6.132s
sys 0m0.578s
[dmadmin@cs-0 ~]$
[dmadmin@cs-0 ~]$ time echo quit | iapi REPO1.REPO1 -Udmadmin -Pxxx
^C
real 0m46.431s
user 0m6.241s
sys 0m0.575s
[dmadmin@cs-0 ~]$
[dmadmin@cs-0 ~]$ time echo quit | iapi REPO1.cs-1_REPO1 -Udmadmin -Pxxx
^C
real 0m35.694s
user 0m6.163s
sys 0m0.582s
[dmadmin@cs-0 ~]$

## on cs-1 (Documentum Server hosting the RCS)
[dmadmin@cs-1 ~]$ hostname -f
cs-1.domain.com
[dmadmin@cs-1 ~]$
[dmadmin@cs-1 ~]$ time dmqdocbroker -t cs-0.domain.com -p 1489 -c ping
dmqdocbroker: A DocBroker Query Tool
dmqdocbroker: Documentum Client Library Version: 16.4.0200.0080
Using specified port: 1489
Successful reply from docbroker at host (cs-0) on port(1490) running software version (16.4.0200.0256 Linux64).

real 0m3.499s
user 0m6.950s
sys 0m0.469s
[dmadmin@cs-1 ~]$
[dmadmin@cs-1 ~]$ time echo quit | iapi REPO1.cs-1_REPO1 -Udmadmin -Pxxx

OpenText Documentum iapi - Interactive API interface
Copyright (c) 2018. OpenText Corporation
All rights reserved.
Client Library Release 16.4.0200.0080

Connecting to Server using docbase REPO1.cs-1_REPO1
[DM_SESSION_I_SESSION_START]info: "Session 010f1234501b635c started for user dmadmin."

Connected to OpenText Documentum Server running Release 16.4.0200.0256 Linux64.Oracle
Session id is s0
API> Bye

real 0m5.032s
user 0m7.401s
sys 0m0.487s
[dmadmin@cs-1 ~]$
[dmadmin@cs-1 ~]$ time echo quit | iapi REPO1.REPO1 -Udmadmin -Pxxx

OpenText Documentum iapi - Interactive API interface
Copyright (c) 2018. OpenText Corporation
All rights reserved.
Client Library Release 16.4.0200.0080

Connecting to Server using docbase REPO1.REPO1
[DM_SESSION_I_SESSION_START]info: "Session 010f1234501b6506 started for user dmadmin."

Connected to OpenText Documentum Server running Release 16.4.0200.0256 Linux64.Oracle
Session id is s0
API> Bye

real 0m5.315s
user 0m7.976s
sys 0m0.515s
[dmadmin@cs-1 ~]$

 

This shows that the issue isn’t the Docbroker or the Repositories themselves but rather the utility/binaries present on the cs-0 that cannot open communication channels with the local Docbroker, for some reasons… Even after setting debugging on the Docbroker, I could see communications when the dmqdocbroker utility was used on the cs-1 host but nothing was showing-up if the same command was used on the cs-0 host instead. You can enable some logs for the Docbroker by adding “trace=true” into the Docbroker.ini file and you can also add some other traces by setting the following environment variables (value can be 1 or 10 for example) and then restart the Docbroker: “export DM_DOCBROKER_TRACE=1; export DM_DEBUG_BROKER=1; export DM_TRANS_LOG=1“. Additionally, you can also add options to the launch script, just like for the Repository part: “-odocbroker_trace -onettrace_all_option -oxxx“.

Unfortunately, the dmqdocbroker utility uses the dmawk binary and the iapi/idql are also binaries so it’s rather difficult to debug further without the source code… After some testing/debugging, I found something rather hard to believe… All the binaries of the Documentum Server looked OK, they were no changes done in the past few weeks and the files were the same (same hash) than on the cs-1 for example. As you probably know, dmqdocbroker/iapi/idql will use the dfc.properties from the folder “$DOCUMENTUM_SHARED/config/” (with $DOCUMENTUM_SHARED=$DOCUMENTUM forced, starting in 16.4). Therefore, I have been looking into this folder for anything that might disrupt the proper behavior of the utility/binaries. All the files in this folder were 100% identical between cs-0 and cs-1, except for the encrypted password of the dm_bof_registry as well as the dfc.keystore since both of these are generated once. This would mean that the issue wouldn’t be there, but it was. I started looking into other areas to try to find the root cause but nothing was working. Then, I came back to the config folder and simply tried to empty it… Somehow, the dmqdocbroker was working again, magically! I mean, it printed many errors because the files log4j.properties, dfc.properties and dfc.keystore weren’t there but it replied something… What to do then? Well, I went step by step, putting back the files one by one, as they are supposed to be, and then executing the dmqdocbroker again to see if it stops working.

The files dfc.properties, log4j.properties, dfcfull.properties, dfc.keystore and all the cache folders were restored properly and the dmqdocbroker was still working without any problem… So what the hell? That’s more or less all of the files, isn’t it? True, that’s all the files, minus the dbor ones: dbor.properties and dbor.properties.lck. At this customer, these files are empty because no configuration was needed. It would be very hard to believe that this could be the issue, right? Well, have a look for yourself:

[dmadmin@cs-0 ~]$ cd $DOCUMENTUM_SHARED/config/
[dmadmin@cs-0 config]$ ls -l
total 140
drwxr-x--- 7 dmadmin dmadmin  4096 Jul 26 2020 ServerApps
drwxr-x--- 9 dmadmin dmadmin  4096 Jul 26 2020 Shared
drwxr-x--- 7 dmadmin dmadmin  4096 Jul 26 2020 acs
drwxr-x--- 7 dmadmin dmadmin  4096 Jul 26 2020 bpm
-rwxr-x--- 1 dmadmin dmadmin     0 Jul 22 2020 dbor.properties
-rw-rw-r-- 1 dmadmin dmadmin     0 Jul 26 2020 dbor.properties.lck
-rw-rw-r-- 1 dmadmin dmadmin  2152 Jul 26 2020 dfc.keystore
-rw-rw-r-- 1 dmadmin dmadmin   481 Jul 26 2020 dfc.properties
-rw-rw---- 1 dmadmin dmadmin    70 Jul 22 2020 dfc.properties.bak.0
-rwxr-x--- 1 dmadmin dmadmin   242 Jul 26 2020 dfc.properties.bak.1
-rw-rw-r-- 1 dmadmin dmadmin   271 Jul 26 2020 dfc.properties.bak.2
-rw-rw-r-- 1 dmadmin dmadmin   323 Jul 26 2020 dfc.properties.bak.3
-rw-rw-r-- 1 dmadmin dmadmin   481 Jul 26 2020 dfc.properties.bak.4
-rw-rw-r-- 1 dmadmin dmadmin   482 Jul 26 2020 dfc.properties.bak.5
-rwxrwx--- 1 dmadmin dmadmin 79268 Jul 22 2020 dfcfull.properties
-rwxr-x--- 1 dmadmin dmadmin  1242 Jul 26 2020 log4j.properties
[dmadmin@cs-0 config]$
[dmadmin@cs-0 config]$ # With the initial content, dmqdocbroker isn't working
[dmadmin@cs-0 config]$ date; time dmqdocbroker -t cs-0.domain.com -p 1489 -c ping; date
Wed Apr 14 07:43:28 UTC 2021
^C
real    0m22.718s
user    0m6.401s
sys     0m0.853s
Wed Apr 14 07:43:51 UTC 2021
[dmadmin@cs-0 config]$
[dmadmin@cs-0 config]$ mkdir test
[dmadmin@cs-0 config]$ mv * test/
mv: cannot move 'test' to a subdirectory of itself, 'test/test'
[dmadmin@cs-0 config]$
[dmadmin@cs-0 config]$ ls -l
total 4
drwxr-x--- 6 dmadmin dmadmin 4096 Apr 14 07:44 test
[dmadmin@cs-0 config]$
[dmadmin@cs-0 config]$ # With the folder empty, dmqdocbroker is "working" (errors but expected ones)
[dmadmin@cs-0 config]$ date; time dmqdocbroker -t cs-0.domain.com -p 1489 -c ping; date
Wed Apr 14 07:45:15 UTC 2021
0 [main] ERROR com.documentum.fc.common.impl.logging.LoggingConfigurator  - Problem locating log4j configuration
1 [main] WARN com.documentum.fc.common.impl.logging.LoggingConfigurator  - Using default log4j configuration
3 [main] ERROR com.documentum.fc.common.impl.preferences.PreferencesManager  - [DFC_PREFERENCE_LOAD_FAILED] Failed to load persistent preferences from null
java.io.FileNotFoundException: dfc.properties
        at com.documentum.fc.common.impl.preferences.PreferencesManager.locateMainPersistentStore(PreferencesManager.java:378)
        at com.documentum.fc.common.impl.preferences.PreferencesManager.readPersistentProperties(PreferencesManager.java:329)
        at com.documentum.fc.common.impl.preferences.PreferencesManager.<init>(PreferencesManager.java:37)
        ...
2862 [main] ERROR com.documentum.fc.client.security.impl.IdentityManager  - [DFC_SECURITY_IDENTITY_INIT] no identity initialization or incomplete identity initialization
DfException:: THREAD: main; MSG: ; ERRORCODE: ff; NEXT: null
        at com.documentum.fc.client.security.impl.JKSKeystoreUtil.creteNewKeystoreFile(JKSKeystoreUtil.java:425)
        at com.documentum.fc.client.security.impl.JKSKeystoreUtil.createNewKeystore(JKSKeystoreUtil.java:209)
        at com.documentum.fc.client.security.impl.DfcIdentityKeystore.applyDfcInitPolicy(DfcIdentityKeystore.java:95)
        ...
dmqdocbroker: A DocBroker Query Tool
dmqdocbroker: Documentum Client Library Version: 16.4.0200.0080
Using specified port: 1489
Successful reply from docbroker at host (cs-0) on port(1490) running software version (16.4.0200.0256 Linux64).

real    0m3.763s
user    0m6.265s
sys     0m0.672s
Wed Apr 14 07:45:19 UTC 2021
[dmadmin@cs-0 config]$
[dmadmin@cs-0 config]$ ls -l
total 12
drwxr-x--- 8 dmadmin dmadmin 4096 Apr 14 07:45 documentum
-rw-r----- 1 dmadmin dmadmin 3245 Apr 14 07:45 log4j.log
drwxr-x--- 6 dmadmin dmadmin 4096 Apr 14 07:44 test
-rw-r----- 1 dmadmin dmadmin    0 Apr 14 07:45 trace.log
[dmadmin@cs-0 config]$
[dmadmin@cs-0 config]$ rm -rf documentum/ log4j.log  trace.log
[dmadmin@cs-0 config]$
[dmadmin@cs-0 config]$ mv test/log4j.properties ./
[dmadmin@cs-0 config]$ mv test/dfc.properties ./
[dmadmin@cs-0 config]$
[dmadmin@cs-0 config]$ # With the folder empty exept for log4j.properties and dfc.properties files, dmqdocbroker is working
[dmadmin@cs-0 config]$ date; time dmqdocbroker -t cs-0.domain.com -p 1489 -c ping; date
Wed Apr 14 07:47:17 UTC 2021
dmqdocbroker: A DocBroker Query Tool
dmqdocbroker: Documentum Client Library Version: 16.4.0200.0080
Using specified port: 1489
Successful reply from docbroker at host (cs-0) on port(1490) running software version (16.4.0200.0256 Linux64).

real    0m4.280s
user    0m8.161s
sys     0m0.729s
Wed Apr 14 07:47:21 UTC 2021
[dmadmin@cs-0 config]$
[dmadmin@cs-0 config]$ ls -l
total 20
drwxr-x--- 8 dmadmin dmadmin 4096 Apr 14 07:47 Shared
-rw-r----- 1 dmadmin dmadmin 2153 Apr 14 07:47 dfc.keystore
-rw-rw-r-- 1 dmadmin dmadmin  481 Jul 26  2020 dfc.properties
-rwxr-x--- 1 dmadmin dmadmin 1242 Jul 26  2020 log4j.properties
drwxr-x--- 6 dmadmin dmadmin 4096 Apr 14 07:46 test
[dmadmin@cs-0 config]$
[dmadmin@cs-0 config]$ rm -rf Shared/ dfc.keystore
[dmadmin@cs-0 config]$
[dmadmin@cs-0 config]$ mv test/dfc.keystore ./
[dmadmin@cs-0 config]$ mv test/dfcfull.properties ./
[dmadmin@cs-0 config]$ mv test/dfc.properties* ./
[dmadmin@cs-0 config]$ mv test/log4j.properties* ./
[dmadmin@cs-0 config]$ mv test/ServerApps ./
[dmadmin@cs-0 config]$ mv test/Shared ./
[dmadmin@cs-0 config]$ mv test/acs ./
[dmadmin@cs-0 config]$ mv test/bpm ./
[dmadmin@cs-0 config]$
[dmadmin@cs-0 config]$ ls -l
total 140
drwxr-x--- 7 dmadmin dmadmin  4096 Jul 26  2020 ServerApps
drwxr-x--- 9 dmadmin dmadmin  4096 Jul 26  2020 Shared
drwxr-x--- 7 dmadmin dmadmin  4096 Jul 26  2020 acs
drwxr-x--- 7 dmadmin dmadmin  4096 Jul 26  2020 bpm
-rw-rw-r-- 1 dmadmin dmadmin  2152 Jul 26  2020 dfc.keystore
-rw-rw-r-- 1 dmadmin dmadmin   481 Jul 26  2020 dfc.properties
-rw-rw---- 1 dmadmin dmadmin    70 Jul 22  2020 dfc.properties.bak.0
-rwxr-x--- 1 dmadmin dmadmin   242 Jul 26  2020 dfc.properties.bak.1
-rw-rw-r-- 1 dmadmin dmadmin   271 Jul 26  2020 dfc.properties.bak.2
-rw-rw-r-- 1 dmadmin dmadmin   323 Jul 26  2020 dfc.properties.bak.3
-rw-rw-r-- 1 dmadmin dmadmin   481 Jul 26  2020 dfc.properties.bak.4
-rw-rw-r-- 1 dmadmin dmadmin   482 Jul 26  2020 dfc.properties.bak.5
-rwxrwx--- 1 dmadmin dmadmin 79268 Jul 22  2020 dfcfull.properties
-rwxr-x--- 1 dmadmin dmadmin  1242 Jul 26  2020 log4j.properties
drwxr-x--- 2 dmadmin dmadmin  4096 Apr 14 07:51 test
[dmadmin@cs-0 config]$
[dmadmin@cs-0 config]$ ls -l test/
total 0
-rwxr-x--- 1 dmadmin dmadmin 0 Jul 22  2020 dbor.properties
-rw-rw-r-- 1 dmadmin dmadmin 0 Jul 26  2020 dbor.properties.lck
[dmadmin@cs-0 config]$
[dmadmin@cs-0 config]$
[dmadmin@cs-0 config]$ # With the full folder except the dbor files, dmqdocbroker is still working
[dmadmin@cs-0 config]$ date; time dmqdocbroker -t cs-0.domain.com -p 1489 -c ping; date
Wed Apr 14 07:51:30 UTC 2021
dmqdocbroker: A DocBroker Query Tool
dmqdocbroker: Documentum Client Library Version: 16.4.0200.0080
Using specified port: 1489
Successful reply from docbroker at host (cs-0) on port(1490) running software version (16.4.0200.0256 Linux64).

real    0m3.501s
user    0m6.632s
sys     0m0.666s
Wed Apr 14 07:51:34 UTC 2021
[dmadmin@cs-0 config]$
[dmadmin@cs-0 config]$ mv test/dbor.properties* ./
[dmadmin@cs-0 config]$
[dmadmin@cs-0 config]$ # With the dbor files back, dmqdocbroker isn't working anymore
[dmadmin@cs-0 config]$ date; time dmqdocbroker -t cs-0.domain.com -p 1489 -c ping; date
Wed Apr 14 07:51:56 UTC 2021
^C
real    0m30.682s
user    0m5.001s
sys     0m0.424s
Wed Apr 14 07:52:27 UTC 2021
[dmadmin@cs-0 config]$
[dmadmin@cs-0 config]$ mv dbor.properties.lck test/
[dmadmin@cs-0 config]$
[dmadmin@cs-0 config]$ # Removing just the dbor files again, dmqdocbroker is working again
[dmadmin@cs-0 config]$ date; time dmqdocbroker -t cs-0.domain.com -p 1489 -c ping; date
Wed Apr 14 07:52:36 UTC 2021
dmqdocbroker: A DocBroker Query Tool
dmqdocbroker: Documentum Client Library Version: 16.4.0200.0080
Using specified port: 1489
Successful reply from docbroker at host (cs-0) on port(1490) running software version (16.4.0200.0256 Linux64).

real    0m3.185s
user    0m5.546s
sys     0m0.578s
Wed Apr 14 07:52:39 UTC 2021
[dmadmin@cs-0 config]$
[dmadmin@cs-0 config]$ ll
total 140
drwxr-x--- 7 dmadmin dmadmin  4096 Jul 26  2020 ServerApps
drwxr-x--- 9 dmadmin dmadmin  4096 Jul 26  2020 Shared
drwxr-x--- 7 dmadmin dmadmin  4096 Jul 26  2020 acs
drwxr-x--- 7 dmadmin dmadmin  4096 Jul 26  2020 bpm
-rwxr-x--- 1 dmadmin dmadmin     0 Jul 22  2020 dbor.properties
-rw-r----- 1 dmadmin dmadmin     0 Apr 14 07:52 dbor.properties.lck
-rw-rw-r-- 1 dmadmin dmadmin  2152 Jul 26  2020 dfc.keystore
-rw-rw-r-- 1 dmadmin dmadmin   481 Jul 26  2020 dfc.properties
-rw-rw---- 1 dmadmin dmadmin    70 Jul 22  2020 dfc.properties.bak.0
-rwxr-x--- 1 dmadmin dmadmin   242 Jul 26  2020 dfc.properties.bak.1
-rw-rw-r-- 1 dmadmin dmadmin   271 Jul 26  2020 dfc.properties.bak.2
-rw-rw-r-- 1 dmadmin dmadmin   323 Jul 26  2020 dfc.properties.bak.3
-rw-rw-r-- 1 dmadmin dmadmin   481 Jul 26  2020 dfc.properties.bak.4
-rw-rw-r-- 1 dmadmin dmadmin   482 Jul 26  2020 dfc.properties.bak.5
-rwxrwx--- 1 dmadmin dmadmin 79268 Jul 22  2020 dfcfull.properties
-rwxr-x--- 1 dmadmin dmadmin  1242 Jul 26  2020 log4j.properties
drwxr-x--- 2 dmadmin dmadmin  4096 Apr 14 07:52 test
[dmadmin@cs-0 config]$
[dmadmin@cs-0 config]$ diff dbor.properties.lck test/dbor.properties.lck
[dmadmin@cs-0 config]$

 

So as you can see above (it’s rather long but I wanted to put all the evidences I gathered because I still cannot believe this is the cause of the issue), just removing/renaming the empty file “dbor.properties.lck” which was there, untouched, since almost 9 months is sufficient to have the dmqdocbroker/iapi/idql working again… Trying to put back the old empty file, the issue will come back. It’s the “same” file, same content (empty), same file format, everything… The only difference would be the Inode of course and the creation/modification dates.

After some more investigations, the issue appeared to be on the NAS behind which was still having a lock on the file, somehow. For information, I also had the same behavior on a second environment but with the file “$DOCUMENTUM/config/ServerApps/identityInterprocessMutex.lock” this time… So if that even happen to you, take a look at these lock files under $DOCUMENTUM/config and make sure there are no problems with the storage.

Cet article Documentum – dmqdocbroker/iapi/idql not working because of dbor.properties.lck est apparu en premier sur Blog dbi services.

SQL Server 2019: What’s new in sp_configure and sys.configurations options?

Tue, 2021-06-29 08:46

SQL Server 2019 added new options in sp_configure and sys.configurations.
First, how can we find the difference between these SQL Server Versions.
This simple query will give us the number of options and the SQL Server Version:

select count(*),@@version FROM sys.configurations

In SQL server 2016, we have 74 parameters for the instance configuration:

In SQL server 2017, we have 77 parameters for the instance configuration:

In SQL server 2019, we have 84 parameters for the instance configuration:

In SQL Server 2019 we have 7 more parameters than in SQL Server 2017.

In detail, we see that one parameter has been removed and 8 added:

    • The parameter removed is with the ID 1577 named “common criteria compliance enabled”. More detail here
    • The 8 new options are:
      • ID 1588 named “column encryption enclave type”
      • ID 1589 named “tempdb metadata memory-optimized”
      • ID 1591 named “ADR cleaner retry timeout (min) ”
      • ID 1592 named “ADR Preallocation Factor”
      • ID 1593 named “version high part of SQL Server”
      • ID 1594 named “version low part of SQL Server”
      • ID 16398 named “allow filesystem enumeration”
      • ID 16399 named “polybase enabled”

After we identify the new parameters, we will go a step forward with the configuration with this query:

select * FROM sys.configurations where configuration_id in (1588,1589,1591,1592,1593,1594,16398,16399)

We can see that only one value has 1 and not 0 by default, the parameter “allow filesystem enumeration”.
The 2 others interesting columns is “is_dynamic” and “is_advanced”:

  • When “is_dynamic” is set to 1, the parameter need a RECONFIGURE to be activate.
  • When “is_advanced” is set to1, the parameter is in the advanced configuration.

I will not explain or test the new parameter in this article.
It’s just to give you a view of the new SQL Server 2019 Instance configuration options.

Cet article SQL Server 2019: What’s new in sp_configure and sys.configurations options? est apparu en premier sur Blog dbi services.

Connecting to Repositories with the Same Name and/or ID

Mon, 2021-06-28 15:33

A rare but recurrent issue that customers sometimes encounter is to how to connect to each one of distinct repositories with the same name or same docbase id, or even both if one repository is a clone of the other one. The present connection resolution technique based on the dfc.properties file does not support this and only lets one connect to the first matching repository found. Well knowing this limitation, why were they created with the same name in the first place ? Just don’t and that pitfall is avoided. Actually however, this situation makes sense when the repositories are created by different teams, e.g. one TEST repository for each application in development, or developers’ personal repositories, maybe local on their laptops, or an application’s repository existing in different stages of its lifecycle, e.g. a PRODUCT_CATALOG repository in DEV, TUNI, INT, CTLQ, ACC, PROD, ARCHIVED, and maybe CLONE_1, CLONE_2, etc…
Now, at some point, a developer would need to access simultaneously, say, PRODUCT_CATALOG in DEV and PRODUCT_CATALOG in PROD in order to troubleshoot a problem, a plausible scenario. Another scenario is when a common, DFCs-based service must access docbases with the same name coming from different, independently managed applications. So, how to do that ?

Yes, how to do that ?

In this article, I presented a possible solution based on editing the dfc.properties file on-the-fly prior to connecting, so its docbroker_host and docbroker_port parameters would point to the target repository’s docbroker. As you know, that file is required to be able to connect to repositories. It is read by the DFCs when a connection request is done from a DFCs client, and the docbrokers listed in there are queried in turn until one of them replies with the information for the requested target that projects to it.
The problem with this algorithm is that it is not selective enough to tell apart distinct repositories with the same name. If the target could be specified like docbase[.server_instance][@machine[:port]], that could work. Actually, the above syntax, minus the [:port] part, is accepted but it does not work in our case. A typical returned error is:

DfNoServersException:: THREAD: main; MSG: [DM_DOCBROKER_E_NO_SERVERS_FOR_DOCBASE]error: "The DocBroker running on host (null:0) does not know of a server for the specified docbase (dmtest73@192.168.56.12)"; ERRORCODE: 100; NEXT: null

The hack described in the aforementioned article consisted in removing all the docbroker_host and docbroker_port pairs of parameters and inserting the exact one that is used by the repository of interest, so any ambiguity is lifted. Prior to the next connection, the work-around is repeated for the new target.
Wrappers around the command-line tools iapi and idql, wiapi respectively widql, do just that.
It works quite well for these tools but what about java DFCs clients ? I guess we could subclass the session manager’s getSession() or write a wrapper that applies the same work-around transparently and that was my intention at first when a customer raised the issue but I decided to do give OpenText’s knowledge base another try, in case an out of-the-box yet unknown parameter would solve this question with no programming needed, i.e. no customization of existing code. I was not left empty-handed as I found this article: “How can I dynamically switch between different Docbrokers using DFC?”, Article ID:KB8801087. Here is the full note:

How can I dynamically switch between different Docbrokers using DFC?

Article ID:KB8801087

Add FavoriteEmailNotifyPrint

Applies to

Documentum Foundation Classes 4.0

Summary

How can I dynamically switch between different Docbrokers using DFC?

Resolution
Normally, you are restricted to Docbases registered to theDocbroker specified in the DMCL.ini file. Modifying the apiConfigobject can still access Docbases that are not registered with thisDocbroker.

The steps involved are the following:
1.Get a session object for a docbase which is registered withthe docbroker specified in the dmcl.ini file.
2.Get an apiConfig object using the methodIDfSession.getClientConfig();
3.Set the 'primary_host" and "primary_port" attributes ofthis apiConfig object to a different docbroker.
4.Get a session for a different docbase registered with thisdocbroker.

For more details, please refer to the APIconfig object in theobject reference manual for Documentum 4i.
Legacy Article ID
ECD316940

It’s again a bit hacky and still requires a customization but at least it addresses the DFCs part. Although it is DFCs-oriented, as a proof of concept, let’s see first if we could make it work using the API from within iapi.

Testing from within iapi

In our test case, we have a docbase named dmtest73 with id 50000 projecting to docbroker host dmtest.cec on port 1489 and a different docbase with the same name dmtest73 but id 90005 projecting to the docbroker host docker on port 7489. Both docbrokers are present in the dfc.properties file. We want to be able to connect to each of those docbases. Let’s first try the normal, default connection:

$ iapi dmtest73 -Udmadmin -Pdmadmin
API> retrieve,c,dm_server_config
dump,c,l
USER ATTRIBUTES

  object_name                     : dmtest73
...
  owner_name                      : dmtest73
...
  acl_domain                      : dmtest73
...
  operator_name                   : dmtest73
...
  web_server_loc                  : dmtest.cec
...
SYSTEM ATTRIBUTES

  r_object_type                   : dm_server_config
  r_creation_date                 : 7/1/2019 19:26:18
  r_modify_date                   : 6/20/2021 23:55:33
...
  r_creator_name                  : dmtest73
...
  r_server_version                : 7.3.0000.0214  Linux64.Oracle
  r_host_name                     : dmtest.cec
  r_process_id                    : 908
  r_install_owner                 : dmadmin
...

# dump the docbase config too;
API> retrieve,s0,dm_docbase_config    
...
3c00c35080000103
API> dump,c,l
...
USER ATTRIBUTES

  object_name                     : dmtest73
  title                           : a v7.3 test repository
...
  owner_name                      : dmtest73
...
  acl_domain                      : dmtest73
...
  index_store                     : DM_DMTEST73_INDEX
...

SYSTEM ATTRIBUTES

  r_object_type                   : dm_docbase_config
  r_creation_date                 : 7/1/2019 19:26:18
  r_modify_date                   : 7/1/2019 17:33:50
...
  r_creator_name                  : dmtest73
...
  r_dbms_name                     : Oracle
  r_docbase_id                    : 50000
# we are in the docbase with id 50000;

So, this is the dmtest73 docbase that is accessed with the current docbroker definitions in dfc.properties.
Usually, only that docbase, with FQN dmtest73@dmtest.cec:1489, is reachable because its docbroker is listed earlier than dmtest73@docker:7489 in dfc.properties, as reported by the session’s apiconfig object:

dump,c,apiconfig
...
  dfc.docbroker.exclude.failure_th: 3
  dfc.docbroker.exclusion.time    : 30
  dfc.docbroker.host           [0]: dmtest.cec
                               [1]: dmtest.cec
                               [2]: docker
                               [3]: docker
                               [4]: docker
                               [5]: docker
  dfc.docbroker.port           [0]: 7289
                               [1]: 1489
                               [2]: 1489
                               [3]: 1589
                               [4]: 7489
                               [5]: 6489
  dfc.docbroker.protocol       [0]: rpc_static
                               [1]: rpc_static
                               [2]: rpc_static
                               [3]: rpc_static
                               [4]: rpc_static
                               [5]: rpc_static
  dfc.docbroker.search_order      : sequential
  dfc.docbroker.service        [0]: dmdocbroker
                               [1]: dmdocbroker
                               [2]: dmdocbroker
                               [3]: dmdocbroker
                               [4]: dmdocbroker
                               [5]: dmdocbroker
  dfc.docbroker.timeout        [0]: 0
                               [1]: 0
                               [2]: 0
                               [3]: 0
                               [4]: 0
                               [5]: 0
...

Here, 6 pairs of docbroker_host/docbroker_port were defined in the dfc.properties; each one has additional default parameters dfc.docbroker.protocol, dfc.docbroker.service and dfc.docbroker.timeout; they are all synchronized, i.e.
dfc.docbroker.host[i] uses port dfc.docbroker.port[i], dfc.docbroker.protocol[i], dfc.docbroker.service[i] with a timeout of dfc.docbroker.timeout[i].
The Note says to force the primary docbroker (i.e. the first one, corresponding to the values at index 0 of the dfc.docbroker% attributes) to the target docbase’s one; it doesn’t say anything about the other ones but we’ll remove them to eradicate any possibility of a fail over logic:

# empty the dfc.docbroker% attributes;
API> truncate,c,apiconfig,dfc.docbroker.host
truncate,c,apiconfig,dfc.docbroker.port
truncate,c,apiconfig,dfc.docbroker.protocol
truncate,c,apiconfig,dfc.docbroker.service
truncate,c,apiconfig,dfc.docbroker.timeout

# add the target docbase's docbroker;
API> append,c,apiconfig,dfc.docbroker.host
docker
append,c,apiconfig,dfc.docbroker.port
7489
append,c,apiconfig,dfc.docbroker.protocol
rpc_static
append,c,apiconfig,dfc.docbroker.service
dmdocbroker
append,c,apiconfig,dfc.docbroker.timeout
0

# verify the parameters;
API> dump,c,apiconfig
...
  dfc.docbroker.host           [0]: docker
  dfc.docbroker.port           [0]: 7489
  dfc.docbroker.protocol       [0]: rpc_static
  dfc.docbroker.search_order      : sequential
  dfc.docbroker.service        [0]: dmdocbroker
  dfc.docbroker.timeout        [0]: 0
...

# try to connect to dmtest73@docker:7489 now; 
API> connect,dmtest73,dmadmin,dmadmin
...
s1
# OK but where are we really ?
API> retrieve,s1,dm_server_config
...
3d015f9580000102
API> dump,c,l
USER ATTRIBUTES

  object_name                     : dmtest73
...
  owner_name                      : dmtest73c
...
  acl_domain                      : dmtest73c
...
  operator_name                   : dmtest73c
...
  web_server_loc                  : container73

SYSTEM ATTRIBUTES

  r_object_type                   : dm_server_config
  r_creation_date                 : 6/20/2021 02:52:36
  r_modify_date                   : 6/20/2021 00:59:43
...
  r_creator_name                  : dmtest73c
...
  r_server_version                : 16.4.0000.0248  Linux64.Oracle
  r_host_name                     : container73
  r_process_id                    : 13709
...

# dump the docbase config too;
API> retrieve,s1,dm_docbase_config
...
3c015f9580000103
API> dump,c,l
...
USER ATTRIBUTES

  object_name                     : dmtest73
  title                           : dmtest73 homonym silently
...
...
  acl_domain                      : dmtest73c
...
  index_store                     : dm_dmtest73c_index
...

SYSTEM ATTRIBUTES

  r_object_type                   : dm_docbase_config
  r_creation_date                 : 6/20/2021 02:52:36
  r_modify_date                   : 6/20/2021 01:27:22
...
...
  r_dbms_name                     : Oracle
  r_docbase_id                    : 90005
...
# we are now in the docbase with id 90005;

It works as proved by comparing the dm_server_config and dm_docbase_config objects, notably the dm_docbase_config.r_docbase_id.
So, what to think of such a solution ? It requires some cutting and pasting of statements but they could be saved into some sort of macros depending on the command-line client used (e.g. the GUI-based dqman). For iapi I’d definitively prefer starting it as described in the above article, e.g.

$ wiapi dmtest73:docker:7289
$ widql dmtest73:dmtest.cec:1489

but if several connections to homonym docbases must be opened at once in the same iapi working session, this is the only way.

The DFCs version

The solution from the Note:KB8801087 was for the DFCs in the first place. Let’s apply it in function getSessionExtended() (starting on line 46 in the listing below) with the following profile:

public void getSessionExtended(String repo, String user, String passwd) throws Exception

The function takes a docbase’s specification in repo, and a user name and password to connect to the repository. As we need a unambiguous syntax to define repo, let’s use the time proven one:

docbase_name[:docbroker_machine[:docbroker_port]]

where docbroker_machine:docbroker_port are the coordinates of the docbroker, possibly remote, the docbase docbase_name projects to. Obviously, two homonym docbases cannot project to the same docbroker (the docbroker would reject them all but the first one), but they could be running on the same machine and project to distinct docbrokers, locally or remotely.
An alternate syntax could be:

docbase_name@docbroker_machine:docbroker_port

If preferred, the change is easy in the function’s regular expression (see line 51 below):

Pattern re_docbase_docbroker_host_docbroker_port = Pattern.compile("^([^:]+)(@([^:]+)(:([^:]+))?)?$");

The function will first parse the extended repository syntax (starting on line 53), apply the Note’s hack (lines 81 to 87), and finally connect (line 96). To be sure we are in the right docbase, all the configuration objects will be dumped by the ancillary function dump_current_configs() (defined starting on line 124, called on line 182) as well. Starting on line 111, the apiconfig object is displayed to check the hack’s change. On lines 102 to 108, we find out the minimum field width to display the apiconfig’s attributes without truncation, as the dump() method does a lame job here.
If no docbroker is specified, then the function fails back to the DFCs default behavior so it remains backwards compatible.
In order to make the test possible, a main() function (starting on line 157) accepting and parsing command-line parameters (starting on line 167) is also provided.

// A test program for function getSessionExtended() to open sessions using an extended syntax;
// June 2021, cec@dbi-services.com;
// to compile: javac extendedConnection.java
// to execute: java -classpath .:$DOCUMENTUM/config:$CLASSPATH extendedConnection target_docbase user_name password
// the programm will attempt to connect to target_docbase using those credentials;
// target_docbase's syntax is:
//   docbase_name[:docbroker_machine[:docbroker_port]]
// if docbroker_machine is missing (and hence docbroker_port too), a session is opened through the default behavior from the dfc.properties file, i.e. following the order the docbrokers are listed in that file;
// otherwise, the docbrokers listed in dfc.properties are not used and the specified docbroker is;
// so the function is compatible with the default behavior of the DFCs;
import com.documentum.fc.client.IDfClient;
import com.documentum.fc.client.DfClient;
import com.documentum.fc.client.DfQuery;
import com.documentum.fc.client.IDfCollection;
import com.documentum.fc.client.IDfDocbaseMap;
import com.documentum.fc.client.IDfQuery;
import com.documentum.fc.client.IDfSession;
import com.documentum.fc.client.IDfSessionManager;
import com.documentum.fc.common.DfLoginInfo;
import com.documentum.fc.common.IDfLoginInfo;
import com.documentum.fc.common.IDfAttr;
import com.documentum.fc.client.IDfTypedObject;
import com.documentum.fc.client.IDfPersistentObject;
import java.io.RandomAccessFile;
import java.util.regex.Pattern;
import java.util.regex.Matcher;
import java.util.Enumeration;

public class extendedConnection {
   String gr_username = null;
   String gr_password = null;
   IDfSessionManager sessMgr = null;
   IDfSession idfSession;
 
   public void usage() {
   // output the utility extended repository's syntax;
      System.out.println("Usage:");
      System.out.println("   java [-Ddfc.properties.file=/tmp/dfc.properties] -classpath .:/app/dctm/config:$CLASSPATH docbase[:docbroker-host[:docbroker-port]] username password");
      System.out.println("Use the extended repository syntax (e.g. dmtest:docker:7289, vs. dmtest) to override the DFCs' default resolution mechanism.");
      System.out.println("Examples:");
      System.out.println("   java -classpath .:/app/dctm/config:$CLASSPATH dctmping dmtest73 dmadmin dmadmin");
      System.out.println("for using the docbrokers defined in the dfc.properties (classic usage)");
      System.out.println("   java -classpath .:/tmp:$CLASSPATH dctmping dmtest73:docker:7489 dmadmin dmadmin");
      System.out.println("for short-circuiting the docbrokers and using the repo's extended syntax");
   }
   public void getSessionExtended(String repo, String user, String passwd) throws Exception {
   // change the dfc.docbroker.host and dfc.docbroker.port to connect with more flexibility;
   // The target repository repo is defined though the following advanced syntax:
   //    docbase[:docbroker-host[:docbroker-port]]
      System.out.printf("getSessionExtended%n");
      Pattern re_docbase_docbroker_host_docbroker_port = Pattern.compile("^([^:]+)(:([^:]+)(:([^:]+))?)?$");

      Matcher check = re_docbase_docbroker_host_docbroker_port.matcher(repo);
      String docbase = null;
      String docbroker_host = null;
      String docbroker_port = null;
      if (check.find()) {
         docbase = check.group(1);
         docbroker_host = check.group(3);
         docbroker_port = check.group(5);
      } 
      else {
         System.out.println("Missing docbase name; the docbase is mandatory");
         usage();
      }
      if (docbroker_host != null) {
         System.out.println("host = " + docbroker_host);
         if (docbroker_port == null)
            docbroker_port = "1489";
         System.out.println("port = " + docbroker_port);
      } 
      else
         System.out.println("docbroker host is empty, using the dfc.properties");
      System.out.println("using the " + (docbroker_host != null ? (" docbroker host " + docbroker_host + ":" + docbroker_port) : "the dfc.properties"));

      IDfClient client = DfClient.getLocalClient();
      IDfTypedObject client_config = client.getClientConfig();

      if (docbroker_host != null) {
         // let's hack the session config to force the given docbroker[:port];
         client_config.truncate("dfc.docbroker.host", 0);
         client_config.appendString("dfc.docbroker.host", docbroker_host);
         client_config.truncate("dfc.docbroker.port", 0);
         client_config.appendString("dfc.docbroker.port", docbroker_port);
         client_config.truncate("dfc.docbroker.protocol", 1);
         client_config.truncate("dfc.docbroker.service", 1);
         client_config.truncate("dfc.docbroker.timeout", 1);
      }

      IDfLoginInfo login = new DfLoginInfo();
      login.setUser(user);
      login.setPassword(passwd);
      // as recommended, use the session manager;
      sessMgr = client.newSessionManager();
      sessMgr.setIdentity(docbase, login);
      idfSession = sessMgr.getSession(docbase);

      System.out.printf("session config:%n");
      int max_length = 0;
      // as the default presentation from dump() sucks too much due to the truncated attribut names, let's produce a non-truncated one:
      // first, iterate through the session config and find the longest one;
      for (Enumeration e = client_config.enumAttrs(); e.hasMoreElements() ;) {
         IDfAttr attr = (IDfAttr) e.nextElement();
         String name = attr.getName();
         String value = client_config.getString(name);
         if (null != value)
            max_length = max_length >= name.length() ? max_length : name.length();
      }
      // max_length contains now the length of the longest attribute name;
      // display the nicely formatted session config;
      for (Enumeration e = client_config.enumAttrs(); e.hasMoreElements() ;) {
         IDfAttr attr = (IDfAttr) e.nextElement();
         String name = attr.getName();
         String value = client_config.getAllRepeatingStrings(name, "\n" + String.join("", new String(new char[max_length]).replace("\0", " ") + "  "));
         System.out.printf("%" + max_length + "s: %s%n", name, value);
      }
   }
 
   public void releaseSession() throws Exception {
   // quite obvious;
      sessMgr.release(idfSession);
   }
 
   public void dump_all_configs() throws Exception {
   // dump all the server and docbase configs defined in repository;
      System.out.printf("%nin dump_all_configs%n");
      String[] configs = {"select r_object_id from dm_server_config",
                          "select r_object_id from dm_docbase_config"};
      IDfQuery query = new DfQuery();
      for (String dql_stmt: configs) {
         System.out.println("executing " + dql_stmt);
         query.setDQL(dql_stmt);
         IDfCollection collection = null;
         String r_object_id = null;
         try {
            collection = query.execute(idfSession, IDfQuery.DF_READ_QUERY);
            while (collection.next()) {
               r_object_id = collection.getString("r_object_id");
               IDfPersistentObject obj = idfSession.getObjectByQualification("dm_sysobject where r_object_id = '" + r_object_id + "'");
               System.out.println("dumping object with id = " + r_object_id);
               System.out.println(obj.dump());
            }
         }
         catch(Exception e) {
            System.out.printf("Error in dumps_all_configs()%n");
            System.out.println(e.getMessage());
            e.printStackTrace();
            // continue as far as possible;
         }
         finally {
            if (collection != null)
               collection.close();
         }
      }
   }

   public static void main(String[] args) throws Exception {
      System.out.printf("%nextendedConnection started ...%n");
      extendedConnection dmtest = new extendedConnection();
      if (0 == args.length)
         System.exit(0);

      String docbase = null;
      String user    = null;
      String passwd  = null;
      // if any arguments are present, they must be the target docbase and the credentials to connect (username and password);
      if (args.length != 3) {
         System.out.println("Missing arguments. Usage: dctmping [target_docbase [user_name password]]");
         System.exit(1);
      }
      else {
         docbase = args[0];
         user    = args[1];
         passwd  = args[2];
      }

      try {
         // connect using the command-line parameters;
         dmtest.getSessionExtended(docbase, user, passwd);

         //dump all the server and docbase configs;
         dmtest.dump_all_configs();
      }
      catch(Exception e) {
         System.out.printf("Error while working in the docbase %s as user %s\n", docbase, user);
         System.out.println(e.getMessage());
         e.printStackTrace();
      }
      finally {
         try {
            dmtest.releaseSession();
         }
         catch(Exception e) {}
      }
   }
}

Interestingly, the DFCs’ truncate method has an additional argument compared to its API’s equivalent, the index at which truncation should start. Also, using the DFCs, no opened session is needed prior to accessing the config temporary object, named here client config vs. API’s apiconfig.
Save this code into a file named extendedConnection.java in current directory.
To compile it:

javac extendedConnection.java

To execute it from current directory (paths may vary according to your installation):

export DOCUMENTUM=/app/dctm
export CLASSPATH=$DOCUMENTUM/dctm.jar:$DOCUMENTUM/dfc/dfc.jar
$ java -classpath .:$DOCUMENTUM/config:$CLASSPATH extendedConnection docbase username password
Example of output

Here is an example of the default behavior, i.e. when the target repo does not use the extended syntax repo:docbroker_host[:docbroker_port]:

$ java -classpath .:/app/dctm/config:$CLASSPATH extendedConnection dmtest73 dmadmin dmadmin

extendedConnection started ...
getSessionExtended
docbroker host is empty, using the dfc.properties
using the the dfc.properties
session config:
                                                  dfc.name: dfc
                                            dfc.config.dir: /app/dctm/config
                                           dfc.config.file: file:/app/dctm/config/dfc.properties
...
                                        dfc.docbroker.host: 
                                                            192.168.56.12
                                                            
                                                            192.168.56.15
                                                            192.168.56.15
                                                            192.168.56.15
                                        dfc.docbroker.port: 0
                                                            1489
                                                            0
                                                            7489
                                                            6489
                                                            1489
...
                                    dfc.docbroker.protocol: rpc_static
                                                            rpc_static
                                                            rpc_static
                                                            rpc_static
                                                            rpc_static
                                                            rpc_static
                                     dfc.docbroker.service: dmdocbroker
                                                            dmdocbroker
                                                            dmdocbroker
                                                            dmdocbroker
                                                            dmdocbroker
                                                            dmdocbroker
                                     dfc.docbroker.timeout: 0
                                                            0
                                                            0
                                                            0
                                                            0
                                                            0
...

in dump_all_configs
executing select r_object_id from dm_server_config
dumping object with id = 3d00c35080000102
USER ATTRIBUTES

  object_name                     : dmtest73
  title                           : 
  subject                         : 
...
  owner_name                      : dmtest73
...

SYSTEM ATTRIBUTES

  r_object_type                   : dm_server_config
  r_creation_date                 : 7/1/2019 7:26:18 PM
...
  r_creator_name                  : dmtest73
...
  r_server_version                : 7.3.0000.0214  Linux64.Oracle
  r_host_name                     : dmtest.cec
...

executing select r_object_id from dm_docbase_config
dumping object with id = 3c00c35080000103
USER ATTRIBUTES

  object_name                     : dmtest73
  title                           : a v7.3 test repository
...
  acl_domain                      : dmtest73
...
  index_store                     : DM_DMTEST73_INDEX
...

SYSTEM ATTRIBUTES

  r_object_type                   : dm_docbase_config
  r_creation_date                 : 7/1/2019 7:26:18 PM
...
  r_creator_name                  : dmtest73
...
  r_docbase_id                    : 50000
...

This is the dmtest73 docbase with id 50000, like in the iapi example seen before.
Make sure $CLASSPATH includes the dfc.jar, or dctm.jar if going through the manifest in it is preferred.
Next, an example of accessing a docbase with the same name as previously but on another host, using the extended syntax:

$ java -classpath .:/app/dctm/config:$CLASSPATH extendedConnection dmtest73:docker:7489 dmadmin dmadmin

extendedConnection started ...
getSessionExtended
host = docker
port = 7489
using the  docbroker host docker:7489
session config:
                                                  dfc.name: dfc
                                            dfc.config.dir: /app/dctm/config
                                           dfc.config.file: file:/app/dctm/config/dfc.properties
...
                                        dfc.docbroker.host: docker
                                        dfc.docbroker.port: 7489
...
                                    dfc.docbroker.protocol: rpc_static
                                     dfc.docbroker.service: dmdocbroker
                                     dfc.docbroker.timeout: 0
...

in dump_all_configs
executing select r_object_id from dm_server_config
dumping object with id = 3d015f9580000102
USER ATTRIBUTES

  object_name                     : dmtest73
  title                           : 
  subject                         : 
...

SYSTEM ATTRIBUTES

  r_object_type                   : dm_server_config
  r_creation_date                 : 6/20/2021 2:52:36 AM
...
  r_creator_name                  : dmtest73c
...
  r_server_version                : 16.4.0000.0248  Linux64.Oracle
  r_host_name                     : container73
...

executing select r_object_id from dm_docbase_config
dumping object with id = 3c015f9580000103
USER ATTRIBUTES

  object_name                     : dmtest73
  title                           : dmtest73 homonym silently
...
  acl_domain                      : dmtest73c
...
  index_store                     : dm_dmtest73c_index
...

SYSTEM ATTRIBUTES

  r_object_type                   : dm_docbase_config
  r_creation_date                 : 6/20/2021 2:52:36 AM
...
  r_creator_name                  : dmtest73c
...
  r_docbase_id                    : 90005
...

The reached docbase has the 90005, as expected.
As a final example, let’s connect to the default dmtest73 again but this time using the extended syntax:

$ java -classpath .:/app/dctm/config:$CLASSPATH dctmping dmtest73:dmtest.cec:1489 dmadmin dmadmin

extendedConnection started ...

getSessionExtended
host = dmtest.cec
port = 1489
using the  docbroker host dmtest.cec:1489
session config:
                                                  dfc.name: dfc
                                            dfc.config.dir: /app/dctm/config
                                           dfc.config.file: file:/app/dctm/config/dfc.properties
...
                                  dfc.docbroker.debug.host: 
                                  dfc.docbroker.debug.port: 0
..
                                        dfc.docbroker.host: dmtest.cec
                                        dfc.docbroker.port: 1489
..
                                    dfc.docbroker.protocol: rpc_static
                                     dfc.docbroker.service: dmdocbroker
                                     dfc.docbroker.timeout: 0
...

in dump_all_configs
executing select r_object_id from dm_server_config
dumping object with id = 3d00c35080000102
USER ATTRIBUTES

  object_name                     : dmtest73
  title                           : 
  subject                         : 
...
  operator_name                   : dmtest73
...
  web_server_loc                  : dmtest.cec
...

SYSTEM ATTRIBUTES

  r_object_type                   : dm_server_config
  r_creation_date                 : 7/1/2019 7:26:18 PM
...
  r_creator_name                  : dmtest73
...
  r_server_version                : 7.3.0000.0214  Linux64.Oracle
  r_host_name                     : dmtest.cec
...

executing select r_object_id from dm_docbase_config
dumping object with id = 3c00c35080000103
USER ATTRIBUTES

  object_name                     : dmtest73
  title                           : a v7.3 test repository
...
  acl_domain                      : dmtest73
...
  index_store                     : DM_DMTEST73_INDEX
...

SYSTEM ATTRIBUTES

  r_object_type                   : dm_docbase_config
  r_creation_date                 : 7/1/2019 7:26:18 PM
...
  r_creator_name                  : dmtest73
...
  r_docbase_id                    : 50000
...

The reached docbase has id 50000 as expected.
The client config shows here the same output as sessionconfig from within iapi, which shouldn’t come as a surprise given that iapi invokes the DFCs behind the scenes.
As with the alternative in the aforementionned article, this work-around even allows to completely remove the docbroker_host/docbroker_port pairs from the dfc.properties file as it does not use them and imposes its own direct, flattened resolution mechanism (“just contact the damn given docbroker”). This can be verified with a stripped-down dfc.properties file (e.g. /tmp/dfc.properties) with no such entries and invoking the test program thusly:

$ java -Ddfc.properties.file=/tmp/dfc.properties -classpath .:$CLASSPATH extendedConnection dmtest73:docker:7489 dmadmin dmadmin

Of course, in such a case, the default resolution mechanism won’t work if attempted to be used through the normal repository syntax, and the DFCs will complain with an error as shown below:

$ java -Ddfc.properties.file=/tmp/dfc.properties -classpath .:$CLASSPATH extendedConnection dmtest73 dmadmin dmadmin

extendedConnection started ...
getSessionExtended
docbroker host is empty, using the dfc.properties
using the the dfc.properties
Error while working in the docbase dmtest73 as user dmadmin
[DM_DOCBROKER_E_NO_DOCBROKERS]error:  "No DocBrokers are configured"
DfServiceException:: THREAD: main; MSG: [DM_DOCBROKER_E_NO_DOCBROKERS]error:  "No DocBrokers are configured"; ERRORCODE: 100; NEXT: null
	at com.documentum.fc.client.DfServiceException.newNoDocbrokersException(DfServiceException.java:44)
...
Conclusion

Although this work-around smells like a hack, it is a very effective one. From within iapi or any DFCs client with a slightly smarter connection function, this recurrent yet pesky limitation has finally found a solution. However, while it is acceptable for in-house development where source code is available, third-party developers might not want to bother with an ad hoc customization to fix a niche problem; so some per$ua$ion work might be needed here too. Let’s hope that OpenText, in breaking with a 30 years-old code immobilism in this area, will delight us soon with a really transparent solution which, who knows, supports an enhanced repository syntax such as the proposed one.

Cet article Connecting to Repositories with the Same Name and/or ID est apparu en premier sur Blog dbi services.

SQL Server SCOM Alert Analysis: The Average Wait Time of SQL instance “xxx” on computer “xxx” is too high

Mon, 2021-06-28 04:19

I have an interested alert on SCOM for SQL Server by a customer two weeks ago.
In the SCOM interface, the warning on the server is MSSQL2016: The Average Wait Time of SQL instance “xxx” on computer “xxx” is too high.
After right-click on the properties, go to the tab Alert Context. On this part, you can see that the Object Name is MSSQL$xxx:Locks.

My first step is to go in the Reports>Standard Reports> Performance Dashboard to have more information

In this reports, the first graph is about CPU usage and the second one about the waits.
In my case, we can see that one wait is very very high: Lock.
In the Historical Information, click on waits to have the details.

With the Detail, we see two main locks: LCK_M_IS and LCK_M_S:

LCK_M_IS: This wait type is when a thread is waiting to acquire an Intent Shared lock on a resource and there is at least one other lock in an incompatible mode granted on the resource to a different thread. link here for more detail.

LCK_M_S: This wait type is when a thread is waiting to acquire a Shared lock on a resource and there is at least one other lock in an incompatible mode granted on the resource to a different thread. link here for more detail.

After searching the object, I see the database and analyse the queries on it.
To resolve this issue, I add 3GB memory and add a new index on the table with the queries impacting these two locks.

One week later, no more locks!

I hope this blog will help you to analyse also through the dashboard how to fix a lock problem.

 

Cet article SQL Server SCOM Alert Analysis: The Average Wait Time of SQL instance “xxx” on computer “xxx” is too high est apparu en premier sur Blog dbi services.

Reduce the complexity: be sharing and open source

Sun, 2021-06-27 15:26
By Franck Pachot

.
I thought that my last post on this blog was the previous one but I blog when it comes to my mind – nothing planned. And two things got me to write this. First, last week we got our 20th dbi xChange where we meet with colleagues. And I love discussing with colleagues about what we see, and foresee, in our jobs. The second one is:
https://twitter.com/FranckPachot/status/1409084362637942786?s=20
The web.archive.org is full of gems like The Law of Conservation of Complexity from Daniel Fink written in 2008. As it is not about one technology, but methodology, it is still relevant today.

The best you can do in IT is avoid to increase the complexity of the system. If this is not your main goal, you may find short-term solutions. or workarounds, or quick wins, but increase the technological debt. More complexity means more bugs, less agility for future evolution, poor performance because of additional layers and latency, and additional costs. If you add a cache on top of a system, do you do it to decrease complexity (it would require more code or hardware to get same performance without the cache) or increase it (the problem is in application design and you add a new component to hide it)? If you move from RDBMS to NoSQL, do you reduce the complexity (you don’t need consistency and access to small items only) or increase it (you will have to write more code to get consistent data sets)? If you split your monolith to microservices, do you reduce complexity (simplify the regression tests and deployment procedures) or increase it (have to stream data between many services and try to get consistent eventually)? Those are just examples. And think about complexity at all levels. Modern full-stack developers require so many skills… think about the hiring complexity as well.

The The Law of Conservation of Complexity mentions that you can decrease the complexity by understanding how system works and by knowledge sharing. I agree on both. My next job is developer advocate: help the users understand how the system works, and help the developers understand how their system is used. Without this role, the systems tend to over-engineer features that may not even been used. And users tend to interact with the system in a sub-optimal way and add workarounds in their code. By system, in my case, I’m referring to the YugabyteDB distributed database, as that’s my next job. SQL can reduce the complexity of code because there are many things that you don’t have to do in the application (like integrity constraints, lock management,…). YugabyteDB is PostgreSQL compatible, well known to provide extensible features while limiting the complexity. And consensus protocols can reduce the complexity of auto-scaling, sharding, HA and DR with a simple definition: replication factor and AZ/region placement. In addition to those, YugabyteDB is Open Source, which also reduces complexity. First because all is documented publicly. And also because opening your code to contribution forces you to making it simple to understand and evolve.

I like Oracle because, even if closed source, I can have a good knowledge on how the system works. Thanks to the many troubleshooting tools and instrumentation, the quality of the documentation, and the great community documenting, for years, how it works. I’m thinking about OakTable, and Oracle ACE there, and many others. This will stay for sure, but the Oracle database is also very complex. The new challenge is in distributed databases, which mimics the NoSQL scalability, but with full SQL support for reliable OLTP. My main goal, as developer advocate at Yugabyte, will be to reduce the complexity by knowledge sharing: help you know, use, and maybe contribute to YugabyteDB. I’ve put all links to follow this journey in the previous post.

Cet article Reduce the complexity: be sharing and open source est apparu en premier sur Blog dbi services.

Kubernetes : from Docker to containerd

Tue, 2021-06-22 02:02

Kubernetes is very popular and well-known open-source orchestration platform for containers. Very dynamic ecosystem, the release cycle is also very fast.
Six months passed since the K8s team announced the dockershim will be deprecated and the 1.22 release deadline is approaching.
As they informed in their quite detailed post in early Dec. 2020, Docker support will be removed in 1.22 : https://kubernetes.io/blog/2020/12/02/dont-panic-kubernetes-and-docker/
We are here today in order to see how we can migrate, with minimal overhead, a Kubernetes cluster setup to use Docker as container runtime into a up-to-date cluster using containerd instead, and ready for future releases.

From Docker to containerd migration

This cluster is setup on latest Ubuntu LTS release, 20.04.2, but steps described here are easily adaptable to other Linux flavors, like RedHat or CentOS.

We are going to proceed first with the workers and then the master.

Let’s drain the worker1.

Move to worker1 now.
When you installed Docker on your nodes, containerd was installed too.
To confirm that statement, you can try to check the containerd service status.

Before doing any activity, stop kubelet :

When installed through Docker, containerd got its CRI plugin disabled.
Update its config file accordingly which sits in /etc/containerd/config.toml and comment the line containing disabled_plugins = [“cri”]

Once the configuration is updated, restart the containerd service.

Now it’s time to update kubelet to tell him to use containerd as CRI runtime instead of Docker. For that purpose, two arguments were implemented.
When you navigate to the kubelet reference page on the official website, you’ll get a better overview : https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet

We would need to update both (note kubelet defaulting to Docker !)

New arguments values would be for containerd :
--container-runtime=remote --container-runtime-endpoint=unix:///run/containerd/containerd.sock

Those arguments are defined the kubelet drop-in file for systemd. Open the /etc/systemd/system/kubelet.service.d/10-kubeadm.conf
Check the content. On default installations, the KUBELET_EXTRA_ARGS is not defined, you would need to add it as a new line in the file. Your updated configuration file should be similar to the following :

Inform systemd we’ve updated the configuration and restart the kubelet.

If the systemctl daemon-reload is not performed, you’ll be warned by the system:

To summarize : we’ve updated containerd to be ready to be used as CRI runtime, we’ve told kubelet to use containerd instead of Docker. Time is now to switch back to the master, check if things have changed and move on !

Surprise ! Well, not a suprise because we expected this : Kubernetes is reporting now that our worker1 node is running on containerd instead of Docker. Pay attention to the last column.

We are ready now to bring back the worker1 into the cluster. For that, you need to uncordon :

Repeat this with worker2, and then master, and you’ll be ready.

Our cluster is fully operational, using containerd :

Leaving Docker installed ?

Note that with the activities we performed here, we left Docker installed on each nodes. containerd was installed as a part of the installation of Docker.
If you plan to remove Docker completely, you would need to install containerd as a standalone package. This can be done using package through the standard Docker repository.
Another solution is to download containerd binaries from the official website: https://containerd.io/downloads/
In that case, you may need to generate the containerd config file using the command:
containerd config default > /etc/containerd/config.toml

The ctr command

Now we don’t use anymore docker, the docker cli command won’t list any container. By using containerd, we can now use the ctr command, the containerd cli tool used to interact with containerd.
Like with docker, with the ctr command, you can list / create / delete containers, import / export / pull images, and many more.

Note the namespace terminology here. Namespaces in containerd are not the same namespaces used in Kubernetes.

With ctr, you can also list pulled images:

 

By migrating your cluster away from docker, you’ll be protected against the deprecation of one of the core component, scheduled in the upcoming release.
But containerd also provide additional flexibility.
In a second episode, we will learn how to setup Kubernetes to use containerd, with not only runC as OCI runtime but also KataContainers.
Stay connected !

Cet article Kubernetes : from Docker to containerd est apparu en premier sur Blog dbi services.

See you on Polywork (an new LinkedIn?)

Mon, 2021-06-21 14:57
By Franck Pachot

.
I have published 105 additional blog posts here since the 500th one https://blog.dbi-services.com/rollback-to-savepoint/ that I have written on my come back to dbi-services. On Oracle Database, of course, other databases as well, on clouds (OCI and AWS mainly, with a look at GCP and Azure). But also with some thoughts on the rise of the “scale-out” obsession of NoSQL. And, because I’m convinced that consistency, referential integrity, and SQL is a must for an OLTP database, I’ve been looking at the new world of scalable data: the distributed databases based on the Google Spanner architecture.

I’m now going further on that, taking a big turn on my career: no more consulting for me but still helping users. This also mean I’m leaving dbi services, which I still consider as the best consulting company I’ve encountered, for the customers as well as the consultants. Growing, and hiring. And this also mean that this is, once again, the last post I write on this blog. My next posts will be on http://blog.pachot.net which currently re-directs to dev.to. And of course on the Yugabyte blog, slack, StackOverflow… because this is where I’m moving: as a developer advocate for this NewSQL open-source distributed database, compatible with PostgreSQL.

I’ve written about this decision on LinkedIn: https://www.linkedin.com/pulse/why-ill-join-yugabyte-1st-july-franck-pachot
I’m also writing this post to mention one more place where I’ll be sharing and encourage to do so: https://www.polywork.com/franckpachot. Polywork is a new professional social network that competes with LinkedIn, based on less noise and more activity sharing.
Want to give a try? I have a few invites. If you want one, follow and DM me on Twitter https://twitter.com/FranckPachot which is still my main() place of sharing.

Cet article See you on Polywork (an new LinkedIn?) est apparu en premier sur Blog dbi services.

Pages