Yann Neuhaus

Subscribe to Yann Neuhaus feed
dbi services technical blog
Updated: 28 min 59 sec ago

Elasticsearch, Kibana, Logstash and Filebeat – Centralize all your database logs (and even more)

Wed, 2016-07-27 09:02

When it comes to centralizing logs of various sources (operating systems, databases, webservers, etc.) the ELK stack is becoming more and more popular in the open source world. ELK stands for Elasticsearch, Logstash and Kibana. Elasticsearch is based on Apache Lucene and the primary goal is to provide distributed search and analytic functions. Logstash is responsible to collect logs from a variety of systems and is able to forward these to Elasticsearch. Kibana is the data visualization platform on top of Elasticsearch. Nowadays the term ELK seems not be used anymore and people speak about the Elastic Stack. In this post I’ll look at how you can use these tools to centralize your PostgreSQL log file(s) into the Elastic Stack.

As usual my VMs are running CentOS although that should no be very important for the following (except for the yum commands). As Elasticsearch, Kibana and Logstash are all based on Java you’ll need to install java before starting. There are yum and apt repositories available but I’ll use the manual way for getting the pieces up and running.

The goal is to have one VM running Elasticsearch, Logstash and Kibana and another VM which will run the PostgreSQL instance and Filebeat. Lets start with the first one by installing java and setting up a dedicated user for running the stack:

[root@elk ~]# yum install -y java-1.8.0-openjdk
[root@elk ~]# groupadd elk
[root@elk ~]# useradd -g elk elk
[root@elk ~]# passwd elk
[root@elk ~]# mkdir -p /opt/elk
[root@elk ~]# chown elk:elk /opt/elk
[root@elk ~]# su - elk
[elk@elk ~]$ cd /opt/elk

The first of the products we’re going to install is Elasticsearch which is quite easy (we’ll not set up a distributed mode, only single node for the scope of this post). All we need to do is to download the tar file, extract and start:

[elk@elk ~]$ wget https://download.elastic.co/elasticsearch/release/org/elasticsearch/distribution/tar/elasticsearch/2.3.4/elasticsearch-2.3.4.tar.gz
[elk@elk elk]$ tar -axf elasticsearch-2.3.4.tar.gz
[elk@elk elk]$ ls -l
total 26908
drwxrwxr-x. 6 elk elk     4096 Jul 27 09:17 elasticsearch-2.3.4
-rw-rw-r--. 1 elk elk 27547169 Jul  7 15:05 elasticsearch-2.3.4.tar.gz
[elk@elk elk]$ cd elasticsearch-2.3.4
[elk@elk elasticsearch-2.3.4]$ bin/elasticsearch &

This will start up Elasticsearch and print some messages to the screen:

[2016-07-27 09:20:10,529][INFO ][node                     ] [Shinchuko Lotus] version[2.3.4], pid[10112], build[e455fd0/2016-06-30T11:24:31Z]
[2016-07-27 09:20:10,534][INFO ][node                     ] [Shinchuko Lotus] initializing ...
[2016-07-27 09:20:11,090][INFO ][plugins                  ] [Shinchuko Lotus] modules [reindex, lang-expression, lang-groovy], plugins [], sites []
[2016-07-27 09:20:11,114][INFO ][env                      ] [Shinchuko Lotus] using [1] data paths, mounts [[/ (rootfs)]], net usable_space [46.8gb], net total_space [48.4gb], spins? [unknown], types [rootfs]
[2016-07-27 09:20:11,115][INFO ][env                      ] [Shinchuko Lotus] heap size [1015.6mb], compressed ordinary object pointers [true]
[2016-07-27 09:20:11,115][WARN ][env                      ] [Shinchuko Lotus] max file descriptors [4096] for elasticsearch process likely too low, consider increasing to at least [65536]
[2016-07-27 09:20:12,637][INFO ][node                     ] [Shinchuko Lotus] initialized
[2016-07-27 09:20:12,637][INFO ][node                     ] [Shinchuko Lotus] starting ...
[2016-07-27 09:20:12,686][INFO ][transport                ] [Shinchuko Lotus] publish_address {127.0.0.1:9300}, bound_addresses {[::1]:9300}, {127.0.0.1:9300}
[2016-07-27 09:20:12,690][INFO ][discovery                ] [Shinchuko Lotus] elasticsearch/zc26XSa5SA-f_Kvm_jfthA
[2016-07-27 09:20:15,769][INFO ][cluster.service          ] [Shinchuko Lotus] new_master {Shinchuko Lotus}{zc26XSa5SA-f_Kvm_jfthA}{127.0.0.1}{127.0.0.1:9300}, reason: zen-disco-join(elected_as_master, [0] joins received)
[2016-07-27 09:20:15,800][INFO ][gateway                  ] [Shinchuko Lotus] recovered [0] indices into cluster_state
[2016-07-27 09:20:15,803][INFO ][http                     ] [Shinchuko Lotus] publish_address {127.0.0.1:9200}, bound_addresses {[::1]:9200}, {127.0.0.1:9200}
[2016-07-27 09:20:15,803][INFO ][node                     ] [Shinchuko Lotus] started

The default port is 9200 and you should now be able to talk to Elasticsearch:

[elk@elk elasticsearch-2.3.4]$ curl -X GET http://localhost:9200/
{
  "name" : "Shinchuko Lotus",
  "cluster_name" : "elasticsearch",
  "version" : {
    "number" : "2.3.4",
    "build_hash" : "e455fd0c13dceca8dbbdbb1665d068ae55dabe3f",
    "build_timestamp" : "2016-06-30T11:24:31Z",
    "build_snapshot" : false,
    "lucene_version" : "5.5.0"
  },
  "tagline" : "You Know, for Search"
}

Looks good. The next product we’ll need to install is Kibana. The setup itself is as easy as setting up Elasticsearch:

[elk@elk elasticsearch-2.3.4]$ cd /opt/elk/
[elk@elk elk]$ wget https://download.elastic.co/kibana/kibana/kibana-4.5.3-linux-x64.tar.gz
[elk@elk elk]$ tar -axf kibana-4.5.3-linux-x64.tar.gz
[elk@elk elk]$ cd kibana-4.5.3-linux-x64
[elk@elk kibana-4.5.3-linux-x64]$ grep elasticsearch.url config/kibana.yml 
elasticsearch.url: "http://localhost:9200"
[elk@elk kibana-4.5.3-linux-x64]$ bin/kibana &

Similar to Elasticsearch the startup messages are written to the screen:

  log   [09:27:30.208] [info][status][plugin:kibana] Status changed from uninitialized to green - Ready
  log   [09:27:30.237] [info][status][plugin:elasticsearch] Status changed from uninitialized to yellow - Waiting for Elasticsearch
  log   [09:27:30.239] [info][status][plugin:kbn_vislib_vis_types] Status changed from uninitialized to green - Ready
  log   [09:27:30.242] [info][status][plugin:markdown_vis] Status changed from uninitialized to green - Ready
  log   [09:27:30.253] [info][status][plugin:metric_vis] Status changed from uninitialized to green - Ready
  log   [09:27:30.257] [info][status][plugin:spyModes] Status changed from uninitialized to green - Ready
  log   [09:27:30.261] [info][status][plugin:statusPage] Status changed from uninitialized to green - Ready
  log   [09:27:30.263] [info][status][plugin:table_vis] Status changed from uninitialized to green - Ready
  log   [09:27:30.270] [info][listening] Server running at http://0.0.0.0:5601
  log   [09:27:35.320] [info][status][plugin:elasticsearch] Status changed from yellow to yellow - No existing Kibana index found
[2016-07-27 09:27:35,513][INFO ][cluster.metadata         ] [Shinchuko Lotus] [.kibana] creating index, cause [api], templates [], shards [1]/[1], mappings [config]
[2016-07-27 09:27:35,938][INFO ][cluster.routing.allocation] [Shinchuko Lotus] Cluster health status changed from [RED] to [YELLOW] (reason: [shards started [[.kibana][0]] ...]).
  log   [09:27:38.746] [info][status][plugin:elasticsearch] Status changed from yellow to green - Kibana index ready

To check if Kibana is really working point your browser to http://[hostname]:5601 (192.168.22.173 in my case):
kibana_01

The third product we’ll need is Logstash. Is is almost the same procedure for getting it up and running:

[elk@elk kibana-4.5.3-linux-x64]$ cd /opt/elk/
[elk@elk elk]$ wget https://download.elastic.co/logstash/logstash/logstash-all-plugins-2.3.4.tar.gz
[elk@elk elk]$ tar -axf logstash-all-plugins-2.3.4.tar.gz
[elk@elk elk]$ cd logstash-2.3.4

To test if Logstash is running fine start a very simple pipeline:

[elk@elk logstash-2.3.4]$ bin/logstash -e 'input { stdin { } } output { stdout {} }'
Settings: Default pipeline workers: 1
Pipeline main started

Once this is up type something on the command line to check if Logstash is responding:

yipphea
2016-07-27T07:52:43.607Z elk yipphea

Looks good as well. For now we can stop Logstash again by “Control-c”:

^CSIGINT received. Shutting down the agent. {:level=>:warn}
stopping pipeline {:id=>"main"}
Received shutdown signal, but pipeline is still waiting for in-flight events
to be processed. Sending another ^C will force quit Logstash, but this may cause data loss. {:level=>:warn}

Pipeline main has been shutdown

Now we need to do some configuration to prepare Logtsash for receiving our PostgreSQL log file(s) through Filebeat. Filebeat will be responsible to forward the PostgreSQL log file(s) to Logstash.

[elk@elk logstash-2.3.4]$ pwd
/opt/elk/logstash-2.3.4
[elk@elk logstash-2.3.4]$ mkdir conf.d
[elk@elk logstash-2.3.4]$ cat conf.d/02-beats-input.conf
input {
  beats {
    port => 5044
    ssl => false
  }
}

What this is doing is telling Logstash that it shall create a new input of type beats and listen on port 5044 for incoming data. In addition to this input plugin Logstash will need an ouput plugin to know what it shall do with data coming in. As we want to send all the data to Elasticsearch we need to specify this:

[elk@elk logstash-2.3.4]$ cat conf.d/10-elasticsearch-output.conf
output {
  elasticsearch {
    hosts => ["localhost:9200"]
    sniffing => true
    manage_template => false
    index => "%{[@metadata][beat]}-%{+YYYY.MM.dd}"
    document_type => "%{[@metadata][type]}"
  }
}

Lets test if the configuration is fine (Logstash will read all configuration files in order):

[elk@elk logstash-2.3.4]$ bin/logstash --config /opt/elk/logstash-2.3.4/conf.d/ --configtest
Configuration OK

As all seems fine we can start Logstash with our new configuration:

[elk@elk logstash-2.3.4]$ bin/logstash --config /opt/elk/logstash-2.3.4/conf.d/ &
Settings: Default pipeline workers: 1
Pipeline main started

For being able to easily use the Filebeat index patterns in Kibana we’ll load the template dashboards provided by Elastic:

[elk@elk logstash-2.3.4]$ cd /opt/elk/
[elk@elk elk]$ wget http://download.elastic.co/beats/dashboards/beats-dashboards-1.2.3.zip
[elk@elk elk]$ unzip beats-dashboards-1.2.3.zip
[elk@elk elk]$ cd beats-dashboards-1.2.3
[elk@elk beats-dashboards-1.2.3]$ ./load.sh

Time to switch to the PostgreSQL VM to install Filebeat:

postgres@centos7:/u01/app/postgres/product/ [PG1] pwd
/u01/app/postgres/product
postgres@centos7:/u01/app/postgres/product/ [PG1] wget https://download.elastic.co/beats/filebeat/filebeat-1.2.3-x86_64.tar.gz
postgres@centos7:/u01/app/postgres/product/ [PG1] tar -axf filebeat-1.2.3-x86_64.tar.gz

Filebeat comes with an index template for Elasticsearch which we will now need to transfer to the host where Elasticsearch runs on for being able to load it:

postgres@centos7:/u01/app/postgres/product/filebeat-1.2.3-x86_64/ [PG1] ls *template*
filebeat.template.json
postgres@centos7:/u01/app/postgres/product/filebeat-1.2.3-x86_64/ [PG1] scp filebeat.template.json elk@192.168.22.173:/var/tmp/
elk@192.168.22.173's password: 
filebeat.template.json                                                      100%  814     0.8KB/s   00:00  

Locally on the host where Elasticsearch runs on we can now load the template into Elasticsearch:

[elk@elk elk]$ curl -XPUT 'http://localhost:9200/_template/filebeat' -d@/var/tmp/filebeat.template.json
{"acknowledged":true}
[elk@elk elk]$ 

Back to the PostgreSQL VM we need to configure Filebeat itself by adapting the filebeat.yml configuration file:

postgres@centos7:/u01/app/postgres/product/filebeat-1.2.3-x86_64/ [PG1] ls
filebeat  filebeat.template.json  filebeat.yml

There are only a few important points to configure. The first one is to tell Filebeat where to look for the PostgreSQL log files:

filebeat:
  # List of prospectors to fetch data.
  prospectors:
    # Each - is a prospector. Below are the prospector specific configurations
    -
      # Paths that should be crawled and fetched. Glob based paths.
      # To fetch all ".log" files from a specific level of subdirectories
      # /var/log/*/*.log can be used.
      # For each file found under this path, a harvester is started.
      # Make sure not file is defined twice as this can lead to unexpected behaviour.
      paths:
        - /u02/pgdata/PG1/pg_log/*.log

Afterwards make sure you disable/uncomment the Elasticsearch ouput plugin under the “output” section:

  ### Elasticsearch as output
  #elasticsearch:
    # Array of hosts to connect to.
    # Scheme and port can be left out and will be set to the default (http and 9200)
    # In case you specify and additional path, the scheme is required: http://localhost:9200/path
    # IPv6 addresses should always be defined as: https://[2001:db8::1]:9200
    #hosts: ["localhost:9200"]

Finally enable the Logstash output plugin in the same section (provide the host and port where you Logstash is running):

       ### Logstash as output
  logstash:
    # The Logstash hosts
    hosts: ["192.168.22.173:5044"]

The host and port specified here must match to what we specified in 02-beats-input.conf when we configured Logstash above. This should be sufficiant to startup Filebeat:

postgres@centos7:/u01/app/postgres/product/filebeat-1.2.3-x86_64/ [PG1] ./filebeat &

If everything is working fine we should now be able to ask Elasticsearch for our data from the PostgreSQL log file(s):

[elk@elk elk]$ curl -XGET 'http://localhost:9200/filebeat-*/_search?pretty'
{
  "took" : 4,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 971,
    "max_score" : 1.0,
    "hits" : [ {
      "_index" : "filebeat-2016.07.27",
      "_type" : "log",
      "_id" : "AVYrey_1IfCoJOBMaP1F",
      "_score" : 1.0,
      "_source" : {
        "message" : "2016-05-15 06:57:34.030 CEST - 2 - 20831 -  - @ LOG:  MultiXact member wraparound protections are now enabled",
        "@version" : "1",
        "@timestamp" : "2016-07-27T08:31:44.940Z",
        "source" : "/u02/pgdata/PG1/pg_log/postgresql-Sun.log",
        "type" : "log",
        "input_type" : "log",
        "fields" : null,
        "count" : 1,
        "beat" : {
          "hostname" : "centos7.local",
          "name" : "centos7.local"
        },
        "offset" : 112,
        "host" : "centos7.local",
        "tags" : [ "beats_input_codec_plain_applied" ]
      }
    }, {
      "_index" : "filebeat-2016.07.27",
      "_type" : "log",
      "_id" : "AVYrey_1IfCoJOBMaP1M",
      "_score" : 1.0,
      "_source" : {
        "message" : "2016-05-15 07:20:46.060 CEST - 2 - 20835 -  - @ LOG:  autovacuum launcher shutting down",
        "@version" : "1",
        "@timestamp" : "2016-07-27T08:31:44.940Z",
        "fields" : null,
        "beat" : {
          "hostname" : "centos7.local",
          "name" : "centos7.local"
        },
        "source" : "/u02/pgdata/PG1/pg_log/postgresql-Sun.log",
        "offset" : 948,
        "type" : "log",
        "count" : 1,
        "input_type" : "log",
        "host" : "centos7.local",
        "tags" : [ "beats_input_codec_plain_applied" ]
      }
    }, {
      "_index" : "filebeat-2016.07.27",
      "_type" : "log",
      "_id" : "AVYrey_1IfCoJOBMaP1P",
      "_score" : 1.0,
      "_source" : {
        "message" : "2016-05-15 07:20:46.919 CEST - 5 - 20832 -  - @ LOG:  checkpoint complete: wrote 0 buffers (0.0%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=0.000 s, sync=0.000 s, total=0.678 s; sync files=0, longest=0.000 s, average=0.000 s; distance=10908 kB, estimate=10908 kB",
        "@version" : "1",
        "@timestamp" : "2016-07-27T08:31:44.940Z",
        "source" : "/u02/pgdata/PG1/pg_log/postgresql-Sun.log",
        "input_type" : "log",
        "count" : 1,
        "beat" : {
          "hostname" : "centos7.local",
          "name" : "centos7.local"
        },
        "type" : "log",
        "fields" : null,
        "offset" : 1198,
        "host" : "centos7.local",
        "tags" : [ "beats_input_codec_plain_applied" ]
      }
    }, {
      "_index" : "filebeat-2016.07.27",
      "_type" : "log",
      "_id" : "AVYrey_1IfCoJOBMaP1R",
      "_score" : 1.0,
      "_source" : {
        "message" : "2016-05-15 09:36:34.600 CEST - 1 - 2878 -  - @ LOG:  database system was shut down at 2016-05-15 07:20:46 CEST",
        "@version" : "1",
        "@timestamp" : "2016-07-27T08:31:44.940Z",
        "beat" : {
          "hostname" : "centos7.local",
          "name" : "centos7.local"
        },
        "offset" : 1565,
        "input_type" : "log",
        "count" : 1,
        "fields" : null,
        "source" : "/u02/pgdata/PG1/pg_log/postgresql-Sun.log",
        "type" : "log",
        "host" : "centos7.local",
        "tags" : [ "beats_input_codec_plain_applied" ]
      }
    }, {
      "_index" : "filebeat-2016.07.27",
      "_type" : "log",
      "_id" : "AVYrey_1IfCoJOBMaP1X",
      "_score" : 1.0,
      "_source" : {
        "message" : "2016-05-15 09:39:21.313 CEST - 3 - 3048 - [local] - postgres@postgres STATEMENT:  insert into t1 generate_series(1,1000000);",
        "@version" : "1",
        "@timestamp" : "2016-07-27T08:31:44.940Z",
        "offset" : 2216,
        "type" : "log",
        "input_type" : "log",
        "count" : 1,
        "fields" : null,
        "beat" : {
          "hostname" : "centos7.local",
          "name" : "centos7.local"
        },
        "source" : "/u02/pgdata/PG1/pg_log/postgresql-Sun.log",
        "host" : "centos7.local",
        "tags" : [ "beats_input_codec_plain_applied" ]
      }
    }, {
      "_index" : "filebeat-2016.07.27",
      "_type" : "log",
      "_id" : "AVYrey_1IfCoJOBMaP1e",
      "_score" : 1.0,
      "_source" : {
        "message" : "2016-05-15 09:43:24.366 CEST - 3 - 3397 - [local] - postgres@postgres CONTEXT:  while updating tuple (0,1) in relation \"t1\"",
        "@version" : "1",
        "@timestamp" : "2016-07-27T08:31:44.940Z",
        "offset" : 3165,
        "type" : "log",
        "input_type" : "log",
        "count" : 1,
        "fields" : null,
        "beat" : {
          "hostname" : "centos7.local",
          "name" : "centos7.local"
        },
        "source" : "/u02/pgdata/PG1/pg_log/postgresql-Sun.log",
        "host" : "centos7.local",
        "tags" : [ "beats_input_codec_plain_applied" ]
      }
    }, {
      "_index" : "filebeat-2016.07.27",
      "_type" : "log",
      "_id" : "AVYrey_1IfCoJOBMaP1l",
      "_score" : 1.0,
      "_source" : {
        "message" : "2016-05-15 09:43:46.776 CEST - 10 - 3397 - [local] - postgres@postgres CONTEXT:  while updating tuple (0,1) in relation \"t1\"",
        "@version" : "1",
        "@timestamp" : "2016-07-27T08:31:44.940Z",
        "beat" : {
          "hostname" : "centos7.local",
          "name" : "centos7.local"
        },
        "offset" : 4045,
        "type" : "log",
        "count" : 1,
        "source" : "/u02/pgdata/PG1/pg_log/postgresql-Sun.log",
        "input_type" : "log",
        "fields" : null,
        "host" : "centos7.local",
        "tags" : [ "beats_input_codec_plain_applied" ]
      }
    }, {
      "_index" : "filebeat-2016.07.27",
      "_type" : "log",
      "_id" : "AVYrey_1IfCoJOBMaP1r",
      "_score" : 1.0,
      "_source" : {
        "message" : "2016-05-15 09:45:39.837 CEST - 9 - 3048 - [local] - postgres@postgres ERROR:  type \"b\" does not exist at character 28",
        "@version" : "1",
        "@timestamp" : "2016-07-27T08:31:44.940Z",
        "input_type" : "log",
        "count" : 1,
        "beat" : {
          "hostname" : "centos7.local",
          "name" : "centos7.local"
        },
        "source" : "/u02/pgdata/PG1/pg_log/postgresql-Sun.log",
        "offset" : 4799,
        "type" : "log",
        "fields" : null,
        "host" : "centos7.local",
        "tags" : [ "beats_input_codec_plain_applied" ]
      }
    }, {
      "_index" : "filebeat-2016.07.27",
      "_type" : "log",
      "_id" : "AVYrey_1IfCoJOBMaP1w",
      "_score" : 1.0,
      "_source" : {
        "message" : "2016-05-15 09:45:49.843 CEST - 14 - 3048 - [local] - postgres@postgres ERROR:  current transaction is aborted, commands ignored until end of transaction block",
        "@version" : "1",
        "@timestamp" : "2016-07-27T08:31:44.940Z",
        "type" : "log",
        "fields" : null,
        "beat" : {
          "hostname" : "centos7.local",
          "name" : "centos7.local"
        },
        "source" : "/u02/pgdata/PG1/pg_log/postgresql-Sun.log",
        "offset" : 5400,
        "input_type" : "log",
        "count" : 1,
        "host" : "centos7.local",
        "tags" : [ "beats_input_codec_plain_applied" ]
      }
    }, {
      "_index" : "filebeat-2016.07.27",
      "_type" : "log",
      "_id" : "AVYrey_1IfCoJOBMaP1x",
      "_score" : 1.0,
      "_source" : {
        "message" : "2016-05-15 09:45:49.843 CEST - 15 - 3048 - [local] - postgres@postgres STATEMENT:  alter table t1 add column b int;",
        "@version" : "1",
        "@timestamp" : "2016-07-27T08:31:44.940Z",
        "offset" : 5559,
        "count" : 1,
        "fields" : null,
        "beat" : {
          "hostname" : "centos7.local",
          "name" : "centos7.local"
        },
        "source" : "/u02/pgdata/PG1/pg_log/postgresql-Sun.log",
        "type" : "log",
        "input_type" : "log",
        "host" : "centos7.local",
        "tags" : [ "beats_input_codec_plain_applied" ]
      }
    } ]
  }
}

Quite a lot of information, so it is really working :) When we can ask Elasticsearch we should be able to use Kibana on the same data, too, shouldn’t we? Fire up your browser and point it to your Kibana URL (192.168.22.173:5601 in my case). You should see the “filebeat-*” index pattern in the upper left:

kibana_02

Select the “filebeat-*” index pattern:
kibana_03

To make this index the default one click on the green star:
kibana_04

Time to dicover our data by using the “Discover” menu on the top:
kibana_05

The result of that should be that you can see all the PostgreSQL log messages on the screen:
kibana_06

Try to search something, e.g. “checkpoint complete”:
kibana_07

Not much happening on my instance so lets do some checkpoints:

postgres@centos7:/u01/app/postgres/product/filebeat-1.2.3-x86_64/ [PG1] sqh
psql (9.6beta1 dbi services build)
Type "help" for help.

(postgres@[local]:5432) [postgres] > checkpoint;
CHECKPOINT
Time: 100.403 ms
(postgres@[local]:5432) [postgres] > checkpoint;
CHECKPOINT
Time: 100.379 ms
(postgres@[local]:5432) [postgres] > checkpoint;
CHECKPOINT
Time: 100.364 ms
(postgres@[local]:5432) [postgres] > checkpoint;
CHECKPOINT
Time: 100.321 ms
(postgres@[local]:5432) [postgres] > checkpoint;
CHECKPOINT
Time: 100.370 ms
(postgres@[local]:5432) [postgres] > checkpoint;
CHECKPOINT
Time: 100.282 ms
(postgres@[local]:5432) [postgres] > checkpoint;
CHECKPOINT
Time: 100.411 ms
(postgres@[local]:5432) [postgres] > checkpoint;
CHECKPOINT
Time: 101.166 ms
(postgres@[local]:5432) [postgres] > checkpoint;
CHECKPOINT
Time: 100.392 ms
(postgres@[local]:5432) [postgres] > checkpoint;
CHECKPOINT
Time: 100.322 ms
(postgres@[local]:5432) [postgres] > checkpoint;
CHECKPOINT
Time: 100.367 ms
(postgres@[local]:5432) [postgres] > checkpoint;
CHECKPOINT
Time: 100.320 ms
(postgres@[local]:5432) [postgres] > checkpoint;
CHECKPOINT
Time: 100.328 ms
(postgres@[local]:5432) [postgres] > checkpoint;
CHECKPOINT
Time: 100.285 ms

What is the picture now:
kibana_08

Isn’t that great? All the information near realtime, just by using a browser. In my case logs are coming from a single PostreSQL instance but logs could be coming from hundreds of instances. Logs could also be coming from webservers, application servers, operating system, network, … . All centralized in one place ready to analyze. Even better you could use Watcher to alert on changes on your data.

Ah, I can already hear it: But I need to see my performance metrics as well. No problem, there is the jdbc input plugin for Logstash. What can you do with that? Once configured you can query what ever you want from your database. Lets do a little demo.

As we downloaded Logstash with all plugins included already the jdbc input plugin is already there:

[elk@elk logstash-2.3.4]$ pwd
/opt/elk/logstash-2.3.4
[elk@elk logstash-2.3.4]$ find . -name *jdbc*
./vendor/bundle/jruby/1.9/gems/jdbc-sqlite3-3.8.11.2
./vendor/bundle/jruby/1.9/gems/jdbc-sqlite3-3.8.11.2/jdbc-sqlite3.gemspec
./vendor/bundle/jruby/1.9/gems/jdbc-sqlite3-3.8.11.2/lib/jdbc
./vendor/bundle/jruby/1.9/gems/jdbc-sqlite3-3.8.11.2/lib/sqlite-jdbc-3.8.11.2.jar
./vendor/bundle/jruby/1.9/gems/logstash-input-jdbc-3.1.0
./vendor/bundle/jruby/1.9/gems/logstash-input-jdbc-3.1.0/lib/logstash/inputs/jdbc.rb
./vendor/bundle/jruby/1.9/gems/logstash-input-jdbc-3.1.0/lib/logstash/plugin_mixins/jdbc.rb
./vendor/bundle/jruby/1.9/gems/logstash-input-jdbc-3.1.0/logstash-input-jdbc.gemspec
./vendor/bundle/jruby/1.9/gems/sequel-4.36.0/lib/sequel/adapters/jdbc
./vendor/bundle/jruby/1.9/gems/sequel-4.36.0/lib/sequel/adapters/jdbc/jdbcprogress.rb
./vendor/bundle/jruby/1.9/gems/sequel-4.36.0/lib/sequel/adapters/jdbc.rb
./vendor/bundle/jruby/1.9/specifications/jdbc-sqlite3-3.8.11.2.gemspec
./vendor/bundle/jruby/1.9/specifications/logstash-input-jdbc-3.1.0.gemspec

What you’ll need to provide in addtition is the jdbc driver for the database you want to connect to. In my case for PostgreSQL:

[elk@elk logstash-2.3.4]$ cd /opt/ elk/
[elk@elk elk]$ mkdir jdbc
[elk@elk elk]$ cd jdbc/
[elk@elk jdbc]$ wget https://jdbc.postgresql.org/download/postgresql-9.4.1209.jar
[elk@elk jdbc]$ ls
postgresql-9.4.1209.jar

All we need to do from here on is to configure another input and output plugin for Logstash:

[elk@elk conf.d]$ pwd
/opt/elk/logstash-2.3.4/conf.d
[elk@elk conf.d]$ cat 03-jdbc-postgres-input.conf
input {
    jdbc {
        jdbc_connection_string => "jdbc:postgresql://192.168.22.99:5432/postgres"
        jdbc_user => "postgres"
        jdbc_password => "postgres"
        jdbc_validate_connection => true
        jdbc_driver_library => "/opt/elk/jdbc/postgresql-9.4.1209.jar"
        jdbc_driver_class => "org.postgresql.Driver"
        statement => "SELECT * from pg_stat_activity"
    }
}
output {
    stdout { codec => json_lines }
}

Re-test if the configuration is fine:

[elk@elk logstash-2.3.4]$ bin/logstash --config /opt/elk/logstash-2.3.4/conf.d/ --configtest
Configuration OK

And then kill and restart Logstash:

[elk@elk logstash-2.3.4]$ bin/logstash --config /opt/elk/logstash-2.3.4/conf.d/

You should see data from pg_stat_activity right on the screen:

[elk@elk logstash-2.3.4]$ bin/logstash --config /opt/elk/logstash-2.3.4/conf.d/ 
Settings: Default pipeline workers: 1
Pipeline main started
{"datid":13322,"datname":"postgres","pid":3887,"usesysid":10,"usename":"postgres","application_name":"","client_addr":{"type":"inet","value":"192.168.22.173"},"client_hostname":null,"client_port":58092,"backend_start":"2016-07-27T13:15:25.421Z","xact_start":"2016-07-27T13:15:25.716Z","query_start":"2016-07-27T13:15:25.718Z","state_change":"2016-07-27T13:15:25.718Z","wait_event_type":null,"wait_event":null,"state":"active","backend_xid":null,"backend_xmin":{"type":"xid","value":"1984"},"query":"SELECT * from pg_stat_activity","@version":"1","@timestamp":"2016-07-27T13:15:26.712Z"}
{"message":"2016-07-27 15:15:25.422 CEST - 1 - 3887 - 192.168.22.173 - [unknown]@[unknown] LOG:  connection received: host=192.168.22.173 port=58092","@version":"1","@timestamp":"2016-07-27T13:15:32.054Z","source":"/u02/pgdata/PG1/pg_log/postgresql-Wed.log","offset":84795,"type":"log","input_type":"log","count":1,"beat":{"hostname":"centos7.local","name":"centos7.local"},"fields":null,"host":"centos7.local","tags":["beats_input_codec_plain_applied"]}
{"message":"2016-07-27 15:15:25.454 CEST - 2 - 3887 - 192.168.22.173 - postgres@postgres LOG:  connection authorized: user=postgres database=postgres","@version":"1","@timestamp":"2016-07-27T13:15:32.054Z","source":"/u02/pgdata/PG1/pg_log/postgresql-Wed.log","fields":null,"beat":{"hostname":"centos7.local","name":"centos7.local"},"count":1,"offset":84932,"type":"log","input_type":"log","host":"centos7.local","tags":["beats_input_codec_plain_applied"]}

As we want to have this data in Elasticsearch and analyze it with Kibana adjust the configuration to look like this and then restart Logstash:

input {
    jdbc {
        jdbc_connection_string => "jdbc:postgresql://192.168.22.99:5432/postgres"
        jdbc_user => "postgres"
        jdbc_password => "postgres"
        jdbc_validate_connection => true
        jdbc_driver_library => "/opt/elk/jdbc/postgresql-9.4.1209.jar"
        jdbc_driver_class => "org.postgresql.Driver"
        statement => "SELECT * from pg_stat_activity"
        schedule => "* * * * *"
    }
}
output {
    elasticsearch {
        index => "pg_stat_activity"
        document_type => "pg_stat_activity"
        document_id => "%{uid}"
        hosts => ["localhost:9200"]
    }
}

Once your restarted head over to Kibana and create a new index:
kibana_09

When you “Discover” you should see the data from pg_stat_activity:
kibana_10

Have fun with your data …

 

Cet article Elasticsearch, Kibana, Logstash and Filebeat – Centralize all your database logs (and even more) est apparu en premier sur Blog dbi services.

Getting started with Ansible – Creating the PostgreSQL instance

Tue, 2016-07-26 08:20

In the last three posts we did the initial Ansible setup, installed the operating system packages, created the PostgreSQL group and user and downloaded, compiled and installed the PostgreSQL binaries from source. In this post we’ll look at how we can use Ansible to create our first PostgreSQL instance.

As a reminder this is our current playbook:

[ansible@ansiblecontrol ansible]$ pwd
/opt/ansible
[ansible@ansiblecontrol ansible]$ cat roles/postgresqldbserver/tasks/main.yml 
---
- name: Install PostgreSQL dependencies
  yum: name={{item}} state=present
  with_items:
   - gcc
   - openldap-devel
   - python-devel
   - readline-devel
   - openssl-devel
   - redhat-lsb
   - bison
   - flex
   - perl-ExtUtils-Embed
   - zlib-devel
   - crypto-utils
   - openssl-devel
   - pam-devel
   - libxml2-devel
   - libxslt-devel
   - tcl
   - tcl-devel
   - openssh-clients
   - bzip2
   - net-tools
   - wget
   - screen
   - ksh

- name: Add PostgreSQL operating system group
  group: name=postgressomegroup state=present

- name: Add PostgreSQL operating system user
  user: name=postgres comment="PostgreSQL binaries owner" group=postgres

- name: Download the PostgreSQL 9.5.3 sources
  get_url: url=https://ftp.postgresql.org/pub/source/v9.5.3/postgresql-9.5.3.tar.bz2 dest=/var/tmp mode=0755

- name: Copy PostgreSQL install script to targets
  copy: src=install_pg953.sh dest=/var/tmp/install_pg953.sh owner=root group=root mode="u=rwx"

- name: Compile and install PostgreSQL
  shell: /var/tmp/install_pg953.sh >> /var/tmp/install_pg_log
  args:
    executable: /usr/bin/bash

What we need now is an additional task which does the creation of the PostgreSQL cluster on disk. As with the installation from source we’ll need to do some scripting for this. Here is a very basic script for doing this:

#!/usr/bin/bash

PGUSER="postgres"
PGGROUP="postgres"
DATADIRECTORY="/u02/pgdata/PG1"
XLOGLOCATION="/u03/pgdata/PG1"
PGHOME="/u01/app/postgres/product/95/db_3/"
POSTGRESDBPASSWORD="postgres"

mkdir -p ${DATADIRECTORY}
mkdir -p ${XLOGLOCATION}
chown ${PGUSER}:${PGGROUP} ${DATADIRECTORY}
chown ${PGUSER}:${PGGROUP} ${XLOGLOCATION}

su - ${PGUSER} -c "echo ${POSTGRESDBPASSWORD} > /var/tmp/tmp_pwd"
su - ${PGUSER} -c "${PGHOME}/bin/initdb -D ${DATADIRECTORY} --pwfile=/var/tmp/tmp_pwd -X ${XLOGLOCATION} -k"

rm -f /var/tmp/tmp_pwd

su - ${PGUSER} -c "${PGHOME}/bin/pg_ctl -D ${DATADIRECTORY} start"

As with the PostgreSQL installation script we’ll put this into the “files” directory of our “postgresqldbserver” role:

[ansible@ansiblecontrol ansible]$ ls roles/postgresqldbserver/files/
create_pg953_cluster.sh  install_pg953.sh

We’ll use the same logic again and add two new tasks to our existing playbook:

- name: Copy the PostgreSQL cluster creation script to the targets
  copy: src=create_pg953_cluster.sh dest=/var/tmp/create_pg953_cluster.sh owner=root group=root mode="u=rwx"

- name: Create and start the PostgreSQL instance
  shell: /var/tmp/create_pg953_cluster.sh >> /var/tmp/create_pg_cluster.log
  args:
    executable: /usr/bin/bash

Once the playbook is executed again:

[ansible@ansiblecontrol ansible]$ ansible-playbook -i /opt/ansible/development /opt/ansible/roles/postgresqldbserver/site.yml

PLAY [postgresql-servers] ******************************************************

TASK [setup] *******************************************************************
ok: [192.168.22.171]
ok: [192.168.22.172]

TASK [postgresqldbserver : Install PostgreSQL dependencies] ********************
ok: [192.168.22.172] => (item=[u'gcc', u'openldap-devel', u'python-devel', u'readline-devel', u'openssl-devel', u'redhat-lsb', u'bison', u'flex', u'perl-ExtUtils-Embed', u'zlib-devel', u'crypto-utils', u'openssl-devel', u'pam-devel', u'libxml2-devel', u'libxslt-devel', u'tcl', u'tcl-devel', u'openssh-clients', u'bzip2', u'net-tools', u'wget', u'screen', u'ksh'])
ok: [192.168.22.171] => (item=[u'gcc', u'openldap-devel', u'python-devel', u'readline-devel', u'openssl-devel', u'redhat-lsb', u'bison', u'flex', u'perl-ExtUtils-Embed', u'zlib-devel', u'crypto-utils', u'openssl-devel', u'pam-devel', u'libxml2-devel', u'libxslt-devel', u'tcl', u'tcl-devel', u'openssh-clients', u'bzip2', u'net-tools', u'wget', u'screen', u'ksh'])

TASK [postgresqldbserver : Add PostgreSQL operating system group] **************
ok: [192.168.22.171]
ok: [192.168.22.172]

TASK [postgresqldbserver : Add PostgreSQL operating system user] ***************
ok: [192.168.22.172]
ok: [192.168.22.171]

TASK [postgresqldbserver : Download the PostgreSQL 9.5.3 sources] **************
ok: [192.168.22.172]
ok: [192.168.22.171]

TASK [postgresqldbserver : Copy PostgreSQL install script to targets] **********
changed: [192.168.22.171]
changed: [192.168.22.172]

TASK [postgresqldbserver : Compile and install PostgreSQL] *********************
changed: [192.168.22.172]
changed: [192.168.22.171]

TASK [postgresqldbserver : Copy the PostgreSQL cluster creation script to the targets] ***
ok: [192.168.22.171]
changed: [192.168.22.172]

TASK [postgresqldbserver : Create and start the PostgreSQL instance] ***********
changed: [192.168.22.172]
changed: [192.168.22.171]

PLAY RECAP *********************************************************************
192.168.22.171             : ok=9    changed=3    unreachable=0    failed=0   
192.168.22.172             : ok=9    changed=4    unreachable=0    failed=0   

… we should have a running PostgreSQL instance on both nodes, lets check:

[root@ansiblepg1 tmp]# ps -ef | grep postgres
postgres 17284     1  0 08:47 ?        00:00:00 /u01/app/postgres/product/95/db_3/bin/postgres -D /u02/pgdata/PG1
postgres 17288 17284  0 08:47 ?        00:00:00 postgres: checkpointer process   
postgres 17289 17284  0 08:47 ?        00:00:00 postgres: writer process   
postgres 17290 17284  0 08:47 ?        00:00:00 postgres: wal writer process   
postgres 17291 17284  0 08:47 ?        00:00:00 postgres: autovacuum launcher process   
postgres 17292 17284  0 08:47 ?        00:00:00 postgres: stats collector process   
root     17294 10223  0 08:47 pts/1    00:00:00 grep --color=auto postgres
[root@ansiblepg2 ~]# ps -ef | grep postgres
postgres 16951     1  0 08:47 ?        00:00:00 /u01/app/postgres/product/95/db_3/bin/postgres -D /u02/pgdata/PG1
postgres 16955 16951  0 08:47 ?        00:00:00 postgres: checkpointer process   
postgres 16956 16951  0 08:47 ?        00:00:00 postgres: writer process   
postgres 16957 16951  0 08:47 ?        00:00:00 postgres: wal writer process   
postgres 16958 16951  0 08:47 ?        00:00:00 postgres: autovacuum launcher process   
postgres 16959 16951  0 08:47 ?        00:00:00 postgres: stats collector process   
root     16985 16964  0 08:48 pts/0    00:00:00 grep --color=auto postgres

Isn’t that cool? Sure, there is much hard coding in here which needs to be extended by using variables. This is a topic for another post.

For your reference here is the complete playbook:

[ansible@ansiblecontrol ansible]$ cat /opt/ansible/roles/postgresqldbserver/tasks/main.yml 
---
- name: Install PostgreSQL dependencies
  yum: name={{item}} state=present
  with_items:
   - gcc
   - openldap-devel
   - python-devel
   - readline-devel
   - openssl-devel
   - redhat-lsb
   - bison
   - flex
   - perl-ExtUtils-Embed
   - zlib-devel
   - crypto-utils
   - openssl-devel
   - pam-devel
   - libxml2-devel
   - libxslt-devel
   - tcl
   - tcl-devel
   - openssh-clients
   - bzip2
   - net-tools
   - wget
   - screen
   - ksh

- name: Add PostgreSQL operating system group
  group: name=postgressomegroup state=present

- name: Add PostgreSQL operating system user
  user: name=postgres comment="PostgreSQL binaries owner" group=postgres

- name: Download the PostgreSQL 9.5.3 sources
  get_url: url=https://ftp.postgresql.org/pub/source/v9.5.3/postgresql-9.5.3.tar.bz2 dest=/var/tmp mode=0755

- name: Copy PostgreSQL install script to targets
  copy: src=install_pg953.sh dest=/var/tmp/install_pg953.sh owner=root group=root mode="u=rwx"

- name: Compile and install PostgreSQL
  shell: /var/tmp/install_pg953.sh >> /var/tmp/install_pg_log
  args:
    executable: /usr/bin/bash

- name: Copy the PostgreSQL cluster creation script to the targets
  copy: src=create_pg953_cluster.sh dest=/var/tmp/create_pg953_cluster.sh owner=root group=root mode="u=rwx"

- name: Create and start the PostgreSQL instance
  shell: /var/tmp/create_pg953_cluster.sh >> /var/tmp/create_pg_cluster.log
  args:
    executable: /usr/bin/bash
 

Cet article Getting started with Ansible – Creating the PostgreSQL instance est apparu en premier sur Blog dbi services.

Getting started with Ansible – Download the PostgreSQL sources, compile and install

Mon, 2016-07-25 23:21

In the last post in this series we looked at how you can instruct Ansible to install packages on the operating system using the yum module and how you can create groups and users by using the group and the user modules. In this post we’ll look at how you can download the PostgreSQL sources, compile and then finally install the binaries by extending our existing playbook.

If you remember the last post our current Ansible playbook looks like this:

[ansible@ansiblecontrol ansible]$ pwd
/opt/ansible
[ansible@ansiblecontrol ansible]$ cat roles/postgresqldbserver/tasks/main.yml 
---
- name: Install PostgreSQL dependencies
  yum: name={{item}} state=present
  with_items:
   - gcc
   - openldap-devel
   - python-devel
   - readline-devel
   - openssl-devel
   - redhat-lsb
   - bison
   - flex
   - perl-ExtUtils-Embed
   - zlib-devel
   - crypto-utils
   - openssl-devel
   - pam-devel
   - libxml2-devel
   - libxslt-devel
   - tcl
   - tcl-devel
   - openssh-clients
   - bzip2
   - net-tools
   - wget
   - screen
   - ksh

- name: Add PostgreSQL operating system group
  group: name=postgressomegroup state=present

- name: Add PostgreSQL operating system user
  user: name=postgres comment="PostgreSQL binaries owner" group=postgres

Basically we have three tasks:

  • Install the required operating system packages
  • Create the PostgreSQL operating system group
  • Create the PostgreSQL operating system user

The next task we want Ansible to do is to download the PostgreSQL sources. In this example we’ll download from the official PostgreSQL servers but if you do not have connectivity to the outside world this can be a local server in your company’s network as well. The module we’ll use for that is get_url_module. All we need to do is to add the following lines to our playbook:

- name: Download the PostgreSQL 9.5.3 sources
  get_url: url=https://ftp.postgresql.org/pub/source/v9.5.3/postgresql-9.5.3.tar.bz2 dest=/var/tmp mode=0755

Once we execute the playbook again:

[ansible@ansiblecontrol ~]$ ansible-playbook -i /opt/ansible/development /opt/ansible/roles/postgresqldbserver/site.yml

PLAY [postgresql-servers] ******************************************************

TASK [setup] *******************************************************************
ok: [192.168.22.172]
ok: [192.168.22.171]

TASK [postgresqldbserver : Install PostgreSQL dependencies] ********************
ok: [192.168.22.171] => (item=[u'gcc', u'openldap-devel', u'python-devel', u'readline-devel', u'openssl-devel', u'redhat-lsb', u'bison', u'flex', u'perl-ExtUtils-Embed', u'zlib-devel', u'crypto-utils', u'openssl-devel', u'pam-devel', u'libxml2-devel', u'libxslt-devel', u'tcl', u'tcl-devel', u'openssh-clients', u'bzip2', u'net-tools', u'wget', u'screen', u'ksh'])
ok: [192.168.22.172] => (item=[u'gcc', u'openldap-devel', u'python-devel', u'readline-devel', u'openssl-devel', u'redhat-lsb', u'bison', u'flex', u'perl-ExtUtils-Embed', u'zlib-devel', u'crypto-utils', u'openssl-devel', u'pam-devel', u'libxml2-devel', u'libxslt-devel', u'tcl', u'tcl-devel', u'openssh-clients', u'bzip2', u'net-tools', u'wget', u'screen', u'ksh'])

TASK [postgresqldbserver : Add PostgreSQL operating system group] **************
ok: [192.168.22.172]
ok: [192.168.22.171]

TASK [postgresqldbserver : Add PostgreSQL operating system user] ***************
ok: [192.168.22.172]
ok: [192.168.22.171]

TASK [postgresqldbserver : Download the PostgreSQL 9.5.3 sources] **************
changed: [192.168.22.171]
changed: [192.168.22.172]

PLAY RECAP *********************************************************************
192.168.22.171             : ok=5    changed=1    unreachable=0    failed=0   
192.168.22.172             : ok=5    changed=1    unreachable=0    failed=0   

The sources are available on the PostgreSQL hosts:

[root@ansiblepg1 ~] ls /var/tmp/post*
/var/tmp/postgresql-9.5.3.tar.bz2

Now we need to do some scripting as there is no pre-defined module for installing PostgreSQL from source. Here is a very basic script to do that:

#!/usr/bin/bash
PGSOURCE="/var/tmp/postgresql-9.5.3.tar.bz2"
PGUSER="postgres"
PGGROUP="postgres"
PGHOME="/u01/app/postgres/product/95/db_3"
SEGSIZE=2
BLOCKSIZE=8
mkdir -p /u01/app
chown ${PGUSER}:${PGGROUP} /u01/app
su - ${PGUSER} -c "cd /var/tmp/; tar -axf ${PGSOURCE}"
su - ${PGUSER} -c "cd /var/tmp/postgresql-9.5.3; ./configure --prefix=${PGHOME} \
            --exec-prefix=${PGHOME} \
            --bindir=${PGHOME}/bin \
            --libdir=${PGHOME}/lib \
            --sysconfdir=${PGHOME}/etc \
            --includedir=${PGHOME}/include \
            --datarootdir=${PGHOME}/share \
            --datadir=${PGHOME}/share \
            --with-pgport=5432 \
            --with-perl \
            --with-python \
            --with-tcl \
            --with-openssl \
            --with-pam \
            --with-ldap \
            --with-libxml \
            --with-libxslt \
            --with-segsize=${SEGSIZE} \
            --with-blocksize=${BLOCKSIZE} \
            --with-wal-segsize=16  \
            --with-extra-version=\" dbi services build\""
su - ${PGUSER} -c "cd /var/tmp/postgresql-9.5.3; make world"
su - ${PGUSER} -c "cd /var/tmp/postgresql-9.5.3; make install"
su - ${PGUSER} -c "cd /var/tmp/postgresql-9.5.3/contrib; make install"
rm -rf /var/tmp/postgresql*

Place a file with this contents under the files directory of our role:

roles/postgresqldbserver/files/install_pg953.sh

There is another Ansible module called copy which we now can use to copy the file from our roles directory to the target server. All we need to do is to add the following lines to our playbook:

- name: Copy PostgreSQL install script to targets
  copy: src=install_pg953.sh dest=/var/tmp/install_pg953.sh owner=postgres group=postgres mode="u=rwx"

Once we execute the playbook the file is distributed to all targets (here the check for the first node):

[root@ansiblepg1 ~] ls /var/tmp/install*
/var/tmp/install_pg953.sh

The only thing we need to do from now on to get PostgreSQL installed on the target system is to exexute this file. How can we do that? Very easy by using the shell module. Add these lines to the playbook:

- name: Compile and install PostgreSQL
  shell: /var/tmp/install_pg953.sh >> /var/tmp/install_pg_log
  args:
    executable: /usr/bin/bash

… re-execute the playbook:

[ansible@ansiblecontrol ansible]$ ansible-playbook -i /opt/ansible/development /opt/ansible/roles/postgresqldbserver/site.yml

PLAY [postgresql-servers] ******************************************************

TASK [setup] *******************************************************************
ok: [192.168.22.172]
ok: [192.168.22.171]

TASK [postgresqldbserver : Install PostgreSQL dependencies] ********************
ok: [192.168.22.171] => (item=[u'gcc', u'openldap-devel', u'python-devel', u'readline-devel', u'openssl-devel', u'redhat-lsb', u'bison', u'flex', u'perl-ExtUtils-Embed', u'zlib-devel', u'crypto-utils', u'openssl-devel', u'pam-devel', u'libxml2-devel', u'libxslt-devel', u'tcl', u'tcl-devel', u'openssh-clients', u'bzip2', u'net-tools', u'wget', u'screen', u'ksh'])
ok: [192.168.22.172] => (item=[u'gcc', u'openldap-devel', u'python-devel', u'readline-devel', u'openssl-devel', u'redhat-lsb', u'bison', u'flex', u'perl-ExtUtils-Embed', u'zlib-devel', u'crypto-utils', u'openssl-devel', u'pam-devel', u'libxml2-devel', u'libxslt-devel', u'tcl', u'tcl-devel', u'openssh-clients', u'bzip2', u'net-tools', u'wget', u'screen', u'ksh'])

TASK [postgresqldbserver : Add PostgreSQL operating system group] **************
ok: [192.168.22.172]
ok: [192.168.22.171]

TASK [postgresqldbserver : Add PostgreSQL operating system user] ***************
ok: [192.168.22.172]
ok: [192.168.22.171]

TASK [postgresqldbserver : Download the PostgreSQL 9.5.3 sources] **************
changed: [192.168.22.172]
changed: [192.168.22.171]

TASK [postgresqldbserver : Copy PostgreSQL install script to targets] **********
ok: [192.168.22.171]
ok: [192.168.22.172]

TASK [postgresqldbserver : Compile and install PostgreSQL] *********************
changed: [192.168.22.171]
changed: [192.168.22.172]

PLAY RECAP *********************************************************************
192.168.22.171             : ok=7    changed=2    unreachable=0    failed=0   
192.168.22.172             : ok=7    changed=2    unreachable=0    failed=0   

… and we are done. Just to prove it:

[root@ansiblepg1 ~] /u01/app/postgres/product/95/db_3/bin/psql --version
psql (PostgreSQL) 9.5.3 dbi services build
[root@ansiblepg2 ~] /u01/app/postgres/product/95/db_3/bin/psql --version
psql (PostgreSQL) 9.5.3 dbi services build

PostgreSQL is installed and ready to create a new database cluster (which is the topic for the next post).

Btw: The two steps of copying the script to the targets and then execute it can be combined into one step by using the script module.
Btw2: Of course you might do the same with Oracle, Mysql, MongoDB, Cassandra, …

 

Cet article Getting started with Ansible – Download the PostgreSQL sources, compile and install est apparu en premier sur Blog dbi services.

Getting started with Ansible – Installing OS packages, creating groups and users

Mon, 2016-07-25 08:13

It has been quite a while since the first post in this series: “Getting started with Ansible – Preparations“. If you recap from the initial post Ansible was running on the control host and this simple Ansible command:

ansible postgres-servers -a "/bin/echo 11" -f 5

… was successfully executed against the “postgres-servers” group. So far, so good. Getting Ansible up and running for just this would not be very usefull, so lets see where we might go from here.

When you start to think on what you want to automate you should start to think on how you want to organize your Ansible stuff. The documentation provides some guidelines which might or might not fit your needs. For the scope of this series lets stick to what the documentation is recommeding as one possible way to go. The directory layout on the control host will then be:

[ansible@ansiblecontrol ~]$ sudo mkdir /opt/ansible
[ansible@ansiblecontrol ~]$ sudo chown ansible:ansible /opt/ansible
[ansible@ansiblecontrol ~]$ touch /opt/ansible/development                  # the inventory file for the development hosts      
[ansible@ansiblecontrol ~]$ touch /opt/ansible/staging                      # the inventory file for the staging hosts
[ansible@ansiblecontrol ~]$ touch /opt/ansible/production                   # the inventory file for the production hosts
[ansible@ansiblecontrol ~]$ mkdir /opt/ansible/roles/
[ansible@ansiblecontrol ~]$ mkdir /opt/ansible/roles/common                 # a role valid for "common" stuff
[ansible@ansiblecontrol ~]$ mkdir /opt/ansible/roles/common/tasks
[ansible@ansiblecontrol ~]$ mkdir /opt/ansible/roles/common/handlers
[ansible@ansiblecontrol ~]$ mkdir /opt/ansible/roles/common/templates
[ansible@ansiblecontrol ~]$ mkdir /opt/ansible/roles/common/files
[ansible@ansiblecontrol ~]$ mkdir /opt/ansible/roles/common/vars
[ansible@ansiblecontrol ~]$ mkdir /opt/ansible/roles/common/meta
[ansible@ansiblecontrol ~]$ mkdir /opt/ansible/roles/postgresqldbserver    # a role vaild for the PostgreSQL stuff
[ansible@ansiblecontrol ~]$ mkdir /opt/ansible/roles/postgresqldbserver/tasks
[ansible@ansiblecontrol ~]$ mkdir /opt/ansible/roles/postgresqldbserver/handlers
[ansible@ansiblecontrol ~]$ mkdir /opt/ansible/roles/postgresqldbserver/templates
[ansible@ansiblecontrol ~]$ mkdir /opt/ansible/roles/postgresqldbserver/files
[ansible@ansiblecontrol ~]$ mkdir /opt/ansible/roles/postgresqldbserver/vars
[ansible@ansiblecontrol ~]$ mkdir /opt/ansible/roles/postgresqldbserver/meta

The concept of roles is explained in the documentation and you should definitely read that. We’ll come back to this later.

For now let’s place our two PostgreSQL hosts into the “development” inventory:

[ansible@ansiblecontrol ~]$ echo "[postgresql-servers]" >> /opt/ansible/development
[ansible@ansiblecontrol ~]$ echo "192.168.22.171" >> /opt/ansible/development
[ansible@ansiblecontrol ~]$ echo "192.168.22.172" >> /opt/ansible/development

Passing our new inventory file to Ansible we should be able to perform the same simple task as in the first post:

[ansible@ansiblecontrol ~]$ ansible -i /opt/ansible/development postgresql-servers -a "/bin/echo 11"
192.168.22.172 | SUCCESS | rc=0 >>
11

192.168.22.171 | SUCCESS | rc=0 >>
11

Ok, fine, this still works. When it comes to PostgreSQL one of the first steps when we want to install from source is to install all the operating system packages which are required. How could we do that with Ansible?

The initial step is to tell Ansible where to look for our roles. This is done by specifying the “roles_path” configuration parameter in the ansible.cfg configuration file:

[ansible@ansiblecontrol ~]$ cat /etc/ansible/ansible.cfg | grep roles | grep -v "#"
roles_path    = /opt/ansible/roles

From here on we need to setup our role by creating an initial “site.yml” file:

[ansible@ansiblecontrol ansible]$ cat roles/postgresqldbserver/site.yml 
---
# This playbook deploys a single PostgreSQL instance from the source code

- hosts: postgresql-servers
  become: true
  become_user: root

  roles:
    - postgresqldbserver

You can see from the above that the “postgresql-servers” group is referenced. Additionally notice the “become” and the “become_user” flags. As we’re going to use yum to install the packages we need a way to become root on the target system and this is how you can instruct Ansible to do so.
Time to specify on how we want to install the packages. This is quite easy as well:

[ansible@ansiblecontrol ansible]$ cat roles/postgresqldbserver/tasks/main.yml 
---
- name: Install PostgreSQL dependencies
  yum: name={{item}} state=present
  with_items:
   - gcc
   - openldap-devel
   - python-devel
   - readline-devel
   - openssl-devel
   - redhat-lsb
   - bison
   - flex
   - perl-ExtUtils-Embed
   - zlib-devel
   - crypto-utils
   - openssl-devel
   - pam-devel
   - libxml2-devel
   - libxslt-devel
   - tcl
   - tcl-devel
   - openssh-clients
   - bzip2
   - net-tools
   - wget
   - screen
   - ksh

What did we do here? We created our first task. We tell Ansible to use “yum” to install our “items” (which are the packages we want to install). You can check the documentation for more information on the yum module.

Lets see if it works and we can execute our first task on both PostgreSQL nodes:

[ansible@ansiblecontrol ~]$ ansible-playbook -i /opt/ansible/development /opt/ansible/roles/postgresqldbserver/site.yml

PLAY [postgresql-servers] ******************************************************

TASK [setup] *******************************************************************
ok: [192.168.22.171]
ok: [192.168.22.172]

TASK [postgresqldbserver : Install PostgreSQL dependencies] ********************
changed: [192.168.22.172] => (item=[u'gcc', u'openldap-devel', u'python-devel', u'readline-devel', u'openssl-devel', u'redhat-lsb', u'bison', u'flex', u'perl-ExtUtils-Embed', u'zlib-devel', u'crypto-utils', u'openssl-devel', u'pam-devel', u'libxml2-devel', u'libxslt-devel', u'tcl', u'tcl-devel', u'openssh-clients', u'bzip2', u'net-tools', u'wget', u'screen', u'ksh'])
changed: [192.168.22.171] => (item=[u'gcc', u'openldap-devel', u'python-devel', u'readline-devel', u'openssl-devel', u'redhat-lsb', u'bison', u'flex', u'perl-ExtUtils-Embed', u'zlib-devel', u'crypto-utils', u'openssl-devel', u'pam-devel', u'libxml2-devel', u'libxslt-devel', u'tcl', u'tcl-devel', u'openssh-clients', u'bzip2', u'net-tools', u'wget', u'screen', u'ksh'])

PLAY RECAP *********************************************************************
192.168.22.171             : ok=2    changed=1    unreachable=0    failed=0   
192.168.22.172             : ok=2    changed=1    unreachable=0    failed=0   

Cool, we just installed all the dependencies on both nodes with one Ansible command. We additionally want an operating system group for our PostgreSQL deployment so we add the following lines to the playbook:

- name: Add PostgreSQL operating system group
  group: name=postgressomegroup state=present

Execute the playbook again:

[ansible@ansiblecontrol ~]$ ansible-playbook -i /opt/ansible/development /opt/ansible/roles/postgresqldbserver/site.yml

PLAY [postgresql-servers] ******************************************************

TASK [setup] *******************************************************************
ok: [192.168.22.172]
ok: [192.168.22.171]

TASK [postgresqldbserver : Install PostgreSQL dependencies] ********************
ok: [192.168.22.172] => (item=[u'gcc', u'openldap-devel', u'python-devel', u'readline-devel', u'openssl-devel', u'redhat-lsb', u'bison', u'flex', u'perl-ExtUtils-Embed', u'zlib-devel', u'crypto-utils', u'openssl-devel', u'pam-devel', u'libxml2-devel', u'libxslt-devel', u'tcl', u'tcl-devel', u'openssh-clients', u'bzip2', u'net-tools', u'wget', u'screen', u'ksh'])
ok: [192.168.22.171] => (item=[u'gcc', u'openldap-devel', u'python-devel', u'readline-devel', u'openssl-devel', u'redhat-lsb', u'bison', u'flex', u'perl-ExtUtils-Embed', u'zlib-devel', u'crypto-utils', u'openssl-devel', u'pam-devel', u'libxml2-devel', u'libxslt-devel', u'tcl', u'tcl-devel', u'openssh-clients', u'bzip2', u'net-tools', u'wget', u'screen', u'ksh'])

TASK [postgresqldbserver : Add PostgreSQL operating system group] **************
changed: [192.168.22.171]
changed: [192.168.22.172]

PLAY RECAP *********************************************************************
192.168.22.171             : ok=3    changed=1    unreachable=0    failed=0   
192.168.22.172             : ok=3    changed=1    unreachable=0    failed=0   

We did not change any of the packages but added the group. Lets add the PostgreSQL operating system user by adding these lines to the playbook:

- name: Add PostgreSQL operating system user
  user: name=postgres comment="PostgreSQL binaries owner" group=postgres

Execute again:

[ansible@ansiblecontrol ~]$ ansible-playbook -i /opt/ansible/development /opt/ansible/roles/postgresqldbserver/site.yml

PLAY [postgresql-servers] ******************************************************

TASK [setup] *******************************************************************
ok: [192.168.22.172]
ok: [192.168.22.171]

TASK [postgresqldbserver : Install PostgreSQL dependencies] ********************
ok: [192.168.22.171] => (item=[u'gcc', u'openldap-devel', u'python-devel', u'readline-devel', u'openssl-devel', u'redhat-lsb', u'bison', u'flex', u'perl-ExtUtils-Embed', u'zlib-devel', u'crypto-utils', u'openssl-devel', u'pam-devel', u'libxml2-devel', u'libxslt-devel', u'tcl', u'tcl-devel', u'openssh-clients', u'bzip2', u'net-tools', u'wget', u'screen', u'ksh'])
ok: [192.168.22.172] => (item=[u'gcc', u'openldap-devel', u'python-devel', u'readline-devel', u'openssl-devel', u'redhat-lsb', u'bison', u'flex', u'perl-ExtUtils-Embed', u'zlib-devel', u'crypto-utils', u'openssl-devel', u'pam-devel', u'libxml2-devel', u'libxslt-devel', u'tcl', u'tcl-devel', u'openssh-clients', u'bzip2', u'net-tools', u'wget', u'screen', u'ksh'])

TASK [postgresqldbserver : Add PostgreSQL operating system group] **************
changed: [192.168.22.171]
changed: [192.168.22.172]

TASK [postgresqldbserver : Add PostgreSQL operating system user] ***************
changed: [192.168.22.171]
changed: [192.168.22.172]

PLAY RECAP *********************************************************************
192.168.22.171             : ok=4    changed=2    unreachable=0    failed=0   
192.168.22.172             : ok=4    changed=2    unreachable=0    failed=0   

Really cool and simple. Just to prove lets connect to one of the nodes and check if the postgres user really is there:

[root@ansiblepg2 ~] id -a postgres
uid=1001(postgres) gid=1001(postgres) groups=1001(postgres)
[root@ansiblepg2 ~] 

Perfect. In the next post we’ll install the PostgreSQL binaries.

For you reference this is the playbook as it looks now:

[ansible@ansiblecontrol ansible]$ cat roles/postgresqldbserver/tasks/main.yml
---
- name: Install PostgreSQL dependencies
  yum: name={{item}} state=present
  with_items:
   - gcc
   - openldap-devel
   - python-devel
   - readline-devel
   - openssl-devel
   - redhat-lsb
   - bison
   - flex
   - perl-ExtUtils-Embed
   - zlib-devel
   - crypto-utils
   - openssl-devel
   - pam-devel
   - libxml2-devel
   - libxslt-devel
   - tcl
   - tcl-devel
   - openssh-clients
   - bzip2
   - net-tools
   - wget
   - screen
   - ksh

- name: Add PostgreSQL operating system group
  group: name=postgressomegroup state=present

- name: Add PostgreSQL operating system user
  user: name=postgres comment="PostgreSQL binaries owner" group=postgres
 

Cet article Getting started with Ansible – Installing OS packages, creating groups and users est apparu en premier sur Blog dbi services.

Redo log block size on ODA X6 all flash

Fri, 2016-07-22 16:43

On the Oracle Database Appliance, the redo logs are on Flash storage (and with X6 everything is on Flash storage) so you may wonder if we can benefit from 4k redo blocksize. Here are some tests about it on an ODA X6-2M.

I’ll compare the same workload (heavy inserts) with 512 bytes and 4k bytes block size redo. However, we can’t create a log group different than 512 bytes:

ORA-01378: The logical block size (4096) of file
/u03/app/oracle/redo/LABDB1/onlinelog/o1_mf_999_%u_.log is not compatible with
the disk sector size (media sector size is 512 and host sector size is 512)

This is because the flash storage is exposed with 512 bytes sector size:

ASMCMD> lsdg
State Type Rebal Sector Block AU Total_MB Free_MB Req_mir_free_MB Usable_file_MB Offline_disks Voting_files Name
MOUNTED NORMAL N 512 4096 4194304 4894016 4500068 2441888 1023992 0 Y DATA/
MOUNTED NORMAL N 512 4096 4194304 1231176 221172 610468 -199762 0 N RECO/

Then, in order to be able to create new redo log groups with higher block size you need to set “_disk_sector_size_override” to TRUE;

I have 3 log groups with 512 bytes block size, and 3 groups with 4k:


LOGFILE
GROUP 10 '+RECO/LABDB1/ONLINELOG/group_10.264.917867333' SIZE 51200M BLOCKSIZE 512,
GROUP 11 '+RECO/LABDB1/ONLINELOG/group_11.265.917867489' SIZE 51200M BLOCKSIZE 512,
GROUP 12 '+RECO/LABDB1/ONLINELOG/group_12.266.917867645' SIZE 51200M BLOCKSIZE 512,
GROUP 13 '+RECO/LABDB1/ONLINELOG/group_13.267.917867795' SIZE 51200M BLOCKSIZE 4096,
GROUP 14 '+RECO/LABDB1/ONLINELOG/group_14.268.917867913' SIZE 51200M BLOCKSIZE 4096,
GROUP 15 '+RECO/LABDB1/ONLINELOG/group_15.269.917868013' SIZE 51200M BLOCKSIZE 4096

In 12c the database files should be on ACFS and not directly on the diskgroup. We did this on purpose in order to check if there is any overhead when in ACFS and we have seen exactly the same performance in both. There is something I dislike here however: redo log files are not multiplexed with multiple log members, but rely on the diskgroup redundancy. I agree with that in ASM because you are not supposed to manage the files and then risk to delete one of them. But in ACFS you see only one file, and if you drop it by mistake, both mirrors are lost, with the latest transactions.

On an insert intensive workload I take AWR snapshots between two log switches:

ODAX6REDO
The switch between blocksize 512 and blocksize 4096 happened at 12:35

Don’t be nervous about those orange ‘log file sync waits’ we had to run 10000 transactions per second in order to get some contention here.

We have to go to the details in order to compare, from an AWR Diff report:

Workload Comparison
~~~~~~~~~~~~~~~~~~~ 1st Per Sec 2nd Per Sec %Diff
--------------- --------------- ------
DB time: 37.9 37.3 -1.4
CPU time: 19.0 24.4 28.4
Background CPU time: 0.8 1.0 23.2
Redo size (bytes): 61,829,138.5 76,420,493.9 23.6
Logical read (blocks): 1,181,178.7 1,458,915.9 23.5
Block changes: 360,883.0 445,770.8 23.5
Physical read (blocks): 0.4 1.1 164.3
Physical write (blocks): 14,451.2 16,092.4 11.4
Read IO requests: 0.4 1.1 164.3
Write IO requests: 9,829.4 10,352.3 5.3
Read IO (MB): 0.0 0.0 100.0
Write IO (MB): 112.9 125.7 11.4
IM scan rows: 0.0 0.0 0.0
Session Logical Read IM:
User calls: 8,376.0 10,341.2 23.5
Parses (SQL): 5,056.0 6,247.8 23.6
Hard parses (SQL): 0.0 0.0 0.0
SQL Work Area (MB): 3.1 3.2 3.5
Logons: 0.4 0.3 -37.2
Executes (SQL): 225,554.2 278,329.3 23.4
Transactions: 10,911.0 13,486.4 23.6

The second workload, when redo blocksize was 4k, was able to handle 23% more activity.

‘log file sync’ average time is 1.3 milliseconds instead of 2.4:

Top Timed Events First DB/Inst: LABDB1/labdb1 Snaps: 155-156 (Elapsed time: 301.556 sec DB time: 11417.12 sec), Second DB/Inst: LABDB1/labdb1 Snaps: 157-158 (Elapsed time: 301.927 sec DB time: 11269.1 sec)
-> Events with a "-" did not make the Top list in this set of snapshots, but are displayed for comparison purposes
 
1st 2nd
------------------------------------------------------------------------------------------------ ------------------------------------------------------------------------------------------------
Event Wait Class Waits Time(s) Avg Time(ms) %DB time Event Wait Class Waits Time(s) Avg Time(ms) %DB time
------------------------------ ------------- ------------ ------------ ------------- ----------- ------------------------------ ------------- ------------ ------------ ------------- -----------
CPU time N/A 5,722.8 N/A 50.1 CPU time N/A 7,358.4 N/A 65.3
log file sync Commit 2,288,655 5,412.1 2.4 47.4 log file sync Commit 2,808,036 3,535.5 1.3 31.4
target log write size Other 363,206 283.7 0.8 2.5 target log write size Other 644,287 278.2 0.4 2.5
log file parallel write System I/O 368,063 225.1 0.6 2.0 enq: TX - row lock contention Application 171,485 170.2 1.0 1.5
db file parallel write System I/O 12,399 160.2 12.9 1.4 db file parallel write System I/O 12,131 150.4 12.4 1.3
enq: TX - row lock contention Application 144,822 133.2 0.9 1.2 log file parallel write System I/O 649,501 148.1 0.2 1.3
library cache: mutex X Concurrency 130,800 120.8 0.9 1.1 library cache: mutex X Concurrency 86,632 128.1 1.5 1.1
log file sequential read System I/O 7,433 27.5 3.7 0.2 LGWR wait for redo copy Other 478,350 45.1 0.1 0.4
LGWR wait for redo copy Other 228,643 20.8 0.1 0.2 log file sequential read System I/O 6,577 21.7 3.3 0.2
buffer busy waits Concurrency 261,348 15.8 0.1 0.1 buffer busy waits Concurrency 295,880 20.1 0.1 0.2
--------------------------------------------------------------------------------------------------------------------

We see that this difference comes from lower latency in ‘log file parallel write':

Wait Events First DB/Inst: LABDB1/labdb1 Snaps: 155-156 (Elapsed time: 301.556 sec DB time: 11417.12 sec), Second DB/Inst: LABDB1/labdb1 Snaps: 157-158 (Elapsed time: 301.927 sec DB time: 11269.1 sec)
-> Ordered by absolute value of 'Diff' column of '% of DB time' descending (idle events last)
 
# Waits/sec (Elapsed Time) Total Wait Time (sec) Avg Wait Time (ms)
---------------------------------------- ---------------------------------------- -------------------------------------------
Event Wait Class 1st 2nd %Diff 1st 2nd %Diff 1st 2nd %Diff
------------------------------ ------------- -------------- -------------- ---------- -------------- -------------- ---------- --------------- --------------- -----------
log file sync Commit 7,589.5 9,300.4 22.5 5,412.1 3,535.5 -34.7 2.36 1.26 -46.61
log file parallel write System I/O 1,220.5 2,151.2 76.2 225.1 148.1 -34.2 0.61 0.23 -62.30
enq: TX - row lock contention Application 480.2 568.0 18.3 133.2 170.2 27.8 0.92 0.99 7.61
LGWR wait for redo copy Other 758.2 1,584.3 109.0 20.8 45.1 117.1 0.09 0.09 0.00
library cache: mutex X Concurrency 433.8 286.9 -33.8 120.8 128.1 6.0 0.92 1.48 60.87
db file parallel write System I/O 41.1 40.2 -2.3 160.2 150.4 -6.2 12.92 12.40 -4.02
cursor: pin S Concurrency 29.7 46.0 55.0 9.9 16.6 67.0 1.11 1.19 7.21
cursor: mutex X Concurrency 7.0 10.8 54.2 13.6 19.7 45.0 6.39 6.01 -5.95
latch: In memory undo latch Concurrency 585.3 749.0 28.0 10.8 16.3 50.8 0.06 0.07 16.67

In order to go into details, here is the wait event histogram for 512 bytes redo blocksize:

% of Waits
-----------------------------------------------
Total
Event Waits <1ms <2ms <4ms <8ms <16ms <32ms 1s
------------------------- ------ ----- ----- ----- ----- ----- ----- ----- -----
LGWR all worker groups 41 48.8 12.2 14.6 14.6 4.9 2.4 2.4
LGWR any worker group 259 6.2 5.4 8.9 13.9 18.1 18.1 29.3
LGWR wait for redo copy 228.9K 99.1 .9 .0
LGWR worker group orderin 442 44.6 9.7 4.5 5.0 9.3 10.6 16.3
log file parallel write 368.5K 85.3 7.5 4.7 1.4 .9 .2 .0
log file sequential read 7358 6.5 13.1 59.0 17.2 3.0 1.1 .2
log file sync 2.3M 48.9 23.1 17.0 5.7 2.7 2.3 .3

and for 4096 bytes blocksize:

% of Waits
-----------------------------------------------
Total
Event Waits <1ms <2ms <4ms <8ms <16ms <32ms 1s
------------------------- ------ ----- ----- ----- ----- ----- ----- ----- -----
LGWR all worker groups 20 45.0 5.0 15.0 10.0 5.0 20.0
LGWR any worker group 235 7.2 3.0 5.5 7.7 14.5 25.1 37.0
LGWR wait for redo copy 478.7K 98.9 1.0 .1 .0
LGWR worker group orderin 517 51.3 9.7 2.3 2.9 7.2 11.6 15.1
log file parallel write 649.9K 97.7 1.3 .3 .3 .4 .0 .0
log file sequential read 6464 5.7 8.2 73.5 11.0 1.2 .3 .1
log file sync 2.8M 78.2 15.6 2.3 .8 1.6 1.2 .

Few milliseconds are not perceived by end-user at commit except if the application has a design that is so bad that hundreds of commits are done for each user interaction. Even if both are really fast, the log writers was above 1ms for writes only for 1% of them when in blocksize 4k vs. 15% with default blocksize.

This faster latency is measured by I/O statistics as well:

Reads: Reqs Data Writes: Reqs Data Waits: Avg
Function Name Data per sec per sec Data per sec per sec Count Tm(ms)
--------------- ------- ------- ------- ------- ------- ------- ------- -------

BLOCKSIZE 512:
LGWR 0M 0.0 0M 18.1G 2420.4 61.528M 368.9K 0.6
BLOCKSIZE 4096:
LGWR 0M 0.0 0M 24.1G 4263.5 81.689M 649.5K 0.2

To be comprehensive, here are the statistics related with redo, thanks to those many statistics available in 12c:

Value per Second (Elapsed Time)
------------------------------------------- ---------------------------------------
Statistic 1st 2nd %Diff 1st 2nd %Diff
------------------------------ ---------------- ---------------- --------- -------------- -------------- ---------
 
redo KB read 16,319,609 15,783,576 -3.3 54,118.0 52,276.1 -3.4
redo blocks checksummed by FG 26,587,090 1,000,267 -96.2 88,166.3 3,312.9 -96.2
redo blocks written 37,974,499 6,318,372 -83.4 125,928.5 20,926.8 -83.4
redo blocks written (group 0) 37,256,502 6,257,861 -83.2 123,547.5 20,726.4 -83.2
redo blocks written (group 1) 717,997 60,511 -91.6 2,381.0 200.4 -91.6
redo entries 24,023,503 30,214,386 25.8 79,665.1 100,071.8 25.6
redo size 18,644,947,688 23,073,410,468 23.8 61,829,138.5 76,420,493.9 23.6
redo synch long waits 343 4,890 1,325.7 1.1 16.2 1,321.1
redo synch time 541,804 354,625 -34.5 1,796.7 1,174.5 -34.6
redo synch time (usec) 5,418,056,862 3,546,209,390 -34.5 17,967,000.7 11,745,254.3 -34.6
redo synch time overhead (usec) 145,664,759 197,925,281 35.9 483,043.8 655,540.2 35.7
redo synch time overhead count ( 2ms) 2,295,847 2,821,726 22.9 7,613.3 9,345.7 22.8
redo synch time overhead count ( 8ms) 443 3,704 736.1 1.5 12.3 734.7
redo synch time overhead count ( 32ms) 2 9 350.0 0.0 0.0 200.0
redo synch writes 2,305,502 2,849,645 23.6 7,645.4 9,438.2 23.5
redo wastage 179,073,264 2,703,864,280 1,409.9 593,830.9 8,955,357.7 1,408.1
redo write finish time 291,094,266 277,884,591 -4.5 965,307.5 920,370.1 -4.7
redo write gather time 63,237,013 125,066,420 97.8 209,702.4 414,227.3 97.5
redo write info find 2,296,292 2,825,439 23.0 7,614.8 9,358.0 22.9
redo write schedule time 63,679,682 125,819,850 97.6 211,170.3 416,722.8 97.3
redo write size count ( 4KB) 12,220 0 40.5 0
redo write size count ( 8KB) 26,420 2,246 -91.5 87.6 7.4 -91.5
redo write size count ( 16KB) 69,674 94,557 35.7 231.0 313.2 35.5
redo write size count ( 32KB) 108,676 268,794 147.3 360.4 890.3 147.0
redo write size count ( 128KB) 106,651 253,669 137.8 353.7 840.2 137.6
redo write size count ( 256KB) 37,332 28,076 -24.8 123.8 93.0 -24.9
redo write size count ( 512KB) 7,328 2,382 -67.5 24.3 7.9 -67.5
redo write size count (1024KB) 28 28 0.0 0.1 0.1 0.0
redo write time 29,126 27,817 -4.5 96.6 92.1 -4.6
redo write time (usec) 291,261,420 278,162,437 -4.5 965,861.8 921,290.4 -4.6
redo write total time 306,213,383 298,786,696 -2.4 1,015,444.5 989,599.1 -2.5
redo write worker delay (usec) 38,246,633 73,452,748 92.1 126,830.9 243,279.8 91.8
redo writes 368,330 649,751 76.4 1,221.4 2,152.0 76.2
redo writes (group 0) 366,492 648,430 76.9 1,215.3 2,147.6 76.7
redo writes (group 1) 1,838 1,321 -28.1 6.1 4.4 -28.2
redo writes adaptive all 368,330 649,752 76.4 1,221.4 2,152.0 76.2
redo writes adaptive worker 368,330 649,752 76.4 1,221.4 2,152.0 76.2

I’ve added a few things that were masked by the AWR Diff Report. The writes lower than 4k is zero in the second snapshots because it’s the blocksize.
It’s interesting to see that redo size is higher and this is because you write 4k even when you have less. This is measured by ‘redo wastage’.

So, larger blocksize lowers the latency but increases the volume. Here, where NVMe optimizes the bandwidth to Flash storage, it may not be a problem.

So what?

You have to keep in mind that this workload, with lot of small transactions and no other waits, is a special workload for this test. If you are not in this extreme case, then the default block size is probably sufficient for latency and reduces the redo size. However, if log file sync latency is your bottleneck, you may consider increasing the blocksize.

Thanks to

arrow-electronicsOracle Authorized Solution Center, Switzerland.
Those tests were done on ODA X6-2M at Arrow OASC. Arrow has a wide range of Engineered Systems available for Oracle partners, like dbi services, and for customers to do Proof of Concepts, demos, learning, benchmarks, etc.

ODAArrow

 

Cet article Redo log block size on ODA X6 all flash est apparu en premier sur Blog dbi services.

ODA X-6 log file sync with NVMe flash

Thu, 2016-07-21 12:04

The new ODA X-6 has been announced last month with two smaller configurations and smaller prices: 2S and 2M. Small, but high performance configuration: all SSD, and I/O transfer optimized for Flash with PCIe bus and NVMe protocol. Let’s see how it keeps up with an high OLTP workload.

Just in case you wonder if Flash is good for writes, and especially for redo, we have run a workload that mainly inserts rows and commit, from 10 sessions without think time.

log file sync

From the following picture shows a workload all in CPU: OLTP mainly inserts.
CaptureX62M003
All in CPU and some waits on ‘log file sync’, the commit wait event. At first sight, it seems that those waits are high (30%) and irregular (peaks in transaction rate). But beyond the colors, let’s check the numbers:


Load Profile Per Second Per Transaction Per Exec Per Call
~~~~~~~~~~~~~~~ --------------- --------------- --------- ---------
DB Time(s): 9.5 0.0 0.00 0.00
DB CPU(s): 6.7 0.0 0.00 0.00
Background CPU(s): 0.4 0.0 0.00 0.00
Redo size (bytes): 28,534,906.0 5,647.3
Logical read (blocks): 546,914.9 108.2
Block changes: 167,747.0 33.2
Physical read (blocks): 15.2 0.0
Physical write (blocks): 4,807.8 1.0
Read IO requests: 8.7 0.0
Write IO requests: 2,581.3 0.5
Read IO (MB): 0.1 0.0
Write IO (MB): 37.6 0.0
IM scan rows: 0.0 0.0
Session Logical Read IM:
User calls: 3,881.8 0.8
...
Executes (SQL): 104,409.9 20.7
Rollbacks: 8.5 0.0
Transactions: 5,052.9

5000 transactions per seconds, 30GB/s of redo, that’s not bad. Then are those waits a problem?

Wait event historgrams


Wait Event Histogram DB/Inst: LABDB1/labdb1 Snaps: 75-76
-> Units for Total Waits column: K is 1000, M is 1000000, G is 1000000000
-> % of Waits: value of .0 indicates value was % of Waits: column heading of <=1s is truly 1s is truly >=1024ms
-> Ordered by Event (idle events last)
% of Waits
-----------------------------------------------
Total
Event Waits <1ms <2ms <4ms <8ms <16ms <32ms 1s
------------------------- ------ ----- ----- ----- ----- ----- ----- ----- -----
log file parallel write 4.2M 96.3 1.6 1.5 .3 .2 .0 .0
log file sequential read 25.8K 22.2 22.5 38.2 14.0 2.3 .6 .1
log file single write 22 86.4 9.1 4.5
log file switch (private 73 4.1 75.3 17.8 2.7
log file switch completio 20 100.0
log file sync 3.5M 77.5 10.6 7.8 2.1 1.2 .7 .2

Log file parallel writes are all less than what you can have with spinning disks, and 96% are less than one millisecond.
Log file sync, the only time where a user may wait for an I/O, is mostly less than 8 millisecond. Actually, the average is:


Top 10 Foreground Events by Total Wait Time
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
Total Wait Wait % DB Wait
Event Waits Time (sec) Avg(ms) time Class
------------------------------ ----------- ---------- ---------- ------ --------
DB CPU 9082.7 71.1
log file sync 3,472,568 3487.5 1.00 27.3 Commit
enq: HW - contention 3,169 195 61.55 1.5 Configur

1 millisecond. Even if a application with a very bad design does an hundred of commits per user interaction, the user will not see it.

This is done on a database created as-is with the ODA provisioning interface. Files are on ACFS. ODA is bare metal (no virtualization for 2S and 2M). Redo logs have 512 bytes block size.

So what?

This is just a first quick test and it looks promising. This Oracle Database Appliance X6-2M is sold at 24,000 dollars. And the X6-2S at 18,000 dollars has exactly the same storage but only one socket. This is a great opportunity for small customers with few Oracle databases in Enterprise Edition or Standard Edition.

 

Cet article ODA X-6 log file sync with NVMe flash est apparu en premier sur Blog dbi services.

Gettin Ansible up and running on a SLES 12 SP1 host

Wed, 2016-07-20 06:38

Recently at a customer we had to setup Ansible on a SLES 12 SP1 host. In comparison to RedHat or Ubuntu based distributions there are some more steps to do as Ansible is not included in the standard SLES repositories. Here we go …

Start with the user, group, directories and permissions:

host:~ # groupadd ansible
host:~ # useradd -g ansible -m ansible
host:~ # passwd ansible
host:~ # mkdir /opt/ansible
host:~ # chown ansible:ansible /opt/ansible
host:~ # mkdir /etc/ansible
host:~ # chown ansible:ansible /etc/ansible
host:~ # touch /etc/ansible/hosts
host:~ # chown ansible:ansible /etc/ansible/hosts
host:~ # chmod 600 /etc/ansible/hosts

In our case we wanted to install Ansible directly from the git repository on github so we needed to install git:

host:~ # zypper install git

As pip was not available in the os repositories we had to install manually:

host:~ # cd /var/tmp
host:~ # export http_proxy=proxy:port    # if a proxy is used
host:~ # export https_proxy=proxy:port   # if a proxy is used
host:~ # wget https://bootstrap.pypa.io/get-pip.py
host:~ # python get-pip.py
host:~ # pip install paramiko PyYAML Jinja2 httplib2 six

Installing Ansible from here on is straight forward:

host:/var/tmp # su - ansible
ansible@host:~> cd /opt/ansible/
ansible@host:/opt/ansible> export http_proxy=webproxy.amag.car.web:8080
ansible@host:/opt/ansible> export https_proxy=webproxy.amag.car.web:8080
ansible@host:/opt/ansible> git config --global http.proxy $http_proxy
ansible@host:/opt/ansible> git config --global https.proxy $http_proxy
ansible@host:/opt/ansible>git clone https://github.com/ansible/ansible.git --recursive
ansible@host:/opt/ansible/ansible> git pull --rebase
Current branch devel is up to date.
ansible@host:/opt/ansible/ansible> ansible@s1100tap460:/opt/ansible/ansible> git submodule update --init --recursive

Ansible provides a script which makes it easy to setup the environment:

ansible@host:/opt/ansible/ansible>cd ansible
ansible@host:/opt/ansible/ansible>. hacking/env-setup

This does the initial stuff required for running Ansible out of a git repository and does set all the environment variables. It is good idea to do this automatically once you login as ansible user:

ansible@host:/opt/ansible/ansible> echo ". /opt/ansible/ansible/hacking/env-setup -q" >> ~/.bash_profile

Now you can execute ansible without adjusting your environment or switching to the installation directory:

ansible@host:/opt/ansible/ansible> which ansible
/opt/ansible/ansible/bin/ansible
ansible@host:/opt/ansible/ansible> ansible --version
ansible 2.2.0 (devel 4cc4dc6793) last updated 2016/07/19 13:51:23 (GMT +200)
  lib/ansible/modules/core: (detached HEAD 7de287237f) last updated 2016/07/19 13:51:40 (GMT +200)
  lib/ansible/modules/extras: (detached HEAD 68ca157f3b) last updated 2016/07/19 13:51:53 (GMT +200)
  config file =

To test we can add the localhost to the Ansible inventory:

ansible@host:/opt/ansible/ansible> echo "127.0.0.1" >> /etc/ansible/hosts

… and then exchange the public ssh key of the ansible user:

ansible@host:~> ssh-keygen -t dsa
ansible@host:~> ssh-copy-id -i ~/.ssh/id_dsa.pub ansible@127.0.0.1

Do a functional test:

ansible@s1100tap460:~>  ansible all -m ping
127.0.0.1 | SUCCESS => {
    "changed": false,
    "ping": "pong"
}

… and your done. Happy deploying …

 

Cet article Gettin Ansible up and running on a SLES 12 SP1 host est apparu en premier sur Blog dbi services.

SQL Server AlwaysOn: new services packs and new diagnostic capabilities

Mon, 2016-07-18 13:24

As you certainly know, the SQL Server 2014 SP2 has been released by Microsoft with some interesting improvements that concern SQL Server AlwaysOn and availability groups feature. In fact, all of these improvements are also included into SQL Server 2012 SP3 and SQL Server 2016. Among all fixes and improvements that concern AlwaysOn, I would like to focus on those described in the Microsoft KB3173156 and KB3112363. But in this first blog post, let’s say that I will just cover the improvement about the lease timeout which is part of the AlwaysOn health model.

Did you already face lease timeout issue ? If yes, you have certainly notice dit is an good indicator of system wide problem and figure out what is the root cause could be a burden task because we missed diagnostic information and we had to correlate different performance metrics as well. Fortunately, the release of new service packs provide enhancements in this area.

Let’s take an example with a 100% CPU utilization scenario that leads to make the primary replica unresponsive and unable to respond to cluster isAlive() routine. This is typically a situation where we may face a lease timeout issue. After simulating this scenario on my lab environment,here what I found in the SQL Server error log from my primary replica. (I have voluntary filtered to include only the sample we want to focus on).

blog 101 - AG 2014 SP2 - lease timeout issue

Firstly, we may see different new messages related to lease timeout issues between the range interval 12:39:54 – 12:43:22. For example, the WSFC did not receive a process event signal from SQL Server within the lease timeout period or the lease between AG and the WSFC has expired. Diagnostic messages have been enhanced to give us a better understanding of the lease issue. But at this point we know we are facing lease timeout but we don’t know the root cause yet. Imrovements have also been extented to the cluster log in order to provide more insights to the system behavior at the moment of the lease timeout issue as we may see below:

00000644.00000768::2016/07/15-12:40:06.575 ERR   [RCM] rcm::RcmResource::HandleFailure: (TestGrp)

00000644.00000c84::2016/07/15-12:40:06.768 INFO [GEM] Node 2: Sending 1 messages as a batched GEM message

00000644.00000768::2016/07/15-12:40:06.768 INFO [RCM] resource TestGrp: failure count: 0, restartAction: 0 persistentState: 1.

00000644.00000768::2016/07/15-12:40:06.768 INFO [RCM] numDependents is zero, auto-returning true

00000644.00000768::2016/07/15-12:40:06.768 INFO [RCM] Will queue immediate restart (500 milliseconds) of TestGrp after terminate is complete.

00000644.00000768::2016/07/15-12:40:06.768 INFO [RCM] Res TestGrp: ProcessingFailure -> WaitingToTerminate( DelayRestartingResource )

00000644.00000768::2016/07/15-12:40:06.768 INFO [RCM] TransitionToState(TestGrp) ProcessingFailure–>[WaitingToTerminate to DelayRestartingResource].

00000644.00000768::2016/07/15-12:40:06.768 INFO [RCM] Res TestGrp: [WaitingToTerminate to DelayRestartingResource] -> Terminating( DelayRestartingResource )

00000644.00000768::2016/07/15-12:40:06.768 INFO [RCM] TransitionToState(TestGrp) [WaitingToTerminate to DelayRestartingResource]–>[Terminating to DelayRestartingResource].

00000cc0.00001350::2016/07/15-12:40:12.452 WARN [RES] SQL Server Availability Group: [hadrag] Lease timeout detected, logging perf counter data collected so far

00000cc0.00001350::2016/07/15-12:40:12.452 WARN [RES] SQL Server Availability Group: [hadrag] Date/Time, Processor time(%), Available memory(bytes), Avg disk read(secs), Avg disk write(secs)

00000cc0.00001350::2016/07/15-12:40:12.452 WARN [RES] SQL Server Availability Group: [hadrag] 7/15/2016 10:39:24.0, 8.866394, 912523264.000000, 0.000450, 0.000904

00000cc0.00001350::2016/07/15-12:40:12.452 WARN [RES] SQL Server Availability Group: [hadrag] 7/15/2016 10:39:34.0, 25.287347, 919531520.000000, 0.001000, 0.000594

00000cc0.00001350::2016/07/15-12:40:12.452 WARN [RES] SQL Server Availability Group: [hadrag] 7/15/2016 10:39:44.0, 25.360508, 921534464.000000, 0.000000, 0.001408

00000cc0.00001350::2016/07/15-12:40:12.452 WARN [RES] SQL Server Availability Group: [hadrag] 7/15/2016 10:39:55.0, 81.225454, 921903104.000000, 0.000513, 0.000640

00000cc0.00001350::2016/07/15-12:40:12.452 WARN [RES] SQL Server Availability Group: [hadrag] 7/15/2016 10:40:5.0, 100.000000, 922415104.000000, 0.002800, 0.002619

00000cc0.00001350::2016/07/15-12:40:12.452 INFO [RES] SQL Server Availability Group: [hadrag] Stopping Health Worker Thread

 

According to the SQL Server error log time range we may notice similar messages that concern the detection of lease timeout with some additional information that came from the perfmon counters (Concerned lines are underlined in the sample above). If we reformat the concerned portion into the table below we may get a better identification of our issue

Date/Time Processor time (%) Availability memory(bytes) Avg disk read(secs) Avg disk write(secs) 10:39:24.0 8.866394 912523264 912523264 0.000904 10:39:34.0 25.287347 919531520 0.001000 0.000594 10:39:44.0 25.360508 921534464 0.000000 0.001408 10:39:55.0 81.225454 921903104 0.000513 0.000640 10:40:5.0 100.000000 922415104 0.002800 0.002619

 

CPU utilization is what we must focus on here. So getting this valuable information directly to the cluster.log when we troubleshoot lease timeout issue will help us a lot. But just to clarify, this doesn’t mean that it was not possible with older versions but we have to retrieve them in a more complicated way (by using the AlwaysOn_health extended event for example).

Next, other improvements concern existing extended events like availability_group_lease_expired and hadr_ag_lease_renewal. The next picture points out new available fields like current_time, new_timeout and state as well.

blog 101 - AG 2014 SP2 - lease time out xe new fields

Let me show you their interest with another example. This time, I voluntary hang my sqlserver.exe process related to the primary replica in order to trigger an unresponsive lease scenario. I got interesting outputs from the extended event trace on both sides.

blog 101 - AG 2014 SP2 - lease time out xe test 2

From the former primary, there are no related records during the period of the SQL Server process responsiveness but we may see a record at 17:19:11. The lease renewal process fails and  we get a better picture of the problem by looking at the corresponding state (LeaseNotValid) followed by the availability_group_lease_expired event. Note that the current_time (time at which the lease expired) value is greater than the new_timeout (time out time, when availability_group_lease_expired is raised) value here – 3215765 > 3064484 – which confirms that we experienced a timeout issue in this case.

On the new primary, we may notice the start of the lease worker thread but until the concerned replica stabilizes the PRIMARY ONLINE state, it voluntary postpones the lease check process (materialized by StartedExcessLeaseSleep / ExcessSleepSucceeded state values).

In the next blog I will talk about improvements in the detection of the availability group replication latency.

Stay tuned!

 

 

Cet article SQL Server AlwaysOn: new services packs and new diagnostic capabilities est apparu en premier sur Blog dbi services.

AMM and ASMM derived parameters

Sat, 2016-07-16 15:24

The latest DBA Essentials Workshop training I’ve given raised a question about PGA_AGGREGATE_LIMIT. The default depends on PGA_AGGREGATE_TARGET. So how is it calculated in AMM where PGA_AGGREGATE_TARGET is dynamic? Is it also dynamic or is it determined by the value at instance startup only?

The PGA_AGGREGATE_LIMIT default value is documented. I’ll use the following query to display the values of the concerned parameters:

select
dbms_stats_internal.get_parameter_val('pga_aggregate_limit')/1024/1024/1024 "pga_aggregate_limit",
2 "2GB",
3*dbms_stats_internal.get_parameter_val('processes')/1024 "3MB*processes",
2*dbms_stats_internal.get_parameter_val('__pga_aggregate_target')/1024/1024/1024 "2*__pga_aggregate_target",
dbms_stats_internal.get_parameter_val('__sga_target')/1024/1024/1024 "__sga_target",
dbms_stats_internal.get_parameter_val('__pga_aggregate_target')/1024/1024/1024 "__pga_aggregate_target"
from dual
/

and I start with the following:

pga_aggregate_limit 2GB 3MB*processes 2*__pga_aggregate_target __sga_target __pga_aggregate_target
------------------- ---------- ------------- ------------------------ ------------ ----------------------
2.40625 2 .87890625 2.40625 1.796875 1.203125

I’m in AMM with only MEMORY_TARGET set to 3G. The dynamic SGA is at 1.8G and the PGA at 1.2G. The PGA_AGGREGATE_LIMIT is at 200% of the PGA which is 2.4G

I increase the SGA in order to see a resize of the PGA


SQL> alter system set sga_target=2500M;
System altered.

The PGA is now about 500M in order to release some space for SGA:

pga_aggregate_limit 2GB 3MB*processes 2*__pga_aggregate_target __sga_target __pga_aggregate_target
------------------- ---------- ------------- ------------------------ ------------ ----------------------
2.40625 2 .87890625 1.09375 1.796875 .546875

However, the PGA_AGGREGATE_LIMIT did no change. the formula is not dynamic. The value that has been calculated at startup remains.

spfile

When dynamic components are resized, the values are written into the spfile with double underscore parameters, so that a restart of the instance starts with same value:
SQL> host strings /u01/app/oracle/product/12.1.0/dbhome_1/dbs/spfileCDB.ora | grep target
CDB.__pga_aggregate_target=587202560
CDB.__sga_target=1929379840
*.memory_max_target=5G
*.memory_target=3G
.sga_target=2634022912

So let’s restart and see what happens to PGA_AGGREGATE_LIMIT (which has no double underscore entry in spfile)

SQL> startup force
ORACLE instance started.
 
Total System Global Area 5368709120 bytes
Fixed Size 2935712 bytes
Variable Size 3976201312 bytes
Database Buffers 721420288 bytes
Redo Buffers 13840384 bytes
In-Memory Area 654311424 bytes
Database mounted.
Database opened.
 
SQL> select
2 dbms_stats_internal.get_parameter_val('pga_aggregate_limit')/1024/1024/1024 "pga_aggregate_limit",
3 2 "2GB",
4 3*dbms_stats_internal.get_parameter_val('processes')/1024 "3MB*processes",
5 2*dbms_stats_internal.get_parameter_val('__pga_aggregate_target')/1024/1024/1024 "2*__pga_aggregate_target",
6 dbms_stats_internal.get_parameter_val('__sga_target')/1024/1024/1024 "__sga_target",
7 dbms_stats_internal.get_parameter_val('__pga_aggregate_target')/1024/1024/1024 "__pga_aggregate_target"
8 from dual
9 /
 
pga_aggregate_limit 2GB 3MB*processes 2*__pga_aggregate_target __sga_target __pga_aggregate_target
------------------- ---------- ------------- ------------------------ ------------ ----------------------
2 2 .87890625 1.09375 2.453125 .546875

The good thing is that the value is calculated from the actual values. Here 200% of PGA is smaller than 2G so 2G is used.

The bad thing is that a restart of the instance may bring a different behavior than before than restart.

So what?

This instability is easy to solve: don’t use AMM. SGA and PGA are different things and you should size them separately.
But the problem is wider. There are other parameters that can show same behavior. For example, the default db_file_multiblock_read_count can be limited by processes x __db_block_buffers.
You may have to change some values either manually or automatically at the start of a new application because you don’t know which is the best setting. But once the application is more stable, you should stabilize the dynamic sizing by setting minimum values.

 

Cet article AMM and ASMM derived parameters est apparu en premier sur Blog dbi services.

SQL Server 2016: New SQL PowerShell CMDLETs for ErrorLog

Thu, 2016-07-14 00:54

With the latest release of SQL Server Management Studio(SSMS) 2016 (13.0.15500.91), downloadable here, was introduced new CMDLETs for Always Encrypted, SQL Agent and the  ErrorLog.
SSMS_update01

In this article, I will present you the 2 new CMDLETs for the Error Logs:

  • Get-SqlErrorLog: Retrieves the SQL Server Logs.
  • Set-SqlErrorLog: Sets or resets the maximum number of error log files before recycling.

My first step is to search all commands with “Sql”:

Get-Command | Select Name |Where-Object {$_.Name -like "*Sql"*}

PowerShell_ErrorLog_01

As you can see, I have a lot of commands. I filter with SqlErrorLog and have the detail of both commands:

Get-Command | Where-Object {$_.Name -like "*SqlErrorLog*"} | Format-List *

PowerShell_ErrorLog_02

To have the detail per command, I use these commands:

Get-Command | Where-Object {$_.Name -eq "Get-SqlErrorLog"} | Format-List *
Get-Command | Where-Object {$_.Name -eq "Set-SqlErrorLog"} | Format-List *

 

CMDLET Get-SqlErrorLog

For Example, a simple query to retrieve all backup lines:

Get-SqlErrorLog | Where-Object { $_.text -like "*BACKUP*"} | Out-GridView

PowerShell_ErrorLog_03 You can do the same for the failed login:

Get-SqlErrorLog | Where-Object { $_.text -like "*Failed*"} | Out-GridView

PowerShell_ErrorLog_04

Or directly find all errors between 2 dates with –Before and –After parameters:

Get-SqlErrorLog -Before "2016/06/30" -After "2016/06/28" | Where-Object { $_.text -like "*Error:*"} | Out-GridView

PowerShell_ErrorLog_05 >

CMDLET Set-SqlErrorLog

It is very easy to configure the number of errorlog files with this command:

Set-SqlErrorLog -MaxLogCount [6-99]

PowerShell_ErrorLog_06

After the command:
PowerShell_ErrorLog_07

For the fun, I try to enter a value equal to 1 and a value equal to 100 to see if an error message appears:
PowerShell_ErrorLog_08

It is very nice to have these 2 news CMDLETs in SQL PowerShell ;-)

 

Cet article SQL Server 2016: New SQL PowerShell CMDLETs for ErrorLog est apparu en premier sur Blog dbi services.

Oracle Public Cloud patch conflict

Wed, 2016-07-13 13:20

This morning I wanted to test a patch (18633374) in the Oracle Cloud Service. The DBaaS was created as an ‘Enterprise Edition Extreme Performance’ which comes with all options, including multitenant option. I applied my patch. My test required to create a new tablespace but it failed with: ORA-65010: maximum number of pluggable databases created

This is the kind of message we get when we try to use a feature that is not allowed in Standard Edition. But I was in Enterprise Edition here:
First thing I did was to tweet a screenshot, in case someone encountered the issue already:

This is not exactly my definition of 'EE Extreme Perf'. WTF did I do? pic.twitter.com/y34JoX9mY3

— Franck Pachot (@FranckPachot) July 13, 2016

And second thing was to try to reproduce the issue because it’s a test environment where I did things quickly and I don’t remember all what was done.
I create a new service in EE Extreme Performance:

Connected to:
Oracle Database 12c EE Extreme Perf Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Oracle Label Security, OLAP, Advanced Analytics
and Real Application Testing options

check that I can create additional pluggable databases

SQL> show pdbs
 
CON_ID CON_NAME OPEN MODE RESTRICTED
---------- ------------------------------ ---------- ----------
2 PDB$SEED READ ONLY NO
3 PDB READ WRITE NO
4 DEMOS READ WRITE NO
SQL> create pluggable database PDBNEW admin user admin identified by admin;
Pluggable database created.
SQL> create pluggable database PDBNEW1 admin user admin identified by admin;
Pluggable database created.

I tried to do the same as I did (apply patch 18633374)

[oracle@CDBA 18633374]$ dbshut $ORACLE_HOME
Processing Database instance "CDB": log file /u01/app/oracle/product/12.1.0/dbhome_1/shutdown.log
$ORACLE_HOME/OPatch/opatch apply
[oracle@CDBA 18633374]$ $ORACLE_HOME/OPatch/opatch apply
Oracle Interim Patch Installer version 12.1.0.1.10
Copyright (c) 2016, Oracle Corporation. All rights reserved.
 
Oracle Home : /u01/app/oracle/product/12.1.0/dbhome_1
Central Inventory : /u01/app/oraInventory
from : /u01/app/oracle/product/12.1.0/dbhome_1/oraInst.loc
OPatch version : 12.1.0.1.10
OUI version : 12.1.0.2.0
Log file location : /u01/app/oracle/product/12.1.0/dbhome_1/cfgtoollogs/opatch/18633374_Jul_13_2016_11_23_28/apply2016-07-13_11-23-28AM_1.log
 
Verifying environment and performing prerequisite checks...
 
Conflicts/Supersets for each patch are:
 
Patch : 18633374
 
Conflict with 23192060
Conflict details:
/u01/app/oracle/product/12.1.0/dbhome_1/lib/libserver12.a:kpdbc.o
/u01/app/oracle/product/12.1.0/dbhome_1/lib/libserver12.a:krb.o
/u01/app/oracle/product/12.1.0/dbhome_1/lib/libserver12.a:krbb.o
/u01/app/oracle/product/12.1.0/dbhome_1/lib/libserver12.a:krbi.o
/u01/app/oracle/product/12.1.0/dbhome_1/lib/libserver12.a:krbabr.o
 
Following patches have conflicts: [ 18633374 23192060 ]

Yes, I remember that I had to de-install an interim patch that was there in my newly created DBaaS:


[oracle@CDBA 18633374]$ $ORACLE_HOME/OPatch/opatch lspatches
23192060;
22674709;Database PSU 12.1.0.2.160419, Oracle JavaVM Component (Apr2016)
22291127;Database Patch Set Update : 12.1.0.2.160419 (22291127)

Let’s do it:


[oracle@CDBA 18633374]$ $ORACLE_HOME/OPatch/opatch rollback -id 23192060
Oracle Interim Patch Installer version 12.1.0.1.10
Copyright (c) 2016, Oracle Corporation. All rights reserved.
 
Oracle Home : /u01/app/oracle/product/12.1.0/dbhome_1
Central Inventory : /u01/app/oraInventory
from : /u01/app/oracle/product/12.1.0/dbhome_1/oraInst.loc
OPatch version : 12.1.0.1.10
OUI version : 12.1.0.2.0
Log file location : /u01/app/oracle/product/12.1.0/dbhome_1/cfgtoollogs/opatch/23192060_Jul_13_2016_11_24_49/rollback2016-07-13_11-24-49AM_1.log
 
RollbackSession rolling back interim patch '23192060' from OH '/u01/app/oracle/product/12.1.0/dbhome_1'
 
Please shutdown Oracle instances running out of this ORACLE_HOME on the local system.
(Oracle Home = '/u01/app/oracle/product/12.1.0/dbhome_1')
 
Is the local system ready for patching? [y|n] y
User Responded with: Y
 
Patching component oracle.oracore.rsf, 12.1.0.2.0...
 
Patching component oracle.rdbms.rsf, 12.1.0.2.0...
 
Patching component oracle.rdbms, 12.1.0.2.0...
Deleting "kscs.o" from archive "/u01/app/oracle/product/12.1.0/dbhome_1/lib/libserver12.a"
Deleting "kststqae.o" from archive "/u01/app/oracle/product/12.1.0/dbhome_1/lib/libserver12.a"
 
Patching component oracle.rdbms.dbscripts, 12.1.0.2.0...
RollbackSession removing interim patch '23192060' from inventory
Log file location: /u01/app/oracle/product/12.1.0/dbhome_1/cfgtoollogs/opatch/23192060_Jul_13_2016_11_24_49/rollback2016-07-13_11-24-49AM_1.log
 
OPatch succeeded.

Then I check if I’m still able to create a new PDB:


[oracle@CDBA 18633374]$ dbstart $ORACLE_HOME
Processing Database instance "CDB": log file /u01/app/oracle/product/12.1.0/dbhome_1/startup.log

Ok, I’ll not detail the following problem for the moment:
ORA-01078: failure in processing system parameters
LRM-00101: unknown parameter name 'encrypt_new_tablespaces'

During my tests I removed that encrypt_new_tablespaces parameter from the spfile to continue.


[oracle@CDBA 18633374]$ sqlplus / as sysdba
 
SQL*Plus: Release 12.1.0.2.0 Production on Wed Jul 13 11:34:57 2016
 
Copyright (c) 1982, 2014, Oracle. All rights reserved.
 
Connected to:
Oracle Database 12c EE Extreme Perf Release 12.1.0.2.0 - 64bit Production
 
SQL> create pluggable database PDBNEW2 admin user admin identified by admin;
create pluggable database PDBNEW2 admin user admin identified by admin
  *
ERROR at line 1:
ORA-65010: maximum number of pluggable databases created

Ok, issue reproduced. Interim patch 23192060 is required to be able to have EE Extreme Perf able to act as an Enterprise Edition.

No description, but from alert.log the list of bugs bundled there:
Patch Id: 23192060
Patch Description:
Patch Apply Time:
Bugs Fixed: 19366375,19665921,19770063,21281607,21470120,21923026,23072137

A lookup in MOS gives:

19366375 – CORRECT ELAPSED TIME CALCULATION AND ADD DIAGNOSTIC FOR BUG 18920073
19665921 – ENABLE HCC FOR DBCS REGARDLESS OF EXTREME PERFORMANCE OPTION OR OTHER OPTIONS
19770063 – GET INFO ABOUT CLOUD BUNDLE INTO V$INSTANCE TABLE
21281607 – TRANSPARENTLY ENCRYPT TABLESPACE AT CREATION IN CLOUD
21470120 – CLOUD BACKPORT FOR HCC AND VERSION CHANGES
21923026 – ORA-600 [OLTP COMPRESSION SANITY CHECK] 23072137 – TDE OFFLINE ENCRYPTION SHOULD NOT BE ALLOWED CONCURRENTLY DURING MRP

Several Oracle Public Cloud specifics here. The ‘encrypt_new_tablespaces’ to do TDE for all new user tablespace, the HCC which is possible in any EE on the Cloud, and info about cloud edition in v$instance…

Let’s check the edition now that I’ve de-installed the patch 19770063:

SQL> select edition from v$instance;
 
EDITION
-------
UNKNOWN

This value comes from x$ksuxsinst.ksuxsedition which is 0 there and the view knows only the value 8 for Enterprise Edition:

SQL> select ksuxsedition from x$ksuxsinst;
 
KSUXSEDITION
------------
0
 
SQL> select view_definition from v$fixed_view_definition where view_name='GV$INSTANCE';
 
VIEW_DEFINITION
---------------
select ks.inst_id,ksuxsins,ksuxssid,ksuxshst,ksuxsver,ksuxstim,decode(ksuxssts,0,'STARTED',1,'MOUNTED',2,'OPEN',3,'OPEN MIGRATE','UNKNOWN'),decode(ksuxsshr,0,'NO',1,'YES',2,NULL),ksuxsthr,decode(ksuxsarc,0,'STOPPED',1,'STARTED','FAILED'),decode(ksuxslsw,0,NULL,2,'ARCHIVE LOG',3,'CLEAR LOG',4,'CHECKPOINT', 5,'REDO GENERATION'),decode(ksuxsdba,0,'ALLOWED','RESTRICTED'),decode(ksuxsshp,0,'NO','YES'),decode(kvitval,0,'ACTIVE',2147483647,'SUSPENDED','INSTANCE RECOVERY'),decode(ksuxsrol,1,'PRIMARY_INSTANCE',2,'SECONDARY_INSTANCE','UNKNOWN'), decode(qui_state,0,'NORMAL',1,'QUIESCING',2,'QUIESCED','UNKNOWN'), decode(bitand(ksuxsdst, 1), 0, 'NO', 1, 'YES', 'NO'), ks.con_id, decode(ksuxsmode,2,'READ MOSTLY','REGULAR'),
decode(ksuxsedition, 2, 'PO', 4, 'SE', 8, 'EE', 16, 'XE', 32, 'CS', 40, 'CE', 'UNKNOWN'), ksuxsfam from x$ksuxsinst ks, x$kvit kv, x$quiesce qu where kvittag = 'kcbwst'

No doubt, this meay lead to inaccesible EE features.

When you create a DBaaS in, wich includes the patch 2319206, you get the following for EE Extreme Performance:


SQL> host $ORACLE_HOME/OPatch/opatch lspatches
23192060;
22674709;Database PSU 12.1.0.2.160419, Oracle JavaVM Component (Apr2016)
22291127;Database Patch Set Update : 12.1.0.2.160419 (22291127)
 
SQL> select edition from v$instance;
 
EDITION
-------
XP
 
SQL> select ksuxsedition from x$ksuxsinst;
 
KSUXSEDITION
------------
256
 
SQL> select view_definition from v$fixed_view_definition where view_name='GV$INSTANCE';
 
VIEW_DEFINITION
---------------
select ks.inst_id,ksuxsins,ksuxssid,ksuxshst,ksuxsver,ksuxstim,decode(ksuxssts,0,'STARTED',1,'MOUNTED',2,'OPEN',3,'OPEN MIGRATE','UNKNOWN'),decode(ksuxsshr,0,'NO',1,'YES',2,NULL),ksuxsthr,decode(ksuxsarc,0,'STOPPED',1,'STARTED','FAILED'),decode(ksuxslsw,0,NULL,2,'ARCHIVE LOG',3,'CLEAR LOG',4,'CHECKPOINT', 5,'REDO GENERATION'),decode(ksuxsdba,0,'ALLOWED','RESTRICTED'),decode(ksuxsshp,0,'NO','YES'),decode(kvitval,0,'ACTIVE',2147483647,'SUSPENDED','INSTANCE RECOVERY'),decode(ksuxsrol,1,'PRIMARY_INSTANCE',2,'SECONDARY_INSTANCE','UNKNOWN'), decode(qui_state,0,'NORMAL',1,'QUIESCING',2,'QUIESCED','UNKNOWN'), decode(bitand(ksuxsdst, 1), 0, 'NO', 1, 'YES', 'NO'), ks.con_id, decode(ksuxsmode,1,'REGULAR',2,'READ MOSTLY','READ ONLY'),
decode(ksuxsedition, 2, 'PO', 4, 'SE', 8, 'EE', 16, 'XE', 32, 'CS', 64, 'CE', 128, 'HP', 256, 'XP', 'UNKNOWN'), ksuxsfam, kjidtv from x$ksuxsinst ks, x$kvit kv, x$quiesce qu, x$kjidt where kvittag = 'kcbwst'

So what?

The Oracle Public Cloud is a strange PaaS: database is provisioned automatically but you can break everything you want later: you are DBA, SYSDBA and even root, as in IaaS. But it’s not because you can do everything that you should do everything. The Oracle Database software has been adapted for the Cloud and requires specific patches. After each PSU, those patches are merged to be applied over the PSU. And if you need to apply a new patch which conflicts with one of them, then you should request a merge that includes the Cloud fixes.

Having different patches for Cloud and for on-premises is not very nice. If the goal is to have dev and test in the public cloud and prod on-premises, then we want the same software and the same patching procedures. But don’t worry, this is because the cloud arrived after 12.1.0.2 release. Next generation will be stabilized on the cloud first. We complains about ‘cloud-first’?

 

Cet article Oracle Public Cloud patch conflict est apparu en premier sur Blog dbi services.

SQL Server 2016 AlwaysOn: Direct seeding and backup considerations

Tue, 2016-07-12 15:27

As promised in my prior blog post, we’ll talk now about direct seeding feature and backups. We found an interesting LIMIT_CONCURRENT_BACKUPS value for current state of the hadr_automatic_seeding_state_transition event. As said the last time, this value is meaningful by itself.

So let’s try to perform a backup while the database is concerned by an active seeding session. In fact, my backup was blocked (as I supposed) and here what I found from the sys.dm_tran_locks and sys.dm_exec_requests DMVs.

SELECT	
	r.command,
	r.wait_type,
	r.wait_resource,
	DB_NAME(tl.resource_database_id) as [database_name],
	tl.resource_type,
	tl.resource_subtype,
	tl.request_mode,
	tl.request_type,
	tl.request_status,
	r.session_id as blocked_session_id,
	r.blocking_session_id
FROM sys.dm_tran_locks as tl
join sys.dm_exec_requests as r
	on tl.request_session_id = r.session_id
WHERE tl.request_session_id = <concerned session>

 

  • The blocked session (my backup)

blog 100 - AG direct seeding - 1 - blocked session

  • Blocking session (my direct seeding session)

blog 100 - AG direct seeding - 2 - blocking session

In short, my backup session is waiting on database object with a LCK_M_U lock type that is already held by my direct seeding session. But the interesting part is that adding a database with direct seeding mode seems to be similar to a performing a backup operation from a locking perspective. We also recognize characteristic wait types that occur during a backup operation (ASYNC_IO_COMPLETION, BACKUPBUFFER). So direct seeding as a streaming backup? Well that sounds good!

Well, let’s continue with this other query

SELECT 
	r.command,
	r.wait_type,
	r.wait_resource,
	r.scheduler_id
from sys.dm_exec_requests as r
join sys.dm_os_schedulers as s
	on s.scheduler_id = r.scheduler_id
where r.session_id = 82
	AND s.status = 'VISIBLE ONLINE'

 

blog 100 - AG direct seeding - 3 - direct seeding internal

Do you notice the command text related to the direct seeding session? VDI_CLIENT_WORKER seems to indicate that SQL Server is using the virtual device interface internally during the seeding operation (VDI prefix as Virtual Device Interface). If SQL Server uses backup stream and VDI interface internally during seeding operation, it may explain why it has to limit concurrent backups in our case.

How to check if SQL Server uses VDI during direct seeding operation? Well, we know that using VDI implies to use sqlvdi.dll. So my first idea consisted in checking if the corresponding dll is loaded as module used by the sqlservr.exe process and the tasklist tool is a good candidate to answer to our need.

blog 100 - AG direct seeding - 4 - direct seeding sqlvdi dll

The above picture seems to confirm my assumption but this test is not perfect. Indeed, we may also claim that anything else of the  could have trigger the load of the sqlvdi.dll. So my second idea was to capture the stack dump from the windbg tool while running the seeding session and here what I was able to see:

blog 100 - AG direct seeding - 5 - direct seeding and windbg

We may recognize CHadrBuildSourceStateLimitConcurrentBackups (similar to value LIMIT_CONCURRENT_BACKUPS value from the hadr_automatic_seeding_state_transition event) what is a parameter to the function CallFunctAndResumeBuildTask(). Let’s also notice the use of sqlmin!CHadrBuildSourceStateSeeding::CreateVdiSession function that seems to confirm that SQL Server is using VDI session during seeding process.

Ok after this funny test, let’s back to something more important to keep in mind: we know that direct seeding will limit concurrent backups. Thus it will imply than our existing RPO may be impacted especially if the seeding operation takes a very long time.

Happy database seeding!!

 

 

 

Cet article SQL Server 2016 AlwaysOn: Direct seeding and backup considerations est apparu en premier sur Blog dbi services.

Set the SQL Native Client Default Port with PowerShell

Tue, 2016-07-12 08:06

I written an article about “SQL Server 2012: Configuring your TCP Port via PowerShell” and I received a question from PaulJ:
“How do you set the port for the SQL Native Client 11.0 Configuration (32bit) – as seen in the SQL Configuration Manager?”

This is a very good question and I decide to write this blog as an answer to this question.
The first step is always the same, initialization of my object:

[reflection.assembly]::LoadWithPartialName("Microsoft.SqlServer.SqlWmiManagement") | Out-Null

$wmi = New-Object ("Microsoft.SqlServer.Management.Smo.Wmi.ManagedComputer")

The second step is used to know for which client protocol the setting belongs to.
In the class “Microsoft.SqlServer.Management.Smo.Wmi.ManagedComputer”, you find a property “ClientProtocols” as you can see in the msdn web page:
tcpnativeclient01
I display the name and the protocol Properties with this command:

$wmi.ClientProtocols | Select displayname -ExpandProperty ProtocolProperties

tcpnativeclient02
As you can see, I have 4 client protocols (Named Pipes, default port, KEEPALIVE and KEEPALIVEINTERVAL).
The next step is to select the default port:

$tcp_list = $wmi.ClientProtocols  | Where-Object {$_.displayname -eq "TCP/IP"}
$default_tcp = $tcp_list.ProtocolProperties | Where-Object {$_.Name -eq "Default Port"}
$default_tcp

tcpnativeclient03
As you can see, the default client port is set to 1433 and now, I will set another value for this port:

$default_tcp.value=50100

Note: The port has a System.Int32 type
Validate this change with an Alter:

$tcp_list.alter()

To finish, do not forget to restart your services to activate the port change:

$sql_service = ($wmi.Services | Where-Object { $_.Type -eq "SqlServer" })
$sql_service.alter()
$sql_service.stop()
$sql_service.start()

tcpnativeclient04
Et voilà! The default port for the client protocol is changed!

 

Cet article Set the SQL Native Client Default Port with PowerShell est apparu en premier sur Blog dbi services.

Nulls in composite keys

Tue, 2016-07-12 06:36

Comparison of NULL can be misleading and it’s even worse for unique constraint validation. Having partial nulls in a composite key can be tricky because the SQL ANSI specification is not very easy to understand, and implementation can depend on the RDBMS. Here is an example with composite unique key and foreign key on Oracle.

Unique constraint

I create a table with a composite unique constraint:
SQL> create table TABLE1 (a char, b char, unique(a,b));
Table TABLE1 created.

I can insert a row with a=’X’ and b=’X':
SQL> insert into TABLE1 values ('X','X');
1 row inserted.

I cannot insert the same row:
SQL> insert into TABLE1 values ('X','X');
*
ERROR at line 1:
ORA-00001: unique constraint (SYS.SYS_C0015464) violated

I insert another row with same value for column a but different value for column b:
SQL> insert into TABLE1 values ('X','Y');
1 row inserted.

And another row with same value for column a but a null for column b:
SQL> insert into TABLE1 values ('X',null);
1 row inserted.

However, I cannot insert the same a second time:
SQL> insert into TABLE1 values ('X',null);
*
ERROR at line 1:
ORA-00001: unique constraint (SYS.SYS_C0015464) violated

If you look at documentation, this is documented as:
Because of the search mechanism for unique key constraints on multiple columns, you cannot have identical values in the non-null columns of a partially null composite unique key constraint.

It looks like an implementation reason (the search mechanism is the index that enforces the unique constraint). What is documented in SQL-92?
A unique constraint is satisfied if and only if no two rows in a table have the same non-null values in the unique columns.

How to interpret this? We cannot insert two (‘X’,null) because that would be two rows with same non-null value (a=’X’) and the Oracle implementation is compilent.

Or is it? We can also read the definition as the unique constraint being violated only when we find rows that have non-null values and they are the same. This is what MySQL and PostgresSQL do: accept duplicates when there is at least one null.
This is also what I found more intuitive: I usually consider NULL as a value that is not known at insert time but that will be assigned a value later during the lifecycle of the row. Thus, I expect to be able to insert rows where there is a null and check the constraint only when all columns have a value.

It is probably an implementation choice from Oracle which stores nulls as a zero-length string and then cannot have two identical entries in a unique index.

Now inserting a row where a is null and b is null:
SQL> insert into TABLE1 values (null,null);
1 row inserted.

And because that do not violate the rule whatever the way we read it (non-null values are not the same as there are no non-null values at all here) I can insert a second one:
SQL> insert into TABLE1 values (null,null);
1 row inserted.

This is documented as
Unless a NOT NULL constraint is also defined, a null always satisfies a unique key constraint

About implementation, there is no problem because full null entries are not stored in the index. They are stored in bitmap indexes, but bitmap indexes cannot be used to enforce a unique constraint.

In summary, here is what can be stored on a table where (a,b) is unique but nullable:

SQL> select rownum,TABLE1.* from TABLE1;
 
ROWNUM A B
---------- - -
1 X X
2 X Y
3 X
4
5

Foreign key

Now that I have a unique key, I can reference it:
SQL> create table TABLE2 (a char, b char, foreign key(a,b) references TABLE1(a,b));
Table TABLE2 created.

Yes. You don’t need to reference the primary key. Any unique key, even with nullable columns, can be referenced.

I can insert a row where parent exists:
SQL> insert into TABLE2 values('X','X');
1 row inserted.

As I’ve no unique key on the child, it’s many to one relationship:
SQL> insert into TABLE2 values('X','X');
1 row inserted.

I also have a parent with a=’X’ and b=’Y':
SQL> insert into TABLE2 values('X','Y');
1 row inserted.

But I’ve no parent with a=’Y':
SQL> insert into TABLE2 values('Y','Y');
*
ERROR at line 1:
ORA-02291: integrity constraint (SYS.SYS_C0015465) violated - parent key not found

So far so good. I said that I have a many to one relationship, but it’s a many to one or zero because my columns are nullable:
SQL> insert into TABLE2 values(null,null);
1 row inserted.

So far so good. But I have a composite key with nullable columns here, and I can insert a row where a=’X’ and b is null:
SQL> insert into TABLE2 values('X',null);
1 row inserted.

But do you think that all non null parent values must exist?
SQL> insert into TABLE2 values('Y',null);
1 row inserted.

Once again, this is documented as:
If any column of a composite foreign key is null, then the non-null portions of the key do not have to match any corresponding portion of a parent key.

And this is what is specified in SQL-92:
If no <match type> was specified then, for each row R1 of the referencing table, either at least one of the values of the referencing columns in R1 shall be a null value, or the value of each referencing column in R1 shall be equal to the value of the corresponding referenced column in some row of the referenced table. More detail about the other match types in Oracle Development Guide.

That may look strange, but, still thinking about NULLS as unknown values, you can consider that constraints cannot be validated until we know all values.

Here is what I was able to insert into my table even with no a=’Y’ in the parent:

SQL> select rownum,TABLE2.* from TABLE2;
 
ROWNUM A B
---------- - -
1 X X
2 X X
3 X Y
4 X
5
6 Y

So what?

Having nulls in composite unique key or foreign key can be misleading, then it’s better to ensure that what you define fits what you expect. It’s probably better to prevent partial nulls in foreign key (a check constraint can ensure that if one column is null then all columns must be null) or to have and additional referential integrity constraint which ensures that you can set only the allowed values for a subset of columns (in our case, a table with column a as primary key that we can reference).

 

Cet article Nulls in composite keys est apparu en premier sur Blog dbi services.

SharePlex Compare and Repair commands

Mon, 2016-07-11 13:31

Reading Franck Pachot blog about comparing source and target in a Dbvisit replication (http://blog.dbi-services.com/compare-source-and-target-in-a-dbvisit-replication/), I decide to write a small article about how we can do same thing with SharePlex. Indeed SharePlex provides built-in commands to compare and repair synchronization.
Below is our configuration file. We are just replicating table article to article_rep. Both tables contain 2000000 rows

datasource:o.SPLEXDB

#source tables target tables routing map
titi.article titi.article_rep atlasrep2@o.SPLEXSTR2

From the source let’s insert 2 rows to verify that replication is working fine.

SQL> insert into article values (2000001,'2000001_designation',2000000001);

1 row created.

SQL> commit;

Commit complete.

SQL> insert into article values (2000002,'2000002_designation',2000000002);

1 row created.

SQL> commit;

Commit complete.


SQL> select * from article where idart in (2000001,2000002);

     IDART DESIGNATION                          PRIX
---------- ------------------------------ ----------
   2000001 2000001_designation            2000000001
   2000002 2000002_designation            2000000002

From the target let’s verify that the 2 rows are replicated

SQL> select * from article_rep where idart in (2000001,2000002);

     IDART DESIGNATION                          PRIX
---------- ------------------------------ ----------
   2000001 2000001_designation            2000000001
   2000002 2000002_designation            2000000002

To compare source and target in SharePlex, the command compare is used. Let’s see a demonstration.
From the source 

sp_ctrl (atlas:2104)>  compare titi.article to titi.article_rep at atlasrep2@o.SPLEXSTR2 for o.SPLEXDB

  comparing 1 of 1 objects

  compare started; job id 7

sp_ctrl (atlas:2104)>

And now let’s see the status of our replication

sp_ctrl (atlas:2104)> show compare

   The syntax to get status for the compare command has changed
   Instead of 'show compare' use the 'compare status' or 'repair status' command


Job ID    : 7
PID       : 8644
Host      : atlas.localdomain
Started   : 06-JUL-16 11:34:48
Job Type  : Compare
Status    : Done - 1 object completed

ID     Tablename                            Status     Time       Total Rows %Comp Total Time
------ ------------------------------------ ---------- ---------- ---------- ----- ----------
1      "TITI"."ARTICLE"                     In Sync           N/A    2000002   100       0:31


sp_ctrl (atlas:2104)>

We see that tables are in sync. When running the compare command, SharePlex obtains a brief exclusive lock on the source table to get read consistency for its row selection.
On the target system, SharePlex obtains an exclusive lock on the target table and retains the lock for the duration of the comparison of that table.
Now let’s Sync out our replication
From the target let’s delete a row (note that we have one directional replication only from source to target)

SQL> delete from article_rep where idart=2000002;

1 row deleted.

SQL> commit;

Commit complete.

SQL>

And from the source let’s insert a row

SQL> insert into article values (2000003,'2000003_designation',2000000003);

1 row created.

SQL> commit;

Commit complete.

SQL> select * from article where idart=2000003;

     IDART DESIGNATION                          PRIX
---------- ------------------------------ ----------
   2000003 2000003_designation            2000000003


SQL> select count(*) from article;

  COUNT(*)
----------
   2000003

SQL>

From the target we can verify that the last row with idart=2000003 is present but the number of rows is now different between the 2 tables. Indeed we deleted the row with idart=2000002 in the target and this deletion is not replicated in the source. Now replication is sync out

SQL> select * from article_rep where idart=2000003;

     IDART DESIGNATION                          PRIX
---------- ------------------------------ ----------
   2000003 2000003_designation            2000000003

SQL> select count(*) from article_rep;

  COUNT(*)
----------
   2000002

Let’s run again compare command on the source and let’s see the output now

sp_ctrl (atlas:2104)> compare titi.article to titi.article_rep at atlasrep2@o.SPLEXSTR2 for o.SPLEXDB

We can see that the status is Sync Out (note that comparison of 2000000 rows was very quick)

sp_ctrl (atlas:2104)> show compare

   The syntax to get status for the compare command has changed
   Instead of 'show compare' use the 'compare status' or 'repair status' command


Job ID    : 9
PID       : 12583
Host      : atlas.localdomain
Started   : 06-JUL-16 13:24:48
Job Type  : Compare
Status    : Done - 1 object completed

ID     Tablename                            Status     Time       Total Rows %Comp Total Time
------ ------------------------------------ ---------- ---------- ---------- ----- ----------
1      "TITI"."ARTICLE"                     Out Sync          N/A    2000003   100       0:25

To sync In  the replication we just have to use the repair command. This command is magic.

sp_ctrl (atlas:2104)> repair  titi.article to titi.article_rep at atlasrep2@o.SPLEXSTR2 for o.SPLEXDB

  repairing 1 of 1 objects

  repair started; job id 10

In the Log file on the target we can see that SharePlex is using ORA_HASH to compare

[oracle@atlasrep2 log]$ less declt_10-1_SPLEXDB_192.168.1.40_p9883.log
declt    2016-07-06 13:36:45.228937 9883 1352263552 NOTE: setting up HASH column info (de_table_setup_ora_hash_compare,L2915)
declt    2016-07-06 13:36:45.228971 9883 1352263552 Key Compare : Off
declt    2016-07-06 13:36:45.229009 9883 1352263552 Job ID      : 10
declt    2016-07-06 13:36:45.229017 9883 1352263552 Source Table: "TITI"."ARTICLE"
declt    2016-07-06 13:36:45.229122 9883 1352263552 Source Node : atlas.localdomain
declt    2016-07-06 13:36:45.229130 9883 1352263552 Target Table: "TITI"."ARTICLE_REP"
declt    2016-07-06 13:36:45.229135 9883 1352263552 Target Route: atlasrep2@SPLEXSTR2
declt    2016-07-06 13:36:45.229140 9883 1352263552 Batch Size  : 100000 rows
declt    2016-07-06 13:36:45.229145 9883 1352263552 Repair      : On (2 Pass)
declt    2016-07-06 13:36:45.229150 9883 1352263552 sp_declt PID: 9883
declt    2016-07-06 13:36:45.229155 9883 1352263552 Read Buffer Size : 1 mbytes
declt    2016-07-06 13:36:45.237727 9883 1352263552 Current status: ready to fetch batch,of rows; including the order by sorting.
declt    2016-07-06 13:36:45.237782 9883 1352263552 running sql statement : (de_select_prepare_to_fetch,L1384)
declt    2016-07-06 13:36:45.237795 9883 1352263552   SELECT /*+ PARALLEL (A,2) */ ROWID,ORA_HASH("IDART"||'~'||"PRIX"), ORA_HASH("DESIGNATION") FROM "TITI"."ARTICLE_REP" A ORDER BY 2, 3
declt    2016-07-06 13:36:54.064711 9883 1352263552 Current status: got batch of rows
declt    2016-07-06 13:37:04.527252 9883 1352263552 NOTE: total number of rows fetched = 2000002 (../src/deqtr/de_select.cpp,L1630)
declt    2016-07-06 13:37:04.533329 9883 1352263552 Notice: starting the second phase in the repair process
declt    2016-07-06 13:37:04.616704 9883 1352263552 Leaving de_msg_clt_preCompareMessaging1_5,L1297
declt    2016-07-06 13:37:06.659513 9883 1352263552 Leaving de_msg_clt_preCompareMessaging2,L614
declt    2016-07-06 13:37:06.659716 9883 1352263552 No new key column order, using default column order for TITI.ARTICLE_REP
declt    2016-07-06 13:37:06.721957 9883 1352263552 Current status: ready to fetch batch,of rows; including the order by sorting.
declt    2016-07-06 13:37:06.721996 9883 1352263552 running sql statement : (de_select_prepare_to_fetch,L1384)

When issuing again the show compare command, we can see that the status is now Repaired.

sp_ctrl (atlas:2104)> show compare

   The syntax to get status for the compare command has changed
   Instead of 'show compare' use the 'compare status' or 'repair status' command


Job ID    : 10
PID       : 13008
Host      : atlas.localdomain
Started   : 06-JUL-16 13:36:37
Job Type  : Repair
Status    : Done - 1 object completed

ID     Tablename                            Status     Time       Total Rows %Comp Total Time
------ ------------------------------------ ---------- ---------- ---------- ----- ----------
1      "TITI"."ARTICLE"                     Repaired          N/A    2000003   100       0:29

From the source we can verify the number of rows

SQL> select count(*) from article;

  COUNT(*)
----------
   2000003

From the target we also can verify that we have the same number of rows and that the row deleted (idart=2000002) is now present.

SQL> select count(*) from article_rep;

  COUNT(*)
----------
   2000003


SQL> select * from article_rep  where idart=2000002;

     IDART DESIGNATION                          PRIX
---------- ------------------------------ ----------
   2000002 2000002_designation            2000000002
Conclusion

We will retain that SharePlex provides powerful commands compare/repair to fix out of synchronization. Just note that during execution of these commands tables are locked (briefly on the source and more on the target) meaning that no transaction will not be possible during this period.
Note that the compare and repair commands can also be used for initial load (not recommended because of locks).


 

Cet article SharePlex Compare and Repair commands est apparu en premier sur Blog dbi services.

Oracle Multitenant feature name

Sun, 2016-07-10 15:46

We all know what is the multitenant option: a container database (CDB) with multiple user pluggable databases (PDB). This requires Enterprise Edition plus option. But you can use the same architecture without the option and even in Standard Edition: a CDB with only one user PDB. It is called “single-tenant” or “lone PDB”. How do you call this new 12c architecture?

I call “multitenant architecture” the following set of features, available in all editions:

  • dictionary separation between system metadata/objects and user metadata/object
  • unplug / plug features to transport PDBs
  • create a new PDB as a clone of another existing PDB
  • ability for a session to switch between PDB and CDB$ROOT explicitly with ‘ALTER SESSION’ or implicitly through metadata and object links
  • ability to read data for several containers in one query

But it may look strange to use the “multitenant” term when in Standard Edition or Enterprise edition without option.

Container database (CDB)

One idea would be to simply call it ‘CDB architecture’, because it’s the architecture of the container database, and because a database without this architecture is called “non-CDB”:

@franckpachot Oracle docs says „CDB architecture“ and „Multitenant Architecture“ – I prefer CDB, because there’s Non-CDB, too :)

— Markus Flechtner (@markusdba) July 8, 2016

consolidated database (CDB)

However, as very often with Oracle, the names change and the acronyms remain. Old names are found in the software, before new name is made public to market it. We all know that CDB stands for “container database” because a CDB is a container that contains at least 3 containers (CDB$ROOT, PDB$SEED and your PDB) and more when you have the multitenant option. However I see no mention of that name in ORACLE_HOME/rdbms/admin scripts:
[oracle@CDB ~]$ grep -i "container database" $ORACLE_HOME/rdbms/admin/* | wc
0 0 0

It seems that the original meaning of CDB was “consolidated database”:

[oracle@CDB ~]$ grep -i "consolidated database" $ORACLE_HOME/rdbms/admin/* | wc
58 465 7276
 
[oracle@CDB ~]$ grep -i "consolidated database" $ORACLE_HOME/rdbms/admin/*.bsq
/u01/app/oracle/product/12.1.0/dbhome_1/rdbms/admin/dcore.bsq:REM Consolidated Database.
/u01/app/oracle/product/12.1.0/dbhome_1/rdbms/admin/dcore.bsq:REM SEED Pluggable Database in a Consolidated Database.
/u01/app/oracle/product/12.1.0/dbhome_1/rdbms/admin/doptim.bsq:Rem the caching in library cache in a Consolidated Database.
/u01/app/oracle/product/12.1.0/dbhome_1/rdbms/admin/dsec.bsq: * Consolidated Database
/u01/app/oracle/product/12.1.0/dbhome_1/rdbms/admin/dsec.bsq: * Consolidated Database

So, the idea behind CDB is the same as behind multitenant: it’s consolidation of several pluggable databases. And then, consolidation is not really what is done by single-tenant where we have only one user PDB per CDB and where this architecture requires 3 containers instead of one non-CDB.

Pluggable databases

Another idea would be to call it “pluggable databases architecture” because this is the real evolution. User tablespaces are transportable for a long time, since 8i introduction of extended rowid and locally managed tablespaces. 12c brought the same for system tablespaces so that PDBs are fully transportable physically, thanks to separation of dictionary.

The real point of this new architecture is the total separation of system data/metadata and user data/metadata, the separation of system DBA role and application DBA role, and this is pluggable databases. Multitenancy is just one thing that is possible with this new architecture. It’s important today because it’s required for the cloud (for consolidation and easy provisioning).

Oracle Multitenant

Actually, the feature was called ‘Oracle Pluggable Database’ until 12.1.0.1 was released publicly with a new name: Oracle Multitenant. And this is the name we can see in Database Feature Usage Statistics. Well, if you look at the first patchset of 12cR1, 12.1.0.2, you will see the old name ‘Oracle Pluggable Databases’ but this is a bug (Patch 20718081 changes back the name).

Here is what you see from Database Feature Usage Statistics after an upgrade from 12.1.0.1 to 12.1.0.2, in EM Express:

CaptureFUMultitenant

And from DBA_FEATURE_USAGE_STATISTICS view:


SQL> select name, version, detected_usages, currently_used, aux_count from dba_feature_usage_statistics where name like '%Multitenant%' or name like '%Pluggable%';
 
NAME VERSION DETECTED_USAGES CURRE AUX_COUNT
------------------------------ ----------------- --------------- ----- ----------
Oracle Multitenant 12.1.0.1.0 98 FALSE 1
Oracle Pluggable Databases 12.1.0.2.0 32 FALSE 1

So, bug aside, the name of the feature is ‘Oracle Multitenant’ and this is true for any container database, because the feature is considered as used as soon as V$DATABASE.CDB=’YES':
select count(*) into feature_boolean from v$database where cdb = 'YES'

The usage of the multitenant option is detected by the AUX_COUNT which is the number of user PDBs:
select count(*) into aux_count from v$pdbs where con_id > 2
CON_ID=0 is for the CDB, CON_ID=1 is for CDB$ROOT, CON_ID=2 is for PDB$SEED and user PDBs start at CON_ID > 2

So the name is “multitenant” whatever the number of PDBs.

So what?

I prefer to stick with “multitenant architecture” even when used without the multitenant option. It’s a good way to keep in mind that, even if it brings lot of interesting features for single-tenant as well, Oracle has taken the step of dictionary separation with the motivation of Cloud, Consolidation and Multitenancy. Charged options are a good trigger to increase priority of evolution requests…
However, this “multitenant architecture” brings very interesting features to Standard Edition and Enterprise Edition even without option. And it you doubt, I’ll try to convince you in San Francisco, September 18th.

 

Cet article Oracle Multitenant feature name est apparu en premier sur Blog dbi services.

SQL Server 2016 AlwaysOn: Direct seeding and performance considerations

Fri, 2016-07-08 13:05

This blog post follows the previous one about new direct seeding feature shipped with SQL Server 2016. As a reminder, I had some doubts about using direct seeding with large databases because log stream is not compressed by default but I forgot the performance improvements described into the Microsoft BOL. I remembered to talk about it a couple of months ago in this blog post.

Microsoft did a good work of improving the AlwaysOn log transport layer and it could be very interesting to compare two methods: Adding a 100 GB database by using usual way that includes backup and restore operations from the primary to the secondary or using direct seeding feature. Which one is the quickest method?

Let’s just have a quick look at my test environment. Two virtual machines with the following configuration:

  • 4x Intel Core i7-3630QM 2.3 GHz
  • 4GB of RAM
  • 2 10Gbps network cards
  • One disk that will host both the database data and log files on my primary replica (Samsung Portable SSD T3 500GB with S.M.A.R.T, NCQ and TRIM)
  • One disk that will host both the database data and log files on my secondary replica (Samsung Portable SSD T1 500GB with S.M.A.R.T, NCQ and TRIM)
  • One disk that will host backups (Samsung SSD 840 EVC, 250GB with S.M.A.R.T, NCQ and TRIM) used by both virtual machines

As an aside, each SSD disk is able to deliver at least 450MB/s and 35000 IOPS.

blog 99 - AG direct seeding - 0 - lab environment

I also used a custom database named GestionCom that contains roughly 100GB of data for my test. 100GB would be sufficient to get relevant results.

 

blog 99 - AG direct seeding - 01 - db GestionCom

 

Go ahead and let’s compare both synchronization methods

 

First test by using an usual way to add  a database to an availability group

As said earlier, my first test will consist in using the usual way so far to add a database to an availability group. Let’s say that we may use 3 ways for data synchronization: FULL, join only and skip initial synchronization. We will use the first method for this test that includes all the steps: backup and restore the concerned database and then join it to the availability group. At this point we may easily image that the most part of the time will be consumed in the backup and restore step. I also want to precise that I did not use voluntary fine tuning options like BUFFERCOUNT, MAXTRANSFERSIZE or splitting backups to several media files in order to stay compliant with the availability group wizard.

 

Step Duration Backup database to backup local disk (primary)WITH CHECKSUM, COMPRESSION 06’55’’ Restore database from network share (secondary)WITH CHECKSUM, NORECOVERY 17’10’’ Join database to availability group + start synchronization 00’01’’ Total 24’06’’

 

What about resource consumption?

On the primary …

blog 99 - AG direct seeding - 10 - CPU and disk activity during backup

 

blog 99 - AG direct seeding - 1 - network bandwith during restore from primary

On the secondary …

 

blog 99 - AG direct seeding - 11 - CPU and disk activity during restore

blog 99 - AG direct seeding - 2 - network bandwith during restore from secondary

CPU utilization is equal to 35% on average during the test. Moreover, disk write throughput seems to stagnate to 130 MB/s on average and includes both backup and restore activities. The network throughput utilization seems also to stagnate between 135 Mbps and 174 Mbps according to my test.

So it is clear that my environment is under-used regarding resource consumption in this first test.

 

Second test by using new direct seeding method

This time I will use the new database deployment method: direct seeding. As said in my previous blog, using this feature will simplify a lot the adding database process but what about the synchronization speed and resource consumption in this case?

Well, to get a good picture of what happens during the seeding process, we will use different tools as the new sys.dm_hadr_automatic_seeding DMV and extended events as well. Extended events will help us to understand what happens under the cover in this case but to measure only the time duration of the operation we don’t need them. If you look at the event list as well as categories, you will probably notice a new dbseed category available that corresponds to the direct seeding. Events in this category are only available from the debug channel. That’s fine because we want to track when the seeding process starts, when it finishes and what’s happen between these two events (like failure, timeout, progress). By the way, the hadr_physical_progress may be very useful to get a picture of network activity for the concerned seeding session if your network card is shared between other sessions or availability group replication activities. In my case, I’m the only one and I will get this information directly from the task manager panel.

So let’s create the extended event session:

CREATE EVENT SESSION [hadr_direct_seeding] 
ON SERVER 
ADD EVENT sqlserver.hadr_automatic_seeding_start
(
    ACTION(sqlserver.database_name,sqlserver.sql_text)
	)
,
ADD EVENT sqlserver.hadr_automatic_seeding_state_transition
(
    ACTION(sqlserver.database_name,sqlserver.sql_text)
),
ADD EVENT sqlserver.hadr_automatic_seeding_success
(
    ACTION(sqlserver.database_name,sqlserver.sql_text)
),
ADD EVENT sqlserver.hadr_automatic_seeding_timeout
(
    ACTION(sqlserver.database_name,sqlserver.sql_text)
),
ADD EVENT sqlserver.hadr_physical_seeding_progress
(
    ACTION(sqlserver.database_name,sqlserver.sql_text)
)
ADD TARGET package0.event_file
(
	SET filename = N'hadr_direct_seeding',
	max_file_size = (2048), 
	max_rollover_files = (10))
WITH 
(
	MAX_MEMORY=4096 KB,
	EVENT_RETENTION_MODE = ALLOW_SINGLE_EVENT_LOSS,
	MAX_DISPATCH_LATENCY = 30 SECONDS,
	MAX_EVENT_SIZE = 0 KB,
	MEMORY_PARTITION_MODE = NONE,
	TRACK_CAUSALITY = OFF,
	STARTUP_STATE = OFF
)
GO

 

And after adding the GestionCom database to the TestGrp availability group, the direct seeding feature comes into play. Honestly, it was a very big surprise! Let’s take a look at the network utilization:

blog 99 - AG direct seeding - 3 - network bandwith during seeding

 

A network usage of 2.2 Gbps on average this time! The direct seeding feature provides a better use of the network bandwidth and we may understand clearly why efforts have been made by Microsoft to improve the synchronization process.

Let’s now move on the CPU and disk utilization respectively from the primary and then the secondary

blog 99 - AG direct seeding - 5 - disk throughput during seeding from primary

blog 99 - AG direct seeding - 6 - disk throughput during seeding from secondary

~ 350 up to 450 MB/s on each side (gain x2) but an increase of the CPU utilization up to 70% during my test (increase x2). So a better resource usage but at the cost of a higher CPU utilization …

Let’s finish by looking at the sys.dm_hadr_automatic_seeding DMV that provides the answer to the question: are we faster in this case?

select
    ag.name as aag_name,
    ar.replica_server_name,
    d.name as database_name,
    has.current_state,
    has.failure_state_desc as failure_state,
    has.error_code,
    has.performed_seeding,
	DATEADD(mi, DATEDIFF(mi, GETUTCDATE(), GETDATE()), has.start_time) as start_time,
	DATEADD(mi, DATEDIFF(mi, GETUTCDATE(), GETDATE()), has.completion_time) as completion_time,
    has.number_of_attempts
from sys.dm_hadr_automatic_seeding as has
join sys.availability_groups as ag
    on ag.group_id = has.ag_id
join sys.availability_replicas as ar
    on ar.replica_id = has.ag_remote_replica_id
join sys.databases as d
    on d.group_database_id = has.ag_db_id

 

 

blog 99 - AG direct seeding - 7 - DMV stats

 

And the answer is yes as we may expect! Only 8 minutes (gain x3) to replicate and to synchronize the GestionCom database between the two high available replicas compared to the first method. But that’s not all … let’s focus on the redo thread activity from the secondary and you may notice a very interesting rate value (~ 12 MB/s). I don’t remember to have seen this value with current availability groups at customer places. This is the second improvement made by Microsoft concerned that has introduce parallel redo capability. As a reminder, before SQL Server 2016, there is only one redo thread per database. In this context, a single redo thread simply could not keep up with applying the changes as persisted in the log.

From the secondary, we may see some changes by looking at the sys.dm_exec_requests DMV:

select 
	r.command, 
	r.status,
	r.wait_type,
	r.wait_time,
	r.last_wait_type,
	r.scheduler_id
from sys.dm_exec_requests as r
where r.command like '%REDO%'
order by r.scheduler_id

 

 

blog 99 - AG direct seeding - 70 - parallel redo thread

Using direct seeding is definitely a solution to take into account to our future database deployment but I think we have to keep in mind two things according to this test: CPU and network consumption from seeding activity may impact the performance of other applications and vis-versa. In real world, there are good chances to be in this situation.

Finally let’s have a look at the extend event output. In respect of what we want to highlight in this blog post, we don’t get any other valuable information but one thing got my attention: LIMIT_CONCURRENT_BACKUPS value from the current value column (underlined in red). What does it mean exactly? Let’s talk about it  in a next blog post because this is a little bit out of scope of the main subject.

blog 99 - AG direct seeding - 12 - Xe with direct seeding

 

Third test by using direct seeding and compression

Let’s talk about the last test I performed. I used direct seeding without compression in the previous test so SQL Server didn’t compress the data stream by default in this case. However we may force SQL Server to use compression by setting a special trace flag 9567. After all, we want to avoid direct seeding flooding the wire and impacting the existing workload from other applications.

I have to admit that enabling compression with direct seeding is not so obvious. For instance I didn’t see any difference from the DMVs that indicates we’re using compression. (is_compression_enabled column from the sys.dm_hadr_physical_seeding_stats DMV is always equal to 0 regardless we use or not compression). The only obvious difference comes from the network throughput usage that is lower with compression (gain x 2.5). However I noticed an important increase of CPU utilization near from 100% on the primary in my case.

blog 99 - AG direct seeding - 8 - resource with seeding and compression from primary

What about seeding time? Well, I didn’t notice any gain on this field. Does compression allow to save network bandwidth? Maybe … hard to say with only this test and one specific environment.

I tried to add 3 VCPUs to each replica and leave one VCPU to the system so a total number of 7 VCPUS dedicated for SQL Server use.

blog 99 - AG direct seeding - 12 - cpu usage during seeding with compression 8 vcpus

At this point, I  admit to be a little bit surprising and I wonder if compression uses correctly all the available processors regarding the uneven distribution of CPU resource usage. The above picture is good representation of what I saw during other tests I performed with compression. In addition, I didn’t see any obvious performance gain in terms of duration except that wire is less used. I’m a little bit disappointed by compressiion but once again it is still much too early to draw a conclision and I’m looking forward direct seeding in action at my customers with real production infrastructure.

The bottom line is that direct seeding is a very promising feature and I love it because it is the direct visible part of the AlwaysOn performance improvements shipped with SQL Server 2016. However, and this is my personal opinion, I think we don’t let it fool us and consider to use direct seeding carefully according to your workload and available resources. Fortunately, in most cases it will be suitable.

Stay tuned!

 

 

 

Cet article SQL Server 2016 AlwaysOn: Direct seeding and performance considerations est apparu en premier sur Blog dbi services.

Compare source and target in a Dbvisit replication

Tue, 2016-07-05 13:51

You’ve setup a logical replication, and you trust it. But before the target goes into production, it will be safer to compare source and target. At least count the number of rows.
But tables are continuously changing, so how can you compare? Not so difficult thanks to Dbvisit replicate heartbeat table and Oracle flashback query.

Here is the state of the replication, with activity on the source and real-time replication to the target:
| Dbvisit Replicate 2.7.06.4485(MAX edition) - Evaluation License expires in 29 days
MINE IS running. Currently at plog 368 and SCN 6119128 (07/06/2016 04:15:21).
APPLY IS running. Currently at plog 368 and SCN 6119114 (07/06/2016 04:15:19).
Progress of replication dbvrep_XE:MINE->APPLY: total/this execution
--------------------------------------------------------------------------------------------------------------------------------------------
REPOE.CUSTOMERS: 100% Mine:961/961 Unrecov:0/0 Applied:961/961 Conflicts:0/0 Last:06/07/2016 04:12:12/OK
REPOE.ADDRESSES: 100% Mine:961/961 Unrecov:0/0 Applied:961/961 Conflicts:0/0 Last:06/07/2016 04:12:12/OK
REPOE.CARD_DETAILS: 100% Mine:894/894 Unrecov:0/0 Applied:894/894 Conflicts:0/0 Last:06/07/2016 04:12:12/OK
REPOE.ORDER_ITEMS: 100% Mine:5955/5955 Unrecov:0/0 Applied:5955/5955 Conflicts:0/0 Last:06/07/2016 04:12:12/OK
REPOE.ORDERS: 99% Mine:4781/4781 Unrecov:0/0 Applied:4780/4780 Conflicts:0/0 Last:06/07/2016 04:12:12/OK
REPOE.INVENTORIES: 100% Mine:5825/5825 Unrecov:0/0 Applied:5825/5825 Conflicts:0/0 Last:06/07/2016 04:12:12/OK
REPOE.LOGON: 99% Mine:6175/6175 Unrecov:0/0 Applied:6173/6173 Conflicts:0/0 Last:06/07/2016 04:12:12/OK
--------------------------------------------------------------------------------------------------------------------------------------------
7 tables listed.

If you wand to compare the rows from source and target, you will always see a difference because modifications on source arrive on target a few seconds later.

Source and target SCN

The first thing to do is to determine a consistent point in time where source and target are the same. This point in time exists because the redo log is sequential by nature, and the commits are done in the same order on target than source. And this order is visible with the SCN. The only problem is that on a logical replication the SCN on source and target are completely different and have their own life.

The first step is determine an SCN from the target and an SCN on the source that show the same state of transactions.

But before that, let’s connect to the target and set the environment:

$ sqlplus /nolog @ compare.sql
 
SQL*Plus: Release 11.2.0.2.0 Production on Tue Jul 5 18:15:34 2016
Copyright (c) 1982, 2014, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Express Edition Release 11.2.0.2.0 - 64bit Production
 
SQL> define table_owner=REPOE
SQL> define table_name=ORDERS
SQL>
SQL> connect system/manager@//192.168.56.67/XE
Connected.
SQL> alter session set nls_date_format='DD-MON-YYYY HH24:mi:ss';
Session altered.
SQL> alter session set nls_timestamp_format='DD-MON-YYYY HH24:mi:ss';
Session altered.

My example is on the #repattack environment, with Swingbench running on the source, and I’ll compare the ORDER table.

Heartbeat table

Each Dbvisit replicate configuration comes with an heartbeat table created in the Dbvisit schema on the source and replicated to the target. This table is updated every 10 seconds on the source with timestamp and SCN. This is a great way to check how the replication is working.Here it will be the way to get the SCN information from the source.

Flashback query

Oracle flashback query offers a nice way to get the commit SCN for the rows updated in the heartbeat table. From the target database, this is the commit SCN for the replication transaction (the APPLY process) and it can be displayed along with the SCN from the source transaction that is recorded in the heartbeat table and replicated to the target.

SQL> column versions_startscn new_value scn_target
SQL> column source_scn new_value scn_source
SQL> column mine_process_name format a12
SQL> column versions_starttime format a21
 
SQL> select mine_process_name,wallclock_date,mine_date,source_scn,mine_scn,versions_startscn,versions_starttime,versions_endscn
from DBVREP.DBRSCOMMON_HEARTBEAT versions between timestamp(sysdate-1/24/60) and sysdate
order by versions_endscn nulls last ;
 
MINE_PROCESS WALLCLOCK_DATE MINE_DATE SOURCE_SCN MINE_SCN VERSIONS_STARTSCN VERSIONS_STARTTIME VERSIONS_ENDSCN
------------ -------------------- -------------------- -------------------- -------------------- -------------------- --------------------- --------------------
MINE 06-JUL-2016 04:14:27 06-JUL-2016 04:14:22 6118717 6118661 4791342
MINE 06-JUL-2016 04:14:37 06-JUL-2016 04:14:31 6118786 6118748 4791342 06-JUL-2016 04:11:29 4791376
MINE 06-JUL-2016 04:14:47 06-JUL-2016 04:14:41 6118855 6118821 4791376 06-JUL-2016 04:11:39 4791410
MINE 06-JUL-2016 04:14:57 06-JUL-2016 04:14:51 6118925 6118888 4791410 06-JUL-2016 04:11:49 4791443
MINE 06-JUL-2016 04:15:07 06-JUL-2016 04:15:01 6119011 6118977 4791443 06-JUL-2016 04:11:59 4791479
MINE 06-JUL-2016 04:15:17 06-JUL-2016 04:15:11 6119091 6119059 4791479 06-JUL-2016 04:12:09 4791515
MINE 06-JUL-2016 04:15:27 06-JUL-2016 04:15:21 6119162 6119128 4791515 06-JUL-2016 04:12:19

This shows that the current version of the heartbeat table on target was commited at SCN 4791515 and we know that this state matches the SCN 6119162 on the source. You can choose any pair you want but the latest will probably be the fastest to query.

Counting rows on source

I’ll use flashback query to count the rows from the source at SCN 6119162. I’m doing it in parallel query, but be careful when the table has high modification activity there will be lot of undo blocks to read.

SQL> connect system/manager@//192.168.56.66/XE
Connected.
SQL> alter session force parallel query parallel 8;
Session altered.
 
SQL> select count(*) from "&table_owner."."&table_name." as of scn &scn_source;
old 1: select count(*) from "&table_owner."."&table_name." as of scn &scn_source
new 1: select count(*) from "REPOE"."ORDERS" as of scn 6119162
 
COUNT(*)
--------------------
775433

Counting rows on target

I’m doing the same fron the target, but with the SCN 4791515
SQL> connect system/manager@//192.168.56.67/XE
Connected.
SQL> alter session force parallel query parallel 8;
Session altered.
 
SQL> select count(*) from "&table_owner."."&table_name." as of scn &scn_target;
old 1: select count(*) from "&table_owner."."&table_name." as of scn &scn_target
new 1: select count(*) from "REPOE"."ORDERS" as of scn 4791515
 
COUNT(*)
--------------------
775433

Good. Same number of rows. This proves that even with constantly inserted tables we can find a point of comparison, thanks to Dbvisit heartbeat table and thanks to Oracle flashback query. If you are replicating with another logical replication product, you can simulate the heartbeat table with a job that updates the current SCN to a single row table, and replicate it. If your target is not Oracle, then there are good chances that you cannot do that kind of ‘as of’ query which means that you need to lock the table on source for the time you compare.

ORA_HASH

If you think that counting the rows is not sufficient, you can compare a hash value from the columns. Here is an example.
I get the list of columns, with ORA_HASH() function on it, and sum() between them:

SQL> column columns new_value columns
SQL> select listagg('ORA_HASH('||column_name||')','+') within group (order by column_name) columns
2 from dba_tab_columns where owner='&table_owner.' and table_name='&table_name';
old 2: from dba_tab_columns where owner='&table_owner.' and table_name='&table_name'
new 2: from dba_tab_columns where owner='REPOE' and table_name='ORDERS'
 
COLUMNS
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
ORA_HASH(CARD_ID)+ORA_HASH(COST_OF_DELIVERY)+ORA_HASH(CUSTOMER_CLASS)+ORA_HASH(CUSTOMER_ID)+ORA_HASH(DELIVERY_ADDRESS_ID)+ORA_HASH(DELIVERY_TYPE)+ORA_HASH(INVOICE_ADDRESS_ID)+ORA_HASH(ORDER_DATE)+ORA_
HASH(ORDER_ID)+ORA_HASH(ORDER_MODE)+ORA_HASH(ORDER_STATUS)+ORA_HASH(ORDER_TOTAL)+ORA_HASH(PROMOTION_ID)+ORA_HASH(SALES_REP_ID)+ORA_HASH(WAIT_TILL_ALL_AVAILABLE)+ORA_HASH(WAREHOUSE_ID)

With this list defined in a substitution variable, I can compare the sum of hash values:

SQL> select count(*),avg(&columns.) hash from "&table_owner."."&table_name." as of scn &scn_target;
old 1: select count(*),sum(&columns.) hash from "&table_owner."."&table_name." as of scn &scn_target
new 1: select count(*),sum(ORA_HASH(CARD_ID)+ORA_HASH(COST_OF_DELIVERY)+ORA_HASH(CUSTOMER_CLASS)+ORA_HASH(CUSTOMER_ID)+ORA_HASH(DELIVERY_ADDRESS_ID)+ORA_HASH(DELIVERY_TYPE)+ORA_HASH(INVOICE_ADDRESS_ID)+ORA_HASH(ORDER_DATE)+ORA_HASH(ORDER_ID)+ORA_HASH(ORDER_MODE)+ORA_HASH(ORDER_STATUS)+ORA_HASH(ORDER_TOTAL)+ORA_HASH(PROMOTION_ID)+ORA_HASH(SALES_REP_ID)+ORA_HASH(WAIT_TILL_ALL_AVAILABLE)+ORA_HASH(WAREHOUSE_ID)) hash from "REPOE"."ORDERS" as of scn 4791515
 
COUNT(*) HASH
-------------------- --------------------
775433 317531150805040439
 
SQL> connect system/manager@//192.168.56.66/XE
Connected.
SQL> alter session force parallel query parallel 8;
Session altered.
 
SQL> select count(*),avg(&columns.) hash from "&table_owner."."&table_name." as of scn &scn_source;
old 1: select count(*),sum(&columns.) hash from "&table_owner."."&table_name." as of scn &scn_source
new 1: select count(*),sum(ORA_HASH(CARD_ID)+ORA_HASH(COST_OF_DELIVERY)+ORA_HASH(CUSTOMER_CLASS)+ORA_HASH(CUSTOMER_ID)+ORA_HASH(DELIVERY_ADDRESS_ID)+ORA_HASH(DELIVERY_TYPE)+ORA_HASH(INVOICE_ADDRESS_ID)+ORA_HASH(ORDER_DATE)+ORA_HASH(ORDER_ID)+ORA_HASH(ORDER_MODE)+ORA_HASH(ORDER_STATUS)+ORA_HASH(ORDER_TOTAL)+ORA_HASH(PROMOTION_ID)+ORA_HASH(SALES_REP_ID)+ORA_HASH(WAIT_TILL_ALL_AVAILABLE)+ORA_HASH(WAREHOUSE_ID)) hash from "REPOE"."ORDERS" as of scn 6119162
 
COUNT(*) HASH
-------------------- --------------------
775433 17531150805040439

Note that this is only an example. You must adapt for your needs: precision of the comparison and performance.

So what?

Comparing source and target is not a bad idea. With Dbvisit replicate, if you defined the replication properly and did the initial import with the SCN provided by the setup wizard, you should not miss transactions, even when there is lot of activity on source, and even without locking the source for the initialisation. But it’s always good to compare, especially before the ‘Go’ decision of a migration done with Dbvisit replicate to lower the downtime (and the stress). Thanks to heartbeat table and flashback query, a checksum is not too hard to implement.

 

Cet article Compare source and target in a Dbvisit replication est apparu en premier sur Blog dbi services.

Script to suggest FK indexes

Mon, 2016-07-04 10:55

In Oracle, when the referenced key is deleted (by delete on parent table, or update on the referenced columns) the child tables(s) are locked to prevent any concurrent insert that may reference the old key. This lock is a big issue on OLTP applications because it’s a TM Share lock, usually reserved for DDL only, and blocking any modification on the child table and blocking some modifications on tables that have a relationship with that child table. This problem can be be overcome when an index structure which allows to find concurrent inserts that may reference the old value. Here is the script I use to find which index is missing.

The idea is not to suggest to index all foreign keys for three reasons:

  • when there are no delete or update in parent side, you don’t have that locking issue
  • when there is minimal write activity on child side, the lock may not have big consequence
  • you probably have indexes build for performance reasons that can be used to avoid locking even when they have more columns or have different column order

The idea is not to suggest an index for each locking issue but only when blocking locks have been observed. Yes, it is a reactive solution, but proactive ones cannot be automatic. If you know your application well and then you know what you ave to index, then you don’t need this script. If you don’t, then proactive suggestion will suggest too many indexes.

Here is the kind of output that I get with this script:
-- DELETE on APP1.GCO_GOOD has locked APP1.FAL_TASK in mode 5 for 8 minutes between 14-sep 10:36 and 14-sep 10:44
-- blocked statement: DELETE FAL LOT LOT WHERE C FAB TYPE AND EXISTS SELECT
-- blocked statement: UPDATE DOC POSITION DETAIL SET DOC POSITION DETAIL ID B
-- blocked statement: delete from C AP GCO GOOD where rowid doa rowid
-- blocked statement: DELETE FROM FAL LOT WHERE FAL LOT ID B
-- blocked statement: DELETE FROM FAL TASK LINK PROP WHERE FAL LOT PROP ID B
-- blocked statement: INSERT INTO FAL LOT PROGRESS FAL LOT PROGRESS ID FAL LOT
-- blocked statement: insert into FAL TASK LINK FAL SCHEDULE STEP ID
-- FK chain: APP1.GCO_GOOD referenced by(cascade delete) APP1"."GCO_SERVICE referenced by(cascade set null) APP1"."FAL_TASK (APP1.FAL_TASK_S_GCO_SERV) unindexed
-- FK column GCO_GCO_GOOD_ID
-- Suggested index: CREATE INDEX ON "APP1"."FAL_TASK" ("GCO_GCO_GOOD_ID");
-- Other existing Indexes: CREATE INDEX "APP1"."FAL_TASK_S_DIC_FR_TASK_COD7_FK" ON "APP1"."FAL_TASK" ("DIC_FREE_TASK_CODE7_ID")
-- Other existing Indexes: CREATE INDEX "APP1"."FAL_TASK_S_DIC_FR_TASK_COD9_FK" ON "APP1"."FAL_TASK" ("DIC_FREE_TASK_CODE9_ID")
-- Other existing Indexes: CREATE INDEX "APP1"."FAL_TASK_S_PPS_TOOLS13_FK" ON "APP1"."FAL_TASK" ("PPS_TOOLS13_ID")

I’ll detail each part.

ASH

Yes we have to detect blocking issues from the past and I use ASH for that. If you don’t have Diagnostic Pack, then you have to change the query with another way to sample V$SESSION.
-- DELETE on APP1.GCO_GOOD has locked APP1.FAL_TASK in mode 5 for 8 minutes between 14-sep 10:36 and 14-sep 10:44
-- blocked statement: DELETE FAL LOT LOT WHERE C FAB TYPE AND EXISTS SELECT
-- blocked statement: UPDATE DOC POSITION DETAIL SET DOC POSITION DETAIL ID B
-- blocked statement: delete from C AP GCO GOOD where rowid doa rowid
-- blocked statement: DELETE FROM FAL LOT WHERE FAL LOT ID B
-- blocked statement: DELETE FROM FAL TASK LINK PROP WHERE FAL LOT PROP ID B
-- blocked statement: INSERT INTO FAL LOT PROGRESS FAL LOT PROGRESS ID FAL LOT
-- blocked statement: insert into FAL TASK LINK FAL SCHEDULE STEP ID

The first part of the output comes from ASH and detects the blocking situations: which statement, how long, and the statements that were blocked.
This part of the script will probably need to be customized: I join with DBA_HIST_SQL_PLAN supposing that the queries have been captured by AWR as long running queries. I check last 15 days of ASH. You may change those to fit the blocking situation encountered.

Foreign Key

Then, we have to find the unindexed foreign key which is responsible for those locks.

-- FK chain: APP1.GCO_GOOD referenced by(cascade delete) APP1"."GCO_SERVICE referenced by(cascade set null) APP1"."FAL_TASK (APP1.FAL_TASK_S_GCO_SERV) unindexed
-- FK column GCO_GCO_GOOD_ID

Here you see that it’s not easy. Actually, all scripts I’ve seen do not detect that situation where the CASCADE SET NULL cascades the issue. Here “APP1″.”GCO_SERVICE” has its foreign key indexed but the SET NULL, even when not on the referenced column, locks the child (for no reason as far as I know, but it does).
My script goes up to a level 10 using a connect by query to detect this situation.

Suggested Index

The suggested index is an index on the foreign key column:

-- Suggested index: CREATE INDEX ON "APP1"."FAL_TASK" ("GCO_GCO_GOOD_ID");

This is only a suggestion. Any regular index that starts with foreign key column in whatever order can be used to avoid the lock.
Remember to think about performance first. The index may be used to navigate from parent to child.

Existing Index

Finally, when adding an index it’s good to check if there are other indexe that would not be needed anymore, so my script displays all of them.
If you think that some indexes are not required, remember that in 12c you can make them invisible for a while and you will be able to bring them back to visible quickly in case of regression.

Script

Here is the script. Sorry, no comments on it yet and a few display things to change. Just try it, it’s just a query on AWR (need Diag. Pack) and table/index/constraint metadata. You can customize it and don’t hesitate to comment if you have ideas to improve. I used it in several environments and it has always found the chain of foreign key that is responsible for an ‘enq: TM’ blocking situation. And believe me this is not always easy to do just by looking at the data model.


set serveroutput on
declare
procedure print_all(s varchar2) is begin null;
dbms_output.put_line(s);
end;
procedure print_ddl(s varchar2) is begin null;
dbms_output.put_line(s);
end;
begin
dbms_metadata.set_transform_param(dbms_metadata.session_transform,'SEGMENT_ATTRIBUTES',false);
for a in (
select count(*) samples,
event,p1,p2,o.owner c_owner,o.object_name c_object_name,p.object_owner p_owner,p.object_name p_object_name,id,operation,min(p1-1414332420+4) lock_mode,min(sample_time) min_time,max(sample_time) max_time,ceil(10*count(distinct sample_id)/60) minutes
from dba_hist_active_sess_history left outer join dba_hist_sql_plan p using(dbid,sql_id) left outer join dba_objects o on object_id=p2 left outer join dba_objects po on po.object_id=current_obj#
where event like 'enq: TM%' and p1>=1414332420 and sample_time>sysdate-15 and p.id=1 and operation in('DELETE','UPDATE','MERGE')
group by
event,p1,p2,o.owner,o.object_name,p.object_owner,p.object_name,po.owner,po.object_name,id,operation
order by count(*) desc
) loop
print_ddl('-- '||a.operation||' on '||a.p_owner||'.'||a.p_object_name||' has locked '||a.c_owner||'.'||a.c_object_name||' in mode '||a.lock_mode||' for '||a.minutes||' minutes between '||to_char(a.min_time,'dd-mon hh24:mi')||' and '||to_char(a.max_time,'dd-mon hh24:mi'));
for s in (
select distinct regexp_replace(cast(substr(sql_text,1,2000) as varchar2(60)),'[^a-zA-Z ]',' ') sql_text
from dba_hist_active_sess_history join dba_hist_sqltext t using(dbid,sql_id)
where event like 'enq: TM%' and p2=a.p2 and sample_time>sysdate-90
) loop
print_all('-- '||'blocked statement: '||s.sql_text);
end loop;
for c in (
with
c as (
select p.owner p_owner,p.table_name p_table_name,c.owner c_owner,c.table_name c_table_name,c.delete_rule,c.constraint_name
from dba_constraints p
join dba_constraints c on (c.r_owner=p.owner and c.r_constraint_name=p.constraint_name)
where p.constraint_type in ('P','U') and c.constraint_type='R'
)
select c_owner owner,constraint_name,c_table_name,connect_by_root(p_owner||'.'||p_table_name)||sys_connect_by_path(decode(delete_rule,'CASCADE','(cascade delete)','SET NULL','(cascade set null)',' ')||' '||c_owner||'"."'||c_table_name,' referenced by') foreign_keys
from c
where level<=10 and c_owner=a.c_owner and c_table_name=a.c_object_name
connect by nocycle p_owner=prior c_owner and p_table_name=prior c_table_name and ( level=1 or prior delete_rule in ('CASCADE','SET NULL') )
start with p_owner=a.p_owner and p_table_name=a.p_object_name
) loop
print_all('-- '||'FK chain: '||c.foreign_keys||' ('||c.owner||'.'||c.constraint_name||')'||' unindexed');
for l in (select * from dba_cons_columns where owner=c.owner and constraint_name=c.constraint_name) loop
print_all('-- FK column '||l.column_name);
end loop;
print_ddl('-- Suggested index: '||regexp_replace(translate(dbms_metadata.get_ddl('REF_CONSTRAINT',c.constraint_name,c.owner),chr(10)||chr(13),' '),'ALTER TABLE ("[^"]+"[.]"[^"]+") ADD CONSTRAINT ("[^"]+") FOREIGN KEY ([(].*[)]).* REFERENCES ".*','CREATE INDEX ON \1 \3;'));
for x in (
select rtrim(translate(dbms_metadata.get_ddl('INDEX',index_name,index_owner),chr(10)||chr(13),' ')) ddl
from dba_ind_columns where (index_owner,index_name) in (select owner,index_name from dba_indexes where owner=c.owner and table_name=c.c_table_name)
and column_name in (select column_name from dba_cons_columns where owner=c.owner and constraint_name=c.constraint_name)
)
loop
print_ddl('-- Existing candidate indexes '||x.ddl);
end loop;
for x in (
select rtrim(translate(dbms_metadata.get_ddl('INDEX',index_name,index_owner),chr(10)||chr(13),' ')) ddl
from dba_ind_columns where (index_owner,index_name) in (select owner,index_name from dba_indexes where owner=c.owner and table_name=c.c_table_name)
)
loop
print_all('-- Other existing Indexes: '||x.ddl);
end loop;
end loop;
end loop;
end;
/

I didn’t take time to document/comment the script but don’t hesitate to ask what you don’t understand there.

You should not see any ‘enq: TM’ from an OLTP application. If you have them, even short, they will become problematic one day. It’s the kind of thing that can block the whole database.

 

Cet article Script to suggest FK indexes est apparu en premier sur Blog dbi services.

SQL Saturday 510 – Locks, latches et spinlocks – Les slides

Sun, 2016-07-03 13:40

Cette année les gens ont encore répondu présent au SQLSaturday à Paris qui devenu incontournable pour ceux qui aiment échanger autour de la donnée avec les produits Microsoft. Par ailleurs, le temps a plutôt été au rendez-vous cette année et la vue depuis le 40ème étage de la tour Montparnasse est toujours aussi impressionnante.

Pour ma part, j’ai eu le plaisir d’animer une session en fin de journée sur la concurrence d’accès mais vu depuis SQL Server avec 3 principaux mécanismes utilisés comme les locks, latches et spinlocks.

blog 98 - lock latches spinlocks

 

Les slides sont téléchargement directement depuis le site du Pass.

Encore un grand merci aux organisateurs de cet événement (GUSS), aux volontaires de Supinfo, aux sponsors et aux participants!

A bientôt

 

Cet article SQL Saturday 510 – Locks, latches et spinlocks – Les slides est apparu en premier sur Blog dbi services.

Pages