Skip navigation.

DBA Blogs

Building a MariaDB Galera Cluster with Docker

Pythian Group - Thu, 2014-08-28 08:13

There’s been a lot of talk about Docker for running processes in isolated userspace (or the cloud for that matter) lately. Virtualization is a great way to compartmentalise applications  and processes however the overhead of virtualization isn’t always worth it – in fact, without directly attached storage IO degradation can seriously impact performance. The solution? Perhaps Docker… With its easy to use CLI as well as the lightweight implementation of cgroups and kernel namespaces.

Without further ado, I present a step-bystep guide on how to build a MariaDB 5.5 Galera Cluster on Ubuntu 14.04. The same guide can probably be applied for MariaDB versions 10+ however I’ve stuck with 5.5 since the latest version of MariaDB Galera Cluster is still in beta.

So we start off with modifying the “ufw” firewall policy to accept forwarded packets and perform a “ufw” service restart for good measure:

root@workstation:~# vi /etc/default/ufw

DEFAULT_FORWARD_POLICY="ACCEPT"

root@workstation:~# service ufw restart
ufw stop/waiting
ufw start/running

I’m assuming you already have docker installed – this is available as a package within the Ubuntu repositories and also available in the Docker repositories (see http://docs.docker.com/installation/ubuntulinux/). You’ll also need to have LXC installed (“apt-get install lxc” should suffice) in order to attach to the Linux Containers / Docker Images.

The next step is pulling the Docker / Ubuntu repository in order to customize an image for our purposes

root@workstation:~# docker pull ubuntu
Pulling repository ubuntu
c4ff7513909d: Pulling dependent layers 
3db9c44f4520: Download complete 
c5881f11ded9: Download complete 
c4ff7513909d: Download complete 
463ff6be4238: Download complete 
822a01ae9a15: Download complete 
75204fdb260b: Download complete 
511136ea3c5a: Download complete 
bac448df371d: Download complete 
dfaad36d8984: Download complete 
5796a7edb16b: Download complete 
1c9383292a8f: Download complete 
6cfa4d1f33fb: Download complete 
f127542f0b61: Download complete 
af82eb377801: Download complete 
93c381d2c255: Download complete 
3af9d794ad07: Download complete 
a5208e800234: Download complete 
9fccf650672f: Download complete 
fae16849ebe2: Download complete 
b7c6da90134e: Download complete 
1186c90e2e28: Download complete 
0f4aac48388f: Download complete 
47dd6d11a49f: Download complete 
f6a1afb93adb: Download complete 
209ea56fda6d: Download complete 
f33dbb8bc20e: Download complete 
92ac38e49c3e: Download complete 
9942dd43ff21: Download complete 
aa822e26d727: Download complete 
d92c3c92fa73: Download complete 
31db3b10873e: Download complete 
0ea0d582fd90: Download complete 
cc58e55aa5a5: Download complete

After the download is complete, we can check the Ubuntu images available for customization with the following command:

root@workstation:~# docker images
 REPOSITORY          TAG                 IMAGE ID            CREATED             VIRTUAL SIZE
 ubuntu              14.04.1             c4ff7513909d        12 days ago         225.4 MB
 ubuntu              trusty              c4ff7513909d        12 days ago         225.4 MB
 ubuntu              14.04               c4ff7513909d        12 days ago         225.4 MB
 ubuntu              latest              c4ff7513909d        12 days ago         225.4 MB
 ubuntu              utopic              75204fdb260b        12 days ago         230.1 MB
 ubuntu              14.10               75204fdb260b        12 days ago         230.1 MB
 ubuntu              precise             822a01ae9a15        12 days ago         108.1 MB
 ubuntu              12.04               822a01ae9a15        12 days ago         108.1 MB
 ubuntu              12.04.5             822a01ae9a15        12 days ago         108.1 MB
 ubuntu              12.10               c5881f11ded9        9 weeks ago         172.2 MB
 ubuntu              quantal             c5881f11ded9        9 weeks ago         172.2 MB
 ubuntu              13.04               463ff6be4238        9 weeks ago         169.4 MB
 ubuntu              raring              463ff6be4238        9 weeks ago         169.4 MB
 ubuntu              13.10               195eb90b5349        9 weeks ago         184.7 MB
 ubuntu              saucy               195eb90b5349        9 weeks ago         184.7 MB
 ubuntu              lucid               3db9c44f4520        4 months ago        183 MB
 ubuntu              10.04               3db9c44f4520        4 months ago        183 MB

Now that we’ve downloaded our images lets create a custom Dockerfile for our customized MariaDB / Galera Docker image, I’ve added a brief description for each line of the file:

root@workstation:~# vi Dockerfile
 # # MariaDB Galera 5.5.39/Ubuntu 14.04 64bit
 FROM ubuntu:14.04
 MAINTAINER Pythian Nikolaos Vyzas <vyzas@pythian.com>

 RUN echo "deb http://archive.ubuntu.com/ubuntu trusty main universe" > /etc/apt/sources.list # add the universe repo
 RUN apt-get -q -y update # update apt
 RUN apt-get -q -y install software-properties-common # install software-properties-common for key management
 RUN apt-key adv --recv-keys --keyserver hkp://keyserver.ubuntu.com:80 0xcbcb082a1bb943db # add the key for Mariadb Ubuntu repos
 RUN add-apt-repository 'deb http://ftp.cc.uoc.gr/mirrors/mariadb/repo/5.5/ubuntu trusty main' # add the MariaDB repository for 5.5
 RUN apt-get -q -y update # update apt again
 RUN echo mariadb-galera-server-5.5 mysql-server/root_password password root | debconf-set-selections # configure the default root password during installation
 RUN echo mariadb-galera-server-5.5 mysql-server/root_password_again password root | debconf-set-selections # confirm the password (as in the usual installation)
 RUN LC_ALL=en_US.utf8 DEBIAN_FRONTEND=noninteractive apt-get -o Dpkg::Options::='--force-confnew' -qqy install mariadb-galera-server galera mariadb-client # install the necessary packages
 ADD ./my.cnf /etc/mysql/my.cnf # upload the locally created my.cnf (obviously this can go into the default MariaDB path
 RUN service mysql restart # startup the service - this will fail since the nodes haven't been configured on first boot
 EXPOSE 3306 4444 4567 4568 # open the ports required to connect to MySQL and for Galera SST / IST operations

We’ll also need our base configuration for MariaDB, I’ve included the base configuration variable for Galera – obviously there are more however these are good enough for starting up the service:

root@workstation:~# vi my.cnf
 [mysqld]
 wsrep_provider=/usr/lib/galera/libgalera_smm.so
 wsrep_cluster_address=gcomm://
 wsrep_sst_method=rsync
 wsrep_cluster_name=galera_cluster
 binlog_format=ROW
 default_storage_engine=InnoDB
 innodb_autoinc_lock_mode=2
 innodb_locks_unsafe_for_binlog=1

So far so good, we have Docker installed and our Dockerfile as well as our “my.cnf” file ready to go. Now its time to build our Docker image, check that the image exists and startup 3x separate Docker images for each of our Galera nodes:

root@workstation:~# docker build -t ubuntu_trusty/mariadb-galera .
root@workstation:~# docker images |grep mariadb-galera
 ubuntu_trusty/mariadb-galera   latest              afff3aaa9dfb        About a minute ago   412.5 MB
docker run --name mariadb1 -i -t -d ubuntu_trusty/mariadb-galera /bin/bash
docker run --name mariadb2 -i -t -d ubuntu_trusty/mariadb-galera /bin/bash
docker run --name mariadb3 -i -t -d ubuntu_trusty/mariadb-galera /bin/bash

We’ve started up our Docker images, now lets verify that they are in fact up and retrieve the process information we need to connect. We’ll need two pieces of information, the IP-Address and the Docker image name which can be received using the combination the the “docker ps” and the “docker inspect” commands:

}]root@workstation:~# docker ps
 CONTAINER ID        IMAGE                                 COMMAND             CREATED             STATUS              PORTS                                    NAMES
 b51e74933ece        ubuntu_trusty/mariadb-galera:latest   /bin/bash           About an hour ago   Up About an hour    3306/tcp, 4444/tcp, 4567/tcp, 4568/tcp   mariadb3
 03109c7018c0        ubuntu_trusty/mariadb-galera:latest   /bin/bash           About an hour ago   Up About an hour    3306/tcp, 4444/tcp, 4567/tcp, 4568/tcp   mariadb2
 1db2a9a520f8        ubuntu_trusty/mariadb-galera:latest   /bin/bash           About an hour ago   Up About an hour    3306/tcp, 4444/tcp, 4567/tcp, 4568/tcp   mariadb1
root@workstation:~# docker ps |cut -d' ' -f1 |grep -v CONTAINER | xargs docker inspect |egrep '"ID"|IPAddress'
 "ID": "b51e74933ece2f3f457ec87c3a4e7b649149e9cff2a4705bef2a070f7adbafb0",
 "IPAddress": "172.17.0.3",
 "ID": "03109c7018c03ddd8448746437346f080a976a74c3fc3d15f0191799ba5aae74",
 "IPAddress": "172.17.0.4",
 "ID": "1db2a9a520f85d2cef6e5b387fa7912890ab69fc0918796c1fae9c1dd050078f",
 "IPAddress": "172.17.0.2",

Time to use lxc-attach to connect to our Docker images using the Docker image name, add the mounts to “/etc/mtab” to keep them MariaDB friendly and customize the “gcomm://” address as we would for a usual Galera configuration (the Docker image name is a generated when the instance fires up so make sure to use your own instance name in the following commands):

root@workstation:~# lxc-attach --name b51e74933ece2f3f457ec87c3a4e7b649149e9cff2a4705bef2a070f7adbafb0
 root@b51e74933ece:~# cat /proc/mounts > /etc/mtab
 root@b51e74933ece:~# service mysql restart
 * Starting MariaDB database mysqld                            [ OK ]
 * Checking for corrupt, not cleanly closed and upgrade needing tables.

root@b51e74933ece:~# vi /etc/mysql/my.cnf
 #wsrep_cluster_address=gcomm://
 wsrep_cluster_address=gcomm://172.17.0.2,172.17.0.3,172.17.0.4

root@b51e74933ece:~# exit
 exit

root@workstation:~# lxc-attach --name 03109c7018c03ddd8448746437346f080a976a74c3fc3d15f0191799ba5aae74
 root@03109c7018c0:~# cat /proc/mounts > /etc/mtab
 root@03109c7018c0:~# vi /etc/mysql/my.cnf
 #wsrep_cluster_address=gcomm://
 wsrep_cluster_address=gcomm://172.17.0.2,172.17.0.3,172.17.0.4
 root@03109c7018c0:~# service mysql start
 * Starting MariaDB database server mysqld                            [ OK ]
 * Checking for corrupt, not cleanly closed and upgrade needing tables.
 root@03109c7018c0:~# mysql -uroot -proot
 Welcome to the MariaDB monitor.  Commands end with ; or \g.
 Your MariaDB connection id is 30
 Server version: 5.5.39-MariaDB-1~trusty-wsrep mariadb.org binary distribution, wsrep_25.10.r4014

Copyright (c) 2000, 2014, Oracle, Monty Program Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB [(none)]> show status like 'wsrep_cluster%';
 +--------------------------+--------------------------------------+
 | Variable_name            | Value                                |
 +--------------------------+--------------------------------------+
 | wsrep_cluster_conf_id    | 2                                    |
 | wsrep_cluster_size       | 2                                    |
 | wsrep_cluster_state_uuid | 42bc375b-2bc0-11e4-851c-1a7627c0624c |
 | wsrep_cluster_status     | Primary                              |
 +--------------------------+--------------------------------------+
 4 rows in set (0.00 sec_

MariaDB [(none)]> exit
 Bye
 root@03109c7018c0:~# exit
 exit

root@workstation:~# lxc-attach --name 1db2a9a520f85d2cef6e5b387fa7912890ab69fc0918796c1fae9c1dd050078f
 root@1db2a9a520f8:~# cat /proc/mounts > /etc/mtab
 root@1db2a9a520f8:~# vi /etc/mysql/my.cnf
 root@1db2a9a520f8:~# service mysql start
 * Starting MariaDB database server mysqld                                                                                                                                                     [ OK ]
 root@1db2a9a520f8:~# mysql -uroot -proot
 Welcome to the MariaDB monitor.  Commands end with ; or \g.
 Your MariaDB connection id is 34
 Server version: 5.5.39-MariaDB-1~trusty-wsrep mariadb.org binary distribution, wsrep_25.10.r4014

Copyright (c) 2000, 2014, Oracle, Monty Program Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB [(none)]> show status like 'wsrep_cluster%';
 +--------------------------+--------------------------------------+
 | Variable_name            | Value                                |
 +--------------------------+--------------------------------------+
 | wsrep_cluster_conf_id    | 3                                    |
 | wsrep_cluster_size       | 3                                    |
 | wsrep_cluster_state_uuid | 42bc375b-2bc0-11e4-851c-1a7627c0624c |
 | wsrep_cluster_status     | Primary                              |
 +--------------------------+--------------------------------------+
 4 rows in set (0.00 sec)

MariaDB [(none)]> exit
 Bye
 root@1db2a9a520f8:~# exit
 exit

Now be honest… Wasn’t that easier than creating multiple virtual machines and configuring the OS for each?

Enjoy your new MariaDB Galera Cluster and happy Dockering!

Categories: DBA Blogs

How to Configure an Azure Point-to-Site VPN – Part 3

Pythian Group - Thu, 2014-08-28 07:58

This blog post is the last of this series and which will demonstrate how to configure a Point-to-Site VPN step-by-step. In my first blog post, I demonstrated how to configure a virtual network and a dynamic routing gateway. This was followed by another post about how to deal with the certificate. Today we will learn how to configure the VPN client.

CONFIGURE THE VPN CLIENT
1. In the Management Portal, navigate to virtual network page; in the “quick glance” you have the links to download the VPN package.

Choose the one appropriate to your architecture (x86 or x64).

Screen Shot 2014-07-31 at 14.10.48

2. After successfully download, copy the file to your servers and execute the setup.
Screen Shot 2014-07-31 at 14.49.34

3. Click Yes when it asks if you want to install the VP and let it run.
Screen Shot 2014-07-31 at 15.09.26

4. After successful installation, it will be visible in your network connections.
Screen Shot 2014-07-31 at 15.46.07

5. In Windows 2012 you can click in the network icon, in the notification area icons (close to the clock), and it will show the right-side bar with all the network connections. You can connect from there.
The other option is right-click the connection in the “Network Connections” window (previous step) and click “Connect / Disconnect”.

6. A window will be shown, click Connect.

Screen Shot 2014-07-31 at 15.58.23

7. Now check the box near to “Do not show this message again for this Connection” and click on “Continue”.

If everything is ok, the connection will succeed.

Screen Shot 2014-07-31 at 16.07.04

8. To confirm that you are connected, execute the command “ipconfig /all” in the command line, and you should see and entry for the VPN with an IP assigned.

Screen Shot 2014-07-31 at 16.24.01

9. After a while, you will be also able to see the connection in you vNet dashboard. As you can see in the image you have data in/out in the vNet.

Screen Shot 2014-07-31 at 16.26.39

After this last part, you are done with the point-to-site VPN configuration. You can test the connectivity by executing the “ping” command and also using the “telnet” client to test if some specific port is opened and reachable.

The point-to-site VPN is recommended if you want connect users/devices to your Azure infrastructure, for few different reasons. If you need to connect the entire or part of your on-premises infrastructure, the way to go is configure a Site-to-Site VPN. Stay tuned for a blog post on how it works.

Thank you for reading!

Categories: DBA Blogs

Partner Webcast – Oracle Internet of Things Platform: Java 8 connecting the world

The Internet of Things Revolution is gaining speed. There are more and more devices, data and connections, thus more and more complexity to handle. But in the first place it brings complete...

We share our skills to maximize your revenue!
Categories: DBA Blogs

force_match => TRUE option of DBMS_SQLTUNE.IMPORT_SQL_PROFILE

Bobby Durrett's DBA Blog - Wed, 2014-08-27 14:13

Yesterday and today I’ve read or heard two people mention the force_match => TRUE parameter value for DBMS_SQLTUNE.IMPORT_SQL_PROFILE and how it forces a profile to work on all SQL statements that are the same except for their literal values.  So, I ran a quick test using the coe_xfr_sql_profile.sql utility that comes with the SQLT scripts that are available for download on Oracle’s support site.

I’ve mentioned in earlier posts how we use coe_xfr_sql_profile.sql to force plans on particular SQL statements using the sql_id of the SQL statement and the plan_hash_value of the plan:

July 2013 post

October 2013 post

March 2014 post

May 2014 post

Yesterday I read this post by David Kurtz where he mentions force_match: post

Today I heard Karen Morton mention force_match in her webinar which should soon be posted here: url

So, after the webinar completed I built a test case to see how the force_match=>TRUE option works.  I created a test table and ran a query with a literal in the where clause and got its plan showing its sql_id and plan_hash_value:

ORCL:SYSTEM>create table test as select * from dba_tables;
ORCL:SYSTEM>SELECT sum(blocks) from test
  2  where owner='SYS';

SUM(BLOCKS)
-----------
      34633

ORCL:SYSTEM>select * from
  2  table(dbms_xplan.display_cursor(null,null,'ALL'));

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------
SQL_ID  10g08ytt2m5mu, child number 0
-------------------------------------
SELECT sum(blocks) from test where owner='SYS'

Plan hash value: 1950795681

---------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)|
----------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |       |       |    29 (100)| 
|   1 |  SORT AGGREGATE    |      |     1 |    30 |            |
|*  2 |   TABLE ACCESS FULL| TEST |   992 | 29760 |    29   (0)|
----------------------------------------------------------------

Then I ran coe_xfr_sql_profile.sql to create a profile that forces the plan on the given sql_id:

SQL> @coe_xfr_sql_profile.sql 10g08ytt2m5mu 1950795681

Then, using vi I edited the output of coe_xfr_sql_profile.sql:

vi coe_xfr_sql_profile_10g08ytt2m5mu_1950795681.sql

I searched for force_match and changed the line to read like this:

force_match => TRUE

instead of

force_match => FALSE

There are comments in the script explaining the meaning of these two values but I don’t want to plagiarize the script by including them here.  Next I ran the edited script:

sqlplus system/password < coe_xfr_sql_profile_10g08ytt2m5mu_1950795681.sql

Then I ran a test showing that not only the original query with the where clause literal ‘SYS’ would use the profile but the same query with a different literal ‘SYSTEM’ would use the  created profile.

ORCL:SYSTEM>SELECT sum(blocks) from test
  2  where owner='SYS';

SUM(BLOCKS)
-----------
      34633

ORCL:SYSTEM>select * from
  2  table(dbms_xplan.display_cursor(null,null,'ALL'));

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------
SQL_ID  10g08ytt2m5mu, child number 0
-------------------------------------
SELECT sum(blocks) from test where owner='SYS'

Plan hash value: 1950795681

----------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)|
----------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |       |       |    29 (100)|
|   1 |  SORT AGGREGATE    |      |     1 |    30 |            |
|*  2 |   TABLE ACCESS FULL| TEST |    81 |  2430 |    29   (0)|
----------------------------------------------------------------

Note
-----
  - SQL profile coe_10g08ytt2m5mu_1950795681 used for this statement

ORCL:SYSTEM>SELECT sum(blocks) from test
  2  where owner='SYSTEM';

SUM(BLOCKS)
-----------
        520

ORCL:SYSTEM>
ORCL:SYSTEM>select * from
  2  table(dbms_xplan.display_cursor(null,null,'ALL'));

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------
SQL_ID  806ncj0a5fgus, child number 0
-------------------------------------
SELECT sum(blocks) from test where owner='SYSTEM'

Plan hash value: 1950795681

----------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)|
----------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |       |       |    29 (100)|
|   1 |  SORT AGGREGATE    |      |     1 |    30 |            |
|*  2 |   TABLE ACCESS FULL| TEST |    81 |  2430 |    29   (0)|
----------------------------------------------------------------

Note
-----
  - SQL profile coe_10g08ytt2m5mu_1950795681 used for this statement

Note that a different sql_id = 806ncj0a5fgus represents the second statement but the same plan_hash_value = 1950795681.  Also note that the SQL profile has the same name in both plans = coe_10g08ytt2m5mu_1950795681.

Now that I’m aware of the force_match=>TRUE option of DBMS_SQLTUNE.IMPORT_SQL_PROFILE I can use SQL profiles to force plans on queries that have different literal values, but are otherwise identical.  This adds a whole new set of problems that can be resolved without modifying the existing code which can really help in a performance firefight.

– Bobby

 

 

 

Categories: DBA Blogs

My Speaking Schedule for Oracle Open World 2014

Galo Balda's Blog - Wed, 2014-08-27 12:22

A quick post to let you know about the two presentations that I’ll be doing at Oracle Open World 2014.

Session ID:         UGF4482
Session Title:     “Getting Started with SQL Pattern Matching in Oracle Database 12c
Venue / Room:  Moscone South – 301
Date and Time:  9/28/14, 13:30 – 14:15

Session ID:          CON4493
Session Title:      “Regular Expressions in Oracle Database 101″
Venue / Room:   Moscone South – 303
Date and Time:   10/2/14, 13:15 – 14:00

As usual, you might have to check before the session to make sure the room has not changed.

I hope to see you there.


Filed under: 12C, Open World, Oracle, Regular Expressions, Row Pattern Matching, SQL Tagged: 12C, Open World, Oracle, Regular Expressions, Row Pattern Matching, SQL
Categories: DBA Blogs

Hands-On Programming with R by Garrett Grolemund

Surachart Opun - Wed, 2014-08-27 02:42
R is a free software environment for statistical computing and graphics. It compiles and runs on a wide variety of UNIX platforms, Windows and MacOS.
R language is useful to become a data scientist, as well as a computer scientist. I mention a book that points about a data science with R. A Hands-On Programming with R Write Your Own Functions and Simulations By Garrett Grolemund. It was written how to solve the logistical problems of data science. Additional, How to write our own functions and simulations with R. In a book, readers are able to learn in practical data analysis projects (Weighted Dice, Playing Cards, Slot Machine) and understand more in R. Additional, Appendix A-E will help to install/update R and R packages as well as loading Data and debugging in R code.
Garrett Grolemund maintains shiny.rstudio.com, the development center for the Shiny R package.
Free Sampler.Written By: Surachart Opun http://surachartopun.com
Categories: DBA Blogs

SQL Server Replication Quick Tips

Pythian Group - Tue, 2014-08-26 07:56

There is a time in every SQL Server DBA career where a mail came in with a “replication is not working, please check” message. This article is intended to provide with quick tips on how to handle common replication errors and performance problems in a one way transactional replication topology

Oh boy, there is a data problem:

ID-10039897

You check replication monitor and get a :

“Transaction sequence number: 0x0003BB0E000001DF000600000000, Command ID: 1″

The row was not found at the Subscriber when applying the replicated command. (Source: MSSQLServer, Error number: 20598)

Note the sequential number will be used in the following scripts, also the commandID is important to note as not necessarily the whole sequential number has issues, it might be tied to just one command.

Go to the distributor database en run the following command to get the list of articles involved in this issue:

select * from dbo.MSarticles
where article_id in (
select article_id from MSrepl_commands
where xact_seqno = 0x0003BB0E000001DF000600000000)

To get the whole list of commands you can run below query

exec sp_browsereplcmds
@xact_seqno_start = ’0x0003BB0E000001DF000600000000′,
@xact_seqno_end = ’0x0003BB0E000001DF000600000000′

With this last query you can get to the exact command that is failing (by searching the command number in the commandID column)

You will notice that a transactional replication will typically(depending on setup) use insert, delete, update stored procedures to replicate the data, so the command you will see over here will look something like:

{CALL [sp_MSdel_dboMyArticle] (118)}

That is the stored procedure generated to process delete statement over dbo.MyArticle table, and in this case it is trying to delete ID 118. Based on the error reported you will now realize that the issue is that the replication is trying to delete MyArtcile on ID 118 and is not there, so it is trying to delete a non existent record.

Options:

  1. You can either check the publisher for this record and manually insert it in the subscriber, this will cause the replication command to succeed and will fix the issue.
  2. You can skip the command, for this specific example you can skip the command as there is no need to delete something that has been already deleted, by removing the command from the MSrepl_commands table. (Beware, only do this when you know what you are doing, manually removing transactions can result in a unstable replication.) In this example you would use something like
    Delete from MSrepl_commands
    where xact_seqno = 0x0003BB0E000001DF000600000000 and commandID=1
  3. Reinitialize, this option is the least famous, you should try to fix the issue before doing this, however if after skipping the command you still get new errors everywhere, something definitely went wrong and there is no easy way to guarantee that your subscription is up to date and stable, this can be indicator that someone or something messed around with the data, there was some type of modification at the subscription and this is causing issues with the replication. Remember most likely a one way transactional replication is intended to have a copy of the data so it can be queried, no modification should be made to the data as this won´t replicate back to the publisher.

Query time outs:

After checking the replication monitor you get a message like:ID-10054415

Query timeout expired
The process is running and is waiting for a response from the server
Initializing…

and then terminating with this error…
Agent ‘MyAgent’ is retrying after an error, YY retries attempted

This can be due to several reasons:

  • Your transaction is taking a long time and needs some tuning. If your transaction is touching too much data or is using a bad query plan it can result in a long running query, check your TSQL and see if the execution plan is optimal
  • There is a problem with the network. If you normally don´t have this issue and this just happened out of the blue, you can try to check the network, sometimes a network failure or saturated endpoint can increase transfer rates affecting your replication.
  • Server performance, either the publisher or subscriber can have a performance problem, either too much CPU or Memory usage can eventually impact a replication transaction causing it to timeout
  • The query just needs some more time to complete. If this is the case you can tweak the time out setting to give the transaction some more time so it can process properly. To do this:
  1. Right click the Replication folder
  2. Click Distributor Properties and select General
  3. Click ‘Profile Defaults’
  4. Choose ‘Distribution Agents’ on left
  5. Click ‘New’ to create a new default agent profile
  6. Choose ‘Default Agent Profile’ from the list displayed, (to copy this)
  7. Pick a name for your new profile and upate the QueryTimeout value in right column
  8. Save
  9. Choose to use this profile across all your replication sets. However I would recommend to only apply to the agent that requires this change
  10. To individually assign the profile, open Replication Monitor and then in the left pane click your replication set
  11. In the right pane, select your desired agent, right click and change the profile to the new one you just created

 Mini Hack on expired subscriptionsID-10098834

When a replication is marked as expired, it will tell you that you need to reinitialize.

To activate it “under the hood”, check your replication monitor last error, it will show you the last sequential number that tried to process, then run this command(using the corresponding seq_no):

update MSsubscriptions
set status=2
where subscription_seqno=0x0002AADE00005030000100000002

The status column means:

0 = Inactive.

1 = Subscribed.

2 = Active.

You can change it to Active and it will try to process again. Why would you use this? if the subscription expired but your distribution cleanup job haven´t run, then it can try to reprocess everything again, if the issue was related to a network time out and now you have your network back up, you can try this as it will try to start from the last sequential number. Also you can try to do this to reproduce the last error reported, so it will fail and eventually expire again but you will have a better idea on why it failed in the first place.

Multi threading or “Streams”

A slow replication, and by slow I mean when you know that your replication is experiencing a delay when your command goes from the distributor to the subscription, you can check this with performance counters or quickly insert a token(http://technet.microsoft.com/en-us/library/ms151846%28v=sql.105%29.aspx)

You can improve the performance by adding streams, normally a default setting will write sequentially the replication transactions one by one, with Streams you can add more threads, say you specify to use 4 strems, you will be processing 4 transactions at a time meaning a faster turnaround. This can work beautifully but it can also generate deadlocks and inconsistencies, I would recommend to start low and just add 1 stream at a time and stop when you start seeing a problem. Do not go crazy and feel this is a turbo button and add 30 streams, and like most features, test it in QA first!

To Enable this option follow these steps:

  1. Open Replication Monitor, expand the Publisher and select the Publication in the left pane.
  2. On the right pane window , under “All Subscriptions” , you will see a list of all the Subscribers.
  3. Right Click the Subscriber you want to modify and click on “View Details”. A new Window will appear with the distribution agent session details.
  4. Now click on “Action” in the Menu bar at top and select “Distribution Agent Job Properties”, this will open the corresponding job properties.ID-100203331
  5. Go to  “Steps” in the left pane window followed by highlighting “Run Agent” on the Right pane window, click Edit.
  6. A new Windows will popup , scroll to the right end of the command section and append this parameter “ -SubscriptionStreams 2”
  7. Save the settings and restart the Distribution Agent job.

You might encounter some issues when implementing this, you can read this KB for further info:

http://support.microsoft.com/kb/953199

Conclusion

There are many tips on how to fix a replication, sometimes is easier to just reinitialize, but sometimes this is not an option when critical systems depend on the subscription to be up to date or your database is so huge that it will take days to complete. When possible try to troubleshoot instead of just restarting the replication from scratch as it will give you a lot more insight on what is going on.

Categories: DBA Blogs

12.1.0.2 Introduction to Attribute Clustering (The Division Bell)

Richard Foote - Tue, 2014-08-26 00:03
One of the really cool new features introduced in 12.1.0.2 is Attribute Clustering. This new table based attribute allows you to very easily cluster data in close physical proximity based on the content of specific columns. As I’ve discussed many times, indexes love table data that is physically clustered in a similar manner to the index […]
Categories: DBA Blogs

Arizona Oracle User Group (AZORA)

Bobby Durrett's DBA Blog - Mon, 2014-08-25 11:10

I saw this fun blog post about the Arizona Oracle User Group getting organized: blog post

I’m definitely interested in being involved.  Please pass this on to any Oracle DBAs, developers, etc. that you know in the Phoenix area.

– Bobby

Categories: DBA Blogs

Integrated Application Heartbeat for Oracle GoldenGate

DBASolved - Mon, 2014-08-25 06:51

Over the last two weeks I’ve been working on a heartbeat monitoring solution for a client.  This is not the normal heartbeat solution provided by Oracle as described in note 1299679.1; yet very similar.  The approach that I configured is similar to a traditional heartbeat setup but uses the same extract and replicats already being used by the database/application.  For simplicity reasons, I like to call this approach an Integrated Application Heartbeat.

In order to setup this style of heartbeat monitoring, the following items are are needed:

1.  Identify the heartbeat table in the source database.  In this configuration, lets call this table SETTINGS (Keep in mind nothing will change with the table (no DDL changes)).
2.  Identify and generate DDL for the target heartbeat table.  Let’s call this table GG_STATUS_HB.
3.  Identify and generate DDL for the target heartbeat history table.  Let’s call this table GG_STATUS_HB_HIST
4.  Two triggers for updating information in the target heartbeat tables (1 example below)
5.  A few macros.  One for each of the processes in the configuration.
6.  Either a crontab job or a DBMS_JOBs process

Now that the components have been identified, lets take a look at what needs to be done.

The source side heartbeat table has already been identified.  In the database, the table name is SETTINGS.  The SETTINGS table has a single column for a primary key.  This table is updated using a crontab job that runs the following SQL.  The <schema> variable is due to a this table (SETTINGS) being in more than one schema.

update <schema>.settings set id=id where rownum < 2;

On the target side, the target heartbeat and heartbeat history table need to be created.  These tables will not match the columns in the SETTINGS table.  This means that all the data that is replicated in the SETTINGS table still have to be replicated a long with being used as a heartbeat table.  The DDL to create these two tables are as follows:

CREATE TABLE <schema>.GG_STATUS_HB
(
SITE_ID                    NUMBER(10),
DB_SCHEMA                  VARCHAR2(30),
CSN_TS                     TIMESTAMP(6),
CSN_NB                     NUMBER(18),
EXT_GROUP                  VARCHAR2(10),
EXT_TIME                   TIMESTAMP(6),
EXT_LAG                    NUMBER,
PMP_GROUP                  VARCHAR2(10),
PMP_TIME                   TIMESTAMP(6),
PMP_LAG                    NUMBER,
TGT_DB_NAME                VARCHAR2(15),
REP_GROU{                  VARCHAR2(10),
REP_TIME                   TIMESTAMP(6),
REP_LAG                    NUMBER,
TOTAL_LAG                  NUMBER,
TOTAL_LAG_MIN              NUMBER,
UPDATE_TS                  TIMESTAMP(6),
CONSTRAINT GG_STATUS_HB_PK PRIMARY KEY (SITE_ID) ENABLE
);
CREATE TABLE <schema>.GG_STATUS_HB_HIST
(
SITE_ID                    NUMBER(10),
DB_SCHEMA                  VARCHAR2(30),
CSN_TS                     TIMESTAMP(6),
CSN_NB                     NUMBER(18),
EXT_GROUP                  VARCHAR2(10),
EXT_TIME                   TIMESTAMP(6),
EXT_LAG                    NUMBER,
PMP_GROUP                  VARCHAR2(10),
PMP_TIME                   TIMESTAMP(6),
PMP_LAG                    NUMBER,
TGT_DB_NAME                VARCHAR2(15),
REP_GROU{                  VARCHAR2(10),
REP_TIME                   TIMESTAMP(6),
REP_LAG                    NUMBER,
TOTAL_LAG                  NUMBER,
TOTAL_LAG_MIN              NUMBER,
UPDATE_TS                  TIMESTAMP(6),
CONSTRAINT GG_STATUS_HB_HIST_PK PRIMARY KEY (SITE_ID, DB_SCHEMA, CSN_TS, CSN_NB, UPDATE_TS) ENABLE
);

In mapping these tables, I found it easier to create macros to handle the mappings.  Macros are especially handy since the SETTINGS table is in multiple schemas and they need to be mapped through existing GoldenGate processes.  For more on macros and passing parameters, I’ve written this post to highlight macros (here).

Examples of the macros that are used in this configuration are as follows:

Macro for Extract process:
This macro is used to capture and map the extract name, time the transaction was extracted, the timestamp of the commit and the System Change Number.

--Heartbeat Extract Macro Library - Version 1.01
--Inital write - 08-13-2014 - BLC</pre>
<pre>-- ***************************************
-- Common macros for this library
-- called internally
-- ***************************************
MACRO #hb_ext_details
BEGIN
EXT_GROUP = @GETENV("GGENVIRONMENT","GROUPNAME"), &
EXT_TIME = @DATE("YYYY-MM-DD HH:MI:SS.FFFFFF","JTS",@GETENV("JULIANTIMESTAMP")), &
CSN_TS = @GETENV("GGHEADER","COMMITTIMESTAMP"), &
TRANS_CSN = @GETENV("TRANSACTION","CSN")
END;
-- *****************************************
-- Application Specific Marcos for Heartbeat
-- *****************************************
MACRO #opshb_info
BEGIN
       #hb_ext_details()
END;

Macro for the Pump process:
The macro captures the pump name and time of transaction passed through the pump.

--Heartbeat Pump Macro Library - Version 1.01
--Inital write - 08-13-2014 - BLC
-- ***************************************
-- Common macros for this library
-- called internally
-- ***************************************

MACRO #hb_pmp_details
BEGIN
PMP_GROUP=@GETENV("GGENVIRONMENT","GROUPNAME"),
PMP_TIME=@DATE("YYYY-MM-DD HH:MI:SS.FFFFFF","JTS",@GETENV("JULIANTIMESTAMP"))
END;

-- *****************************************
-- Application Specific Marcos for Heartbeat
-- *****************************************

MACRO #opshb_info
BEGIN
        #hb_pmp_details()
END;

Macro for the Replicat Process:
This macro does the mapping of all the tokens that have been passed from the other GoldenGate processes to the target tables.

--Heartbeat Replicat Macro Library - Version 1.01
--Inital write - 08-13-2014 - BLC
-- ***************************************
-- Common macros for this library
-- called internally
-- ***************************************
MACRO #opshb_rep_details
BEGIN
SITE_ID=@TOKEN("NHIN_STORE_ID"), &
DB_SCHEMA=@TOKEN("SRC_DB_SCHEMA"),
HOST_NAME=@GETENV("GGFILEHEADER", "HOSTNAME"), &
CSN_TS=@TOKEN("SRC_CSN_TS"), &
CSN_NB=@TOKEN("SRC_TRANS_CSN"), &
EXT_NAME=@TOKEN("EXT_GROUP"), &
EXT_TIME=@TOKEN("EXT_TIME"), &
PMP_GROUP=@TOKEN("PMP_GROUP"), &
PMP_TIME=@TOKEN("PMP_TIME"), &
TGT_DB_NAME=@GETENV("DBENVIRONMENT","DBNAME"), &
REP_GROUP=@GETENV ("GGENVIRONMENT", "GROUPNAME"), &
REP_TIME=@DATE("YYYY-MM-DD HH:MI:SS.FFFFFF","JTS",@GETENV ("JULIANTIMESTAMP")), &
UPDATE_TS=@DATENOW()
END;
-- *****************************************
-- Application Specific Marcos for Heartbeat
-- *****************************************
MACRO #hb_info
PARAMS (#src_schema)
BEGIN
MAP #src_schema.SETTINGS,TARGET <target schema>.GG_STATUS_HB, &
KEYCOLS (SITE_ID), &
INSERTMISSINGUPDATES, &
COLMAP ( #hb_rep_details() );</pre>
<pre>MAP #src_schema.SETTINGS,TARGET <target schema>.GG_STATUS_HB_HIST, &
KEYCOLS (SITE_ID, DB_SCHEMA, CSN_TS, CSN_NB, UPDATE_TS), &
INSERTALLRECORDS, HANDLECOLLISIONS &
COLMAP ( #hb_rep_details() );
END;

For each of the target heartbeat tables, a trigger is needed to calculate the lag and timestamp information.  The triggers that I used/created are based on the Oracle supplied examples in note 1299679.1.  To keep this blog some what short, I’m just going to include one sample of the triggers used.

Trigger Example:


CREATE OR REPLACE TRIGGER <target schema>.GG_STATUS_HB_TRIG
BEFORE INSERT OR UPDATE ON <target schema>.GG_STATUS_HB
FOR EACH ROW
BEGIN
SELECT TRUNC(TO_NUMBER(SUBSTR((:NEW.SRC_EXT_TIME - :NEW.SRC_CSN_TS),1, INSTR(:NEW.SRC_EXT_TIME - :NEW.SRC_CSN_TS,' ')))) * 86400
+ TO_NUMBER(SUBSTR((:NEW.SRC_EXT_TIME - :NEW.SRC_CSN_TS), INSTR((:NEW.SRC_EXT_TIME - :NEW.SRC_CSN_TS),' ')+1,2)) * 3600
+ TO_NUMBER(SUBSTR((:NEW.SRC_EXT_TIME - :NEW.SRC_CSN_TS), INSTR((:NEW.SRC_EXT_TIME - :NEW.SRC_CSN_TS),' ')+4,2) ) * 60
+ TO_NUMBER(SUBSTR((:NEW.SRC_EXT_TIME - :NEW.SRC_CSN_TS), INSTR((:NEW.SRC_EXT_TIME - :NEW.SRC_CSN_TS),' ')+7,2))
+ TO_NUMBER(SUBSTR((:NEW.SRC_EXT_TIME - :NEW.SRC_CSN_TS), INSTR((:NEW.SRC_EXT_TIME - :NEW.SRC_CSN_TS),' ')+10,6)) / 1000000
INTO :NEW.EXT_LAG
FROM DUAL;</pre>
<pre>SELECT TRUNC(TO_NUMBER(SUBSTR((:NEW.SRC_PMP_TIME - :NEW.SRC_EXT_TIME),1, INSTR(:NEW.SRC_PMP_TIME - :NEW.SRC_EXT_TIME,' ')))) * 86400
+ TO_NUMBER(SUBSTR((:NEW.SRC_PMP_TIME - :NEW.SRC_EXT_TIME), INSTR((:NEW.SRC_PMP_TIME - :NEW.SRC_EXT_TIME),' ')+1,2)) * 3600
+ TO_NUMBER(SUBSTR((:NEW.SRC_PMP_TIME - :NEW.SRC_EXT_TIME), INSTR((:NEW.SRC_PMP_TIME - :NEW.SRC_EXT_TIME),' ')+4,2) ) * 60
+ TO_NUMBER(SUBSTR((:NEW.SRC_PMP_TIME - :NEW.SRC_EXT_TIME), INSTR((:NEW.SRC_PMP_TIME - :NEW.SRC_EXT_TIME),' ')+7,2))
+ TO_NUMBER(SUBSTR((:NEW.SRC_PMP_TIME - :NEW.SRC_EXT_TIME), INSTR((:NEW.SRC_PMP_TIME - :NEW.SRC_EXT_TIME),' ')+10,6)) / 1000000
INTO :NEW.PMP_LAG
FROM DUAL;</pre>
<pre>SELECT TRUNC(TO_NUMBER(SUBSTR((:NEW.TGT_REP_TIME - :NEW.SRC_PMP_TIME),1, INSTR(:NEW.TGT_REP_TIME - :NEW.SRC_PMP_TIME,' ')))) * 86400
+ TO_NUMBER(SUBSTR((:NEW.TGT_REP_TIME - :NEW.SRC_PMP_TIME), INSTR((:NEW.TGT_REP_TIME - :NEW.SRC_PMP_TIME),' ')+1,2)) * 3600
+ TO_NUMBER(SUBSTR((:NEW.TGT_REP_TIME - :NEW.SRC_PMP_TIME), INSTR((:NEW.TGT_REP_TIME - :NEW.SRC_PMP_TIME),' ')+4,2) ) * 60
+ TO_NUMBER(SUBSTR((:NEW.TGT_REP_TIME - :NEW.SRC_PMP_TIME), INSTR((:NEW.TGT_REP_TIME - :NEW.SRC_PMP_TIME),' ')+7,2))
+ TO_NUMBER(SUBSTR((:NEW.TGT_REP_TIME - :NEW.SRC_PMP_TIME), INSTR((:NEW.TGT_REP_TIME - :NEW.SRC_PMP_TIME),' ')+10,6)) / 1000000
INTO :NEW.REP_LAG
FROM DUAL;</pre>
<pre>SELECT TRUNC(TO_NUMBER(SUBSTR((:NEW.TGT_REP_TIME - :NEW.SRC_CSN_TS),1, INSTR(:NEW.TGT_REP_TIME - :NEW.SRC_CSN_TS,' ')))) * 86400
+ TO_NUMBER(SUBSTR((:NEW.TGT_REP_TIME - :NEW.SRC_CSN_TS), INSTR((:NEW.TGT_REP_TIME - :NEW.SRC_CSN_TS),' ')+1,2)) * 3600
+ TO_NUMBER(SUBSTR((:NEW.TGT_REP_TIME - :NEW.SRC_CSN_TS), INSTR((:NEW.TGT_REP_TIME - :NEW.SRC_CSN_TS),' ')+4,2) ) * 60
+ TO_NUMBER(SUBSTR((:NEW.TGT_REP_TIME - :NEW.SRC_CSN_TS), INSTR((:NEW.TGT_REP_TIME - :NEW.SRC_CSN_TS),' ')+7,2))
+ TO_NUMBER(SUBSTR((:NEW.TGT_REP_TIME - :NEW.SRC_CSN_TS), INSTR((:NEW.TGT_REP_TIME - :NEW.SRC_CSN_TS),' ')+10,6)) / 1000000
INTO :NEW.TOTAL_LAG
FROM DUAL;
SELECT round((:NEW.TOTAL_LAG/60),1) INTO :NEW.TOTAL_LAG_MIN FROM DUAL;
SELECT SYSTIMESTAMP INTO :NEW.UPDATE_TS FROM DUAL;
END;
/

After all these items are in place, the next thing that has to be done is updating the parameter files.  With any macros, the parameter files have to reference the macro via an INCLUDE statement.  Then the mappings for the SETTINGS table has to be done in each parameter file before restarting the process.  More information on macros can be found here.

The biggest change that has to be be made to existing processes comes in the pump process.  The pump has to be configured to be a PASSTHRU for all tables except the SETTINGS table.  In order to assign tokens to the SETTINGS table, the pump needs to be put in NOPASSTHRU mode.  In order to do this, the pump parameter file needs to be updated with something similar to this approach:

NODYNAMICRESOLUTION
WILDCARDRESOLVE IMMEDIATE
NOPASSTHRU
TABLE <schema>.SETTINGS, TOKENS ( #opshb_info() );

DYNAMICRESOLUTION
WILDCARDRESOLVE DYNAMIC
PASSTHRU
TABLEEXCLUDE <schema>.SETTINGS;

TABLE <schema>.*;

With everything in place and processes restarted, the crontab job should be updating the heartbeat process on a scheduled interval.  By reviewing the STATUS_HB table, it will provide you a quick update on the application heartbeat.  By looking at the STATUS_HB_HIST table, you can get a sense of how much lag is happening within your GoldenGate environment over time without having additional overhead of a separate heartbeat processes.

Enjoy!

about.me: http://about.me/dbasolved

 

 

 


Filed under: Golden Gate
Categories: DBA Blogs

Don’t go directly to Maximum Protection!

The Oracle Instructor - Mon, 2014-08-25 04:14

With a Data Guard Configuration in Maximum Performance protection mode, don’t go to Maximum Protection directly, because that leads to a restart of the primary database:

 Attention!

DGMGRL> show configuration;

Configuration - myconf

  Protection Mode: MaxPerformance
  Databases:
  prima  - Primary database
    physt  - Physical standby database
      physt2 - Physical standby database (receiving current redo)

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS

DGMGRL> edit configuration set protection mode as maxprotection;
Operation requires shutdown of instance "prima" on database "prima"
Shutting down instance "prima"...
Database closed.
Database dismounted.
ORACLE instance shut down.
Operation requires startup of instance "prima" on database "prima"
Starting instance "prima"...
ORACLE instance started.
Database mounted.
Database opened.

Instead, go to Maximum Availability first and then to Maximum Protection:

DGMGRL> edit configuration set protection mode as maxperformance;
Succeeded.
DGMGRL> edit configuration set protection mode as maxavailability;
Succeeded.
DGMGRL> edit configuration set protection mode as maxprotection;
Succeeded.

The demo was done with 12c, involving a cascading standby database, but the behavior is the same in 11g already. The odd thing about it is that DGMGRL will restart the primary without warning. Wanted to share that with the Oracle community for years but always got over it somehow.


Tagged: Data Guard, High Availability
Categories: DBA Blogs

ASM Commands : 2 -- Migrating a DiskGroup to New Disk(s)

Hemant K Chitale - Sun, 2014-08-24 08:52
In 11gR2 Grid Infrastructure and RAC

After the previous demonstration of adding a new DiskGroup, I now demonstrate migrating the DiskGroup to a new pair of disks.

First, I create a table in the Tablespace on that DiskGroup.

[root@node1 ~]# su - oracle
-sh-3.2$ sqlplus

SQL*Plus: Release 11.2.0.2.0 Production on Sun Aug 24 22:17:28 2014

Copyright (c) 1982, 2010, Oracle. All rights reserved.

Enter user-name: hemant/hemant

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options

SQL> create table new_tbs_tbl
2 tablespace new_tbs
3 as select * from dba_objects
4 /

Table created.

SQL> select segment_name, bytes/1048576
2 from user_segments
3 where tablespace_name = 'NEW_TBS'
4 /

SEGMENT_NAME
--------------------------------------------------------------------------------
BYTES/1048576
-------------
NEW_TBS_TBL
9


SQL> select file_name, bytes/1048576
2 from dba_data_files
3 where tablespace_name = 'NEW_TBS'
4 /

FILE_NAME
--------------------------------------------------------------------------------
BYTES/1048576
-------------
+DATA3/racdb/datafile/new_tbs.256.855792859
100


SQL>


Next, I verify that the DiskGroup is currently on disk asmdisk.7 and that the two new disks that I plan to migrate the DiskGroup to are available as asmdisk.8 and asmdisk.9  (yes, unfortunately, they are on /fra, instead of /data1 or /data2 because I have run out of disk space in /data1 and /data2).
This I do from node1 :

SQL> exit
Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options
-sh-3.2$ exit
logout

[root@node1 ~]#
[root@node1 ~]# su - grid
-sh-3.2$ sqlplus

SQL*Plus: Release 11.2.0.2.0 Production on Sun Aug 24 22:22:32 2014

Copyright (c) 1982, 2010, Oracle. All rights reserved.

Enter user-name: / as sysasm

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - Production
With the Real Application Clusters and Automatic Storage Management options

SQL> l
1 select d.name, d.path
2 from v$asm_disk d, v$asm_diskgroup g
3 where d.group_number=g.group_number
4* and g.name = 'DATA3'
SQL> /

NAME
------------------------------
PATH
--------------------------------------------------------------------------------
DATA3_0000
/data1/asmdisk.7


SQL>
SQL> !sh
sh-3.2$ ls -l /fra/asmdisk*
-rwxrwxr-x 1 grid oinstall 1024000000 Aug 24 22:06 /fra/asmdisk.8
-rwxrwxr-x 1 grid oinstall 1024000000 Aug 24 22:07 /fra/asmdisk.9
sh-3.2$


Note how the ownership and permissions are set for the two new disks (see my previous post).

I now add the two new disks.

sh-3.2$ exit
exit

SQL> show parameter power

NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
asm_power_limit integer 1
SQL> alter diskgroup data3 add disk '/fra/asmdisk.8', '/fra/asmdisk.9';

Diskgroup altered.

SQL>
SQL> select * from v$asm_operation;

GROUP_NUMBER OPERA STAT POWER ACTUAL SOFAR EST_WORK EST_RATE
------------ ----- ---- ---------- ---------- ---------- ---------- ----------
EST_MINUTES ERROR_CODE
----------- --------------------------------------------
3 REBAL RUN 1 1 1 101 60
1


SQL>


With ASM_POWERLIMIT set to 1, Oracle ASM automatically starts a REBALANCE operation.  However, since I did *not* drop the existing asmdisk.7, Oracle will still continue to use it.

After a while, I confirm that the REBALANCE has completed.  I can now drop asmdisk.7.  Unfortunately, this will trigger a new REBALANCE !

SQL> l
1* select * from v$asm_operation
SQL> /

no rows selected

SQL>
SQL> l
1 select d.name, d.path
2 from v$asm_disk d, v$asm_diskgroup g
3 where d.group_number=g.group_number
4* and g.name = 'DATA3'
SQL> /

NAME
------------------------------
PATH
--------------------------------------------------------------------------------
DATA3_0000
/data1/asmdisk.7

DATA3_0002
/fra/asmdisk.9

DATA3_0001
/fra/asmdisk.8


SQL>
SQL> alter diskgroup data3 drop disk '/data1/asmdisk.7';
alter diskgroup data3 drop disk '/data1/asmdisk.7'
*
ERROR at line 1:
ORA-15032: not all alterations performed
ORA-15054: disk "/DATA1/ASMDISK.7" does not exist in diskgroup "DATA3"


SQL> alter diskgroup data3 drop disk 'DATA3_0000';

Diskgroup altered.

SQL>
SQL> select * from v$asm_operation;

GROUP_NUMBER OPERA STAT POWER ACTUAL SOFAR EST_WORK EST_RATE
------------ ----- ---- ---------- ---------- ---------- ---------- ----------
EST_MINUTES ERROR_CODE
----------- --------------------------------------------
3 REBAL RUN 1 1 2 102 120
0


SQL>
SQL> l
1* select * from v$asm_operation
SQL>
SQL> /

GROUP_NUMBER OPERA STAT POWER ACTUAL SOFAR EST_WORK EST_RATE
------------ ----- ---- ---------- ---------- ---------- ---------- ----------
EST_MINUTES ERROR_CODE
----------- --------------------------------------------
3 REBAL RUN 1 1 47 101 95
0


SQL> /

no rows selected

SQL>


NOTE : Note how I must specify the Disk NAME (not the PATH) for the DROP. When I added disks asmdisk.8 and asmdisk.9, I could have given then meaningful names as well. Oracle has automatically named them.

Ideally, what I should have done is to use the ADD and DROP command together.  That way, I would have a single-pass REBALANCE required.

After a while, I run my validation queries on node2.


[root@node2 ~]# su - grid
-sh-3.2$ sqlplus

SQL*Plus: Release 11.2.0.2.0 Production on Sun Aug 24 22:42:39 2014

Copyright (c) 1982, 2010, Oracle. All rights reserved.

Enter user-name: / as sysasm

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - Production
With the Real Application Clusters and Automatic Storage Management options

SQL> select d.name, d.path
from v$asm_disk d, v$asm_diskgroup g
where d.group_number=g.group_number
and g.name = 'DATA3' 2 3 4
5
SQL> l
1 select d.name, d.path
2 from v$asm_disk d, v$asm_diskgroup g
3 where d.group_number=g.group_number
4* and g.name = 'DATA3'
SQL> /

NAME
------------------------------
PATH
--------------------------------------------------------------------------------
DATA3_0002
/fra/asmdisk.9

DATA3_0001
/fra/asmdisk.8


SQL>
SQL> exit
Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - Production
With the Real Application Clusters and Automatic Storage Management options
-sh-3.2$ exit
logout

[root@node2 ~]# su - oracle
-sh-3.2$
-sh-3.2$ sqlplus

SQL*Plus: Release 11.2.0.2.0 Production on Sun Aug 24 22:44:10 2014

Copyright (c) 1982, 2010, Oracle. All rights reserved.

Enter user-name: hemant/hemant

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options

SQL> select count(*) from new_tbs_tbl;

COUNT(*)
----------
72460

SQL>


I have now accessed the table, tablespace, diskgroup and disks from node2 successfully. Disk asmdisk.7 is no longer part of the DiskGroup.

I can physically remove disk asmdisk7 from the storage.


SQL> exit
Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - Production
With the Real Application Clusters and Automatic Storage Management options
-sh-3.2$ exit
logout

[root@node1 ~]# cd /data1
[root@node1 data1]# ls -l asmdisk.7
-rwxrwxr-x 1 grid oinstall 2048000000 Aug 24 22:39 asmdisk.7
[root@node1 data1]# rm asmdisk.7
rm: remove regular file `asmdisk.7'? y
[root@node1 data1]#
[root@node1 data1]# su - grid
-sh-3.2$ sqlplus

SQL*Plus: Release 11.2.0.2.0 Production on Sun Aug 24 22:50:18 2014

Copyright (c) 1982, 2010, Oracle. All rights reserved.

Enter user-name: / as sysasm

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - Production
With the Real Application Clusters and Automatic Storage Management options

SQL> set pages 60
SQL> col name format a15
SQL> col path format a20
SQL> select group_number, name, path
2 from v$asm_disk
3 order by 1,2;

GROUP_NUMBER NAME PATH
------------ --------------- --------------------
0 /crs/voting.disk
0 /data1/votedisk.1
0 /data2/votedisk.2
0 /fra/votedisk.3
1 DATA1_0000 /data1/asmdisk.1
1 DATA1_0001 /data2/asmdisk.4
2 DATA2_0000 /data1/asmdisk.2
2 DATA2_0001 /data2/asmdisk.5
2 DATA2_0002 /data2/asmdisk.6
3 DATA3_0001 /fra/asmdisk.8
3 DATA3_0002 /fra/asmdisk.9
4 DATA_0000 /crs/ocr.configurati
on

5 FRA_0000 /fra/fradisk.3
5 FRA_0001 /fra/fradisk.2
5 FRA_0002 /fra/fradisk.1
5 FRA_0003 /fra/fradisk.4

16 rows selected.

SQL>

The disk asmdisk.7 is no longer part of the storage. (Remember : All my disks here are on NFS).
.
.
.
Categories: DBA Blogs

Commit Puzzle

Bobby Durrett's DBA Blog - Fri, 2014-08-22 11:54

This graph represents commit time compared to CPU utilization and redo log write time.  I’ve included only the hourly intervals with more than 1,000,000 commits.  At these peaks the number of commits ranges 1 to 1.6 million commits per hour so each point on the graph represents roughly the same commit rate.  I’m puzzled by why the commit time bounces around peaking above 5 milliseconds when I can’t see any peaks in I/O or CPU that correspond to the commit time peaks.

commitvscpuio

I derived CPU% from DBA_HIST_OSSTAT.  I got the other values by getting wait events from DBA_HIST_SYSTEM_EVENT.  Commit time is log file sync wait time.  Redo write time is log file parallel write wait time.  I converted the wait times to milliseconds so they fit nicely on the chart with CPU%.

I thought I would pass this along as a puzzle that I haven’t figured out.

Here is a zip of the script I used to get the data, its raw output, and the spreadsheet I used to make the chart: zip

– Bobby

P.S.  This is on HP-UX 11.31, Itanium, Oracle 11.2.0.3

P.P.S  Did some more work on this today.  Looks like the high commit time periods have short spikes of long redo log writes even though the average over the hour is still low.  I’m looking at DBA_HIST_SYSTEM_EVENT to get a histogram of the log file parallel write waits and there are a number in the 1024 bucket when the log file sync time is high on average.

END_INTERVAL_TIME   LFPW_MILLI LFPW_COUNT AVG_COMMIT_MS AVG_WRITE_MS
------------------- ---------- ---------- ------------- ------------
21-AUG-14 11.00 AM           1     268136    9.14914833   2.45438987
21-AUG-14 11.00 AM           2     453913    9.14914833   2.45438987
21-AUG-14 11.00 AM           4     168370    9.14914833   2.45438987
21-AUG-14 11.00 AM           8      24436    9.14914833   2.45438987
21-AUG-14 11.00 AM          16       5675    9.14914833   2.45438987
21-AUG-14 11.00 AM          32       6122    9.14914833   2.45438987
21-AUG-14 11.00 AM          64       3369    9.14914833   2.45438987
21-AUG-14 11.00 AM         128       2198    9.14914833   2.45438987
21-AUG-14 11.00 AM         256       1009    9.14914833   2.45438987
21-AUG-14 11.00 AM         512        236    9.14914833   2.45438987
21-AUG-14 11.00 AM        1024         19    9.14914833   2.45438987
21-AUG-14 11.00 AM        2048          0    9.14914833   2.45438987
21-AUG-14 02.00 PM           1     522165    2.97787777   1.64840599
21-AUG-14 02.00 PM           2     462917    2.97787777   1.64840599
21-AUG-14 02.00 PM           4     142612    2.97787777   1.64840599
21-AUG-14 02.00 PM           8      17014    2.97787777   1.64840599
21-AUG-14 02.00 PM          16       4656    2.97787777   1.64840599
21-AUG-14 02.00 PM          32       5241    2.97787777   1.64840599
21-AUG-14 02.00 PM          64       1882    2.97787777   1.64840599
21-AUG-14 02.00 PM         128        820    2.97787777   1.64840599
21-AUG-14 02.00 PM         256        149    2.97787777   1.64840599
21-AUG-14 02.00 PM         512         10    2.97787777   1.64840599
21-AUG-14 02.00 PM        1024          2    2.97787777   1.64840599
21-AUG-14 02.00 PM        2048          0    2.97787777   1.64840599

There were 19 waits over half a second in the first hour and only 2 in the second hour.  Maybe all the log file sync waits pile up waiting for those long writes.  Here is a graph that compares the number of waits over half a second – the 1024 ms bucket – to the average log file sync and log file parallel write times for the hour:

longwrites

You can see that the average redo write time goes up a little but the commit time goes up more.  Maybe commit time is more affected by a few long spikes than by a lot of slightly longer write times.

Found a cool blog post that seems to explain exactly what we are seeing: blog post

 

Categories: DBA Blogs

Log Buffer #385, A Carnival of the Vanities for DBAs

Pythian Group - Fri, 2014-08-22 08:00

This Log Buffer edition combs through top notch blog posts from Oracle, MySQL and SQL Server postings around the globe.

Oracle:

You want to test the Oracle Java Cloud? You can get your own 30 day test account & instance. Or you can get a Java account with our permanent test system.

Some ramblings about Oracle R Distribution 3.1.1.

Scott is demystifying Oracle Unpivot.

Java 8 for Tablets, Pis, and Legos at Silicon Valley JUG – 8/20/2014

A new version of Oracle BPM Suite 11.1.1.7 with Adaptive Case Management (ACM) is now available.

SQL Server:

Data Mining: Part 14 Export DMX results with Integration Services

Should you be planning to move from Exchange to Office 365? If so, why?

Stairway to T-SQL DML Level 12: Using the MERGE Statement

From SQL Server Management Studio it’s hard to look through the first few rows of a whole lot of tables in a database.

Special Characters can lead to many problems. Identifying and reporting on them can save a lot of headache down the road.

MySQL:

MariaDB Galera Cluster 5.5.39 now available

A closer look at the MySQL ibdata1 disk space issue and big tables

How-To: Guide to Database Migration from MS Access using MySQL Workbench

Using resource monitoring to avoid user service overload

How to use MySQL Global Transaction IDs (GTIDs) in production

Categories: DBA Blogs

Partner Webcast - Oracle Data Integration Competency Center (DICC): A Niche Market for services

Market success now depends on data integration speed. This is why we collected all best practices from the most advanced IT leaders, simply to prove that a Data Integration competency center should...

We share our skills to maximize your revenue!
Categories: DBA Blogs

Microsoft PowerBI: News from WPC

Pythian Group - Wed, 2014-08-20 14:24

During the Worldwide Partner Conference (WPC) that happened last month, Microsoft made public some of the new functionalities that we can expect to see soon on Microsoft PowerBI.

If you were on another planet in the last few months, and didn’t heard about PowerBI for Office 365, I’ve included a list of the set of tools that comprise Microsoft BI stack in the cloud. Keep in mind, they are different tools and not a single product, each tool serving a different purpose. At the date of writing of this article the tools that comprise the entire solutions are:

  • Power Pivot
  • Power View
  • Power Query
  • Power Map
  • Power Q&A
  • Power BI Windows Store App

Some of those were already available as an Excel add-in or built-in as part of the Excel product for a long time and now had being re-packed on this cloud solution, while others, like the Power Q&A are a cloud-only solution.

So, what are the big news from the WPC 2014? During the conference, we watched a demonstration of what to expect for the coming months, so today I’ll discuss the ones I believe were the most important. If you want to see the entire presentation you can find it here.

PowerBI-Image1

New visualizations!

One of the key issues we were facing with the current version of Power BI, was the lack of some visualizations the customer always asks for, the main one being the gauge. It is incredible how popular those little gauges have become with time – decision makers love it, so it’s really good to see they are finally available in Power BI.

Besides the gauge, we can see in the image above taken from the WPC demonstration, other data visualizations like the radar chart and the treemap.

Edition capabilities in the browser

Another important thing that was announced was the possibility to edit the dashboard and change the data visualizations on the browser, without using Excel. It doesn’t seem like much, but this can be very important, and indicate a shift in Microsoft behaviour. If you look at all the Power BI functionalities, and in fact, at the entire BI stack, Excel was always the central tool, you needed Excel to basically everything. And now we are seeing some nice features that you can manage and control inside the browser. Let’s wait for the next steps.

Important to mention that everything is HTML5 instead of Silverlight, meaning we can have the same experience in basically any device.

Partner Solution Pack

If I was asked to name just a single announcement that was made that could drastically change the market, it would be this one.

Partner Solution Pack is the ability for the Microsoft partners to create a bundled BI solution including the data, connectivity to the data sources and all the interactive reports. That means that we can as a user buy a solution pack from Salesforce for instance, connect it with our Salesforce account and it would automatically create a fully interactive dashboard with our own data.

Now, imagine the other way around: you, as a Microsoft Partner now has the ability to create a complete BI solution to your customers and make it available on the cloud. And your customers can buy this package, connect it with their data and make use of the solution in a manner of seconds.

The Partner Solution Pack in my opinion will create a huge market for all Microsoft partners and provide us, the users, with tons of good packages and the ability to have a BI solution paying much less than what would cost creating everything from scratch.

PowerBI-Image2 PowerBI-Image3

But you may tell me that we have other tools in the market that can do this, connect on partner applications and build a dashboard, what would be the advantage of using Power BI over the existing tools?

The biggest advantage is the PowerBI Q&A, as you can see in the screenshot in every screen we have a simple search box at the top of the page, which allows the user to do simple natural language questions to query the data. So, if the user wants to know the “opportunity size by month and by industry” all you have to do is ask and PowerBI will find the data, and choose the best visualization method for you. After that, you can just pin this report in the dashboard and that’s it, now you can keep track of this important information on a daily basis. Without requiring a single line of code, without asking for a change request to the IT department and going to a huge queue of requests that would take months to be addressed.

PowerBI-Image4

I hope that in this article I was able to show you the potential this new functionalities can bring to your company. If you need more information about PowerBI, or if you’re as excited as I am with it and want to start using it right away, just contact us and our team will be glad to work with your company to either develop a BI solution that consumes your data, or to plan the development of your own Partner Solution Pack so you can offer your customers a complete BI solution using PowerBI for Office 365.

Click here to watch the entire WPC announcement and see the PowerBI reports in action. The PowerBI demonstration starts at 21:10 minutes.

 

Categories: DBA Blogs

Commit scalability

Bobby Durrett's DBA Blog - Wed, 2014-08-20 13:57

I am learning about how well commits scale on Oracle 11.2 and so far they seem to scale surprisingly well.

I’ve looked at two waits – log file parallel write and log file sync.  Based on documents I’ve read on Oracle’s support site log file parallel write represents the time it takes to do one write to the redo logs.  For mirrored redo logs the log file parallel write time includes the time to write to both of the copies.  Log file sync represents the time it takes for a session to complete a commit and should include all the time measured for the write to the redo logs and added CPU time to process the commit.  So, the log file sync time should equal or exceed the log file parallel write time.

Looking at AWR data I found that at peak times one of our databases had 1 millisecond log file parallel write waits and about 1.2 million waits per hour.  Since there are 3.6 million milliseconds in an hour it seemed to me that during this peak hour the redo logs were about 33% utilized because writes to the redo logs were occurring during 1.2 million of the available 3.6 million milliseconds.  I decided to look at a simple queuing theory model that I had read about in Craig Shallahamer’s Forecasting Oracle Performance book to get a basic idea of how queuing might impact redo log write time as the utilization of the redo log grew closer to 100%.

scalability

This model predicts that the redo log write time will go through the roof as the number of writes per hour approaches 3.6 million, assuming a constant 1 millisecond write time.

To attempt to confirm the predictions made by this graph I decided to build some scripts that will run a bunch of commits on a test database and attempt to max out the writes to the redo log so I could graph the results and compare it to the theoretical model.  In the test I had twenty tables named TEST1, TEST2,…, TEST20.  These tables have one row and one column.  I ran a test of 1, 2, 5, 10, and 20 updating processes that I designed to generate a bunch of commits quickly.  Each process ran 100,000 updates and commits like these:

update test1 set a=a+1;
commit;
update test1 set a=a+1;
commit;
update test1 set a=a+1;
commit;

Each process had its own table – i.e. process 15 acted on table TEST15.

My hope was that as I ran tests with increasing numbers of processes running in parallel eventually I would max out the writes to the redo log and see increasing log file parallel write wait times.  But, surprisingly, as the commit rate increased the redo log write rate actually went down.

commitswrites

This is just one test, but it makes me wonder if I can max out the writes to the redo log.  I believe that as the commit rate increases the database batches the commits together in some efficient way which makes commits more scalable than I realized.

I think that an Oracle database must have some limit to commit scalability that relates more to the CPU used to process the commits instead of the writes to the redo logs.  In these same tests the log file sync or commit time did increase slowly as the number of commits ramped up.

committime

It started around half a millisecond at 3 million commits per hour and slowly grew to almost 2 milliseconds at 10 million commits per hour.  So, commit time grew, but nothing like the original response time graph which went through the roof at around 3.6 million writes to the redo logs per hour.

Here is a zip of the scripts I used to generate the data, the spreadsheets I used to make the charts, and the results from the tests: zip

To run the tests yourself enter your own username and password at the top of setup.sql, commits.sql, and report.sql.  Run one of the five reports – run1.sh, run2.sh, run5.sh, run10.sh or run20.sh like this:

./run5.sh

Wait for the last process to finish outputting then run the report:

./report.sh

Results look like this:

WAIT_EVENT              ELAPSED_SECS WAITS_PER_HOUR WAITS_PER_SECOND AVG_WAIT_MS AVG_WAIT_SEC
----------------------- ------------ -------------- ---------------- ----------- ------------
log file parallel write          542     3008922.51       835.811808  .648076577   .000648077
log file sync                    542     5306207.38       1473.94649   1.1727226   .001172723

I’m really just getting started understanding how commits scale, but it was surprising to me how hard it was to get the rate of redo log writes high enough to cause the  write time to increase due to queuing.  I assume this is because the database batches commits together more efficiently that I expected, which makes commits more scalable than I realized.

– Bobby

P.S.  This is on HP-UX 11.31, Itanium, Oracle 11.2.0.3.0

Categories: DBA Blogs

SSAS Database Doesn’t Show Up in SharePoint 2013 Dashboard Designer

Pythian Group - Wed, 2014-08-20 07:43

Howdy everyone,

Just a quick tip for everyone that is struggling to configure SharePoint Server 2013 PerformancePoint  to connect to a SQL Analysis Services 2012 or 2014 cube.

After a new SharePoint Server 2013 installation, I have tried to create a new connection to my Analysis Services cube through the SharePoint Dashboard Designer, but no matter what, the Database option always shows up as empty and I can select my Analysis Services database.

DashboardDesigner

 

In the Windows server log event I could find the following message:

The data source provider for data sources of type ‘ADOMD.NET’ is not registered. Please contact an administrator.

PerformancePoint Services error code 10115.

The reason you would receive this error message, believe it or not, is because even if you are using SQL Server 2012 or newer, and SharePoint Server 2013, it will try to load the SQL Server 2008 version of the ADMD.NET dll.

If you install the SQL Server 2008 R2 ADMD.NET component, that you can download from the following location: http://www.microsoft.com/en-us/download/details.aspx?id=16978 , and restart IIS you will fix this issue and will be able to successfully connect to your SQL Server 2012/2014 Analysis Services database.

Hope this helps.

 

Categories: DBA Blogs

SQL Server Error: 18056, Severity: 20, State: 29

Pythian Group - Wed, 2014-08-20 07:40

Howdy everyone,

One of the most frequent error messages I come across when dealing with SQL Server, particularly SQL Server 2008 R2, is Error 18056, Severity: 20, State: 29. Not only do I constantly see this error message in the SQL logs, but also in the Microsoft community forums. I often see inquiries likes, “I have SQL Server 2008 R2 SP2 installed and I see the error below. I have the most recent Service Pack installed, but the problem remains.”

Message
Error: 18056, Severity: 20, State: 29.
Message
The client was unable to reuse a session with SPID XXXX, which had been reset for connection pooling. The failure ID is 29. This error may have been caused by an earlier operation failing. Check the error logs for failed operations immediately before this error message.

Now, this error message can be triggered by many things. For instance, lack of user permissions in any database, or in the worst case, lack of resources in the server that made it stop accepting new connection requests. However, in any of those cases you would probably have a lot of angry users calling you and complaining that they can’t do their work. If that’s not the case, you are probably just seeing a generic and internal error message that shouldn’t be in the SQL error log in the first place.

The first step to addressing this issue, or simply investigating if it’s something more serious, is to update to the latest Cumulative Update.

Microsoft released a fix that makes SQL Server stop logging this generic error message in some cases, when it’s not important for you to investigate the issue. This fix is documented in the following Knowledge Base article: http://support.microsoft.com/kb/2543687?wa=wsignin1.0

I hope this help you to get rid of this error message.

 

Categories: DBA Blogs