Skip navigation.

Pythian Group

Syndicate content
Love Your Data
Updated: 17 hours 48 min ago

MySQL on FreeBSD: old genes

Thu, 2016-03-03 10:02

Maintaining mission critical databases on our pitchfork wielding brother, the “Daemon” of FreeBSD, seems quite daunting, or even absurd, from the perspective of a die-hard Linux expert, or from someone who has not touched it in a long time. The question we ask when we see FreeBSD these days is “why?”.  Most of my own experience with FreeBSD was obtained 10-15 years ago.  Back then, in the view of the team I was working on, a custom compiled-from-source operating system like FreeBSD 5.x or 6.x was superior to a Linux binary release.

Package managers like YUM and APT were not as good.  They did not always perform MD5 checks and use SSL like today’s versions. RedHat wasn’t releasing security updates 5 minutes after a vulnerability was discovered. Ubuntu didn’t exist. Debian stable would get so very old before receiving a new version upgrade. FreeBSD was a great choice for a maintainable, secure, free open source UNIX-like OS with tight source control and frequent updates.

Most people do not understand why FreeBSD remains a great choice for security and stability. The main reason is that the entire source of the base OS and the kernel (not just the kernel) are tightly maintained and tested as a whole, monolithic, distribution.

FreeBSD 10.2 is different than versions I worked on many years ago, in a good way, at least from the standpoint of getting started. First, “pkg” has gotten quite an overhaul, making installing packages on FreeBSD as easy as with YUM or APT.  portsnap and portmaster make port upgrades much easier than they used to be. freebsd-update can take care of wholesale updates of the operating system from trusted binary sources without having to “build the world”. These are welcome changes; ones that make it easier to get to production with FreeBSD, and certainly made the task of rapidly building and updating a couple of “lab” virtual machines easier.

In my effort to get re-acquainted with FreeBSD, I hit some snags. However, once I was finished with this exercise, FreeBSD had re-established itself in my mind as a decent flavor to host a mission critical database on. Open Source enthusiasts should consider embracing it without (much) hesitation. Is there some unfamiliar territory for those who only use MySQL on MacOS and Linux? Sure. But it is important to remember that BSD is one of the oldest UNIX like operating systems. The OSS world owes much heritage to it. It is quite stable and boring, perhaps even comfortable in its own way.

Problem 1: forcing older versions of MySQL

I needed to install MySQL 5.5 first, in order to test a mysql upgrade on FreeBSD.  However, when installing percona-toolkit either via “pkg install” (binary) or /usr/ports (source), the later 5.6 version of the mysql client would inevitably be installed as a dependency. After that point, anything relating to MySQL 5.5 would conflict with the 5.6 client. If I installed in the opposite order, server first, percona-toolkit second, the percona-toolkit installation would ask me if it is OK to go ahead and upgrade both server and client to 5.6.

TIP: don’t forget make.conf

my /etc/make.conf:

Once I added MYSQL_DEFAULT into make.conf, the installations for MySQL 5.5 became seamless. Note: if you want another flavor of MySQL server such as Percona Server, install the server “pkg install percona55-server” prior to “pkg install percona-toolkit” so that the client dependencies are met prior to installation.

Problem 2: Some tools don’t work

pt-diskstats does not work, because it reads from /proc/diskstats, which does not exist on FreeBSD. Other favorites like htop don’t work right out of the box. So far I have had good luck with the rest of the Percona toolkit besides pt-diskstats, but here’s how you get around the htop issue (and perhaps others).

TIP: Get the linux /proc mounted

dynamic commands:
# kldload linux
# mkdir -p /compat/linux/proc
# mount -t linprocfs linproc /compat/linux/proc

to make permanent:
# vi /boot/loader.conf (and add the following line)
# vi /etc/fstab (and add the following line)
linproc /compat/linux/proc linprocfs rw 0 0

As you may have determined, these commands ensure that the linux compatibility kernel module is loaded into the kernel, and that the linux style /proc is mounted in a different location than you might be used to “/compat/linux/proc”. The FreeBSD /proc may also be mounted.

Problem 3: I want bash

# pkg install bash
… and once that’s done
# pw user mod root -s /usr/local/bin/bash
…and repeat ^^ for each user you would like to switch. It even comes with a prompt that looks like CentOS/RHEL.
[root@js-bsd1 ~]#

Problem 4: I can’t find stuff

BSD init is much simpler than SysV and upstart init frameworks so your typical places to look for start files are /etc/rc.d and /usr/local/etc/rc.d. To make things start on boot, it’s inevitably a line in /etc/rc.conf.

In our case, for MySQL, our start file is /usr/local/etc/rc.d/mysql-server. To have MySQL start on boot, your rc.conf line is:

If you do not wish to make MySQL start on boot, you may simply say "/usr/local/etc/rc.d/mysql-server onestart"

Notes on binary replacement

Please note, just like in the Linux world, MariaDB and Percona Server are drop in replacements for MySQL so, the startfiles and enable syntax does not change. Your default location for my.cnf is /etc/my.cnf just like in the rest of the known universe.

This command lists all installed packages.
pkg info -a

use pkg remove and pkg install to add new versions of your favorite mysql software.

I ran into no greater issues with pkg than I would with yum or apt doing binary removals and installations, and no issues at all with mysql_upgrade. Remember: If you had to alter make.conf like I did earlier, remember to update it to reflect versions you want to install.

For those who like ZFS, the FreeBSD handbook has a very detailed chapter on this topic. I for one like plain old UFS. It might be the oldest filesytem that supports snapshots and can be implemented very simplistically for those who like low overhead.

Happy tinkering with FreeBSD and MySQL, and thanks for reading!

Categories: DBA Blogs

Sydney Gets New AWS Availability Zone

Thu, 2016-03-03 09:47

On a scorching November day in 2012, Sydneysiders were bracing themselves for yet another heat wave when all of a sudden they became pleasantly surprised as an elastic cloud occupied the tech skies. On November 12, 2012, Amazon announced  the New Asia Pacific (Sydney) Region in Australia.

Before that, Australian customers had to reach out to Japan or Singapore for their cloud needs. That was not really feasible, as it increased up-front expenses, long-term commitments, and scaling challenges. Amazon recognized that and Sydney became another region in the world.

They have now taken it a step further. They have rendered a new Availability Zone (AZ) in Sydney. Availability zone (AZ) is basically an isolated location within data centre regions from which public cloud services originate and operate.

The new availability zone is ap-southeast-2c. This is all set to provide enhanced performance and sociability to Australian customers. This will enable them to fully leverage the potential of technologies like Lambda, the Elastic File System shared filesystem, and Amazon RDS for MS SQL Server.

Pythian’s established presence in Australia and New Zealand coupled with round the clock and world class support for AWS, SQL Server, and other cloud technologies, enables it to support Australian and New Zealand customers from the word go.

Categories: DBA Blogs

SQL Injection with MySQL SLEEP()

Wed, 2016-03-02 11:40

Recently we’ve received an alert from one of our clients that running threads are high on one of their servers. Once we logged in, we noticed that all the selects were waiting for table level read lock. We scrolled through the process list, and found the selects which were causing the problems. After killing it, everything went back to normal.
At first we couldn’t understand why the query took so long, as it looked like all the others. Then we noticed, that one of the WHERE clauses was strange. There, we found a SLEEP(3) attached with OR to the query. Obviously, this server was the victim of a SQL injection attack.

What is SQL injection?

I think most of us know what SQL injection is, but as a refresher, SQL injection is when someone provides malicious input into WHERE, to run their own statements as well.
Typically this occurs when you ask a user for input, like username, but instead of a real name they give you a MySQL statement that will be run by your server without you knowing it.
Exploits of a Mom
Based on the picture, let’s see a few examples.
We have a simple table:

mysql> describe post;
| Field | Type             | Null | Key | Default | Extra          |
| id    | int(10) unsigned | NO   | PRI | NULL    | auto_increment |
| test  | varchar(127)     | YES  |     | NULL    |                |
2 rows in set (0.00 sec)

mysql> select * from post;
| id | test   |
|  1 | text1  |
|  2 | text2  |
|  3 | text3  |
|  4 | text4  |
|  5 | text5  |
|  6 | text6  |
|  7 | text7  |
|  8 | text8  |
|  9 | text9  |
| 10 | text10 |
10 rows in set (0.00 sec)

Lets run a select with LIKE, which we know for sure won’t have a match:

mysql> select * from post where test like '%nomatch%';
Empty set (0.00 sec)

But what, happens if we don’t filter the inputs and someone wants to get all the data?
mysql> select * from post where test like '%nomatch ' || '1==1' && '1%';
| id | test   |
|  1 | text1  |
|  2 | text2  |
|  3 | text3  |
|  4 | text4  |
|  5 | text5  |
|  6 | text6  |
|  7 | text7  |
|  8 | text8  |
|  9 | text9  |
| 10 | text10 |
10 rows in set, 2 warnings (0.00 sec)

That was a very mild injection, but it could be much more malicious: we could drop another table!

mysql> show tables;
| Tables_in_injecttest |
| game                 |
| post                 |
2 rows in set (0.01 sec)

mysql> select * from post where test like '%nomatch'; drop table game;-- %';
Empty set (0.00 sec)

Query OK, 0 rows affected (0.28 sec)

mysql> show tables;
| Tables_in_inject_test |
| post                  |
1 row in set (0.00 sec)


If we don’t know the name of the table, we can still cause trouble by blocking access to the database
If we insert SLEEP() in the WHERE part, then it will be executed for every matching row… if we inject it like: “OR SLEEP(n)”, it will be executed to every row in the table!
Okay, this will be “just” a long running select. It shouldn’t cause much trouble thanks to InnoDB and transaction isolation, unless something needs a table lock.

Some common examples of what causes table locks are:

  • explicit lock table
  • insert/update/delete on MyISAM
  • ALTER table on InnoDB

Once statements start waiting for lock on the table, all proceeding selects will wait for the previous locking statement to finish

Terminal 1:
mysql> select * from post where test like '%nomatch ' OR sleep(300) AND '1%';
Terminal 2:
mysql> alter table post engine=innodb;
Terminal 3:
mysql> select SQL_NO_CACHE count(*) from post;
| Id       | User                 | Host      | db                 | Command | Time  | State                           | Info                                                                  |
| 17170817 | root                 | localhost | janitest           | Query   |    19 | User sleep                      | select * from post where test like '%nomatch ' OR sleep(300) AND '1%' |
| 17170918 | root                 | localhost | janitest           | Query   |    11 | Waiting for table metadata lock | alter table post engine=innodb                                        |
| 17170957 | root                 | localhost | janitest           | Query   |     4 | Waiting for table metadata lock | select * from post                                                    |
3 rows in set (0.00 sec)

As we see in the example, ALTER table will wait until it can get a lock on post table, and this blocks every other select from now on to the table.
Or, if you are using MyISAM table, a simple update/insert will block access to the table, because it needs table level lock during them.

How can we defend ourselves from SQL injection?

There are several ways to secure yourself from SQL injection.

  • First of all, validate the input. If you expect only letters and numbers, filter it with regexp for example, to make sure there are no special characters there. Also escape the inputs on application side; programming languages have built-in function to do that (eg.: mysql_real_escape_string() in PHP)
  • Use prepared statement! It won’t allow 2 clause if you specified only 1. When you use prepared statements, the variables are transmitted as MySQL variables. Even if the string is not escaped, it will end up in one variable, and MySQL treats is as a longer string.
    (For more details see: )
  • Use a tool like MySQL Enterprise Firewall, which is a plugin for MySQL and can filter your statements to make sure there are no things like: || 1==1

I would like to start a little talk about this, so if you encountered SQL injection before, would you share it with us, how they did it, or in general how do you prevent SQL injections in your application?


Categories: DBA Blogs

Log Buffer #463: A Carnival of the Vanities for DBAs

Tue, 2016-03-01 15:27

As the winter in the Northern hemisphere is giving way to spring, slowly but surely, blog posts are blooming in the gardens of Oracle, SQL Server and MySQL. This Log Buffer plucks some of them for your reading pleasure.


Providing A Persistent Data Volume to EMC XtremIO Using ClusterHQ Flocker, Docker And Marathon

There is sliced bread in SQL.

Oracle Cloud – Your service is suspended due to exceeding resource quota !

EM12c Compliance ‘Required Data Available’ flag – Understanding and Troubleshooting

How can I see my invisible columns

SQL Server:

Auto Generate Your Database Documentation

A Lightweight, Self-adjusting, Baseline-less Data Monitor

Keeping POST and GET Separated

How often should I run DBCC CHECKDB?

Disabling SQL Server Optimizer Rules with QUERYRULEOFF


MySQL Contributions status

Planets9s: Building scalable database infrastructures with MariaDB & HAProxy

High availability with asynchronous replication… and transparent R/W split

mysql_real_connect is not thread safe

Now available in swanhart-tools: NATIVE asynchronous query execution for any MySQL client!

Categories: DBA Blogs

GoldenGate 12.2 Big Data Adapters: part 1 – HDFS

Mon, 2016-02-29 10:27

December 2015 brought us a new version of GoldenGate, and a new version for Big Data adapters for the GoldenGate. Let’s have a look at what we have now and how it works. I am going to start from the HDFS adapter.

As a first step, we need to prepare our source database for replication. It becomes easier with every new GoldenGate version. We will need to perform several steps:
a) Enable archive logging on our database. This particular step requires downtime.

orcl> alter database mount;

Database altered.

orcl> alter database archivelog;

Database altered.

orcl> alter database open;

b) Enable force logging and minimal supplemental logging. No need to shutdown database for this.

orcl> alter database add supplemental log data;

Database altered.

orcl> alter database force logging;

Database altered.

orcl> SELECT supplemental_log_data_min, force_logging FROM v$database;

-------- ---------------------------------------

c) Switch parameter “enable_goldengate_replication” to “TRUE”. Can be done online.

orcl> alter system set enable_goldengate_replication=true sid='*' scope=both;

System altered.


And we are almost done. Now we can create a schema for a GoldenGate administrator, and provide required privileges. I’ve just granted DBA role to the user to simplify process. In any case you will need it in case of integrated capture. For a production installation I advise you to have a look at the documentation to verify necessary privileges and roles.

orcl> create user ogg identified by welcome1 default tablespace users temporary tablespace temp;
orcl> grant connect, dba to ogg;

Let’s create a test schema to be replicated. We will call it schema on the source as ggtest and I will name the destination schema as bdtest. It will allow us also to check how the mapping works in our replication.

orcl> create tablespace ggtest; -- optional step 
orcl> create user ggtest identified by welcome1 default tablespace ggtest temporary tablespace temp;
orcl> grant connect, resource to ggtest;

Everything is ready on our source database for the replication.
Now we are installing Oracle GoledenGate for Oracle to our database server. We can get the software from the Oracle site on the download page in the Middleware section, GoldenGate, Oracle GoldenGate for Oracle databases. We are going to use version of the software. The installation is easy – you need to unzip the software and run Installer which will guide you through couple of simple steps. The installer will unpack the software to the destination location, create subdirectories, and register GoldenGate in the Oracle global registry.

[oracle@sandbox distr]$ unzip
[oracle@sandbox distr]$ cd fbo_ggs_Linux_x64_shiphome/Disk1/
[oracle@sandbox Disk1]$ ./runInstaller

We continue by setting up parameters for Oracle GoldenGate (OGG) manager and starting it up. You can see that I’ve used a default blowfish encryption for the password. In a production environment you may consider another encryption like AES256. I’ve also used a non-default port for the manager since I have more than one GoldenGate installation on my test sandbox.

[oracle@sandbox ~]$ export OGG_HOME=/u01/oggora
[oracle@sandbox ~]$ cd $OGG_HOME
[oracle@sandbox oggora]$ ./ggsci

GGSCI (sandbox.localdomain) 1> encrypt password welcome1 BLOWFISH ENCRYPTKEY DEFAULT
Using Blowfish encryption with DEFAULT key.
Algorithm used:  BLOWFISH

GGSCI (sandbox.localdomain) 2> edit params mgr
PORT 7829
purgeoldextracts /u01/oggora/dirdat/*, usecheckpoints

GGSCI (sandbox.localdomain) 3> start manager
Manager started.

Let’s prepare everything for initial load, and later online replication.
I’ve decided to use GoldenGate initial load extract as the way for initial load for the sake of consistency for the resulted dataset on Hadoop.
I prepared the parameter file to replicate my ggtest schema and upload all data to the trail file on remote site. I’ve used a minimum number of options for all my processes, providing only handful of parameters required for replication. Extract options is a subject deserving a dedicated blog post. Here is my simple initial extract:

[oracle@sandbox oggora]$ cat /u01/oggora/dirprm/ini_ext.prm
RMTHOST sandbox, MGRPORT 7839
RMTFILE /u01/oggbd/dirdat/initld, MEGABYTES 2, PURGE
--DDL include objname ggtest.*
TABLE ggtest.*;

Then we run the initial load extract in passive node and it will create a trail file with the data. The trail file will be used later for our initial load on the target side.

[oracle@sandbox oggora]$ ./extract paramfile dirprm/ini_ext.prm reportfile dirrpt/ini_ext.rpt
[oracle@sandbox oggora]$ ll /u01/oggbd/dirdat/initld*
-rw-r-----. 1 oracle oinstall 3028 Feb 16 14:17 /u01/oggbd/dirdat/initld
[oracle@sandbox oggora]$

We can also prepare our extract on the source site as well. I haven’t used datapump in my configuration limiting the topology only by simplest and strait-forward extract to replicat configuration. Of course, in any production configuration I would advise using datapump on source for staging our data.
Here are my extract parameters, and how I added it. I am not starting it yet because I must have an Oracle GoldenGate Manager running on the target, and the directory for the trail file should be created. You may have guessed that the Big Data GoldenGate will be located in /u01/oggbd directory.

[oracle@sandbox oggora]$ ggsci

Oracle GoldenGate Command Interpreter for Oracle
Version OGGCORE_12.
Linux, x64, 64bit (optimized), Oracle 12c on Dec 12 2015 02:56:48
Operating system character set identified as UTF-8.

Copyright (C) 1995, 2015, Oracle and/or its affiliates. All rights reserved.

GGSCI (sandbox.localdomain) 1> edit params ggext

extract ggext
RMTHOST sandbox, MGRPORT 7839
RMTFILE /u01/oggbd/dirdat/or, MEGABYTES 2, PURGE
DDL include objname ggtest.*
TABLE ggtest.*;

GGSCI (sandbox.localdomain) 2> dblogin userid ogg@orcl,password AACAAAAAAAAAAAIARIXFKCQBMFIGFARA, BLOWFISH, ENCRYPTKEY DEFAULT
Successfully logged into database.

GGSCI (sandbox.localdomain as ogg@orcl) 3> register extract GGEXT database

2016-02-16 15:37:21  INFO    OGG-02003  Extract GGEXT successfully registered with database at SCN 17151616.

GGSCI (sandbox.localdomain as ogg@orcl) 4> add extract ggext, INTEGRATED TRANLOG, BEGIN NOW
EXTRACT (Integrated) added.

Let’s leave our source site for a while and switch to the target . Our target is going to be a box where we have hadoop client and all requirement java classes.
I used the same box just to save resources on my sandbox environment. You may run different GoldeGate versions on the same box considering, that Manager ports for each of them will be different.
Essentially we need a Hadoop client on the box, which can connect to HDFS and write data there. Installation of Hadoop client is out of the scope for this article, but you can easily get all necessary information from the Hadoop home page .

Having all required Hadoop classes we continue by installing Oracle GoldenGate for Big Data, configuring and starting it up. In the past I received several questions from people struggling to find the exact place where all adapters could be uploaded. The Adapters were well “hidden” on “Oracle edelivery”, but now it is way simpler. You are going to GoldenGate download page on Oracle site and find the section “Oracle GoldenGate for Big Data” where you can choose the OGG for Linux x86_64, Windows or Solaris. You will need an Oracle account to get it. We upload the file to our linux box, unzip and unpack the tar archive. I created a directory /u01/oggbd as our GoldenGate home and unpacked the tar archive there.
The next step is to create all necessary directories. We start command line utility and create all subdirectories.

[oracle@sandbox ~]$ cd /u01/oggbd/
[oracle@sandbox oggbd]$ ./ggsci

Oracle GoldenGate Command Interpreter
Version OGGCORE_12.
Linux, x64, 64bit (optimized), Generic on Nov 10 2015 16:18:12
Operating system character set identified as UTF-8.

Copyright (C) 1995, 2015, Oracle and/or its affiliates. All rights reserved.

GGSCI (sandbox.localdomain) 1> create subdirs

Creating subdirectories under current directory /u01/oggbd

Parameter files                /u01/oggbd/dirprm: created
Report files                   /u01/oggbd/dirrpt: created
Checkpoint files               /u01/oggbd/dirchk: created
Process status files           /u01/oggbd/dirpcs: created
SQL script files               /u01/oggbd/dirsql: created
Database definitions files     /u01/oggbd/dirdef: created
Extract data files             /u01/oggbd/dirdat: created
Temporary files                /u01/oggbd/dirtmp: created
Credential store files         /u01/oggbd/dircrd: created
Masterkey wallet files         /u01/oggbd/dirwlt: created
Dump files                     /u01/oggbd/dirdmp: created

GGSCI (sandbox.localdomain) 2>

We are changing port for our manager process from default and starting it up. I’ve already mentioned that the port was changed due to the existence off several GoldenGate managers running from different directories.

GGSCI (sandbox.localdomain) 2> edit params mgr
PORT 7839

GGSCI (sandbox.localdomain) 3> start manager
Manager started.

Now we have to prepare parameter files for our replicat processes. Let’s assume the environment variable OGGHOME represents the GoldenGate home and in our case it is going to be /u01/oggbd.
Examples for the parameter files can be taken from $OGGHOME/AdapterExamples/big-data directories. There you will find examples for flume, kafka, hdfs, hbase and for metadata providers. Today we are going to work with HDFS adapter.
I copied files to my parameter files directory ($OGGHOME/dirprm) and modified them accordingly:

oracle@sandbox oggbd]$ cp /u01/oggbd/AdapterExamples/big-data/hdfs/* /u01/oggbd/dirprm/
oracle@sandbox oggbd]$ vi /u01/oggbd/dirprm/hdfs.props

Here are my values for the hdfs.props file:

[oracle@bigdata dirprm]$ cat hdfs.props







javawriter.bootoptions=-Xmx512m -Xms32m -Djava.class.path=ggjava/ggjava.jar

You can find information about all those parameters in oracle documentation here, but there are parameters you will most likely need to change from default:

  • gg.handler.hdfs.rootFilePath – it will tell where the directories and files have to be written on HDFS.
  • gg.handler.hdfs.format – you can setup one of the four formats supported by adapter.
  • goldengate.userexit.timestamp – it will depend from your preferences for transactions timestamps written to your hdfs files.
  • gg.classpath – it will depend from location for your hadoop jar classes and native libraries.

You can see I’ve mentioned the gg.handler.hdfs.format.includeColumnNames parameter. It is supposed to put column name before each value in the output file on HDFS. It may be helpful in some cases, but at the same time it makes the file bigger. If you are planning to create an external Hive table, you may consider commenting on it as I have.
The next parameter file is for our data initialization replicat file. You may consider using a Sqoop or another way to make the initial load for your tables, but I think it makes sense to use the GG replicat if the table size is relatively small. It makes the resulting file-set more consistent with the following replication, since it will be using the same engine and format. So, here is my replicat for initial load:

[oracle@sandbox dirprm]$ cat /u01/oggbd/dirprm/irhdfs.prm
--passive REPLICAT for initial load irhdfs
-- Trail file for this example is located in "dirdat/initld" 
-- Command to run REPLICAT:
-- ./replicat paramfile dirprm/irhdfs.prm reportfile dirrpt/ini_rhdfs.rpt
EXTFILE /u01/oggbd/dirdat/initld
--DDLERROR default discard
setenv HADOOP_COMMON_LIB_NATIVE_DIR=/usr/lib/hadoop/lib/native
DDL include all
TARGETDB LIBFILE SET property=dirprm/hdfs.props
MAP ggtest.*, TARGET bdtest.*;

I was running the initial load in passive mode, without creating a managed process and just running it from command line. Here is an example:

[oracle@sandbox oggbd]$ ./replicat paramfile dirprm/irhdfs.prm reportfile dirrpt/ini_rhdfs.rpt
[oracle@sandbox oggbd]$ hadoop fs -ls /user/oracle/gg/
Found 2 items
drwxr-xr-x   - oracle oracle          0 2016-02-16 14:37 /user/oracle/gg/bdtest.test_tab_1
drwxr-xr-x   - oracle oracle          0 2016-02-16 14:37 /user/oracle/gg/bdtest.test_tab_2
[oracle@sandbox oggbd]$ hadoop fs -ls /user/oracle/gg/bdtest.test_tab_1
Found 1 items
-rw-r--r--   1 oracle oracle        624 2016-02-16 14:37 /user/oracle/gg/bdtest.test_tab_1/bdtest.test_tab_1_2016-02-16_14-37-43.376.txt
[oracle@sandbox oggbd]$ hadoop fs -tail /user/oracle/gg/bdtest.test_tab_1/bdtest.test_tab_1_2016-02-16_14-37-43.376.txt
IBDTEST.TEST_TAB_12016-02-16 19:17:40.7466992016-02-16T14:37:43.37300000000000-100000020121371O62FX2014-01-24:19:09:20RJ68QYM52014-01-22:12:14:30
IBDTEST.TEST_TAB_12016-02-16 19:17:40.7466992016-02-16T14:37:44.89600000000000-100000021552371O62FX2014-01-24:19:09:20HW82LI732014-05-11:05:23:23
IBDTEST.TEST_TAB_12016-02-16 19:17:40.7466992016-02-16T14:37:44.89600100000000-100000022983RXZT5VUN2013-09-04:23:32:56RJ68QYM52014-01-22:12:14:30
IBDTEST.TEST_TAB_12016-02-16 19:17:40.7466992016-02-16T14:37:44.89600200000000-100000024414RXZT5VUN2013-09-04:23:32:56HW82LI732014-05-11:05:23:23
[oracle@sandbox oggbd]$

You can see the Hadoop directories and files created by the initial load.
As soon as the initial load has run we can start our extract and replicat to keep the destination side updated.
We are moving to source and starting our extract prepared earlier.

GGSCI (sandbox.localdomain as ogg@orcl) 6>start extract ggext

Sending START request to MANAGER ...

So, moving back to target and preparing our replicat. I used the replicat with the following parameters:

[oracle@sandbox oggbd]$ cat /u01/oggbd/dirprm/rhdfs.prm
-- Trail file for this example is located in "dirdat/or" directory
-- Command to add REPLICAT
-- add replicat rhdfs, exttrail dirdat/or
--DDLERROR default discard
setenv HADOOP_COMMON_LIB_NATIVE_DIR=/usr/lib/hadoop/lib/native
DDL include all
TARGETDB LIBFILE SET property=dirprm/hdfs.props
MAP ggtest.*, TARGET bdtest.*;

We are adding the replicat to our configuration and it is going to carry on the replication.

GGSCI (sandbox.localdomain) 1> add replicat rhdfs, exttrail dirdat/or

GGSCI (sandbox.localdomain) 2> start replicat rhdfs

Sending START request to MANAGER ...

GGSCI (sandbox.localdomain) 3>

Our replication is up and running, the initial load worked fine, and we can test and see what we have on source and target.
Here is the data in one of our source tables:

orcl> select * from ggtest.test_tab_1;

---------------- ---------- ----------------- ---------- -----------------
	       1 371O62FX   01/24/14 19:09:20 RJ68QYM5	 01/22/14 12:14:30
	       2 371O62FX   01/24/14 19:09:20 HW82LI73	 05/11/14 05:23:23
	       3 RXZT5VUN   09/04/13 23:32:56 RJ68QYM5	 01/22/14 12:14:30
	       4 RXZT5VUN   09/04/13 23:32:56 HW82LI73	 05/11/14 05:23:23


I’ve created an external Hive table for the table test_tab_1 to have better look.

hive> CREATE EXTERNAL TABLE BDTEST.TEST_TAB_1  (tran_flag string, tab_name string, tran_time_utc timestamp, tran_time_loc string,something string, something1 string,
    > PK_ID INT, RND_STR VARCHAR(10),USE_DATE string,RND_STR_1 string, ACC_DATE string)
    > stored as textfile location '/user/oracle/gg/bdtest.test_tab_1';
Time taken: 0.327 seconds
hive> select * from BDTEST.TEST_TAB_1;
I	BDTEST.TEST_TAB_1	2016-02-16 19:17:40.746699	2016-02-16T14:37:43.373000	00000000-10000002012		1	371O62FX	2014-01-24:19:09:20	RJ68QYM5	2014-01-22:12:14:30
I	BDTEST.TEST_TAB_1	2016-02-16 19:17:40.746699	2016-02-16T14:37:44.896000	00000000-10000002155		2	371O62FX	2014-01-24:19:09:20	HW82LI73	2014-05-11:05:23:23
I	BDTEST.TEST_TAB_1	2016-02-16 19:17:40.746699	2016-02-16T14:37:44.896001	00000000-10000002298		3	RXZT5VUN	2013-09-04:23:32:56	RJ68QYM5	2014-01-22:12:14:30
I	BDTEST.TEST_TAB_1	2016-02-16 19:17:40.746699	2016-02-16T14:37:44.896002	00000000-10000002441		4	RXZT5VUN	2013-09-04:23:32:56	HW82LI73	2014-05-11:05:23:23
Time taken: 0.155 seconds, Fetched: 4 row(s)

You can see the table definition is a bit different from what we have on the source site and you will see why.
We got additional columns on the destination side. Interesting that while some of them have a pretty clear purpose, the other columns are not totally clear and have null values.
The first column is a flag for operation, and it shows what kind of operation we have gotten in this row. It can be “I” for insert, “D” for delete and “U” for an update. The second column is table name. The third one is a timestamp in UTC showing when the transaction occurred. The next one is another time in local timezone informing the time of commit, and the next column has a commit sequence number. Those columns can help you to construct the proper data set for any given time.
Let’s insert and update some row(s) on source and see how it will show up on the target:

orcl> insert into ggtest.test_tab_1 values (5,'TEST_1',sysdate,'TEST_1',sysdate);

1 row created.

orcl> commit;

orcl> update ggtest.test_tab_1 set RND_STR='TEST_1_1' where PK_ID=5;

1 row updated.

orcl> commit;

Commit complete.


Let’s check how it is reflected on the destination site. We see two new rows, where each row represents a DML operation. One was for the insert and the second one was for the update.

hive> select * from BDTEST.TEST_TAB_1;
I	BDTEST.TEST_TAB_1	2016-02-16 19:17:40.746699	2016-02-16T14:37:43.373000	00000000-10000002012		1	371O62FX	2014-01-24:19:09:20	RJ68QYM5	2014-01-22:12:14:30
I	BDTEST.TEST_TAB_1	2016-02-16 19:17:40.746699	2016-02-16T14:37:44.896000	00000000-10000002155		2	371O62FX	2014-01-24:19:09:20	HW82LI73	2014-05-11:05:23:23
I	BDTEST.TEST_TAB_1	2016-02-16 19:17:40.746699	2016-02-16T14:37:44.896001	00000000-10000002298		3	RXZT5VUN	2013-09-04:23:32:56	RJ68QYM5	2014-01-22:12:14:30
I	BDTEST.TEST_TAB_1	2016-02-16 19:17:40.746699	2016-02-16T14:37:44.896002	00000000-10000002441		4	RXZT5VUN	2013-09-04:23:32:56	HW82LI73	2014-05-11:05:23:23
I	BDTEST.TEST_TAB_1	2016-02-16 20:43:32.000231	2016-02-16T15:43:37.199000	00000000000000002041		5	TEST_1	2016-02-16:15:43:25	TEST_1	2016-02-16:15:43:25
U	BDTEST.TEST_TAB_1	2016-02-16 20:43:53.000233	2016-02-16T15:43:56.056000	00000000000000002243		5	TEST_1_1
Time taken: 2.661 seconds, Fetched: 6 row(s)

It work for deletes too, only flag will be “D” instead of “I” for insert and “U” for updates.

What about DDL support? Let’s truncate the table.

orcl> truncate table ggtest.test_tab_1;

Table truncated.


And here, there is nothing in our hdfs files. Maybe I’ve missed something, but it looks like the truncate operation is not creating any record. I need to dig a bit more. I will try to make a separate blog post about DDL support for the Big Data Adapters.
It works pretty well when we create a new table and insert new rows.

It also works if we change one of the existing tables, adding or dropping a column. Let’s try to create a new table here:

orcl> create table ggtest.test_tab_4 (pk_id number, rnd_str_1 varchar2(10),acc_date date);

Table created.

orcl> insert into ggtest.test_tab_4 select * from ggtest.test_tab_2;

1 row created.

orcl> commit;

Commit complete.


You can see that it has created a new directory and file for the new table. Additionally, if you add a column the new file will be used for all DML operations for the updated table. It will help to separate tables with different structure.

[oracle@sandbox oggbd]$ hadoop fs -ls /user/oracle/gg/
Found 3 items
drwxr-xr-x   - oracle oracle          0 2016-02-16 15:43 /user/oracle/gg/bdtest.test_tab_1
drwxr-xr-x   - oracle oracle          0 2016-02-16 14:37 /user/oracle/gg/bdtest.test_tab_2
drwxr-xr-x   - oracle oracle          0 2016-02-16 15:56 /user/oracle/gg/bdtest.test_tab_4
[oracle@sandbox oggbd]$ hadoop fs -ls /user/oracle/gg/bdtest.test_tab_4/
Found 1 items
-rw-r--r--   1 oracle oracle        127 2016-02-16 15:56 /user/oracle/gg/bdtest.test_tab_4/bdtest.test_tab_4_2016-02-16_15-56-50.373.txt
[oracle@sandbox oggbd]$ hadoop fs -tail /user/oracle/gg/bdtest.test_tab_4/bdtest.test_tab_4_2016-02-16_15-56-50.373.txt
IBDTEST.TEST_TAB_42016-02-16 20:56:47.0009532016-02-16T15:56:50.371000000000000000000068327IJWQRO7T2013-07-07:08:13:52
[oracle@sandbox oggbd]$

At first glance it looks good, but let’s try to create a table as select.

orcl> create table ggtest.test_tab_3 as select * from ggtest.test_tab_2;

Table created.


Not sure if it is an expected behavior or bug, but apparently it is not working. Our replicat is broken and complains that “DDL Replication is not supported for this database”.

[oracle@sandbox oggbd]$ tail -5 ggserr.log
2016-02-16 15:43:37  INFO    OGG-02756  Oracle GoldenGate Delivery, rhdfs.prm:  The definition for table GGTEST.TEST_TAB_1 is obtained from the trail file.
2016-02-16 15:43:37  INFO    OGG-06511  Oracle GoldenGate Delivery, rhdfs.prm:  Using following columns in default map by name: PK_ID, RND_STR, USE_DATE, RND_STR_1, ACC_DATE.
2016-02-16 15:43:37  INFO    OGG-06510  Oracle GoldenGate Delivery, rhdfs.prm:  Using the following key columns for target table bdtest.TEST_TAB_1: PK_ID.
2016-02-16 15:48:49  ERROR   OGG-00453  Oracle GoldenGate Delivery, rhdfs.prm:  DDL Replication is not supported for this database.
2016-02-16 15:48:49  ERROR   OGG-01668  Oracle GoldenGate Delivery, rhdfs.prm:  PROCESS ABENDING.
[oracle@sandbox oggbd]$

[oracle@sandbox oggbd]$ hadoop fs -ls /user/oracle/gg/
Found 2 items
drwxr-xr-x   - oracle oracle          0 2016-02-16 15:43 /user/oracle/gg/bdtest.test_tab_1
drwxr-xr-x   - oracle oracle          0 2016-02-16 14:37 /user/oracle/gg/bdtest.test_tab_2
[oracle@sandbox oggbd]$

What can we say in summary? The replication works, and supports all DML, and some DDL commands. You will need to prepare to get consistent datasets for any given time using flag for operation, and time for the transaction. In my next few posts, I will cover other Big Data adapters for GoldenGate.

Categories: DBA Blogs

A Repeatable Test Environment with Terraform and Chef-solo

Mon, 2016-02-29 08:57

I have been refreshing my Chef skills, and have been looking for ways to incorporate simple strategies for deployment of temporary lab environments in the cloud, that provide a secure, flexible, repeatable, and cost-effective set of resources for testing code. One of the challenges of working in a Vagrant environment is how to protect our client’s intellectual property (i.e. the code we wish to test, in our case a Chef cookbook) when it is distributed around the globe on our consultants’ laptops. I wanted to tackle the problem simply, one step at a time.

As I see it, cloud services offer a flexible and secure environment in which policies can be managed, costs can be minimized, and access can be controlled. Three things are needed:

  1. A standard, flexible, easy to use deployment framework and template / example environment library
  2. An access control and environmental separation framework that allows lab resources to be deployed securely
  3. A set of templates and policies to manage and control shared infrastructure dependencies and to retire idle resources

Number one is how I spent my Sunday afternoon last week, with this as the result:

Following the README should get you started with deploying the host if you are familiar with Terraform already. This template uses AWS services, but the concept is portable to any of the multiple providers supported by Terraform.

What this is good for:

I think that the benefits of this approach to testing deployment code are clear, but to highlight:

  • We can add the cloud provider’s management framework and services to our test framework, providing a secure location and environmental separation as needed for testing of intellectual property (i.e. code).
  • We can store our test data in the cloud securely, and access it when required by keeping it on persistent storage (EBS in this case).
  • We use templated environments to ensure consistent and repeatable deployments for developers and engineers to collaborate on, as we would with a locally deployed Vagrant test environment.
  • We leverage scale and flexibility in the cloud to create practical testing environments that match our needs and those of our customers.
  • We leverage automation and environmental policies to ensure resource usage is minimized to that required for testing activities; systems are shut down and minimal storage costs incurred when the environments are not in use.


In preparation, you require an Amazon Web Services account with:
– an EC2 keypair with the private key available on your local host (usually in ~/.ssh/ somewhere).
– an IAM access key and secret access key for the AWS account that will be used to deploy.

To deploy the lab:

  1. Install Terraform ( from Hashicorp ( and ensure that the ‘terraform’ directory is somewhere in your path.
  2. Create a build directory and change into it. Clone the repository and change to the project directory. I always set up a remote (origin) at this point in git. We’ll be using the master branch.
  3. Per the README, ensure that you create a ‘terraform.tfvars’ file with 600 permissions and add your authentication token data in the format provided. (This file is excepted in the ‘.gitignore’ to prevent accidental upload of authentication tokens to the repository).
  4. Run ‘terraform plan’ and review the output. If no errors, run ‘terraform apply’. Your environment should start to build in AWS, and Terraform will create some local state files in the working project directory.
  5. Provisioning output will be displayed as the build and bootstrap progresses, with output of the commands run in the remote-exec provisioner section of template file ‘’. (The Seeker‘ is a song by The Who if you are wondering, and IMO the best name ever given to a test host). Review this file to see what’s happening.
  6. When provisioning completes, we should have a functioning instance running Amazon Linux in US-West/Oregon, by default. If you change the region (in ‘’) then don’t forget that you must also update your AMIs (also in ‘’) to match the proper region. I’ve used this AMI in my example, specifically the HVM EBS-backed image.
  7. The public DNS address is output by Terraform. Connect to this using your private EC2 key using ‘ssh -i ~/.ssh/myec2key -l ec2-user’.


If you inspect the template noted in step 5 you will notice that the host is configured to not delete the EBS volume on termination, meaning that after instance termination the EBS volume will be preserved and can be booted from again (so we don’t lose our chef cookbook work between deployments). Do note however that for each node that is freshly provisioned using Terraform, a new volume will be created. If you wish to destroy the volume on creation, update the following section and change the value to ‘true’, however keep in mind that terminating the instance will terminate the data in your ‘chef-repo’.

root_block_device {
delete_on_termination = false

Chef-solo usage:

Now that we’ve logged into our host, let’s change directory to ‘chef-repo’ and have a look…

  1. From the ‘chef-repo’ directory, knife and knife-solo are already configured. You can see the configuration in ‘.chef/knife.rb’.
  2. The ‘cookbooks’ and ‘site-cookbooks’ directories are in the default cookbook path. No roles are configured. There is one node, ‘localhost’, under the nodes directory.
  3. To see the node configuration, run ‘knife node –local-mode list’… ‘localhost‘ is returned.
  4. Let’s have a look at the run list for this node, using ‘knife node –local-mode show localhost’… ‘recipe[ntp::default]‘ is returned in the Run List.
  5. We can see from the output that the run list for the node contains the ‘ntp::default’ recipe as configured by the remote-exec provisioner in the template.
  6. If we wanted to add additional recipes to the run list for the node, we can use ‘knife node –local-mode run_list add localhost ‘recipe[cookbook::recipe]”.
  7. Finally, we can apply the ntp recipe to the node localhost using ‘knife solo cook ec2-user@localhost -i ~/.ssh/mykey’. Note that a second run of the same command will yield a different result, as the new configuration has already been applied by Chef.
  8. When done with the environment, log out of the host and issue a ‘terraform destroy’ command. Type ‘yes’ in response to the confirmation message and Terraform will dismantle the lab and terminate the instance.

Other notes:

Don’t forget that you can now boot the instance from the EC2 console by selecting the volume we created as the new instance’s root (boot) volume.

Chef-solo also supports Berkshelf for dependency management, just install to your ‘chef-repo’ directory and chef-solo will act accordingly.

It’s worth noting that in my research I found that the the chef client now supports local mode operation and will eventually replace chef-solo for headless operation.

A great next step to get going with Terraform templates would be to create a new template file to deploy a second aws instance and then configure it with knife-solo from our first host. To ensure connectivity between nodes, ssh access must be available. To accomplish this simply, we can add the new host to the existing host group, ‘sg_theseeker_access’ which allows ssh login from anywhere by default. Again, refer to the instance template in step 5 in the deployment section above for an example of this.

In my next blog I will be describing a similar, templated Terraform environment that uses Ansible and Ambari Blueprints to deploy a 6+ node Hortonworks Hadoop cluster in ~45 minutes, ready to receive data and code for testing. For a sneak preview, see this github repository and dive into the README.

Have fun!

Categories: DBA Blogs

Oracle’s CREATE INDEX command can take a hint

Thu, 2016-02-25 15:39

Here’s something really useful I discovered by accident when playing with Auto DOP (parallel_degree_policy) in 12c.

The “create index” command can take a hint – in particular the PARALLEL hint. The syntax is as you would expect:

create /*+PARALLEL*/ index tab_ind1 on tab(col1);

Of course, you can specify the number of parallel servers to be used by specifying PARALLEL(24) for example for 24 threads. The really interesting thing about using a hint vs. the documented syntax ("create index tab_ind1 on tab(col1) parallel 24;") is that once created – the index doesn’t have a default degree of parallelism. So you don’t need a second command to make the index noparallel.

Note that if you put the hint and use the “noparallel” attribute like so:

create /*+PARALLEL*/ index tab_ind1 on tab(col1) noparallel;

Then no parallelism will be used.

I tried using hints like FULL(t) to force an index create to use a full table scan instead of an existing index – but that doesn’t seem to work.

I discovered this under really interesting circumstances. I was testing some unrelated functionality that required some indexes created on my play table called CKK.

Here’s the SQL for the CKK table, which will create a 40 GB table with 2 rows per block:

create table ckk nologging tablespace ckk as
select rownum id, mod(rownum,5) mod5_id, mod(rownum,5000) mod5000_id, sysdate dt_fixed, sysdate - rownum/24/60 dt_dec, sysdate + rownum/24/60 dt_pos, sysdate + ora_hash(rownum,65,535)/24 dt_rand, sysdate+mod(rownum,10) dt_mod10, rpad('x',3500,'x') filler
from (select rownum r from dual connect by level <= 10000) r1, (select rownum r from dual connect by level <= 1000)

Then when I attempted to create an index on the table in parallel, Oracle refused to do so:

create index ckk$id on ckk(id) parallel 24; --DOES NOT run in parallel

Instead it created the index with 1 thread only, and then set the parallel degree policy to 24. I have tracked this problem down to the Auto DOP feature. If I turn it off via parallel_degree_policy=manual – the problem goes away. But I never expected this feature to turn off parallelism for index creation when explicitly requested.

Here’s the kicker – once any index is created on the table, future index creations will be automatically done in parallel, regardless if parallel was requested.

For example, this index would be now created in parallel:

create index ckk$mod5_id on ckk(mod5_id);

While before creating the index “ckk$id” – this index would refuse to get created in parallel – when using the parallel attribute.

That’s when I said to myself, “it’s almost like there’s a hint.” I took the hint, and discovered it does work, and it works more consistently than the attribute.

Categories: DBA Blogs

Log Buffer #462: A Carnival of the Vanities for DBAs

Thu, 2016-02-25 15:07

This Log Buffer Edition covers Oracle, SQL Server and MySQL blog posts listing down few new tricks, tips and workarounds plus the news.


Displaying CPU Graphs For Different Time Ranges in Enterprise Manager

One of the cool things in 12c is that (finally after all these years) a sequence can be assigned as the default value for a column.

Jonathan Lewis demonstrates connect by after receiving an email.

Oracle 12c – PL/SQL “White List” via ACCESSIBLE BY

Oracle DBA, IT Manager, or Something Else

SQL Server:

Using R Machine Learning Script as a Power BI Desktop Data Source

SQL Authentication Via AD Groups Part II: Who has what access?

Using R Machine Learning Script as a Power BI Desktop Data Source

Connect to on premise data sources with Power BI Personal Gateway

Exploring SQL Server 2016 Dynamic Data Masking – Part One – Creating a Table that uses Dynamic Data Masking


Data Encryption at Rest

Planets9s: Download the new ClusterControl for MySQL, MongoDB & PostgreSQL

Automate your Database with CCBot: ClusterControl Hubot integration

Loading JSON into MariaDB or even MySQL – mysqljsonimport 2.0 is available

Privileges in MySQL and MariaDB: The Weed Of Crime Bears Bitter Fruit

Categories: DBA Blogs

Learning To Be Present

Thu, 2016-02-25 10:06


“Realize deeply that the present moment is all you ever have.”  – Eckhart Tolle

You’re writing an email, reading instant messages, updating your LinkedIn status, listening to music, taking a call, and thinking about what you’re going to order for dinner tonight, all within the same five minutes.

You think, “Gosh I’m productive!” when in fact, you’re really quite the opposite. We now live in a world where information is at our finger tips and we are no longer spending countless hours reading and/or watching TV. Instead, the majority of us are getting lost in the vortex of surfing the web, and as a result it’s becoming increasingly challenging to remain present.

Being present took on a new meaning for me when I became a parent. Throughout his small but vast life to-date, my son has been inadvertently teaching me to keep me calm, clear, and focused on the present moment. Kids don’t care what time it is, or that you have an important call to make. They’re oblivious to your fast approaching project deadline, or that your favorite TV show is starting in five minutes. They exist in their own ‘now’ and generally focus on one moment and experience at a time…even if that moment or experience only lasts 60 seconds.

My eighteen-month-old son consistently reminds me of the importance of being truly present.  He requires me to be focused on one thing, and hearing what he is trying to express within the scope of his current 30 word vocabulary. He doesn’t waste time when he expresses his wants, needs, and feelings. He’ll call me out on my distractions when other people in my life won’t. He acts with purpose when he does things that he knows he shouldn’t, just to bring me back to the moment that we’re in. It’s a very effective method (to say the least), and has helped me gain perspective on being mindful, while also reminding me how important my time is with him. This valuable lesson has spilled over into my day-to-day life.

Now, when I’m on a call, or have one of my team members in my office, I’m listening and hearing them. I make a conscious effort to turn my computer monitor, turn off my email notifications and transfer my phone calls to voicemail until I’ve completed the task at hand. When I’m writing an email, I don’t reply to instant messages until I click ‘send’. When I’m hosting a meeting, I’m fully present in it, which often leads to a reasonably early wrap-up, and a beneficial gift of time to those in attendance.

I still have a distance to go in my efforts to master being present. My level of self-awareness and my relationship with my son are what give me the focus to continue to up my game. The next time you are juggling five tasks at once, challenge yourself to focus on only one, with a heartfelt intent to be truly present.

“Wherever you are, be all there.” – Jim Elliot

Categories: DBA Blogs

Security in the age of cloud computing: How IT leaders are mitigating risks

Tue, 2016-02-23 13:37


As the world becomes more connected, and more systems are moving to the cloud, security and data privacy are major considerations for IT leaders. But with the ever-present possibility of security threats, vulnerabilities and privacy exposures, how are today’s CIOs and IT leaders securing their systems?


According to a recent survey by the Cloud Security Alliance, despite concerns about the security of corporate data moving to the cloud, just 35.0 percent of IT leaders said that cloud-based systems of record are less secure than their on-premises counterparts. And 64.9 percent said that the cloud is either more secure than on-premises software or equally secure.


This is not to say that these same IT leaders don’t see security as a major concern as they evolve their systems onto the cloud. While businesses have to focus on innovation and agility to gain competitive advantage, the question of security has to be addressed alongside innovation. But the key is to address security without slowing the business down.


So what are IT leaders doing to secure their systems as they move forward with business and IT innovations? I had the opportunity to discuss this with IT leaders from RMS and Mozilla, during Pythian’s CIO panel discussion in November 2015.


Cory Isaacson, CTO at RMS has been working on bringing big data and scalable systems together to create a new cloud-based platform, and says his customers — some of the world’s largest insurance companies — are constantly concerned about threats to their data. This is an industry rife with concerns over privacy exposures because of the nature of data being handled. RMS runs catastrophe models for their customers, like scenarios that will tell insurance companies what a disaster like an earthquake or hurricane might cost them.


One of the biggest fears on the minds of Isaacson’s customers is about the security of their data. “The best idea is to not have anything that’s worth stealing. We’re looking at techniques that will keep the data encrypted from the user’s browser all the way into the database. If we can solve the data security issue simply by not having anything worth stealing, then that’s much better and much safer. Just take all the confidential and proprietary information and encrypt it end-to-end, and work with it on an encrypted basis,” he said.


RMS is betting on this encryption strategy for the longer term. But, it’s not an easy one to implement. Isaacson admits that it’s going to take some doing, and he hopes that after following ISO standards, going through quality gates and adhering to all of the industry prescribed protections and processes, that he will have some robust security in place.


Sean Rich, director of IT at Mozilla, is leading their application services group, and is facing the question of how to automate security within their day to day processes.  “Just like agile development found ways to build quality assurance into the process and DevOps found ways to build operations into the process, we now need a way to build greater security into the process. The definition of working software has evolved to include all three: quality, runtime and security,” said Rich.


Aaron Lee, the Chief Data Officer at Pythian, believes that we all need to think about automating security, just as we do with things like QA.  “When it comes to security, the cost of inaction is super high and the risk of inaction is super high,”  Lee said.


According to Lee, many IT leaders think the idea of automating security is not feasible. “I think the idea of depending on humans to do the same thing over and over again is nuts,” he said, referring to the manual effort that goes into security.


“The idea that a static security design can secure a dynamic engineering environment is an incredible mismatch,” he added.


Lee’s team at Pythian spends a lot of time with clients trying to figure out how to parse the regulatory requirements to automate as much as possible.


And Lee asserted that companies don’t always know what they’re getting into when they host their data with a third party.


“My favorite example is single tenant versus multi tenant. Single tenant sounds safer because all your data is in one place. But it’s all multi tenant in the end, you’re all sharing the building, you’re all sharing the earth so it’s hard to make that distinction. For a certain part of the SaaS industry, this is an important marketing distinction. But in reality, it’s meaningless. A data isolation strategy that might be implemented hypothetically could involve my business being a tenant on some multi tenant infrastructure in which I have a set of keys that I control and that are specific to my business. My third party doesn’t have access to those keys, and they are not shared by any other tenant. But in reality, are we all sharing the same set of keys?  And how are those keys and secrets managed? Are they sitting on a high risk security module somewhere on somebody’s laptop? Is it some combination of all that? When you start looking at what vendors promise from a security standpoint and what the industry talks about, it doesn’t always match reality,” he said.


These are the issues Pythian faces every day when assessing the vulnerability of a company’s IT systems.


Companies that are serious about security aren’t just checking compliance boxes to make auditors happy, according to Lee.  They’re getting experts like Pythian to address their issues in a constructive way.


“My observation is that the enterprise world at large has gone from needing to have people in place to check boxes to being at true risk of fundamental intellectual property leakage and for that matter, business continuity problems,” Lee said.


But most companies aren’t putting resources on automating. They’re hiring. And according to the Cloud Security Alliance survey, IT leaders see the biggest barrier to mitigating threats to their systems and data is not a limitation in security technology — it’s a human resource limitation. The survey results suggested that companies just can’t find enough security professionals to fill their hiring needs.


In keeping with that trend, The Wall Street Journal recently reported that JPMorgan expected to spend $500 million on cyber security in 2015, double its 2014 budget of $250 million.


While companies continue to spend their security budgets on hiring, Mozilla’s Sean Rich agrees that there’s a need for automation to deal with vulnerability.


“This need has driven transformation. We’ve moved from maintaining compliance to actually dealing with practical threats and with real consequences, looking at every single attack vector and how to mitigate them,” said Rich.


“Finding the answer to mitigating IT risks won’t be easy, and will continue to be costly,” said Pythian’s Aaron Lee.


“The best, most sophisticated zero day vulnerability attacks are utterly widespread in a month, so we all eventually get to feel the enormity of those attacks. Security is the biggest expense and it’s rising. It’s only going to continue to get worse because there are a lot of people who make a lot of money by hacking into systems, and they keep getting better at it.”

Categories: DBA Blogs

Google Cloud Dataproc in ETL pipeline – Part 1 (Logging)

Mon, 2016-02-22 11:16


Google Cloud Dataproc, now generally available, provides access to fully managed Hadoop and Apache Spark clusters, and leverages open source data tools for querying, batch/stream processing, and at-scale machine learning. To get more technical information on the specifics of the platform, refer to Google’s original blog post and product home page.


Having access to fully managed Hadoop/Spark based technology and powerful Machine Learning Library (MLlib) as part of Google Cloud Platform makes perfect sense as it allows you to reuse existing code and helps many to overcome the fear of being “locked into” one specific vendor while taking a step into big data processing in the cloud. That said, I would still recommend evaluating Google Cloud Dataflow first while implementing new projects and processes for its efficiency, simplicity and semantic-rich analytics capabilities, especially around stream processing.


When Cloud Dataproc was first released to the public, it received positive reviews. Many blogs were written on the subject with few taking it through some “tough” challenges on its promise to deliver cluster startup in “less than 90 seconds”. In general the product was well received, with the overall consensus that it is well positioned against the AWS EMR offering.


Being able, in a matter of minutes, to start Spark Cluster without any knowledge of the Hadoop ecosystem and having access to a powerful interactive shell such as Jupyter or Zeppelin is no doubt a Data Scientist’s dream. But with extremely fast startup/shutdown, “by the minute” billing and widely adopted technology stack, it also appears to be a perfect candidate for a processing block in bigger ETL pipelines. Orchestration, workflow engine, and logging are all crucial aspects of such solutions and I am planning to publish a few blog entries as I go through evaluation of each of these areas starting with Logging in this blog.


Cloud Dataproc Logging

Cluster’s system and daemon logs are accessible through cluster UIs as well as through SSH-ing to the cluster, but there is a much better way to do this. By default these logs are also pushed to Google Cloud Logging consolidating all logs in one place with flexible Log Viewer UI and filtering. One can even create custom log-based metrics and use these for baselining and/or alerting purposes. All cluster logs are aggregated under a “dataproc-hadoop” tag but “structPayload.filename” field can be used as a filter for specific log file.


In addition to relying on Logs Viewer UI, there is a way to integrate specific log messages into Cloud Storage or BigQuery for analysis. Just to get an idea on what logs are available by default, I have exported all Cloud Dataproc messages into BigQuery and queried new table with the following query:



structPayload.filename AS file_name,

count(*) AS cnt





AND metadata.labels.value = ‘cluster-2:205c03ea-6bea-4c80-bdca-beb6b9ffb0d6’




  • hadoop-hdfs-namenode-cluster-2-m.log
  • yarn-yarn-nodemanager-cluster-2-w-0.log
  • container_1455740844290_0001_01_000004.stderr
  • hadoop-hdfs-secondarynamenode-cluster-2-m.log
  • hive-metastore.log
  • hadoop-hdfs-datanode-cluster-2-w-1.log
  • hive-server2.log
  • container_1455740844290_0001_01_000001.stderr
  • container_1455740844290_0001_01_000002.stderr
  • hadoop-hdfs-datanode-cluster-2-w-0.log
  • yarn-yarn-nodemanager-cluster-2-w-1.log
  • yarn-yarn-resourcemanager-cluster-2-m.log
  • container_1455740844290_0001_01_000003.stderr
  • mapred-mapred-historyserver-cluster-2-m.log


Google Cloud Logging is a customized version of fluentd – an open source data collector for unified logging layer. In addition to system logs and its own logs, fluentd is configured (refer to /etc/google-fluentd/google-fluentd.conf on master node) to tail hadoop, hive, and spark message logs as well as yarn application logs and pushes them under “dataproc-hadoop” tag into Google Cloud Logging.

Application Logging

You can submit a job to the cluster using Cloud Console, Cloud SDK or REST API. Cloud Dataproc automatically gathers driver (console) output from all the workers, and makes it available through Cloud Console. Logs from the job are also uploaded to the staging bucket specified when starting a cluster and can be accessed from there.


Note: One thing I found confusing is that when referencing driver output directory in Cloud Dataproc staging bucket you need Cluster ID (dataproc-cluster-uuid), however it is not yet listed on Cloud Dataproc Console. Having this ID or a direct link to the directory available from the Cluster Overview page is especially critical when starting/stopping many clusters as part of scheduled jobs. One way to get dataproc-cluster-uuid and a few other useful references is to navigate from Cluster “Overview” section to “VM Instances” and then to click on Master or any worker node and scroll down to “Custom metadata” section. Indeed, you can also get it using “gcloud beta dataproc clusters describe <CLUSTER_NAME> |grep clusterUuid” command but it would be nice to have it available through the console in a first place.


The job (driver) output however is currently dumped into console ONLY (refer to  /etc/spark/conf/ on master node) and although accessible through Dataproc Job interface, it is not currently available in Cloud Logging.


The easiest way around this issue, which can be easily implemented as part of Cluster initialization actions, is to modify /etc/spark/conf/ by replacing  “log4j.rootCategory=INFO, console” with “log4j.rootCategory=INFO, console, file” and add the following appender:


# Adding file appender




log4j.appender.file.layout.conversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c: %m%n


Existing Cloud Dataproc fluentd configuration will automatically tail through all files under /var/log/spark directory adding events into Cloud Logging and should automatically pick up messages going into /var/log/spark/spark-log4j.log.


You can verify that logs from the job started to appear in Cloud Logging by firing up one of the examples provided with Cloud Dataproc and filtering Logs Viewer using the following rule:




If after this change messages are still not appearing in Cloud Logging, try restarting fluentd daemon by running “/etc/init.d/google-fluentd restart” command on master node.

Once changes are implemented and output is verified you can declare logger in your process as:

import pyspark

sc = pyspark.SparkContext()

logger =

and submit the job redefining logging level (INFO by default) using “–driver-log-levels”.

Learn more here.

Categories: DBA Blogs

PRKO-2002 : Small Things, Large Shadows

Mon, 2016-02-22 09:28

This quick and short blog post is a reminder to remember the syntax. I was just trying to bring down local listeners at half rack of Exadata (11gR2) to perform some maintenance. The best way to shutdown resources in RAC is to use srvctl so was trying to run following after setting Grid home as Oracle home, from where the local listeners were running:

[oracle@mytestexa1:~]$ srvctl stop listener LISTENER
PRKO-2002 : Invalid command line option: LISTENER

[oracle@mytestexa1:~]$ srvctl status listener
Listener LISTENER is enabled
Listener LISTENER is running on node(s): mytestexa2,mytestexa4,mytestexa1,mytestexa3

Hmmmm. ok, then I tried this:

[oracle@mytestexa1:~]$ srvctl stop listener LISTENER -n mytestexa1
PRKO-2002 : Invalid command line option: LISTENER

Well, ummm ok, then I tried this one:

[oracle@mytestexa1:~]$ srvctl stop listener -n mytestexa1
[oracle@mytestexa1:~]$ srvctl status listener
Listener LISTENER is enabled
Listener LISTENER is running on node(s): mytestexa2,mytestexa4,mytestexa3

And it worked. Yes, if the default listener name is not in use, then -l listener_name should be used.

So need to remember the syntax :) . Small things, but in emergencies they cast large shadows for a while.

Categories: DBA Blogs

A View From Down Under: What IT Innovation and Agility Mean to IT Pros in Australia

Mon, 2016-02-22 06:38

The impact of today’s disruptive technologies on how we do business led Pythian to introduce Velocity of Innovation, a series of panel discussions for senior IT professionals. These are Pythian’s exclusive thought-leadership events where a moderator-led panel engages in lively conversations around today’s emerging technologies.

We recently took our Velocity of Innovation event series to Sydney, Australia to find out what’s on the minds of CIOs and other IT leaders in that region. Our moderator for this event was Tom McCann, senior customer experience analyst with Forrester. You may be thinking: Why would a customer experience analyst be moderating a CIO panel? But here’s why we chose to put a customer experience expert at the centre of this discussion. With the commoditization of products and services, delivering a quality customer experience is one of the few ways that companies now have of truly differentiating themselves from their competitors. From attracting prospects, to acquiring customers, to onboarding and ongoing support, companies need to have the right tools and processes in place to ensure products or services perform as promised, and that every interaction customers have with them is satisfying. They also need to be able to use data to respond to customer needs and market trends. IT plays a critical role in of all of these areas.

The panel consisted of IT leaders with a range of backgrounds and experiences:

  • Francisco Alvarez, vice president, APAC at Pythian
  • Chris Mendes, executive consultant big data and analytics from Industrie IT
  • Tim Sheedy, principal analyst with Forrester Research

As we do to start off many of these events, we began the Sydney discussion by asking this question: Innovation and agility are very common buzzwords that seem to describe what everyone wants. What have you explicitly done to get better at innovating or to make your organization or your customer’s more agile? How has this push for agility impacted your enterprise architecture?

Here’s an excerpt from the discussion that followed.

Chris: I actually don’t think there’s enough innovation in Australia, in particular. There’s a lot of talk about it, people are actually doing a lot of experiments and there are some companies who’ve set up business purely giving them the tool sets to use their data to innovate. You’ve got a few things that seem to be working against that at the moment and I think one of the things working against it is that it doesn’t stand on its own.

It doesn’t matter what the technology is, and in fact what ends up happening is it all comes down to the “Why?” Because you really need to have the technologists interacting with business people so that when they come up with an idea they get immediate feedback. I’ve found that a lot of businesses don’t have time for innovation. They run pretty lean, they are very focused on day-to-day operations and don’t have time to come up with new ideas or to use their data to actually innovate. There are a lot of roadblocks there. I don’t see it as a technology problem, I actually see one of the roadblocks as a management issue. The other one is sort of the culture of no failure, and I think we’ve got a big issue with that in Australia.

Tim: As you are saying, I’ve presented a lot on innovation and Forrester actually has a really good innovation model. We spoke to about 130 companies around the globe who did bits of innovation well and we took all those bits and made them into an end-to-end process that can be used by our clients. We learned a lot from putting this together. We spoke to Telstra in Australia and did the piece on deciding what to innovate with, and where to invest. Other companies told us about how they come up with ideas. I speak to a lot of clients who told us that they had an innovation program in place where they gathered ideas from staff. But now those employees don’t trust their companies anymore because they never got back to them on how they would implement their ideas.

Chris: I think the other thing is I think there’s also a very big misunderstanding at board levels about innovation because boards are there to actually stop you changing your business. The fundamental tenant is: “We’ve got a great business model here, it’s running well, we’ve got to make sure that any change to it doesn’t damage that.” There’s a natural caution at board levels and it’s totally understandable.

Audience: One of the issues I guess that you just raised is that the boards are risk adverse, however if their business is failing, then there is obviously a need for them to do something about it.

Tim: But that’s a bad time to be innovating, right? When everything is going wrong, “We should try to do things differently.” The market’s probably left you behind by that point.

Francisco: The main problem that most of you have hit upon is that innovations equal risk. But take a step back and look at the companies that are really doing well in the market. They’re doing really well because of one factor that differentiates them: they were not afraid to try to innovate. And because of that innovation they are getting their share of the market and gaining ground. Just look at the financial market. CBA was considered crazy a few years ago for all the investment they were doing in technology, social media, apps and so on. They got ahead. And now everybody is trying to do the same.The problem is they need to have a balance. Even with choosing vendors, companies will avoid risk. The will keep going with IBM because they don’t want to take a risk. Or they keep Oracle and let Oracle do everything for them because there might be a risk with moving to another vendor. If they already own a technology, it doesn’t matter if they are not getting good service. They think that for risk capacity, there is insurance. Sometimes you’re not saving money or not improving, but you don’t want to innovate simply because you don’t want to increase the risk to the business.

Tim: Innovation is risk management, this is it. The discussion went further on this topic, and explored areas like the future of IT, security and more.

Interested in being a part of a discussion like this one? Velocity of Innovation is a series of thought-leadership events for senior IT management hosted by Pythian. Pythian invites leading IT innovators to participate in discussions about today’s disruptive technologies: big data, cloud, advanced analytics, DevOps, and more. These events are by invitation only.

If you are interested in attending an upcoming Velocity of Innovation event in a city near you, please contact To view our schedule of upcoming events visit out Velocity of Innovation page.

Categories: DBA Blogs

The Value of Pythian’s Internship Program

Fri, 2016-02-19 07:35

I have been working as a software development intern at Pythian for more than three months. As a second year student in electrical engineering, it is both unsurprising and somewhat expected that I would be working in the software industry during my first co-op term. I’ve heard complaints from aspiring electrical engineers that most of the available co-op positions are for software. This is especially true for first or second year students, because the knowledge base required for advanced electronics projects has not yet been attained.

I would discourage students from adhering to this viewpoint. Knowledge of programming and software development in the 21st century is invaluable to any kind of engineering. Furthermore, with the high level of involvement and learning involved in working as a software developer at Pythian, I am confident that someone with a thirst for knowledge would find this to be a worthwhile experience.

It began at the interview. As soon as I began talking to my current manager and the human resources representative, I could tell that the position was one I’d be very interested in. They both struck me as very kind, understanding, and most importantly, knowledgeable individuals. It was also very clear to me that the goal of the interview was to determine certain key traits, other than the necessary technical requirements. They were seeking someone who had a drive to participate and learn, as well as someone who was sociable, and who could fit with the team dynamic. Needless to say, after learning what kind of work I would be doing on the team and receiving the job offer, I quickly accepted the position.

After two weeks of orientation, I was immediately tasked with a very interesting project. The project involved creating a script to upgrade the database used by the core code base, which would eventually be used on the actual product that is “Adminiscope”. It was very fulfilling to know that my work, at the beginning of my work term, would be used on a production system to apply new features.

After this project, I was assigned more interesting projects, which required even more learning. At this point of my co-op term, I have touched almost all aspects of the Adminiscope software, and my abilities in software development have grown to a significant degree. It is very important to note, that learning all these new skills would have been much more difficult if not for the constant guidance of the friendly, knowledgeable, and experienced people I have had the pleasure to work with.

If what I have already stated is not enough to encourage you to work at Pythian, then their unique corporate culture should convince you. While working here, I never once felt that my work or input was less valuable then others just because I was a co-op student. Participation is actively encouraged. Learning more about the company as a whole is also supported through “BORG” meetings with various senior level staff. I have had a few conversions with the CEO himself, and have learned something of significance every time. It is also a very good feeling to be told by someone of such status in the company that the work you are doing is meaningful, appreciated, and most of all, why it’s important.

I have absolutely no regrets, and I am grateful that I have had the opportunity to work in such a stimulating environment. I can say without doubt that in my three and a half months at Pythian, I have learned more about programming and software development than I have in my entire academic career. It is truly a worthwhile learning experience for those seeking a challenge. I hope my story will encourage you to apply for Pythian’s internship program, and I wish you luck in getting the position.

Categories: DBA Blogs

Data Encryption at Rest

Thu, 2016-02-18 14:31

This blog post was co-authored by Peter Sylvester and Valerie Parham-Thompson

Introduced in version 10.1.3 (and with substantial changes in 10.1.4), the MariaDB data encryption at rest feature allows for transparent encryption at the tablespace level for various storage engines, including InnoDB and Aria.

Before now, there have been only two widely accepted encryption methods for MySQL/MariaDB: encryption at the file system level, or encryption at the column level. For comparison, we’ll do a brief overview of how these work, as well as the pros and cons typically associated with each option.

File System Encryption

This is performed by setting a file system to be encrypted at the block level within the operating system itself, and then specifying that the encrypted volume should be the location of the data directory for MySQL/MariaDB. You can also use encrypted volumes to store MariaDB binary logs.

  • One-time setup and then no additional management required.
  • There is a large degree of overhead at the CPU level. Every time an InnoDB page/extent is retrieved and stored in the InnoDB buffer pool, the data has to be decrypted. The same issue occurs when dirty pages are flushed back to persistent storage, be it data or a log file.
Column Encryption

You can encrypt data at the column level by using a binary data type (varbinary/BLOB) and then encrypt the data as it goes into or out of the the page at the application or code level. Typically this is done using the AES_ENCRYPT and AES_DECRYPT functions in MySQL/MariaDB.

  • You’re only encrypting the data that needs to be secured. All other data has no encryption-related overhead.
  • This provides a higher degree of security then file system encryption. If the data is encrypted at the file system or by the data encryption at rest feature, if you can get into the running MariaDB instance you can still see the unencrypted version of the data. With column-level encryption, the data is stored in a secure fashion and you need to supply the encryption key every time it is accessed by the MariaDB instance.
  • The crypt key needs to be stored somewhere that allows the application to easily provide it when running queries against MariaDB.
  • You may be able to see the crypt key in statement-based binary logging, or in the process list.
  • Data that is encrypted should not be used for reverse lookups. For example, if you are encrypting a column that stores a name, and you need to search that column for a specific name, you have to specify the search using the AES_DECRYPT function, which will force all the table records to be scanned, decrypted, and compared as part of the “where” operation of the query.
MariaDB Data Encryption at Rest

This solution sits somewhere between the aforementioned file system level and column level encryption, allowing you to encrypt data at the table level. This allows for encryption that is easier to manage and work with, while also allowing for a narrower focus so you are encrypting only the data or logs that you wish to encrypt. Although, it should be noted that like file system encryption, if you can get to the launched MariaDB instance, you can get access to the encrypted data.

Now let’s walk through a test of the functionality of the MariaDB data encryption at rest feature.


Preparation included cloning a Centos7 base VM in VirtualBox, adjusting the IP and hostname, and installing MariaDB 10.1.11 using their repository with instructions here.

Create Keys

The first step was to create keys. The output of the openssl command below (with example output) was used to edit a new file /var/lib/mysql/keys.txt.

The command was:
openssl enc -aes-256-cbc -P -md sha1
enter aes-256-cbc encryption password:
Verifying - enter aes-256-cbc encryption password:

Sample output:

iv =C6A3F3625D420BD19AF04CEB9DA2D89B

Sample contents of keys.txt using that output:


(You can take the additional step of encrypting the keys, but that was not done here.)

Don’t lose the key file, or you won’t be able to start the server:

2016-02-13 20:37:49 140334031026304 [ERROR] mysqld: File '/var/lib/mysql/keys.txt' not found (Errcode: 2 "No such file or directory") 
2016-02-13 20:37:49 140334031026304 [ERROR] Plugin 'file_key_management' init function returned error. 
2016-02-13 20:37:49 140334031026304 [ERROR] Plugin 'file_key_management' registration as a ENCRYPTION failed. 
2016-02-13 20:37:49 140334031026304 [ERROR] InnoDB: cannot enable encryption, encryption plugin is not available 
2016-02-13 20:37:49 140334031026304 [ERROR] Plugin 'InnoDB' init function returned error. 
2016-02-13 20:37:49 140334031026304 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed. 
2016-02-13 20:37:49 140334031026304 [Note] Plugin 'FEEDBACK' is disabled. 
2016-02-13 20:37:49 140334031026304 [ERROR] Unknown/unsupported storage engine: InnoDB 
2016-02-13 20:37:49 140334031026304 [ERROR] Aborting

You can of course remove the relevant configs and restart successfully, but we have found at least two issues when trying to remove this configuration after it has been put in place on the MariaDB instance.

  1. If you encrypt the InnoDB log files (redo, not binary logs), then remove the encryption configuration and restart MariaDB, it will not be able to start until you re-enable the data at rest encryption feature.
  2. If you enable default encryption by putting innodb-encrypt-tables in the my.cnf, and then create a table, remove the feature, and restart MariaDB, the server will crash irrecoverably when selecting data from the table (bug filed as
Install Plugin

Next step was to install the plugin and use this file. The clearest path to doing this is to add the following two lines in /etc/my.cnf within the [mysqld] section:
file-key-management-filename = /var/lib/mysql/keys.txt</pre>

Restart MariaDB, and confirm the plugin is installed. The file_key_management plugin should display as “active.”

show all_plugins like '%file%';
Testing Encrypted Tables

As the documentation indicates, you can encrypt all tables when they are created (specify innodb-encrypt-tables in the my.cnf) or individual tables (by adding the settings to a create or alter table statement). (See further below for result of using the third option, innodb-encrypt-tables=force.)

Here are the results if you encrypt a single table.

First, create a table:

mysqlslap --concurrency=5 --number-int-cols=2 --number-char-cols=3 --auto-generate-sql --auto-generate-sql-write-number=1000 --no-drop

And encrypt it:

alter table mysqlslap.t1 encrypted=yes encryption_key_id=1;

Here’s the table definition after encrypting:

show create table mysqlslap.t1\G
*************************** 1. row ***************************
Table: t1
Create Table: CREATE TABLE `t1` (
`intcol1` int(32) DEFAULT NULL,
`intcol2` int(32) DEFAULT NULL,
`charcol1` varchar(128) DEFAULT NULL,
`charcol2` varchar(128) DEFAULT NULL,
`charcol3` varchar(128) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1 `encrypted`=yes `encryption_key_id`=1

Looking at the .ibd file directly via xxd, you can see some text data before encryption:

0084570: 0001 8000 0002 7661 6c65 7269 6570 6172 ......valeriepar
0084580: 6861 6d74 686f 6d70 736f 6e00 0000 0000 hamthompson.....

And after encryption:

0085470: fdf4 7c27 d9cb 5d33 59b1 824d 4656 b211 ..|'..]3Y..MFV..
0085480: 7243 9ce0 1794 7052 9adf 39a1 b4af c2fd rC....pR..9.....

Once that table was encrypted, to test moving encrypted tablespaces, the files were copied from the source to a destination server as follows. The destination server had no encryption plugin, configs, or key installed.

The following process is typical for moving tablespaces: create a similar empty table on the destination server, without encryption. (It throws an error on that unencrypted server if you try it with `encrypted`=yes `encryption_key_id`=1.)

create database mysqlslap;

use mysqlslap

`intcol1` int(32) DEFAULT NULL,
`intcol2` int(32) DEFAULT NULL,
`charcol1` varchar(128) DEFAULT NULL,
`charcol2` varchar(128) DEFAULT NULL,
`charcol3` varchar(128) DEFAULT NULL

Then start the move process. First, discard the tablespace on the destination server. This leaves you with just the .frm file there.

-rw-rw----. 1 mysql mysql 65 Feb 13 13:15 db.opt
-rw-rw----. 1 mysql mysql 932 Feb 13 13:15 t1.frm

Prepare the table on the source server:

flush tables t1 for export;

Now you have a .cfg file on the source server:

-rw-rw----. 1 mysql mysql 65 Feb 13 13:13 db.opt
-rw-rw----. 1 mysql mysql 620 Feb 13 13:16 t1.cfg
-rw-rw----. 1 mysql mysql 976 Feb 13 13:14 t1.frm
-rw-rw----. 1 mysql mysql 557056 Feb 13 13:14 t1.ibd

Send the .cfg and .ibd files from the source to the destination server:

scp /var/lib/mysql/mysqlslap/t1.cfg root@
scp /var/lib/mysql/mysqlslap/t1.ibd root@

Free to unlock on the source server now:

unlock tables;

You’ll get an error on import if you don’t make them usable by mysql:

chown mysql:mysql /var/lib/mysql/mysqlslap/t1*

With the .cfg and .ibd files in place on the destination server, import the tablespace there:

alter table t1 import tablespace;

As intended, the encryption prevents importing the table:

MariaDB [mysqlslap]&gt; alter table t1 import tablespace;
ERROR 1296 (HY000): Got error 192 'Table encrypted but decryption failed. This could be because correct encryption management plugin is not loaded, used encryption key is not available or encryption method does not match.' from InnoDB

If you set innodb-encrypt-tables=force in /etc/my.cnf, attempting to create a table with encryption=no fails:

create table t3 ( 
`intcol1` int(32) DEFAULT NULL, 
`intcol2` int(32) DEFAULT NULL, 
`charcol1` varchar(128) DEFAULT NULL, 
`charcol2` varchar(128) DEFAULT NULL, 
`charcol3` varchar(128) DEFAULT NULL 
) ENGINE=InnoDB DEFAULT CHARSET=latin1 `encryption_key_id`=1 `encrypted`=no;
ERROR 1005 (HY000): Can't create table `mysqlslap`.`t3` (errno: 140 "Wrong create options")

The error message could be more clear, but the setting would save future create statements from undoing desired encryption set up by a DBA.

Encrypted Binlogs

Binlogs can also be encrypted.

Start by adding this to the my.cnf, and restart server.


Before encryption, the binlogs look like this:

008dfb0: 494e 5345 5254 2049 4e54 4f20 7431 2056 I NSERT INTO t1 V
008dfc0: 414c 5545 5320 2832 3132 3635 3538 3138 ALUES (212655818
008dfd0: 352c 3737 3332 3733 3731 382c 2759 3838 5,773273718,'Y88
008dfe0: 4e30 3774 6f30 3333 6d32 5845 497a 487a N07to033m2XEIzHz
008dff0: 4d4a 7348 7558 544c 3247 6543 6865 4334 MJsHuXTL2GeCheC4
008e000: 574a 7149 436c 4471 6f6c 3479 634d 7071 WJqIClDqol4ycMpq
008e010: 5a68 374b 3463 5a79 7442 4251 684e 4d42 Zh7K4cZytBBQhNMB
008e020: 6234 4c6e 7161 6457 425a 5366 7649 544c b4LnqadWBZSfvITL
008e030: 7a64 5a77 3536 7571 4835 4771 5466 7477 zdZw56uqH5GqTftw
008e040: 6a36 6a5a 5943 336b 6c4f 4e5a 616c 6d50 j6jZYC3klONZalmP
008e050: 454a 4c4a 5047 4161 4c49 4f27 2c27 6970 EJLJPGAaLIO','ip

After restarting the server with encryption, newly generated binlog files look like this:

011b860: 69c5 cc00 5cb0 1581 0217 2d3f 728c 77ff i...\.....-?r.w.
011b870: a6ca e6e3 a041 0f26 ee39 c398 eecd 4df9 .....A.&amp;.9....M.
011b880: 5bef 53e0 bf0a 96bd 7b61 bfcc c074 6151 [.S.....{a...taQ
011b890: 208b 63fc 4efd ee91 b2bc 0a90 1009 76a1 .c.N.........v.
011b8a0: bf18 84e3 f444 82a1 e674 b44b 7754 2cc9 .....D...t.KwT,.
011b8b0: b63f 946c 821d 222a ae57 a251 451c 8332 .?.l.."*.W.QE..2
011b8c0: d030 1c5f 3997 db77 96f1 4da5 a03e 55a9 .0._9..w..M..&gt;U.
011b8d0: a882 3980 f81f 9fa9 7b45 27c1 2f51 34ad ..9.....{E'./Q4.
011b8e0: b8bf e5e6 4b1e 6732 11a1 1b00 0000 c049 ....K.g2.......I
011b8f0: b2a9 ad08 ed95 4c5c 5541 05b4 a256 14d3 ......L\UA...V..
011b900: 045b e74f 2526 0000 009f 921c 1482 d621 .[.O%&amp;.........!

Note that also you can’t use mysqlbinlog on encrypted binlogs:

mysqlbinlog /var/lib/mysql/maria101-bin.000006
/*!40019 SET @@session.max_insert_delayed_threads=0*/;
# at 4
#160213 10:49:27 server id 1 end_log_pos 249 Start: binlog v 4, server v 10.1.11-MariaDB-log created 160213 10:49:27
# at 249
# Encryption scheme: 1, key_version: 1, nonce: 4caf0fe45894f796a234a764
# The rest of the binlog is encrypted!
# at 285
/*!50521 SET skip_replication=1*//*!*/;
#620308 22:02:57 server id 3337593713 end_log_pos 2396907567 Ignorable
# Ignorable event type 118 (Unknown)
# at 324
ERROR: Error in Log_event::read_log_event(): 'Found invalid event in binary log', data_len: 42, event_type: 204
ERROR: Could not read entry at offset 366: Error in log format or read error.
# End of log file
ROLLBACK /* added by mysqlbinlog */;

As a test of replication, encrypted binlogs were sent from an encrypted master to an unencrypted slave. The master had encrypted binlogs, but the slave had no encryption plugin, configs, or keys.

Nothing special in the replication setup, and replication did not break. No issues were detected in multi-master replication with replication filtering. Also, if the slave is set up for encryption, the encryption key in use on the slave does not need to be identical to that of the key that is in use on the master.

Of special note for security, while the master’s binlogs were encrypted, the slave’s relay logs were not. Change statements on an unencrypted slave are easily viewed at the file level or using mysqlbinlog on the relay logs. Watch those user permissions! Relay logs on the slave can be encrypted using the ‘encrypt-binlog’ setting on the slave having the plugin installed.

  1. Binlog encryption prevents viewing change statements in raw format or via mysqlbinlog.
  2. Replication from an encrypted master to a nonencrypted slave works. Note that the relay logs on the nonencrypted slave make the change statements visible.
  3. Encrypting a table prevents copying the tablespace to another server.
  4. Once implemented, the steps to unencrypt multiple tables in your schema require careful planning. It is recommended you test this feature carefully before implementing in production.
Categories: DBA Blogs

Log Buffer #461: A Carnival of the Vanities for DBAs

Thu, 2016-02-18 14:08

This Log Buffer browses through Oracle, SQL Server and MySQL spaces and brings forth some of the useful blog posts for this week.


Conner throws it out of the park. As we all (hopefully) know, we should always deploy named program units (procedures/functions) within packages. Its a great method of encapsulation of logic.

Pythian’s Gleb talks about Azure on RAC. Microsoft Azure provides an acceptable and affordable platform for a training environment.

There are some performance improvements that require physical storage options to be set on tables or indexes. One particular technique that I will take as an example for this article is index compression.

To call Oracle Reports from Oracle Forms application, the most secure approach is to use the RUN_REPORT_OBJECT built-in.

Introducing Asset Physical Inventory with PeopleSoft Asset Tracking

SQL Server:

Amazon Web Services (AWS) : Relational Database Services (RDS) for SQL Server

SSIS: Case sensitivity may expose issues with change deployment

Foreign Key Indexes and Binding

In SQL Server 2016 we have new compression functions to compress and uncompress specific data.

Database Continuous Integration


VividCortex is all about optimizing your queries. Many of our users asked us to analyse their queries, searching for common errors and mistakes.

RocksDB vs the world for loading Linkbench tables

Measuring Docker IO overhead

MariaDB 5.5.48 and Connector/J 1.3.5 now available

EXPLAIN FORMAT=JSON: buffer_result is not hidden!

Categories: DBA Blogs

The Little Big Company With The Huge Heart

Wed, 2016-02-17 10:26

February is a drag. Maybe like me you have already experienced profound failure on following-up with New Year’s Resolutions. Instead of “taking yoga” I ended up “severely injuring my back”. Ah, c’est la vie. Sometimes it’s harder than other times to be positive, and when it’s hard to be positive it’s even harder to be grateful (another one of my New Year’s Resolutions). The good news (or bad news if you’re making excuses for your negativity) is you can practice gratitude from wherever you are, in whichever state of mind or body you’re in.

Pythian exists because of the people who work here. This isn’t a commentary on the Company’s success, the Company’s Brand or Culture, but rather the actual employees who make this place a great place to be, by doing one thing: practicing kindness.

This building is full of people who are going out of their way on a daily basis to:

a) be generous with their time (and their baking)

b) offer support to a colleague in need

c) tell a joke, take a joke, share a laugh

d) provide you with directions when you inevitably become confused and disoriented on the 5th or 6th floor.

Thankfully I have the life and work experience to know that this is not the norm. Something special is happening here. People are supporting each other, providing personal and professional insights, and I must say that while technical genius is the grease that allows the wheels to spin, there is a ton of emotional intelligence being exchanged by a ton of people, all the time.

If you wouldn’t characterize your experience here in the same light I encourage you to wander around. Get lost looking for finance and smile with the Accountant who keeps chocolate in his desk. Tell someone in L&D about a new hobby you’ve taken up and she might just tell you she’s proud of you. Stumble into the IT room and ask an inane question about your new cell phone that you can barely turn on, it will be met with patience. Tell our CRO a joke, the driest you’ve got, or a dad-joke, he’ll love it.

Talk to our CEO about heavy metal, art, poetry, books (highly recommended points of interest if like me you started here not ever having heard of “Unix”). Ask our VP of Transformation where the printer is, seriously he told me three times and I still don’t know so please also let me know when you know. Tell one of the DBAs that you’re learning French and then avoid them for the next month or so because you’re too nervous to say anything beyond “je ne comprends pas”.

I’ve been given hugs, have given out hugs, one of my colleagues actually made me lunch a few times when I was moving places and my life was in disarray. There are a lot of smiles here and they are contagious.

Happy to be here with all of you. Thanks for being kind.

Categories: DBA Blogs

How to build a Multi-domain cluster with AlwaysOn Availability Groups – SQL Server 2016

Wed, 2016-02-17 09:30

SQL Server 2016 is making lots of news, and promises to be a revolutionary SQL Server version. In talking about AlwaysOn Availability Groups, a very good number of improvements were already announced as part of the Basic Availability Groups. By the way, Hemantgiri, my friend and former Data Platform MVP, already talked about this on his post.

One of the improvements that got my attention was the ability to set up the AlwaysOn Availability Groups sit in a multi-domain cluster, or even a domain-detached cluster. I tested both and indeed this works very well, but the maintenance is slightly different (read: more difficult).

We need to consider the following points:

  • The cluster manager is not supporting this, everything should be done by using PowerShell.
  • To perform the setup, you need to be connected as a local Administrator.
    • The “Administrator” user password should be the same on all involved nodes.
  • For some tasks, like SQL Server upgrade/update, you will need to run the setup using the local administrator account.
    • You don’t need use the Administrator account to log in. Use your regular account, select the Run as different user menu item, and press the SHIFT key while you right-click the file.
  • You must be using Windows Server 2016 and SQL Server 2016.

The new capability is a group effort between SQL Server and the Windows development team and is, in my opinion, a very good option. I see the multi-domain capability as a better option than a detached-domain, and I already have customers who can benefit very well from this option.

The following approach is the same; it doesn’t matter if you are making a multi-domain setup or domain-detached cluster; however, there are some particular details:

For Domain-detached cluster

In this particular case, the Failover Cluster is created without any associated computer objects, and for this reason, each involved node needs to have a defined primary DNS suffix.


For Multi-domain cluster

Make sure that you can reach all the nodes, using IP, server name, and FQDN. Take care with firewall rules as this may influence the cluster creation. Make sure that the trust relationship between the domains is well defined.


Creating the cluster

In my case, I created a multi-domain cluster, which was also a multi subnet one. The following command is the base to create the cluster:

New-Cluster –Name <Cluster Name> -Node <Nodes to Cluster> -StaticAddress <Cluster IP> -AdministrativeAccessPoint DNS

The catch here is the -AdministrativeAccessPoint DNS. Check the PowerShell New-Cluster command for more options.

To perform this command, open the PowerShell console as Administrator (you can either log in as local Administrator or open the console using the method that I explained earlier in this article), and execute the command. That’s it!

In my test laboratory, I created a cluster containing three nodes:

  • SQLCAFE.local (
  • LISBON.local (
  • SQLCAFE.local (

I executed the following command:

New-Cluster –Name W2016CLT01 -Node W2016SRV06.SQLCAFE.local , W2016SRV07.LISBON.local , W2016SRV08.SQLCAFE.local  -StaticAddress, -NoStorage -AdministrativeAccessPoint Dns


After the cluster is created, use PowerShell to perform all the administrative tasks. The following commands are useful:

  • Get-Cluster
  • Get-ClusterNode
  • Get-ClusterGroup
  • Get-ClusterGroup <Group Name> | Get-ClusterResource


Creating an Availability Group

The process of creating the availability group is the same as mentioned previously. Actually, we are lucky, because the SQL Server Management Studio has all we need to manage the AG, without the need to touch PowerShell. For basic management, SSMS is enough.

After the cluster is created, you can go ahead and enable the AlwaysOn Availability Groups feature. To perform this, just go the SLQ Server Configuration manager and access the SQL Server engine properties. In the AlwaysOn High Availability tab, select the Enable AlwaysOn Availability Groups check box. Click OK and restart the SQL Server engine.


After enabling the feature, just follow the usual procedure and use the wizard in the SSMS, T-SQL, or PowerShell, to create the Availability Group.


More Details

As this is a complex configuration, a careful DNS and Quorum configuration is recommended to ensure the highest service availability. As a result, a DNS replication might be needed. For the quorum, the new Cloud Witness (supported from Windows Server 2016) is recommended.



When installed on Windows Server 2016, more flexibility is given to SQL Server 2016, including DTC support for AG, and Cloud Witness. Multi-domain and domain-detached cluster increases the architecture options when using SQL Server. This is just one of the improvements coming with SQL Server 2016. Stay tuned for more news!


Categories: DBA Blogs

Pythian-led Events: Embracing Disruptive Technologies

Tue, 2016-02-16 14:24

Today’s senior IT professionals are expected to be experts in everything from emerging technologies, to new regulations, and business innovation. Keeping up with the latest advances in technology, and understanding how they impact your business can be challenging.

The impact of today’s disruptive technologies on how we do business, led Pythian to introduce Velocity of Innovation, a series of events for senior IT professionals. These are Pythian’s exclusive thought-leadership events where a moderator-led panel engages in lively conversations around today’s disruptive technologies.

From our Velocity of Innovation series, to CIO summits and CIO forums, Pythian is leading the conversation around technologies that can transform business. Pythian focuses on creating positive outcomes for our clients, such as better, faster software development; high-performing, cost-effective systems; data-driven insights; and the adoption of disruptive technologies for competitive advantage.

Pythian will continue to lead conversations around disruptive technologies, bringing our vendor agnostic approach to a range of IT challenges. On March 16,  Pythian will be participating in the Global Business Intelligence CIO UK Event .

This event brings senior IT professionals together for one day to focus on topics ranging from digital transformation to cyber security.


If you are interested in attending an upcoming Velocity of Innovation event in a city near you, please contact

Categories: DBA Blogs

Step-by-Step Guide to January 2016 PSU Patch Apply on 12c Grid and RDBMS Homes in Linux

Fri, 2016-02-12 07:52

Following step by step action plan is for single instance database stored on ASM in on Linux (OEL 6 64 bit in this case.)

StepDescriptionETA1Update the OPATCH utility:


For Database home:


$ unzip -d /u01/app/oracle/product/12.1.0/db_1

$ /u01/app/oracle/product/12.1.0/db_1/OPatch/opatch version


For Grid home:


$ unzip -d /u01/app/oracle/

$ /u01/app/oracle/ version15 min2Create ocm.rsp file:


Note: Press Enter/Return key and don’t provide any input and say Yes.


$ export ORACLE_HOME=/u01/app/oracle/

$ $ORACLE_HOME/OPatch/ocm/bin/emocmrsp -no_banner -output /stage/ocm.rsp5 min3Validation of Oracle Inventory


Before beginning patch application, check the consistency of inventory information for GI home and each database home to be patched. Run the following command as respective Oracle home owner to check the consistency.


For database home:


$ /u01/app/oracle/product/12.1.0/db_1/OPatch/opatch lsinventory -detail -oh /u01/app/oracle/product/12.1.0/db_1


For Grid home:


$ /u01/app/oracle/ lsinventory -detail -oh /u01/app/oracle/


If this command succeeds, it lists the Oracle components that are installed in the home. Save the output so you have the status prior to the patch apply.5 min4Stage the Patch:


$ mkdir /stage/PSUpatch

$ cp /stage/ /stage/PSUpatch


Check that the directory is empty.

$ cd /stage/PSUpatch

$ ls


Unzip the patch as grid home owner.


$ unzip p22191349_121020_<platform>.zip5 min5One-off Patch Conflict Detection and Resolution:


Run it with root user:


/u01/app/oracle/ apply /stage/PSUpatch/22191349 -analyze -ocmrf /stage/ocm.rsp


It will ask to rollback identical patches like this:


Analyzing patch(es) on “/u01/app/oracle/” …

Patch “/stage/PSUpatch/22191349/21436941” is already installed on “/u01/app/oracle/”. Please rollback the existing identical patch first.

Patch “/stage/PSUpatch/22191349/21948341” is already installed on “/u01/app/oracle/”. Please rollback the existing identical patch first.

Patch “/stage/PSUpatch/22191349/21948344” is already installed on “/u01/app/oracle/”. Please rollback the existing identical patch first.

Patch “/stage/PSUpatch/22191349/21948354” is already installed on “/u01/app/oracle/”. Please rollback the existing identical patch first.


So first rollback above 4 patches by going to their directory and issuing with grid owner from grid home:


opatch rollback -id 21948354 -local -oh /u01/app/oracle/ (Repeat for all 4 patches)


Note: In some cases, weirdly, I had to shutdown the has services with root user before patch rollback by using:


/u01/app/oracle/ stop has -f


After this again run:


/u01/app/oracle/ apply /stage/PSUpatch/22191349 -analyze -ocmrf /stage/ocm.rsp


If analyze command fail then use this with root user:


$ORA_GI_HOME/crs/install/ –postpatch


It will start the has services too.


Then again run the analyze command as given above:


It will show something like:


Analyzing patch(es) on “/u01/app/oracle/” …

Patch “/stage/PSUpatch/22191349/21436941” successfully analyzed on “/u01/app/oracle/” for apply.

Patch “/stage/PSUpatch/22191349/21948341” successfully analyzed on “/u01/app/oracle/” for apply.

Patch “/stage/PSUpatch/22191349/21948344” successfully analyzed on “/u01/app/oracle/” for apply.

Patch “/stage/PSUpatch/22191349/21948354” successfully analyzed on “/u01/app/oracle/” for apply.


Now you are good to apply the patch. Proceed to next step.




 10 min6Apply the Patch: (Note: This should apply patch in both GI and RDBMS Home but its unreliable in that sense so after this completes, we need to check opatch lsinventory to make sure that it also applied patches in RDBMS Home)


As root user, execute the following command:


# /u01/app/oracle/ apply /stage/PSUpatch/22191349 -ocmrf /stage/ocm.rsp


In case if it doesn’t apply in RDBMS Home, then run:


/u01/app/oracle/product/12.1.0/db_1/OPatch/opatchauto apply /stage/PSUpatch/22191349 -oh /u01/app/oracle/product/12.1.0/db_1 -ocmrf /stage/ocm.rsp


Make sure the above applies both OCW and PSU patches. You can verify that from opatch lsinventory. If only OCW patch is present in output and no PSU (which is likely the case), then issue following from Oracle home with oracle database owner after shutting down database:


/u01/app/oracle/product/12.1.0/db_1/OPatch/opatch apply -oh /u01/app/oracle/product/12.1.0/db_1 -local /stage/PSUpatch/22191349/2194835460 min7Loading Modified SQL Files into the Database:


% sqlplus /nolog

SQL> Connect / as sysdba

SQL> startup

SQL> quit

% cd $ORACLE_HOME/OPatch

% ./datapatch -verbose60 min8Check for the list of patches applied to the database.


SQL> select action_time, patch_id, patch_uid, version, status, bundle_series, description from dba_registry_sqlpatch;5 min

Categories: DBA Blogs