Skip navigation.

Pythian Group

Syndicate content
Love Your Data
Updated: 4 hours 18 min ago

Securing Oracle Monitoring and Backup Scripts

Fri, 2016-01-15 14:41

Almost every DBA writes and uses various custom scripts to monitor and backup their Oracle databases. However, finding the optimal and most secure way to connect to the database is often not prioritized.

The short summary is that having your script put the username/password (credentials) in any sort of variables or command arguments is the “bad way”. Using an “Oracle External Password Store” (SEPS) or Oracle Wallet is the “better way”. Yet this technology which has been around since Oracle 10gR2 and which does not require the Advanced Security Option is often not used.


Common Mistakes

Many DBAs will store credentials in their actual scripts. Sometimes obfuscating the actual password through some custom mechanism or script. For example, the following is a simplified version of an RMAN backup script found at a client site:

export DB_USER=backup_user
export DB_PASS=`~/.secure_pwd_extractor`

$ORACLE_HOME/bin/rman << EOF
   connect target $DB_USER/$DB_PASS
   shutdown immediate
   startup mount
   backup database;
   alter database open;


The client thought that it was somewhat secure as the actual password wasn’t used as a command line argument to RMAN and was stored in a Linux “hidden file” (starts with a period), which was protected by properly setting OS permissions. However, it dynamically extracted the password from a plain text file (based on the DB_USER environment variable). Another key problem was the fact that the environment variable was exported and hence was part of the environmental profile under which the database was started.

The exported environment variables in this case can be a little bit of a security risk in a couple of ways:

First of all, the complete operating environment including the exported environment variables under which the database is running are recorded by the listener when service is registered. Hence, they are visible in a listener “services” command with “set displaymode verbose“:


Secondly, they may be recorded in OS process files. For example, the pmon process’ operating environment or even the RMAN process’ while running:



But most significantly the credentials can be extracted by anyone with access to the script file and/or the underlying credentials file.


A Better Approach

A better way to store database credentials for monitoring or backup scripts is to use a “Secure External Password Store” (SEPS) which relies on having the necessary credentials securely stored in an Oracle Wallet file.

Typically a DBA might create their own plain text password listing file for use by scripts and batch jobs. Usually with three columns: 1) DB_NAME; 2) DB_USERNAME; 3) DB_PASSWORD. The Oracle Wallet is structured exactly the same way except:

  1. The file is protected with 3DES encryption.
  2. The DB_NAME is really an OracleNET Service Name meaning you can have multiple aliases for the same database.
  3. The passwords are never exposed.
  4. A separate Wallet password is required to manipulate the file’s contents.
  5. Control on whether the Oracle Wallet file is tied to the local machine or whether it can be copied to and used on other machines.


The advantages of this approach include:

  • No clear text password in any scripts or files.
  • No possible exposure of passwords by the listener or process operating environments.
  • Control on whether the Oracle Wallet file can be copied and used on another machine.


The last point is actually a complex one. A Wallet can be created as an “auto_login” wallet (done by default). To secure it to only work on the local server, it can be changed to “auto_login_local“. However, there are various issues, limitations, and 12c bugs with the additional functionality that Oracle provides. A separate article goes into this in detail.



Setting up a “Secure External Password Store” and Oracle Wallet is actually quite quick and easy:

1) Adjust the sqlnet.ora file to point to an Oracle Wallet location. For example, add the following to the sqlnet.ora file (assuming that the specified directory exists):

   (SOURCE =
     (METHOD = FILE)
       (DIRECTORY = /u01/app/oracle/wallet)


2) Create the Oracle Wallet files and add a credential. Two files will actually be created in the specified directory:

  • ewallet.p12 – the actual password protected Wallet file.
  • cwallet.sso – an additional file for auto-login credentials.


This can be done as either two separate commands or all in a single command:

$ mkstore -wrl "/u01/app/oracle/wallet" -create
Oracle Secret Store Tool : Version
Copyright (c) 2004, 2014, Oracle and/or its affiliates. All rights reserved.

Enter password:
Enter password again:

$ mkstore -wrl "/u01/app/oracle/wallet" -createCredential ORCL scott
Oracle Secret Store Tool : Version
Copyright (c) 2004, 2014, Oracle and/or its affiliates. All rights reserved.

Your secret/Password is missing in the command line
Enter your secret/Password:
Re-enter your secret/Password:
Enter wallet password:
Create credential


Or as a single command:

$ mkstore -wrl "/u01/app/oracle/wallet" -create -createCredential ORCL scott
Oracle Secret Store Tool : Version
Copyright (c) 2004, 2014, Oracle and/or its affiliates. All rights reserved.

Enter password:
Enter password again:
Your secret/Password is missing in the command line
Enter your secret/Password:
Re-enter your secret/Password:
Create credential


Notice that the Wallet is secured by a password. And then the SCOTT credentials are stored within the Wallet. The Wallet password is required to manipulate contents – not for scripts to access the stored credentials.

The first parameter after the “-createCredential” argument is an OracleNET Service Name. Just like with any database connection, here we can specify an OracleNET Service Name (from the tnsnames.ora file), or a full connection string, or an EZconnect string.

Hence, we could add a second and third connection to the same database as:

$ mkstore -wrl "/u01/app/oracle/wallet" -createCredential localhost:1521/ORCL monitoring_user
Oracle Secret Store Tool : Version
Copyright (c) 2004, 2014, Oracle and/or its affiliates. All rights reserved.

Your secret/Password is missing in the command line
Enter your secret/Password:
Re-enter your secret/Password:
Enter wallet password:
Create credential

$ mkstore -wrl "/u01/app/oracle/wallet" -createCredential batch_reporting
Oracle Secret Store Tool : Version
Copyright (c) 2004, 2014, Oracle and/or its affiliates. All rights reserved.

Your secret/Password is missing in the command line
Enter your secret/Password:
Re-enter your secret/Password:
Enter wallet password:
Create credential



And to list the contents of the Oracle Wallet:

$ mkstore -wrl "/u01/app/oracle/wallet" -listCredential
Oracle Secret Store Tool : Version
Copyright (c) 2004, 2014, Oracle and/or its affiliates. All rights reserved.

Enter wallet password:
List credential (index: connect_string username)
3: batch_reporting
2: localhost:1521/ORCL monitoring_user
1: ORCL scott


Now any of the three can be used (from the same OS account: “oracle”) depending on which OracleNET Service Name is referenced:

$ sqlplus /@ORCL

SQL*Plus: Release Production on Wed Jan 13 08:59:12 2016

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

Last Successful login time: Wed Jan 13 2016 08:56:56 -07:00

Connected to:
Oracle Database 12c Enterprise Edition Release - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options

SQL> show user
$ sqlplus /@localhost:1521/ORCL

SQL*Plus: Release Production on Wed Jan 13 08:59:41 2016

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

Last Successful login time: Wed Jan 13 2016 08:57:25 -07:00

Connected to:
Oracle Database 12c Enterprise Edition Release - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options

SQL> show user

$ sqlplus /@

SQL*Plus: Release Production on Wed Jan 13 09:00:14 2016

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

Last Successful login time: Wed Jan 13 2016 08:43:44 -07:00

Connected to:
Oracle Database 12c Enterprise Edition Release - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options

SQL> show user
$ sqlplus / as sysdba

SQL*Plus: Release Production on Wed Jan 13 09:01:12 2016

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

Connected to:
Oracle Database 12c Enterprise Edition Release - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options

SQL> show user


However, one might challenge the fact that the Oracle Wallet file itself that must be secured just as a plain text password file would need to be and that the risks of either being obtained by a “bad actor” are the same. That being said, there are still some benefits of the SEPS method:

  • Passwords are never maintained in plain text in scripts or (hidden) password files.
  • No risk of password exposure at the operating system process list.
  • No risk of password exposure in operating system process environmental files.
  • No risk of exposure from any bad actor with access to the script.
  • No risk of password exposure in OS backups maintained by sysadmins or backup vendors (though the files themselves may still be usable).

Regardless, using the SEPS and Oracle Wallet shouldn’t make anything less secure. One could argue that security and risk is equal, but definitely not worse.

However they are a few operational disadvantages:

  • The OracleNET Service Name forms the primary key of the entries in the Wallet and hence must be unique. So if another user credential is required for the same DB, an OracleNET alias will be required (as trying to add another user using the same OracleNET Service Name will generate the “Credential already exists” error based on the OracleNET Service Name not being unique).
  • Doesn’t work after connecting to SQLPLUS using the “CONNECT” command.
  • Scripts are now dependent on listener availability for establishing a TNS connection instead of a BEQ. Though a workaround may be to connect directly to a DISPATCHER port.


Advantages Over OS Authentication

Using a SEPS and an Oracle Wallet may seem functionally similar to just using “OS Authentication”. However it does pose a few differences or advantages.

Most DBAs operate and implement scripts under the “Oracle software owner” account which is typically called “oracle” on Unix or Linux systems. And hence most are able to connect to the database using a SYSDBA connection. So one solution would be to use a dedicated OS user account specifically for monitoring or database backup scripts. And then rely on OS authentication for database connections. However this is often not done. And if running scripts from a remote server or centralized monitoring server then the REMOTE_OS_AUTHENT=TRUE parameter would have to be set which poses other security risks.

Instead, using a SEPS allows for non-OS authenticated connections to a dedicated and minimally privileged database account even from the software owner (Oracle) account locally, or from any other account remotely.


Other Questions

Q: Do I need to re-create my Wallet file as part of a 12c upgrade?
A: NO.

Q: Do I need to backup the Wallet files?
A: Absolutely. Just back them up as you would other Oracle environmental files, such as the sqlnet.ora. Otherwise they’ll need to be re-created.

Q: Is SEPS compatible with “Proxy Authenticated Connections”?
A: YES. See Proxy Authenticated Database connections

Q: Can the Wallet be used with RMAN?
A: YES. It definitely can and should be used by RMAN scripts.

Q: Can the Wallet be used with JDBC thin or ODBC connections?
A: YES to both. See MOS documents 1441745.1 and 1430666.1 for further details.

Q: Are SEPS and Wallet connections compatible with the CDB architecture and PDBs?
A: Of course. PDBs connect through Oracle Services and OracleNET Service Names.

Q: Can we tell from within the database whether a connected session referenced an Oracle Wallet and used SEPS?
A: NO. There doesn’t seem to be any indication from within the database. As far as the DB is concerned, it just seems to be a password authenticated connection. A SQLNET trace shows that the Wallet files are accessed but doesn’t transpose that information into any DB metric or SYS_CONTEXT USERENV data.


Preventing the Secure Password Store Wallet File from Being Moved to Another Host

Oracle suggests that we can add additional security by tying the Oracle Wallet file to a specific host. See MOS document 1114599.1. However, this poses some restrictions and bugs with specifically. As this discussion is complex, this follow-up article has been created.

A good workaround (and a key point from the follow-up article) is to simply use an EZconnect connection in the Oracle Wallet file and to specify localhost or


Summarizing Best Practices
  • Continue to properly secure the directory and file permissions of the Wallet files as you would a plain text password file. Further, why not make the entire wallet directory a hidden directory starting with a period. Remember that If the wallet file can be copied to another server then potentially the credentials within it can continue to be used. See the follow-up article for suggested techniques for securing access further.
  • Have scripts connect to the database using dedicated purpose based user accounts with minimal privileges (don’t use SYS or SYSTEM).
  • Use OracleNET Service Name aliases (not duplicate entries, but aliases to an existing entry) in the tnsnames.ora file to allow multiple credentials for the same database.
  • For additional security add Wallet entries based on EZconnect strings using localhost or instead of relying on OracleNET Service Names from the tnsnames.ora file.


Discover more about our expertise in the world of Oracle.

Categories: DBA Blogs

Log Buffer #457: A Carnival of the Vanities for DBAs

Fri, 2016-01-15 14:19

This Log Buffer Edition begins with some great blog posts from Oracle, goes through SQL Server and then ends with MySQL.


  • Ruby-oci8 is a ruby interface for an Oracle Database.
  • Another python graph – one wait event.
  • This article compares FBL and HDL – two of the commonly used data loading tools in Fusion HCM to highlight key differences and similarities.
  • Better Data Modeling: Customizing Oracle Sql Developer Data Modeler (#SQLDevModeler) to Support Custom Data Types.
  • Sample code: Oracle Grid Infrastructure action script for Windows.

SQL Server:

  • Being a database administrator can be very challenging at times when you have to troubleshoot performance issues.
  • Another Reason to Use NOEXPAND hints in Enterprise Edition.
  • Error: Microsoft .NET framework 3.5 service pack 1 is Required.
  • Removing Duplicates from Strings in SQL Server.
  • .NET Core is more interesting than the name might suggest. Whereas the .NET framework provides a consistent runtime for all the applications on a machine.


  • OpenSSH CVE-2016-0777: Details and Mitigation.
  • MySQL Group Replication for MySQL 5.7.10.
  • MySQL 5.7 auto-generated root password.
  • MySQL Support People – Those Who Were There First.
  • Planning the defaults for MySQL 5.8.


Learn more about Pythian’s expertise in Oracle SQL Server & MySQL.

Categories: DBA Blogs

Configure High Availability – Load Balancing for Hiveserver2

Tue, 2016-01-12 15:21
We are noticing a steady adoption of Hive adoption among our customers. Due to the increased workload we want to make sure that performance and availability of Hive is not compromised. The following steps will ensure the smooth functioning of Hive under increased workloads.

Configuring High Availability for Hive requires the following components to be fail proof:

1. Hive Metastore underlying RDBMS
2. Zookeeper
3. Hive Metastore Server
4. Hiveserver2

For the sake of simplicity this blog will focus on enabling HA for the Hive Metastore Server and HiveServer2. We recommend that the underlying Hive Metastore underlying RDBMS be configured for High Availability and we have configured multiple Zookeeper instances on the current cluster.

Enabling High Availability for Hive Metastore Server1. Log on to Cloudera Manager2. Click on HIVE > Hive Metastore Server. Locate the host for the Hive Metastore Server.


3. SSH to Hive Metastore Server.# vi /etc/hive/conf.cloudera.hive/hive-site.xmlExpected Output below.


4. On the Cloudera Manager Console click Hive > Configuration

Select Scope > Hive Metastore Server.
Select Category > Advanced.
Locate the Hive Metastore Delegation Token Store property.
Choose org.apache.hadoop.hive.thrift.DBTokenStore
Click Save Changes.


5. On the Cloudera Manager Console click Hive > Instances. Click on Add Role Instances.

Click on Select Hosts for Hive Metastore Server.


6. Choose multiple Hosts (at least 2 more to make a total of 3) to configure Hive Metastore Server on.

Click OK and Continue.


7. Click Finish. You should now see new hosts added as the Hive Metastore Server.Click on Restart the service (or the instance) for the changes to take effect.


8. Notice that hive.metastore.uris now has multiple instances of Hive Metastore Server.

Click on Restart Stale Service.


9. Click Restart Now.


10. Review Restart Messages.


11. Notice that you now have multiple instances of Hive Metastore Server.


12. SSH again to Hive Metastore Server.# vi /etc/hive/conf.cloudera.hive/hive-site.xmlExpected Output below. Note that new instances have been added.


So how do you know the settings are working? The following is the recommended plan for testing the High Availability of Hive MetaStore.1. SSH to any DataNode. Connect to Hiveserver2 using Beeline.

# beeline -u “jdbc:hive2://ip-10-7-176-204.ec2.internal:10000”


2. On the Cloudera Manager Console click Hive > Hive MetaStore Server.Stop the first Hive MetaStore Server in the list.


Issue “Show databases” command in the beeline shell of step 1. The command should work normally.

3. Stop the second Hive Metastore Server in the list. Issue Show databases command in the beeline shell of step 1.The command should still work normally.

4. Stop the third Hive Metastore Server in the list. Issue Show databases command in the beeline shell of step 1.This command should fail which is normal.


Expected Output from beeline below.


5. Now start a random Hive Metastore Server in the list. Issue Show databases command in the beeline shell of step 1.This command should start working normally again.

6. After testing it completed make sure you start all Hive Metastore Servers in the list.

Enabling Load Balancing and High Availability for Hiveserver2To provide high availability and load balancing for HiveServer2, Hive provides a function called dynamic service discovery where multiple HiveServer2 instances can register themselves with Zookeeper. Instead of connecting to a specific HiveServer2 directly, clients connect to Zookeeper which returns a randomly selected registered HiveServer2 instance.

1. Log on to Cloudera Manager.Click Hive > Instances. Click on Add Role Instances.

Click on Select Hosts for HiveServer2.


2. Choose multiple Hosts (at least 2 more to make a total of 3) to configure HiveServer2 on.

Click OK and Continue.


3. You should now see new hosts added as HiveServer2.

Choose the newly added instances and Choose Start.


4. Click on Close. The newly added HiveServer2 instances are now ready for use.


5. Open Hive -> Configuration -> Category -> Advanced.Find “HiveServer2 Advanced Configuration Snippet (Safety Valve) for hive-site.xml”
Add a new property as below:

Value: true


6. Go to the Cloudera Manager Home Page and Restart Hive Service.


7. You should now have multiple instances of HiveServer2.


So how do you know the settings are working? Following is the recommended plan for testing the Load Balancing for Hiveserver2.

1. As mentioned before HiveServer2 High Availability is managed through Zookeeper.


The clients connecting to HiveServer2 now go through Zookeeper. An example, JDBC connect string is as follows. Notice that the JDBC now points to a list of nodes that have Zookeeper on them.

beeline -u “jdbc:hive2://ip-10-7-176-204.ec2.internal:2181,ip-10-229-16-131.ec2.internal:2181,ip-10-179-159-209.ec2.internal:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2”

2. SSH to any data node. Connect to Hiveserver2 using Beeline.# beeline -u “jdbc:hive2://ip-10-7-176-204.ec2.internal:2181,ip-10-229-16-131.ec2.internal:2181,ip-10-179-159-209.ec2.internal:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2”


3. The connection gets routed to the HiveServer2 instances in a round robin fashion.

Issue the following command on the HiveServer2 nodes.

# tail -f /var/log/hive/hadoop-cmf-hive-HIVESERVER2-ip-10-7-176-204.ec2.internal.log.out

Issue the following command on the HiveServer2 nodes.


4. You may issue the beeline command from multiple sources and monitor the HiveServer2 logs.


So how do you know the settings are working? Following is the recommended plan for testing the High Availability for Hiveserver2.

1. On the Cloudera Manager Console click Hive > HiveServer2.Stop the first HiveServer2 in the list


Connection to Beeline using command below should work normally.

# beeline -u “jdbc:hive2://ip-10-7-176-204.ec2.internal:2181,ip-10-229-16-131.ec2.internal:2181,ip-10-179-159-209.ec2.internal:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2”

2. Stop the second HiveServer2 in the list


Connection to Beeline using command below should still work normally.

3. Stop the third HiveServer2 in the list.


Connection to Beeline using command below should fail.


4. Start the third HiveServer2 in the list.


Connection to Beeline using command below should work normally again.

5. After the testing completes make sure you start all HiveServer2 in the list.


Discover more about our expertise in Big Data and Hadoop.

Categories: DBA Blogs

Log Buffer #456: A Carnival of the Vanities for DBAs

Tue, 2016-01-12 07:38

This Log Buffer Edition covers many aspects discussed this week in the realms of Oracle, SQL Server and MySQL.


  • Oracle and Informatica have a very close working relationship and one of the recent results of this collaboration is the joint project done by Informatica and our Oracle ISV Engineering team to test the performance of Informatica software with Oracle Database 12c In-memory on Oracle SPARC systems.
  • The only thing you can do easily is be wrong, and that’s hardly worth the effort.
  • Enterprise Manager 13c: What’s New in Database Lifecycle Management.
  • SnoopEE is a very interesting grass roots open source Java EE ecosystem project. Akin to NetFlixOSS Eureka it enables microservices discovery, lookup and registration.
  • Docker is an open source container technology that became immensely popular in 2014. Docker itself is written in Google’s programming language “Go” and supported on all major Linux distributions (RedHat, CentOS, Oracle Linux, Ubuntu etc.).

SQL Server:

  • This blog helps you understand Graphical Execution Plans in SQL Server.
  • New DAX functions in SQL Server 2016.
  • JSON support in SQL Server 2016.
  • PowerShell Tool Time: Building Help.
  • Datetime vs. Datetime2.


  • Peter Gulutzan discusses SQL qualified names.
  • It is not new that we can store a JSON content in a normal table text field. This has always been the case in the past. But two key features were missing: filtering based on JSON content attributes and indexing of the JSON content.
  • MySQL 5.7 Multi-Source Replication – Automatically Combining Data From Multiple Databases Into One.
  • OOM killer vs. MySQL 5.7.10 epic results with over-allocating memory.
  • Apache Spark with Air ontime performance data.


Learn more about Pythian’s expertise in Oracle SQL Server & MySQL.

Categories: DBA Blogs

Pythian – Through the Eyes of a CO-OP Student

Mon, 2016-01-11 15:32


I worked at Pythian as a Global Talent Acquisition Coordinator (co-op student) within the company’s human resources function in Ottawa, Canada. At the core of my role, I provided direct support to my colleagues in guiding job applicants through our hiring process. I also had the opportunity to take on and create small projects for myself during my four months there.

Since our past technical CO-OPs have written fantastic blogs about their experiences at Pythian (be sure to read them!), how about I write this one about the business side of working there? Here are my top three reasons why any business student would want to work at Pythian:

1. What better place is there to develop cross-cultural literacy?

With Pythian, I had the pleasure of working with remote and international colleagues for the first time. Top that with actively communicating with a global pool of job applicants on a daily basis. Succeeding in this kind of environment definitely requires you to be cross-culturally literate, which means that you understand how cultural differences—both inside and outside an organization—affect a business’s day-to-day practices.

In business school, we are often reminded about the importance of considering the external environment when a firm goes global (CDSTEP, anyone?), so it was quite eye-opening to see how my experience at Pythian really validated my studies. For example, processes that are of no legal concern in Canada might present a huge obstacle when hiring abroad, and pieces of information that North Americans prefer to keep private are quite openly discussed in other cultures. Talking to candidates from around the world definitely taught me how to think more critically about my communication—not only in terms of what I say, but also how I say it.

2. It feels nice not to be just “the CO-OP student”.

Upon my first day, I immediately felt that I would not be looked at as simply an intern. My team greeted me with open arms (already knowing my name!) and repeatedly emphasized the value of having me on board throughout my term. Within days, I could already understand the significance of my tasks and how they contributed not only to the team, but also to the organization as a whole.

Another great thing about not being just “the CO-OP student” is empowerment. At Pythian, you are really treated like a colleague rather than a subordinate. I never worked for my boss, but I always worked with her. During my term, my team enthusiastically invited me to explore our work processes and offer ideas to make things better. It was pretty cool to see my thoughts being listened to and put into action, even after my departure!

3.There’s nothing more rewarding than stepping out of your comfort zone.

One of the things I hated doing most before working at Pythian was using the phone. If you run a quick Google search, you will find many studies showing that Millennials are not as comfortable with making phone calls as their predecessors were—and I could speak to that, 100 percent! Want me to order a pizza? I’ll do it through the restaurant’s online platform. Am I due for a dentist appointment? I’ll ignore the receptionist’s voicemails until she sends me an e-mail (although my telephonophobia might not be the only reason for that one!).

My colleagues helped me overcome this discomfort by having me conduct reference checks. As embarrassing as it might sound, I actually had to take the greatest leap of faith to get my fingers dialling for the first time. Although I certainly had a mini heart-attack whenever I was asked to help with a reference, I eventually eased into the task with time. While I might still shy away from the telephone now and then, I really do feel accomplished in getting more comfortable with using less texting and more talking!

All in all, my experience at Pythian has been nothing less than fantastic. It has truly been a pleasure to work with a diverse group of people from around the world, and I would be thrilled to see my future take me back there one day. If you’re looking to intern for a company with a global focus and an information-sharing, empowering culture, then you would definitely love to join Pythian!

Categories: DBA Blogs

Use Case for SQL Server Table Partitioning

Mon, 2016-01-11 15:31


Often we are asked by our clients about Table Partitioning, and specifically, which tables will be good candidates to be partitioned?
Here are some of the main use cases for Table Partitioning:

  1. You have the Enterprise Edition of SQL Server 2005 or higher.
  2. The table contains or will contain more than 5-6 million rows and growing fast, or its size is growing by around 1GB per month or more.
  3. The FULL backup is taking too long or the backup file is too large and older data is not being updated (i.e.: users can only update data from the last 3 months).
  4. Data needs to be archived or purged on a regular basis, potentially the current archiving or deletion of data is causing blocks or deadlocks to other processes.
  5. There is a NOT NULL date column or another NOT NULL sequential column that the table can be partitioned upon.
  6. Better if most queries are including the partitioned column in WHERE clauses (i.e: between the date range).

When partitioning a table, here are few things to look into:

  1. Create a file and a filegroup per partition (even if the files are created in the same place). This way, it is easy to backup (i.e.: FILEGROUP backup), maintain and archive/purge.
  2. The best way to partition a table is by a date column because data is usually archived or purged by a date. If you do not have such a column, you may want to consider adding a column that will contain the current date/time when the row is created. This column can contain the default GETDATE(). Using an ID or a calculated column may cause too many headaches.
    If the application returns an error when adding the date column, consider using a view on top of the underlying partitioned table.
  3. Partitioning requires maintenance:
    1. To add files, filegroups and partitions on a regular basis and in advance.
    2. Monitor data growth and potentially modify the partition architecture (i.e.: move from a monthly partition to a weekly or daily partition).
    3. Archiving/purging partitions on a regular basis. Consider using a SWITCH partition for quick archiving and purging.
    4. You can defragment indexes per partition.
    5. Remember that statistics cannot be updated by partition, however, you may want to consider FILTERED INDEXES or FILTERED STATISTICS to avoid updating statistics on the entire table, as well as improving performance in specific cases.
  4. Consider using MAXDOP <> 1 on the instance level or for specific queries that span multiple partitions in order to take advantage of parallelism. Configure parallelism with caution.

Additional links:

There is a lot of information around SQL Server Table Partitioning. Here are some of the useful links:

SQL Server Database Partitioning Myths and Truths
Table Partitioning in SQL Server – The Basics
How To Decide if You Should Use Table Partitioning
Query Processing Enhancements on Partitioned Tables and Indexes


Discover more about our expertise in SQL Server.

Categories: DBA Blogs

Oracle RAC Scalability: Slides from Oracle OpenWorld 2015

Fri, 2016-01-08 10:57

Finally getting some time to post the slides from Open World 2015. This is a presentation about Oracle RAC Scalability. The presentation talks about the type of challenges on how to scale an application horizontally via Oracle RAC. Generally speaking – it’s quite easy and mostly “works”. There are only a few very specific, but quite common things to address, and mostly about write-write contention.

Here are the slides:

Christo kutrovsky oracle rac solving common scalability problems from Christo Kutrovsky
Categories: DBA Blogs

How to Deploy a Cluster

Tue, 2016-01-05 12:15


In this blog post I will talk about how to deploy a cluster, the methods I tried and my solution to resolving the prerequisites problem.

I’m fairly new to the big data field. Learning about Hadoop, I kept hearing the term “clusters”, deploying a cluster, and installing some services on namenode, some on datanode and so on. I also heard about Cloudera manager which helps me to deploy services on my cluster, so I set up a VM and followed several tutorials including the Cloudera documentation to install cloudera manager. However, every time I reached the “cluster installation” step my installation failed. I later found out that there are several prerequisites for a Cloudera Manager Installation, which was the reason for the failure to install.


Deploy a Cluster

Though I discuss 3 other methods in detail, ultimately I recommend method 4, provided below.

Method 1:

I ran my very first MapReduce job on “labianchin/hadoop” Docker image. This was easy to use as it comes with Hadoop pre-installed. However it did not contain any other services. I manually installed Hive and Oozie, but soon ran into configuration issues.

Method 2:

A lot of people recommend using Cloudera’s quickstart-VM. Sadly my system did not meet the memory requirements, which led to me to look for other methods.

Method 3:  

Cloudera Manager installation on Google cloud. I will be discussing this method in detail below.

Method 4:

Cloudera Docker image. Released on December 1st, 2015. This is by far the quickest and easiest method to use Cloudera services. All services are pre-installed and pre-configured. I recommend the following by installing a single Google compute engine VM instance, as described below then install Docker, pull image and run the container.


Step-by-Step Guide to Cloudera Manager Installation on Google Cloud (Method 3)

I will be creating a cluster consisting of 4 nodes. 1 node will run Cloudera Manager and remaining 3 nodes will run services.

Create a Google compute engine VM instance:

  1. From the Developers console, under Resources, select ‘Compute Engine’.
  2. Click on new instance on top of the page.
  3. Enter an instance name.
  4. Select a time zone.
  5. Select machine type as ‘2vCPUs 13GB RAM n1-highmem-2’ with 80 GB disk and CentOS 6.7 image (Change disk space according to requirements).
  6. Create.
  7. When the instance is ready, select it and open in Edit Mode.
  8. Scroll down to ssh keys and enter your public key here. If you do not have a public key run the following commands or generate one using PuTTY:
    • ssh keygen -t rsa
    • cat ~/.ssh/
  9. Save.
  10. Create 3 clones of this instance.
  11. Start all 4 instances.
  12. Follow steps from “Repartition a root persistent disk” to expand your disk to allotted size [Repeat on all instances].


  • To allow nodes to SSH to each other, edit /etc/hosts file to include hosts from each node. Below is an example [Repeat on all Instances]:

View the code on Gist.

  • Change swappiness to minimum value without disabling it [Repeat on all instances]:

View the code on Gist.

  • Disable iptables:

View the code on Gist.

  • Disable redhat_transparent_hugepage:

View the code on Gist.

  • Install MySQL:

View the code on Gist.

  • Install Java:

View the code on Gist.

  • Download mysql-connector [Repeat on all instances]:

View the code on Gist.

  • Install Cloudera manger:

View the code on Gist.

  • Create databases and users:

View the code on Gist.

  • Update database name, user and password in ‘’:

View the code on Gist.

  • Start cloudera-server and observe logs until “Jetty Server started” appears. This may take a while:

View the code on Gist.

  • Access cloudera manager from the browser to complete installation:
    • Install PuTTY.
    • Open your private key file in PuTTY pageant. By default this should be located in C:/users/username/.ssh/filename.ppk . PageAnt icon will appear in the system tray.
    • Fill in external IP of VM instance (of node where cloudera server is running) as hostname in PuTTY.
    • From the column on right, go to SSH > tunnels.
    • Enter the internal IP of VM instance in the destination textbox with port 7180. E.g.
    • For ease of remembering ports, set the source port as 7180 (Same as destination port). You can choose to redirect to another port if 7180 is not available. 7180 is the default port for Cloudera manager.
    • Apply and Open the connection.
    • Open the browser and go to “localhost:7180”.
    • Proceed with cluster installation using Cloudera manager.

Cluster Installation:

  • Login as “admin” , “admin”.
  • Accept the Terms and Conditions. Then continue.
  • Select ‘Cloudera Express” or “Cloudera Enterprise Data Hub Edition Trial”. Then continue.
  • Search for your machine’s hostnames. e.g.


  • On the “Cluster Installation” page continue with all default settings.
  • On the “JDK Installation Options” page select “Install Oracle Java SE Development Kit (JDK)” and “Install Java Unlimited Strength Encryption Policy Files”. Continue.
  • Do not select “Single User Mode”. Continue
  • On “Provide SSH Login credentials” page, select Login to all hosts as ‘Another User’ with Authentication method ‘All hosts accept same private key’. Enter the username from SSH key that was added to GCE instance (This is the same user that logged in to PuTTY session). Select the private key file stored on your local machine. Continue without passphrase.
  • On the next page, the cluster installation may take a while. (NOTE: If you install Cloudera manager without following the prerequisites, installation will fail at this step).



  • Once the installation is complete, continue to install parcels and inspect hosts. Finally, continue to the Cluster Setup.


  • Select ‘Core Hadoop’ when asked to select CDH5 services.
  • When Assigning roles, I like to assign all ‘Cloudera Management Service’ roles to one node (in my case ‘cloudera-cm’) and distribute all other roles evenly on the remaining 3 nodes. Here is one possible assignment of roles:


  • On the Database Setup page, set all ‘Database Host Name’ fields to the node running the Cloudera-server. Enter Database name, Username and Passwords that were created in MySQL earlier.


  • Review the changes. Now the cluster will be setup and services deployed. This is the final step.


  • You are now ready to use services directly from console or access Hue on port 8888. Good Luck :)


P.S. I would like to thank Manoj Kukreja for showing me the right way of deploying clusters.


Discover more about our expertise in Big Data.

Categories: DBA Blogs

SQL Server 2016 – AlwaysOn Basic Availability Group

Tue, 2016-01-05 11:46

BAG2Recently I was playing with the SQL Server 2016 CTP 2 and noticed that there was a new feature introduced – Basic Availability Group (BAG) for AlwaysOn. This feature is supported in SQL Server 2016 Standard or higher edition. With this feature we can create a fail-over solution for a single database, that means, if you want to add one or more database(s) it is not permitted. While you are doing a setup using the Basic Availability Group you will notice that you do not see an option that says “Readable Secondary” to “Yes”; this is because of the fact that Basic Availability Group replaces the Database Mirroring (DBM). And that means the secondary database can not be used to off-load the reporting traffic unlike you do in the SQL Server 2012 Enterprise edition.

This made me wonder. Why had MS made an improvement in the High Availability feature by adding BAG when it was already there? I re-read the feature lists and I found my answer – Standard edition of SQL Server 2012 does not have support for AlwaysOn Availability groups. Thus MS has added a support in the SQL Server 2016, and, it now allows a Single Database per Basic Availability Group. This also means that you can create multiple Basic Availability Groups, as per requirement. I can say this, because at the time of writing this I could create at least two such groups. Though, at this moment, I am unaware of how many such groups can be created.

To conclude, let me list the features and limitations for the Basic Availability Group (BAG) that I can think of at this moment.


  1. Supports Standard Edition and Higher.
  2. You can create BAG for single database.
  3. You can create multiple (At least two, while I am writing this post) BAG.













  1. Can have only one database.
  2. Limited to Primary and Secondary, meaning two replica only.
  3. No readable Replica, that means NO to offloading reporting to a secondary replica,
  4. Can not off-load backups as well.














Discover more about our expertise in SQL Server.

Categories: DBA Blogs

Big Data Co-op Experience at Pythian

Thu, 2015-12-31 11:40


I joined Pythian as a Big Data co-op for duration of 4 months. At the time I joined, I had only heard of Big Data as a buzzword in the technical community. So when I was selected for the job I was quite nervous and at the same time very excited and eager to learn.

During my first 2 weeks, I attended BORG sessions with department heads to learn about all departments and how company is run. My favourite one was the session with CEO Paul Vallée where he talked about how he started the company in his basement and his vision for the future. I was repeatedly hearing “Pythian hires the top 5% talent in the industry”, which was intimidating thinking that I’m surrounded by brilliant people and I didn’t know much. However, that feeling didn’t last very long once I got to know my coworkers. Everyone at Pythian is very helpful and there are several mediums for knowledge transfer among employees. Pythian is constantly hiring so if you think you’re new and feel hesitant at first, remember there will be someone newer than you in 2 weeks or less.

Some of the exciting events I encountered during my work-term included the Pythian days, Quarterly all hands meeting, Pythian Future Leaders session for career growth, the holiday party and of course the headquarter move from St-Laurent to Westboro. I also enjoyed the weekly Pythianology sessions which are a great way to learn about best practices for dealing with data.

Being part of the Big Data team, I had to learn to work in a distributed environment. I worked from the Ottawa headquarters while my team members worked remotely from other cities in Canada and worldwide. This is undoubtedly one of the most important skills I learned during my time at Pythian.

Coming to an end of my work-term, I wouldn’t say I’m an expert in Big Data and ready to deal with customers just yet, but I sure learned the type of challenges that customers face. I like to tackle problems on my own, so I strived to get things running with help from Google, Stackoverflow and the Pythian blog before I asked team members for help. Every now and then I would hit a roadblock in which case my team members would guide me in the right direction. An important lesson I learned is that theoretical knowledge is just as important as practical knowledge. As a programmer, I had trained myself to learn by practice so I wanted to get started with my project as quickly as I could. In doing so I skipped over some of the reading I should have focused on. Due to this I struggled with errors that could have been avoided and saved me a lot of time. In my experience, dealing with Big Data technologies is very different from programming (at least, at first). Before you can get to the programming stage, you need to deal with A LOT OF configurations. I’m happy to say that I not only learned about technologies directly related to Big Data like hadoop, MapReduce, Hive, Oozie, Hue, CDH etc. but learned a lot more. In the last 4 months I have dealt with at least 4 flavours of Operating Systems including Windows 7, windows 10, Ubuntu and CentOs. I have worked with Virtual Machines running locally on my system and Virtual machines running on the cloud. I worked mostly on Google Cloud but at some point for a brief period of time explored Amazon Web Services and Microsoft Azure services. I also played around with docker images which I found to be pretty cool and plan to keep using them in future.

I confess that if I had explored Big Data technologies on my own time, I would have given up very early on, but being at Pythian allowed me to stay committed to achieving my goal. I was allowed to learn at my own pace while experimenting and failing repeatedly. I got help from my team members when needed, which helped me to achieve my goal for the work-term. Being new to Big Data can be overwhelming seeing how fast the field is growing and hearing about the large variety of technologies there are. An important piece of advice I received is to not try and learn too much at once but take one technology at a time and understand it well. Having gotten this far, I’m motivated to dive in deeper and to explore things further to improve my skills.

I highly recommend students to join Pythian for coop terms for a chance to learn from industry leaders worldwide. At Pythian you will not be treated as an intern but as a full-time employee.  You will gain knowledge about technologies that are not covered in school. You will get an opportunity to not only learn technologies relevant to your area of interest, but develop transferable skills which will be beneficial in several fields.

Categories: DBA Blogs

MySQL Benchmark in the Cloud

Thu, 2015-12-31 10:50


Testing functionalities and options for a database can be challenging at times, as a live production environment might be required. As I was looking for different options, I was directed by Derek Downey to this post in the Percona blog.

The blog discussed an interesting and fun tool from Percona, tpcc-mysql. I was interested in testing the tool so I decided to play around with it in an AWS EC2 server.

In this post I will expand on the Percona blog post, since the tool lacks documentation, as well as explain how I used it to create a MySQL Benchmark in AWS.

Why tpcc-mysql?

There are various reasons why tpcc-mysql could be a good option to use for a benchmarking project. The following points highlights most of them:


  • Mimics a full DB structure of a real warehouse.
  • Simulates a real life load on the server.
  • Options and flexibility.
  • Very light footprint on the system.


  • No documentation.
Getting the Server Started

You’ll probably need to launch a new EC2 server from the AWS Console, or use an existing one that you already have up an running. Either way, you had better save the current state of your database. Luckily, AWS EBS offers really good and convenient solution to achieve this.

It is possible to create and manage sanpshots of EBS volumes in the AWS Dashboard with some very basic steps. I personally prefer to setup the MySQL base and data directories together in a different volume from from the root volume. This allows me to swap between different versions and data-sets without having to reconfigure my tools every time I load a snapshot.


Writing a  good description helps when creating new volumes.


Possible suggestions come up as you start typing based on descriptions .

 Setting up the Benchmark

Once you have taken your snapshot and configured you MySQL, move on to setup. First we’ll need to setup the prerequisites.

tpcc-mysql uses mysql_config  is part of the libmysqlclient_dev package. We also need Bazaar. So we’ll go ahead and install that:

sudo apt-get install libmysqlclient_dev
sudo apt-get install bzr


Install & Compile spcc-mysql

Use following commands to download the tpcc-mysql source code and compile it:

bzr branch lp:~percona-dev/perconatools/tpcc-mysql
cd tpcc-mysql/src
make all


Prepare the Database & Create Required Tables

Once the the tpcc-mysql has been compiled, we will need to prepare the database for the benchmark. This will consist of running a few scripts to create the required database, tables, and generate random data to use during the testing process.

Following these steps will create the database and tables made for us, they are all part of the tpcc-mysql package:

cd ~/tpcc-mysql
# 1. Create Database to be load data in 
  mysql -u root -p -e "CREATE DATABASE tpcc1000;"
# 2. Create the required table definitions 
  mysql -u root -p tpcc1000 < create_table.sql
# 3. Add foreign keys and indexes  
  mysql -u root -p tpcc1000 < add_fkey_idx.sql

The following tables are created from the previous step:

$ mysql -u root -p tpcc1000 -e "SHOW TABLES;"
Enter password:
| Tables_in_tpcc1000 |
| customer           |
| district           |
| history            |
| item               |
| new_orders         |
| order_line         |
| orders             |
| stock              |
| warehouse          |

As you can see, tpcc-mysql mimics a warehouse’s database that tracks clients, items, orders, stock, … etc

Prepare the Database & Create Required Tables

The last step remaining before we can start our test is to populate some data into the tables. For that, tpcc-mysql has a script, tpcc_load, that does the job.

The tpcc_load script generates random dummy data in the tables created in the previous steps. The script also have a parameter that allows to specify how many warehouses you want to simulate.

The script usage is as follow:

tpcc_load [server] [DB] [user] [pass] [warehouse]

In our example, we’ll use the following:

./tpcc-mysql/tpcc_load tpcc1000 root "$pw" 2
Beginning the Benchmarking Process

This would be a good time to take a snapshot of your server/dataset, so you can come back to it. Also, before we get started, let’s get familiar with the script we need to use for starting the benchmarking process, tpcc_start. 

The script will start creating transactions that would execute various statements like SELECT, UPDATE, DELETE, and INSERT. The script will also be generating a detailed output of the progress and a summary in the end. You can redirect this output to a file to run some analysis, compare it later on, or use it to run an analysis.

The script comes with various parameters to give you flexibility to configure it as you desire:

tpcc_start -h[server] -P[port] -d[DB] -u[mysql_user] -p[mysql_password] -w[# of warehouses] -c[# of connections] -r[warmup_time] -l[running_time]

Now let’s get to the fun part!

We’ll be using the following command will start a simulation of warehouse transactions, and record the output in the file tpcc-output-01.log

./tpcc_start -h127.0.0.1 -dtpcc1000 -uroot -p -w2 -c16 -r10 -l1200 > ~/tpcc-output-01.log
Analyzing the Output

tpcc-mysql comes with different scripts that could be used for analysis. Check the tpcc-mysql/scripts folder. Example of some scripts are:

$ ls ~/tpcc-mysql/scripts/    

Visual Analysis of the Output

We can always take these tests a step further in many different directions. Since plotted data is a lot of fun, why not do a quick experiment with it?

The same blog post I used as my reference for this post also has a modified version of script that comes with tpcc-mysql. The script is named What this script does is that it extracts the time and # of transactions for each time block in a format that gnuplot can read for plotting the data. So let’s use the script on the output file:

./ tpcc-logs/tpcc-output-01.log tpcc-analyzed/time_tr_data_01.txt

To install gnuplot you simply run:

sudo apt-get install gnuplot

Then, we can create the plot using the  script (from here as well) as follows:

./ tpcc-analyzed/time_tr_data_01.txt tpcc-graphs/graph01.jpg

 And this generated the following plot for me:


I hope this was helpful. As you can see, there is a lot of potential of things that can be done using tpcc-mysql. If there is anything that you come up with or experiment with, I would love to hear it from you.


Discover more about our expertise in MySQL and the Cloud.

Categories: DBA Blogs

SQL On The Edge #6 – SQL AlwaysEncrypted

Thu, 2015-12-31 10:42

Security is on everyone’s mind these days in the IT (and the real) world. Either because they’re dealing with compliance, risks or mitigation, etc. at work or because they just saw on the news yet another item about some big leak/breach happening. It is said that it’s not a question of if your systems will be attacked but when. As part of the SQL product family, Microsoft has now released a new feature called AlwaysEncrypted to continue risk mitigation and strengthen the security story of the product. And I mentioned the SQL ‘product family’ instead of just SQL Server because this feature is also available on Azure SQL Database.


What is it?
AlwaysEncrypted is the latest in the set of features that enables encryption inside SQL Server. Let’s look at the list so far:

  • Column level encryption
    This targets specific columns in specific tables, with the encryption/decryption happening at the server.
  • Transparent Database Encryption (A.K.A TDE): This targets entire databases and is transparent to the calling application. It’s also transparent to any user with proper access to the data.
  • AlwaysEncrypted: This also targets specific columns in specific tables, with the encryption/decryption happening ON THE CLIENT.

This is the big difference of this new feature, that the operations to encrypt/decrypt happen on the client NOT on SQL Server. That means that if your SQL Server is compromised, the key pieces to reveal the data are NOT with the server. This means that even if your DBA wants to see the data, if they don’t have access to the CLIENT application then they won’t be able to see the values.


How Does it Work?
This feature can be enabled through T-SQL or through a wizard in Management Studio. The actual data manipulation is done by the latest version of the ADO .NET client and during configuration, the client will read all of the data, perform the encryption and send it back to SQL Server for storage. The latest 4.6 release of the .NET framework is required. There’s a Column Master Key that will have to be stored in a Windows certificate store, Azure Key Vault or other 3rd party key storage software. During normal application operation, the ADO client will read this master key and use it to decrypt and encrypt the values.

There are two options for this type of encryption:

  1. Randomized
    This will make the same source values encrypt into DIFFERENT encrypted values. Useful for columns that could be correlated by looking at them and won’t be used for searching.
  2. Deterministic: This will make the same source values encrypt into the SAME encrypted values, thus allowing for indexing and searching.


For the demo, check the video below where we’ll use the SSMS Wizard to enable AlwaysEncrypted on a column and will show the decryption happening in SSIS using the ADO .NET client!



Discover more about our expertise in SQL Server.

Categories: DBA Blogs

How to Add Two Ethernet Interfaces with Azure VM

Thu, 2015-12-31 09:55


Recently, working with my test VM on Azure, I needed a second network interface on my VM, but found no way to add one. I was using a standard A2 size Oracle Linux VM. I tried to search it in GUI interface and settings but could find no obvious way to add it. This surprised me so I continued to search and I found some blog posts on how to do it using Azure PowerShell. I discovered that there is no way to add the interface when the VM was already created and that you have to use Azure PowerShell for that. So, if you are a Mac user, as I am, and have only Azure CLI for Mac then you need to find a Windows box. I hope it will be fixed in future releases and that you will be able to manage networks from GUI or using any command line tool provided for Mac, Linux or any other platform. Thus, I will try to explain how you can create a new VM with 2 or more NIC.

First, we need to get a Windows machine to run Azure PowerShell commands. I created a small Windows box on Azure itself, explicitly to run the PowerShell when I need it. I’ve chosen basic A1 size Widows 2012 and installed the PowerShell there. It worked fine except you need to be careful if you use more than one monitor and rdp client for Mac. By default it was trying to use all monitors and in my case I got 1.5 screens (one was cut by half because it could not fit to my monitor). I removed check “Use all monitors” in my configuration for that connection in the rdp client. So, the first obstacle was resolved and I continued to work with the next steps.

Next, we will need “ImageName” to create a new machine. It can be checked using “Get-AzureVMImage” command. For Oracle linux it looks like:

PS C:\> Get-AzureVMImage | Where-Object { $_.Label -like "*Oracle Linux*" }
VERBOSE: 5:50:58 PM - Begin Operation: Get-AzureVMImage

ImageName : c290a6b031d841e09f2da759bbabe71f__Oracle-Linux-6-12-2014
OS : Linux
MediaLink :

Using the given ImageName we can now proceed. Keep in mind that you cannot create a VM with two or more NICs for an A2 size box. For two NICs you need at least a Large (A3) for 2 interfaces or an ExtraLarge(A4) if you need 4 NICs.
Let’s set up the image name:

PS C:\> $image = Get-AzureVMImage -ImageName "c290a6b031d841e09f2da759bbabe71f__Oracle-Linux-6-12-2014"

You need to setup your subscription ID for the session in PowerShell and a storage account:

PS C:\> Set-AzureSubscription -SubscriptionId "321265e2-ffb5-66f9-9e07-96079bd7e0a6" -CurrentStorageAccount "oradb5"

Create a custom config for our VM :

PS C:\> $vm = New-AzureVMConfig -name "oradb5" -InstanceSize "Large" -Image $image.ImageName
PS C:\> Add-AzureProvisioningConfig -VM $vm -Linux -LinuxUser "otochkin" -Password "welcome1"

I’ve created a virtual network “Multi-VNet” with two subnets for my VM and named the subnets as “Public” and “Private”. The Virtual network and subnets you can create in a GUI portal or using command line. I am going to use those subnets for my NICs.
Adding the first subnet to our VM configuration:

PS C:\> Set-AzureSubnet -SubnetName "Public" -VM $vm

Setting static IP for the network:

PS C:\> Set-AzureStaticVNetIP -IPAddress -VM $vm

Adding the second interface to the configuration.

PS C:\> Add-AzureNetworkInterfaceConfig -name "eth1" -SubnetName "Private" -StaticVNetIPAddress -VM $vm

And we can deploy our custom VM now:

PS C:\> New-AzureVM -ServiceName "test-1" -VNetName "Multi-VNet" -VM $vm
WARNING: No deployment found in service: 'test-1'.
VERBOSE: 6:59:03 PM - Begin Operation: New-AzureVM - Create Deployment with VM oradb5

OperationDescription OperationId OperationStatus
——————– ———– —————
New-AzureVM b7fcb2de-eac7-3684-aa8b-d1e9addc4587 Succeeded
VERBOSE: 7:01:08 PM – Completed Operation: New-AzureVM – Create Deployment with VM oradb5

The VM is created and you can check and connect to it. You don’t need your Windows box anymore and can shut it down to save money:

MacBook:~ otochkin$ azure vm list -v
info: Executing command vm list
verbose: Getting virtual machines
data: Name Status Location DNS Name IP Address
data: -------- ------------------ -------- --------------------- ----------
data: winman1 ReadyRole East US
data: oradb5 ReadyRole East US
info: vm list command OK
MacBook:~ otochkin$

You can connect to your freshly created VM and check the network:

[root@oradb5 ~]# ifconfig -a
eth0 Link encap:Ethernet HWaddr 00:0D:3A:11:A3:71
inet addr: Bcast: Mask:
inet6 addr: fe80::20d:3aff:fe11:a371/64 Scope:Link
RX packets:337 errors:0 dropped:0 overruns:0 frame:0
TX packets:353 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:54281 (53.0 KiB) TX bytes:49802 (48.6 KiB)

eth1 Link encap:Ethernet HWaddr 00:0D:3A:11:AC:92
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)

lo Link encap:Local Loopback
inet addr: Mask:
inet6 addr: ::1/128 Scope:Host
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)

[root@oradb5 ~]#

And that’s it. In summary I can say that it is not difficult to create more than one interface on an Azure VM, but I think it can be a good addition to GUI (Azure portal). In my next blog post I will try to check other aspects of using Oracle and Linux on Microsoft Azure. Stay tuned.


Discover more about our expertise in the Cloud.

Categories: DBA Blogs

Good Habits & Gratitude

Thu, 2015-12-31 09:40


One of my favourite books is The Power of Habit: Why We Do What We Do in Life and Business. With the New Year ahead and many of us focused on how we can continue to “up our game” by creating or changing our habits, it serves as a great read!

When most of us reflect on our habits, we tend to focus on our “bad” habits as opposed to the habits that are “good” or positive. I try to take a different approach by considering all the things that I do well, and incorporating those habits into my daily routine, including being grateful.

Two years ago I received the gift of a leather bound book. There were many use cases for the book, including note taking for work, collecting my favourite quotes, random sketches or jotting down ideas. I chose to use my book as a gratitude journal. At that time, a number of books about gratitude and the art of keeping a gratitude journal hit several bestseller lists. While I didn’t begin one then, I was keen on the new daily habit of documenting my gratitude.

As far as new habits go, this one was fairly easy to adopt:

  1. Find journal.
  2. Have journal in a convenient place.
  3. Pick a time of day to write in journal.
  4. Be grateful.
  5. Write it down.

My entries have covered everything from lessons I’ve learned, celebrating wins at work, special moments I’ve experienced, feelings I’ve felt, acknowledging good fortunes like health and wellness, etc. On days when I’m really pressed for time, I mindfully think about what I’m grateful for and log it in the next day. Sometimes the entries are short like a note about health, happiness, family, friends, a chocolate brownie, a great book, warm boots, etc.

This habit continues to help me remember and recognize the things that really matter to me. In times of stress or challenge my journal entries serve as a reminder that it’s important to take a few moments to find something to be grateful for. And while you don’t need a journal to be grateful, it’s wonderful to flip back and read what you were grateful for eight months ago, six weeks ago or even four days ago.

Today I’m grateful for the free speech that allows me to write this blog post, the winter tires on my car, the collective talents of the Pythian HR team, for the amazing 400+ colleagues in 36 countries who we get to work with every day and the opportunities that lay ahead for all of us in 2016.

What new habit(s) will you form in 2016?

Categories: DBA Blogs

Log Buffer #455: A Carnival of the Vanities for DBAs

Wed, 2015-12-30 13:45

What better to do during the holiday season than to read the Log Buffer? This log buffer edition is here to add some sparkle to Oracle, MySQL and SQL Server on your days off.


  • Ops Center version 12.3.1 has just been released. There are a number of enhancements here.
  • Oracle R Enterprise (ORE) 1.5 is now available for download on all supported platforms with Oracle R Distribution 3.2.0 / R-3.2.0. ORE 1.5 introduces parallel distributed implementations of Random Forest, Singular Value Decomposition (SVD), and Principal Component Analysis (PCA) that operate on ore.frame objects.
  • Create a SOA Application in JDeveloper 12c Using Maven SOA Plug-In by Daniel Rodriguez.
  • How reliable are the memory advisors?
  • Oracle Enterprise Manager offers a complete cloud solution including self-service provisioning balanced against centralized, policy-based resource management, integrated chargeback and capacity planning and complete visibility of the physical and virtual environments from applications to disk.

SQL Server:

  • SQL Server Data Tools (SSDT) and Database References.
  • Stairway to SQL Server Extended Events Level 1: From SQL Trace to Extended Events.
  • Advanced Mathematical Formulas using the M Language.
  • Liberating the DBA from SQL Authentication with AD Groups.
  • Enterprise Edition customers enjoy the manageability and performance benefits offered by table partitioning, but this feature is not available in Standard Edition.


  • Is MySQL X faster than MySQL Y? – Ask query profiler.
  • Usually when one says “SSL” or “TLS” it means not a specific protocol but a family of protocols.
  • The MariaDB project is pleased to announce the immediate availability of MariaDB 10.1.10, MariaDB Galera Cluster 5.5.47, and MariaDB Galera Cluster 10.0.23.
  • EXPLAIN FORMAT=JSON: everything about attached_subqueries, optimized_away_subqueries, materialized_from_subquery.
  • Use MySQL to store data from Amazon’s API via Perl scripts.


Learn more about Pythian’s expertise in Oracle SQL Server & MySQL.

Categories: DBA Blogs

Why the Python Data Model Could be for You

Wed, 2015-12-23 08:31


A great feature of Python is its simplicity. This simplicity allows developers to start coding immediately with very little formality. Python is also very good at doing ‘quick and dirty’ tasks. The result of this ability to get up and running with Python so quickly leads to many programmers using Python without necessitating a deeper understanding of the language. In many cases this is fine, but a deeper dive has some nice rewards.

Named Tuples
Named Tuples are a nice lightweight construct that makes things easier when dealing with data that might otherwise be stored in a regular tuple or dict.

In a recent project we encountered named tuples when returning table schema from Hive Thrift API:

# Define the fieldschema tuple
FieldSchema = collections.namedtuple('FieldSchema', ['comment','type', 'name'])

# Create some fields for a table schema
customer_table=[] customer_name = FieldSchema('customer name', 'varchar(100)','name')
customer_addr = FieldSchema('address', 'varchar(100)','addr1')
customer_dob = FieldSchema('birthdate ', 'int','dob')
last_updated = FieldSchema('date last updated', 'datetime','last_updated')

# create two customer tables that have slightly different schemas
customer_table_a=[customer_name, customer_addr, customer_dob] customer_table_b = [customer_name, customer_addr, customer_dob, last_updated]

The Python Data Model
In Python, everything is an object and each object has an identity, a type and a value. Built in methods like id(obj) returns the object’s identity and type(obj) returns the object’s type. The identity of an object can never change. Objects can be compared with the “is” keyword. Example: X is Y.

Special or “dunder” methods.
Special or ‘Magic’ method names are always written with leading and trailing double underscores (i.e., __getitem__). So when you use the syntax myobj[key] it is really calling the __getitem__ special method behind the scenes. So the interpreter calls obj.__getitem(key) in order to evaluate myobj[key]. Because of the leading and trailing double underscores, these methods are sometimes referred to as ‘dunder’ (double-underscore) methods. These ‘dunder’ methods are invoked by special syntax. For example using the index [] on a collections object invokes the __getitem__(key) special method. Example:

my_list = ['a','b','c'] print my_list[2] print my_list.__getitem__(2)
> c
> c

Generally, these ‘dunder’ methods are meant to be used by the interpreter, and not by you. But the advantage of knowing something about special methods is that it makes class design more fluid. Here’s an example where we override the __sub__ method (subtract) so we can compare dataframes by subtracting one from another:

import pandas as pd
class HiveTable:
def __init__(self,tabledef):
self.df = pd.DataFrame(tabledef)

def __sub__(self, other):
oj = other.df.merge(self.df, on=”name”, how=”outer”)
df_diff = oj[(pd.isnull(oj[“type_x”])) | (pd.isnull(oj[“type_y”]))] return df_diff

# Create Hive Tables
t = HiveTable([customer_name, customer_addr, customer_dob])
t2 = HiveTable([customer_name, customer_addr, customer_dob, last_updated])

# Compare them by ‘subracting’ them using the “-” operator.


date customer last updated


While Python is easy to get started with, it is also ‘deep all the way down’, continues to be interesting and reward with a deeper understanding. By implementing special methods in your own classes, your objects can behave more like the built in ones, allowing for ‘pythonic’ coding for your own classes.

Fluent Python by Luciano Ramalho



Categories: DBA Blogs

Log Buffer #454: A Carnival of the Vanities for DBAs

Fri, 2015-12-18 07:39

As the festive season of holidays draws near, there is a spring in the blogosphere. This Log Buffer edition anticipates that and picks some lively blog posts from Oracle, SQL Server and MySQL.


  • Truncate is one of those commands that cannot be undone trivially…In fact, you might well be looking at a database recovery in order to get a truncated table back to a “full” state.
  • It should be rare to see many TNS listener processes running on 1 OS. Some listeners are even Global Data Services listener.
  • The fact that an extension explicitly created by a user through DBMS_STATS can invalidate objects like packages is not new in 12c. It has been like that since the introduction of extensions in 11g.
  • Monitoring is a critical and necessary function to ensure that systems and process are running properly. Good monitoring practice can also be proactive in identifying and resolving potential problems before they occur.
  • Creating Multiple Tables in a Single Transaction.

SQL Server:

  • Execute these stored procedures to backup all your database daily and your transaction logs several time daily.
  • At some point you are going to experience issues with database performance.  You need to have a process to follow and certain tasks that you should perform.
  • PowerShell is like any computer language: you must understand the paradigms, the constructs, and the way it is designed to work to get the most value from it.
  • With the introduction of SQL Server 2016 you now have a new way to encrypt columns called Always Encrypted.
  • Azure SQL Database provides a number of benefits that leverage resiliency and redundancy built into the underlying cloud infrastructure.


  • Ahead of Oracle’s OpenWorld conference in 2013, the company first began to talk about a major new release of its open-source MySQL database. Now two years later, development on MySQL 5.7 is compete and general availability is set for October 26.
  • POWER8 Accelerated CRC32 merged in MariaDB 10.1
  • used_key_parts: EXPLAIN FORMAT=JSON provides insight into which part of multiple-column key is used.
  • MySQL 5.8 Planning: C++11 and Native Partitioning.
  • Knowing which privileges a given account has is easy – just issue SHOW GRANTS FOR user@host.  But what about when you need visibility into privileges from the other direction – which accounts can access specific data?


Learn more about Pythian’s expertise in Oracle SQL Server & MySQL.

Categories: DBA Blogs

Issues with Triggers in Cloudera Manager

Thu, 2015-12-17 09:47


Triggers in Cloudera Manager is a great feature. You may want to set them up in order to monitor tons of available metrics using tsquery language. But the documentation says:

Important: Because triggers are a new and evolving feature, backward compatibility between releases is not guaranteed at this time.

That’s happened to me when I tried to setup a new trigger to monitor dfs_capacity_used_non_hdfs metric in HDFS in CDH5.4.2 (verified that issue is still there for CDH5.5.0).

I used the Create Trigger button on the status page of HDFS service to create a new trigger, changed the default name, entered a metric, a value and changed the action to Mark as bad. It then showed that everything is ok and trigger was not fired. So I pressed Create Trigger and was now sure that it would send me an alert once non-dfs usage goes to high.

Well everything need to be tested. So I created another trigger with a lower value that should make this trigger fire. But I realised that noting happened. Having checked various things I figured out that the issue was that CM had created a trigger with a variable name $SERVICENAME instead of actual value. If you ever see the issue in the servicemonitor logs, it would be something like “Could not parse trigger expression: …”

The fix seems to be simple: replace it with HDFS and save. If you just do that it will complain about a manually changed expression for the trigger that was created in the editor. To prevent that you may want to remove expressionEditorConfig section. But the more consistent way is to remove the trigger before using the documented way from the Edit Trigger page. I prefer to do so as we can’t be sure that CM doesn’t keep any metadata somewhere else.

Another issue however is that you don’t have a link to this page in CM. It would usually appear at Health tests, if it was created without an initial issue. To get this Edit Trigger page you may use your browser history or build it manually. Just go to any health test from the service and replace the tail of the URL with healthTestName=alarm%3A<trigger name>. If you used spaces in <trigger name> replace them with plus sign.

The triggers are awesome but you should create them manually before the editor is fixed. Also if you have created triggers using editor, you may want to review if they are actually working. You should see them in the Health tests list. You shouldn’t see any parsing errors in the servicemonitor logs.


Discover more about our expertise in Big Data & Hadoop.

Categories: DBA Blogs

Is Oracle Smart Flash Cache a “SPOF”?

Thu, 2015-12-17 09:28


Oracle Smart Flash Cache (OSFC) is a nice feature that was introduced in Oracle 11g Release 2. As only recently I had a real use case for it, I looked into it with the main goal of determining if adding this additional caching layer would not introduce a new Single Point Of Failure (SPOF). This was a concern, because the solid-state cards/disks used for caching would normally have no redundancy to maximize the available space, and I couldn’t find what happens if any of the devices fail by looking in the documentation or My Oracle Support, so my decision was to test it!
The idea behind the OSFC is to provide a second level of “buffer cache” on solid-state devices that would have better response times compared to re-reading data blocks from spinning disks. When buffer cache runs out of space clean blocks (not “dirty”) would be evicted from it and written to the OSFC. The dirty blocks would be written by DBWR to the data files first, and only then would be copied to OSFC and evicted from the buffer cache. You can read more about what it is, how it works and how to configure OSFC in Oracle Database Administrator’s Guide for 11.2 and 12.1 and in this Oracle white paper “Oracle Database Smart Flash Cache“.

In my case the OSFC was considered for a database running on an Amazon AWS EC2 instance. We used EBS volumes for ASM disks for data files, and as EBS volumes are basically attached by networks behind the scenes, we wanted to remove that little bit of I/O latency by using the instance store (ephemeral SSDs) for the Smart Flash Cache. The additional benefit from using this would be reduction of IOPS done on the EBS volumes, and that’s a big deal, as it’s not that difficult to reach the IOPS thresholds on EBS volumes.



I did the testing on my VirtualBox VM, which ran Oracle Linux 7.2 and Oracle Database EE. In my case I simply added another VirtualBox disk, that I used for OSFC (reminder, not looking for performance testing here). The device was presented to the database via a separate ASM disk group named “FLASH”. Enabling the OCFS was done by setting the following parameters in the parameter file:

  • db_flash_cache_file=’+FLASH/flash.dat’
  • db_flash_cache_size=’8G’

The 1st surprise came when I bounced the database to enable the new settings, the DB didn’t start and an error was presented “ORA-00439: feature not enabled: Server Flash Cache”. Luckily, I found a known issue in a MOS note “Database Startup Failing With ORA-00439 After Enabling Flash Cache (Doc ID 1550735.1)”, and after forcefully installing two RPMs from OL5 (enterprise-release and redhat-release-5Server), the database came up.



The test I chose was a really simple. These are the preparation steps I did:

  • Reduced the buffer cache of the DB to approximately 700Mb.
  • Created table T1 of size ~1598Mb.
  • Set parameter _serial_direct_read=NEVER (to avoid direct path reads when scanning large tables. I really want to cache everything this time).

The next step was Full-scanning the table by running “select count(*) from T1”, and as I was also tracing the operation to see what was happening:

    • During the 1st execution I observed the following wait events (all multi-block reads from data files, as expected), however, I new the buffer cache was too small to fit all blocks, so a large volume of the blocks would end up in OSFC when they were flushed out from the buffer cache:
      WAIT #140182517664832: nam='db file scattered read' ela= 6057 file#=10 block#=90244 blocks=128 obj#=92736 tim=19152107066
      WAIT #140182517664832: nam='db file scattered read' ela= 4674 file#=10 block#=90372 blocks=128 obj#=92736 tim=19152113919
      WAIT #140182517664832: nam='db file scattered read' ela= 5486 file#=10 block#=90500 blocks=128 obj#=92736 tim=19152121510
      WAIT #140182517664832: nam='db file scattered read' ela= 4888 file#=10 block#=90628 blocks=128 obj#=92736 tim=19152129096
      WAIT #140182517664832: nam='db file scattered read' ela= 3754 file#=10 block#=90756 blocks=128 obj#=92736 tim=19152133997
      WAIT #140182517664832: nam='db file scattered read' ela= 8515 file#=10 block#=90884 blocks=124 obj#=92736 tim=19152143891
      WAIT #140182517664832: nam='db file scattered read' ela= 7177 file#=10 block#=91012 blocks=128 obj#=92736 tim=19152152344
      WAIT #140182517664832: nam='db file scattered read' ela= 6173 file#=10 block#=91140 blocks=128 obj#=92736 tim=19152161837
    • The 2nd execution of the query confirmed the reads from the OSFC:
      WAIT #140182517664832: nam='db flash cache single block physical read' ela= 989 p1=0 p2=0 p3=0 obj#=92736 tim=19288463835
      WAIT #140182517664832: nam='db file scattered read' ela= 931 file#=10 block#=176987 blocks=3 obj#=92736 tim=19288465203
      WAIT #140182517664832: nam='db flash cache single block physical read' ela= 589 p1=0 p2=0 p3=0 obj#=92736 tim=19288466044
      WAIT #140182517664832: nam='db file scattered read' ela= 2895 file#=10 block#=176991 blocks=3 obj#=92736 tim=19288469577
      WAIT #140182517664832: nam='db flash cache single block physical read' ela= 1582 p1=0 p2=0 p3=0 obj#=92736 tim=19288471506
      WAIT #140182517664832: nam='db file scattered read' ela= 1877 file#=10 block#=176995 blocks=3 obj#=92736 tim=19288473665
      WAIT #140182517664832: nam='db flash cache single block physical read' ela= 687 p1=0 p2=0 p3=0 obj#=92736 tim=19288474615


Crashing it?

Once the OSFC was in use I decided to “pull out the SSD” by removing the device /dev/asm-disk03-flash that I created using udev rules and that the FLASH disk group consisted of.
Once I did it, nothing happened, so I executed the query against the T1 table again, as it would access the data in OSFC. This is what I saw:

      1. The query didn’t fail, it completed normally. The OSFC was not used, and the query transparently fell back to the normal disk IOs.
      2. I/O errors for the removed disk were logged in the alert log, followed by messages about disabling of the Flash Cache. It didn’t crash the instance!
        Tue Dec 15 17:07:49 2015
        Errors in file /u01/app/oracle/diag/rdbms/lab12c/LAB12c/trace/LAB12c_ora_24987.trc:
        ORA-15025: could not open disk "/dev/asm-disk03-flash"
        ORA-27041: unable to open file
        Linux-x86_64 Error: 2: No such file or directory
        Additional information: 3
        Tue Dec 15 17:07:49 2015
        WARNING: Read Failed. group:2 disk:0 AU:8243 offset:1040384 size:8192
        path:Unknown disk
                 incarnation:0x0 synchronous result:'I/O error'
                 subsys:Unknown library krq:0x7f7ec93eaac8 bufp:0x8a366000 osderr1:0x0 osderr2:0x0
                 IO elapsed time: 0 usec Time waited on I/O: 0 usec
        WARNING: failed to read mirror side 1 of virtual extent 8191 logical extent 0 of file 256 in group [2.3848896167] from disk FLASH_0000  allocation unit 8243 reason error; if possible, will try another mirror side
        Tue Dec 15 17:07:49 2015
        Errors in file /u01/app/oracle/diag/rdbms/lab12c/LAB12c/trace/LAB12c_ora_24987.trc:
        ORA-15025: could not open disk "/dev/asm-disk03-flash"
        ORA-27041: unable to open file
        Linux-x86_64 Error: 2: No such file or directory
        Additional information: 3
        ORA-15081: failed to submit an I/O operation to a disk
        WARNING: Read Failed. group:2 disk:0 AU:8243 offset:1040384 size:8192
        path:Unknown disk
                 incarnation:0x0 synchronous result:'I/O error'
                 subsys:Unknown library krq:0x7f7ec93eaac8 bufp:0x8a366000 osderr1:0x0 osderr2:0x0
                 IO elapsed time: 0 usec Time waited on I/O: 0 usec
        WARNING: failed to read mirror side 1 of virtual extent 8191 logical extent 0 of file 256 in group [2.3848896167] from disk FLASH_0000  allocation unit 8243 reason error; if possible, will try another mirror side
        Tue Dec 15 17:07:49 2015
        Errors in file /u01/app/oracle/diag/rdbms/lab12c/LAB12c/trace/LAB12c_ora_24987.trc:
        ORA-15025: could not open disk "/dev/asm-disk03-flash"
        ORA-27041: unable to open file
        Linux-x86_64 Error: 2: No such file or directory
        Additional information: 3
        ORA-15081: failed to submit an I/O operation to a disk
        ORA-15081: failed to submit an I/O operation to a disk
        WARNING: Read Failed. group:2 disk:0 AU:8243 offset:1040384 size:8192
        path:Unknown disk
                 incarnation:0x0 synchronous result:'I/O error'
                 subsys:Unknown library krq:0x7f7ec93eaac8 bufp:0x8a366000 osderr1:0x0 osderr2:0x0
                 IO elapsed time: 0 usec Time waited on I/O: 0 usec
        WARNING: failed to read mirror side 1 of virtual extent 8191 logical extent 0 of file 256 in group [2.3848896167] from disk FLASH_0000  allocation unit 8243 reason error; if possible, will try another mirror side
        Tue Dec 15 17:07:49 2015
        Errors in file /u01/app/oracle/diag/rdbms/lab12c/LAB12c/trace/LAB12c_ora_24987.trc:
        ORA-15081: failed to submit an I/O operation to a disk
        ORA-15081: failed to submit an I/O operation to a disk
        ORA-15081: failed to submit an I/O operation to a disk
        Encounter unknown issue while accessing Flash Cache. Potentially a hardware issue
        Flash Cache: disabling started for file
        Flash cache: future write-issues disabled
        Start disabling flash cache writes..
        Tue Dec 15 17:07:49 2015
        Flash cache: DBW0 stopping flash writes...
        Flash cache: DBW0 garbage-collecting for issued writes..
        Flash cache: DBW0 invalidating existing flash buffers..
        Flash cache: DBW0 done with write disabling. Checking other DBWs..
        Flash Cache file +FLASH/flash.dat (3, 0) closed by dbwr 0


Re-enabling the OSFC

Once the OSFC was automatically disabled I wanted to know if it can be re-enabled without bouncing the database. I added back the missing ASM disk, but it didn’t trigger the re-enabling of the OSFC automatically.
I had to set the db_flash_cache_size=’8G’ parameter again, and then the cache was re-enabled, which was also confirmed by a message in the alert log:

Tue Dec 15 17:09:46 2015
Dynamically re-enabling db_flash_cache_file 0
Tue Dec 15 17:09:46 2015
ALTER SYSTEM SET db_flash_cache_size=8G SCOPE=MEMORY;

Good news! It appears to be safe (and also logical) to configure Oracle Smart Flash Cache on non-redundant solid-state devices, as their failures don’t affect the availability of the database. However, you may experience a performance impact at the time the OSFC is disabled. I did the testing on only, so this may behave differently in order versions.


Discover more about our expertise in the world of Oracle.

Categories: DBA Blogs

Recursion in Hive – Part 1

Wed, 2015-12-16 15:12


I am going to start this new series of blog posts talking about code migration use cases. We will talk about migration from RDBMS to Hive keeping the simplicity and flexibility of a SQL approach.

The first case is about recursive SQL. In most of the situations for RDBMS it covered by recursive queries by using a “with” clause. Though, unfortunately it’s not yet supported in Hive;.

Let’s consider the following scenario. We have PRODUCTs and STATEs. STATEs make the forest of trees structure. The facts are combinations of PRODUCTs and STATEs which may have some data. Here are the simplified DDLs:

create table t_product (
  product string);

create table t_state (
  state string,
  next_state string);

create table t_big_data (
  product string,
  state string,
  data string);

The task is: for an input set of pairs (PRODUCT, STATE) try to find the next available STATE with data in fact table or return NULL.

The input data is stored in t_input table:

create table t_input (
  product string,
  state string);

We need to populate t_output table:

create table t_output (
  product string,
  state string,
  found_state string,
  data string);

Here are various methods to solve this: procedural approach with recursive functions, recursive SQL, multi-joins (in case we know the max depth).

The most reasonable for the modern RDBMS supporting recursive queries would be something like this:

insert into t_output(product, state, found_state, data)
with rec (product, state1, state2, data) as (
  select i.product, i.state, i.state,
  from t_input i
  left join t_big_data d
    on d.product = i.product and d.state = i.state
  union all
  select r.product, r.state1, s.next_state,
  from rec r
  join t_state s
    on s.state = r.state2
  left join t_big_data d
    on d.product = r.product and d.state = s.next_state
  where is null
select product, state1 as state, state2 as found_state, data
from rec
where data is not null
   or state2 is null;

RDBMS can make a good execution plan for such queries especially if there are correct indexes on t_big_data table. We could do a multi-join approach in Hive but the cost for each big table scan it too high.

The trick we will use here is based on an observation that a tree structure is usually relatively small in comparison with a data table. So that we can easily “expand” a tree into flat denormalized structure: for each STATE from the initial table, we keep all STATES and path length on the way to root. For example, for the following simple tree:

----- ----------
S1    NULL
S2    S1
S3    S1
S4    S2

We would have:

------ ------ -----
S1     S1     0
S2     S2     0
S2     S1     1
S3     S3     0
S3     S1     1
S4     S4     0
S4     S2     1
S4     S1     2

Using the RDBMS recursive queries we would create this table as:

create table t_expand_state (
  state1 string,
  state2 string,
  lvl integer);

insert into t_expand_state (state1, state2, lvl)
with rec (state1, state2, lvl) as (
  select state, state, 0
  from t_state
  union all
  select r.state1, s.next_state, r.lvl + 1
  from rec r
  join t_state s
    on r.state2 = s.state
  where s.next_state is not null
select * from rec;

For Oracle DB we could do this with “connect by”:

select connect_by_root state state1, state as state2, level-1 lvl
from t_state
connect by prior next_state = state;

Having this t_expand_state table we can rewrite out query as:

insert into t_output(product, state, found_state, data)
select t.product, t.state,
       case when t.min_lvl_with_data is not null then t.state2 end,
from (
  select i.product, i.state, s.state2, s.lvl,,
         min(case when is not null then lvl end)
           over(partition by i.product_id, i.state) min_lvl_with_data
  from t_input i
  join t_expand_state s
    on s.state1 = i.state
  left join t_big_data d
    on d.product = i.product and d.state = s.state2) t
  where t.lvl = t.min_lvl_with_data
     or (t.lvl = 0 and t.min_lvl_with_data is null);

This solution has its specific edge cases of inefficiency:
— big t_state that produce abnormal t_expand_state table;
— dense t_big_data table: so that during the query execution it has to keep a lot of extra-rows with “data” for states we don’t need;
— big t_input: joining it by all “next states” would inflate dataset.
But for practical use t_input is usually relatively small and there isn’t much overhead for getting extra-data for the next states. Another advantage is that we scan t_big_data only once.

To reach our goal the only task left is: how to build t_expand_state in Hive without recursion? Well we surely may consider multi-joins once again, but my choice is to use: UDTF.

In order to make recursion more natural I implemented this function using Scala. In the ExpandTreeUDTF we store tree structure in a mutable map during the “process” method call. After that it expands this map using memoization.

class ExpandTree2UDTF extends GenericUDTF {
  var inputOIs: Array[PrimitiveObjectInspector] = null
  val tree: collection.mutable.Map[String,Option[String]] = collection.mutable.Map()

  override def initialize(args: Array[ObjectInspector]): StructObjectInspector = {
    inputOIs ={_.asInstanceOf[PrimitiveObjectInspector]}
    val fieldNames = java.util.Arrays.asList("id", "ancestor", "level")
    val fieldOI = primitive.PrimitiveObjectInspectorFactory.javaStringObjectInspector.asInstanceOf[ObjectInspector]
    val fieldOIs = java.util.Arrays.asList(fieldOI, fieldOI, fieldOI)
    ObjectInspectorFactory.getStandardStructObjectInspector(fieldNames, fieldOIs);

  def process(record: Array[Object]) {
    val id = inputOIs(0).getPrimitiveJavaObject(record(0)).asInstanceOf[String]
    val parent = Option(inputOIs(1).getPrimitiveJavaObject(record(1)).asInstanceOf[String])
    tree += ( id -> parent )

  def close {
    val expandTree = collection.mutable.Map[String,List[String]]()
    def calculateAncestors(id: String): List[String] =
      tree(id) match { case Some(parent) => id :: getAncestors(parent) ; case None => List(id) }
    def getAncestors(id: String) = expandTree.getOrElseUpdate(id, calculateAncestors(id))
    tree.keys.foreach{ id => getAncestors(id).zipWithIndex.foreach{ case(ancestor,level) => forward(Array(id, ancestor, level)) } }

Having this we may compile it to jar, add it to Hive, create function and use it to build t_expand_state table.

create function expand_tree as 'com.pythian.nikotin.scala.ExpandTreeUDTF';

insert ovewrite table t_expand_state (state1, state2, lvl)
select expand_tree(state, next_state) from t_state;



Categories: DBA Blogs