Pythian Group

Subscribe to Pythian Group feed
Love Your Data
Updated: 6 hours 15 min ago

A Tale of Three Cities: Perspectives on innovation from New York, San Francisco and Sydney

Tue, 2016-03-22 12:29

Recently, Pythian hosted a number of Velocity of Innovation (Velocity) events. I moderated two of these: one last June in New York, and one in November in San Francisco. Another event in Sydney, Australia was moderated by Tom McCann, senior customer experience analyst with Forrester.

Our Velocity events have given us unique insights into what IT professionals in various regions see as their top priorities or concerns. And although we always framed our discussions with similar questions, it was interesting to see the different directions they took in each location — especially when it came to the topic of innovation.

So what makes a particular region fertile ground for innovation? And can you measure it?

The Global Innovation Index (GII) ranks countries based on a multitude of indicators of innovation. The United States ranks number 2 on the GII, behind Switzerland, while Australia is number 17, out of 141 countries. According to the GII website, the index aims to capture the multi-dimensional facets of innovation and provide the tools to assist in tailoring policies to promote long-term output growth, improved productivity and job growth.

The ideas discussed in the US and Australian locations seemed to align with the GII results, with US panelists expressing more positive attitudes and concrete ideas on how companies can improve agility and become more innovative. And while being at the forefront of technology in the Asia-Pacific region, the Australian panelists and audience members described more cautious approaches to achieving innovation.

Sydney: Cautiously moving forward

Early in the Sydney panel discussion, Chris Mendez, executive consultant big data and analytics from Industrie IT, sparked a lively discussion about innovation by asserting that innovation is lacking in that region.

“I actually don’t think there’s enough innovation in Australia, in particular. There’s a lot of talk about it, people are doing a lot of experiments, and there are some companies who’ve set up business purely based on tool sets that use data to innovate. But there are a few things that seem to be working against innovation, and I think one of those things is that it doesn’t stand on its own,” Mendez said.

According to Francisco Alvarez, vice president, APAC at Pythian, the risks associated with innovation might be holding companies back in Australia. “The main problem for most companies is that innovation equals risk,” Alvarez said.

Alvarez also commented on what it takes to make innovation work. “If you take a step back and look at the companies that are doing well in the market, you can see that there is one factor that differentiates them: they were not afraid to try to innovate. And because of that innovation they are getting their share of the market and gaining ground. Just look at the financial market. CBA was considered crazy a few years ago for all the investment they were making in technology, social media, apps and so on. They got ahead. And now everybody is trying to do the same,” he said.

Mendez thinks that innovation needs to start from the top. “I think there’s also a very big misunderstanding at board levels about innovation because boards are there to actually stop you changing your business. The fundamental tenant is: ‘We’ve got a great business model here, it’s running well, we’ve got to make sure that any change to it doesn’t damage that.’ There’s a natural caution at board levels and it’s totally understandable,” he said.

While cautious, the Sydney panelists expressed that they thought there is hope for more innovation in the future. They expressed a need to proceed slowly, watching what works for innovation leaders.

“The key is to have a balance,” Alvarez said.

New York: Early adopters

If you were to put our New York panelists on Geoffrey Moore’s Technology Adoption Lifecycle, you might classify them as early adopters, rather than true innovators. Not surprising, since New York’s competitive industries such as banking and publishing rely on innovative technologies, but they don’t create them.

According to New York panelist, Forrester Analyst Gene Leganza, what makes an enterprise agile is the ability to sense what’s going on in the marketplace and to quickly respond to it. But, he said that innovation comes at a cost. “The flip side of agility is innovation. An interesting aspect of innovation is getting really hot talent into your environment. Getting the right talent and doing smart things and being leading edge are challenges. You have to figure out what level to drop in on, where you are in the industry. You need to determine if you are a startup or a state organization that needs to be a fast follower,” Leganza said.

Otto Toth, CTO at Huffington Post warned that innovating quickly is not always in the best interest of the business, or it may not be the way to do it properly. He asserted that quick innovation can actually work against the business, and that instead of making your business faster, being very agile can slow everything down.

“Too many decision-makers just slow down the process. It’s better to have a few people or a core team who make the decisions and come up with new features,” he added.

Leganza went on to describe what it takes at various levels of the organization. He said that there’s a notion at the engineer level that agility means bureaucracy won’t get in their way. Then there’s agility at the enterprise level, which is about reducing risk and understanding how soon change can be in production.

“The higher up you go, the more people are going to be receptive to what improves the whole portfolio rather than one project. This is where architects come in. They have been hands-on, but have the credibility and knowledge to guide the organization more strategically,” Leganza said.

San Francisco: The innovators

In San Francisco the narratives on innovation were quite different. Although cities don’t have their own GII ranking, you might assume that the West Coast IT leaders are the innovators. And judging by the discussion at the San Francisco event, this assumption seemed to be true.

Cory Isaacson, CTO at RMS was one of our San Francisco panelists. His company runs catastrophe models for some of the world’s largest insurance companies, like scenarios that will tell what a disaster like an earthquake or hurricane might cost them. Isaacson has been working on bringing big data and scalable systems together to create a new cloud-based platform.

“At my company some of the things that we’re trying to do are, honestly, more advanced than most other things I’ve ever seen in my career. But when you’re doing innovation, it is risky. There’s no way around it. There is a lot to evaluate: from different algorithms to the risk models and the catastrophe models,” said Isaacson.

Sean Rich, director of IT at Mozilla added to the San Francisco discussion by talking about some of the concrete innovations his company is working on. They’re taking a partnership approach to enable agility.

“Innovation is doing something new. In an effort toward achieving agility, one of the things that we’re doing is enabling the agility of our business partners, by changing our own operating model. Instead of traditional IT where we run all the services and infrastructure necessary to drive the business, we’re taking more of an enabler or partnership approach,” Rich said.

“We’re now doing things like encouraging shadow IT, encouraging the use of SaaS applications and helping them really do that better through different service offerings like vendor management or change management of user adoption for certain platforms and data integration” he added.

“Overall, we’re looking at ourselves differently, and asking what new capabilities we need to develop, and what processes, tools and skills we need to enable agility for our marketing group or our product lines, as an example,” Rich said.

Aaron Lee, the Chief Data Officer at Pythian, runs a team that specializes in helping clients harness technology to deliver real outcomes. Usually they involve things like big data, DevOps, cloud, advanced analytics — he’s involved in some of the most leading edge initiatives for Pythian customers. He takes a practical approach to innovation with clients, and said that companies could improve innovation by looking at the root of the motivation for it.

“They need to ask: Why are we going down this path, trying to innovate something and what is the value of that thing we’re trying to innovate?

“If the shared goals around innovation opportunities aren’t defined in a way that actually lead to success over time, then the business is just like any other organism: it starts to get more risk averse. Then it becomes harder and harder to execute any kind of change agenda. Planning in a way that is likely to have a good long-term outcome, even at the outset of any sort of initiative, is one key success criteria that we put in place to help ourselves and our customers get to a good place,” Lee said.

Isaacson added that companies like Google have been known to allow an engineer to take a day a week or a day every two weeks to just look at things. “I think though, the challenge is you have to get your organization up to the point where this is an economically viable thing to do. Once we get more ahead of the curve, I think we could do that kind of thing,” he said.

Interested in being a part of a discussion like these? VELOCITY OF INNOVATION is a series of thought-leadership events for senior IT management hosted by Pythian. Pythian invites leading IT innovators to participate in discussions about today’s disruptive technologies: big data, cloud, advanced analytics, DevOps, and more. These events are by invitation only.

If you are interested in attending an upcoming Velocity of Innovation event in a city near you, please contact To view our schedule of upcoming events visit our Velocity of Innovation page.

Categories: DBA Blogs

Apache Cassandra 2.1 Incremental Repair

Mon, 2016-03-21 15:05

The “incremental repair” feature has been around since Cassandra’s 2.1. Conceptually the idea behind incremental repair is straightforward, but it can get complicated. The official Datastax document describes the procedure for migrating to incremental repair, but in my opinion, it doesn’t give a full picture. This post aims to fill in this gap by summarizing and consolidating the information of Cassandra incremental repair.

Note: this post assumes the reader has a basic understanding of Apache Cassandra, especially the “repair” concept within Cassandra.


1. Introduction

The idea of incremental repair is to mark SSTables that are already repaired with a flag (a timestamp called repairedAt indicating when it was repaired) and when the next run of repair operation begins, only previously unrepaired SSTables are scanned for repair. The goal of an “incremental repair” is two-fold:

1) It aims to reduce the big expense that is involved in a repair operation that sets out to calculate the “merkle tree” on all SSTables of a node;

2) It also makes repair network efficient because only rows that are marked as “inconsistent” will be sent across the network.

2. Impact on Compaction

“Incremental repair” relies on an operation called anticompaction to fulfill its purpose. Basically, anticompaction means splitting an SSTable into two: one contains repaired data and the other contains non-repaired data. With the separation of the two sets of SSTables, the compaction strategy used by Cassandra also needs to be adjusted accordingly. This is because we cannot merge/compact a repaired SSTable with an unrepaired SSTable together. Otherwise, we lose the repaired states.

Please note that when an SSTable is fully covered by a repaired range, no anticompaction will occur. It will just rewrite the repairedAt field in SSTable metadata.

SizeTiered compaction strategy takes a simple strategy. Size-Tiered compaction is executed independently on the two sets of SSTables (repaired and unrepaired), as the result of incremental repair Anticompaction operation.

For Leveled compaction strategy, leveled compaction is executed as usual on repaired set of SSTables, but for unrepaired set of SSTables, SizeTiered compaction will be executed.

For DateTiered compaction strategy, “incremental repair” should NOT be used.

3. Migrating to Incremental Repair

By default, “nodetool repair” of Cassandra 2.1 does a full, sequential repair. We can use “nodetool repair” with “-inc” option to enable incremental repair.

For Leveled compaction strategy, incremental repair actually changes the compaction strategy to SizeTiered compaction strategy for unrepaired SSTables. If a nodetool repair is executed for the first time on Leveled compaction strategy, it will do SizeTiered compaction on all SSTables because until the first incremental repair is done, Cassandra doesn’t know the repaired states. This is a very expensive operation and it is therefore recommended to migrate to incremental repair one node at a time, and follow the following procedure to migrate to incremental repair:

  1. Disable compaction on the node using nodetool disableautocompaction
  2. Run the default full, sequential repair.
  3. Stop the node.
  4. Use the tool sstablerepairedset to mark all the SSTables that were created before you disabled compaction.
  5. Restart cassandra
3.1 Tools for managing SSTable repaired/unrepaired state

Cassandra offers two utilities for SSTable repaired/unrepaired state management:

  • sstablemetadata is used to check repaired/unrepaired state of an SSTable. The syntax is as below:

             sstablemetadata <sstable filenames>

  • sstablerepairedset is used to manually mark if an SSTable is repaired or unrepaired. The syntax is as below. Note that this tool has to be used when Cassandra is stopped.

             sstablerepairedset [–is-repaired | –is-unrepaired] [-f <sstable-list> | <sstables>]

Please note that with utility sstablerepairedset, you can also stop incremental repair on Leveled compaction and restore the data to be leveled again with the “—is-unrepaired” option. Similarly, the node needs to be stopped first.

4. Other Considerations with Incremental Repair

There are some other things to consider when using incremental repair.

  • For Leveled compaction, once an incremental repair is used, it should be done so continuously. Otherwise, only SizeTiered compaction will be executed. It is recommended to run incremental repair daily and run full repairs weekly to monthly.
  • Recovering from missing data or corrupted SSTables require a non-incremental full repair.
  • “nodetool repair” –local option should be only used with full repair, not with incremental repair.
  • In C* 2.1, sequential repair and incremental repair does NOT work together.
  • With SSTable’s repaired states being tracked via it’s metadata, some Cassandra tools can impact the repaired states:
    1. Bulk loading will make loaded SSTables unrepaired, even if was repaired in a different cluster.
    2. If scrubbing causes dropped rows, new SSTables will be marked as unrepaired. Otherwise, SSTables will keep their original repaired state.
Categories: DBA Blogs

Pythian at Collaborate 16

Mon, 2016-03-21 14:27

Collaborate is a conference for Oracle power users and IT leaders to discuss and find solutions and strategies based on Oracle technologies. This many Oracle experts in one place only happens one per year, and Pythian is excited to be attending. If you are attending this year, make sure to register for some of the sessions featuring Pythian’s speakers, listed below.

Collaborate 16 is on April 10-14, 2016 at the Mandalay Bay Resort and Casino in Las Vegas, Nevada, US.


Pythian Collaborate 16 Speaker List:


Michael Abbey | Consulting Manager | Oracle ACE

Communications – the Good, the Bad, and the Best

Tues April 12 | 9:15 a.m. – 10:15 a.m. | North Convention, Room South Pacific D

Traditional DB to PDB: The Options

Tues April 12 | 2:15 p.m. – 3:15 p.m. | Room Jasmine A

Documentation – A Love/Hate Relationship (For Now)

Wed April 13 | 8:00 a.m. – 9:00 a.m. | Room Palm A


Nelson Caleroa | Database Consultant | Oracle ACE

Exadata Maintenance Tasks 101

Tues April 12 | 10:45 a.m. – 11:45 am | Room Palm C

Evolution of Performance Management: Oracle 12c Adaptive Optimization

Tues April 12 | 3:30 p.m. – 4:30 p.m | Room Jasmine A


Subhajit Das Chaudhuri | Team Manager

Deep Dive Into SSL Implementation Scenarios for Oracle Application E-Business Suite

Wed April 13 | 8:00 a.m. – 9:00 a.m. | Room Breakers E


Alex Gorbachev | CTO | Oracle ACE Director

Oaktable World: TED Talks

Wed April 13 | 12:00 p.m. – 12:30 p.m. | Room Mandalay Bay Ballroom

Oaktable World: Back of a Napkin Guide to Oracle Database in the Cloud

Wed April 13 | 4:15 p.m. – 5:15 p.m. | Room Mandalay Bay Ballroom


Gleb Otochkin | Principal Consultant

Two towers or story about data migration. Story about moving data and upgrading databases.

Mon April 11 | 4:30 p.m. – 5:30 p.m. | Room Jasmine A


Simon Pane | ATCG Principal Consultant | Oracle Certified Expert

Oracle Database Security: Top 10 Things You Could & Should Be Doing Differently

Mon April 11 | 2 p.m. – 3 p.m. | Room Palm A

Time to get Scheduling: Modernizing your DBA scripts with the Oracle Scheduler (goodbye CRON)

Tues April 12 | 10:45 a.m. – 11:45 a.m. | Room Palm A


Roopesh Ramklass | Principal Consultant

Oracle Certification Master Exam Prep Workshop

Sun April 10 | 9:00 a.m. – 3:00 p.m. | Room Jasmine C

Fast Track Your Oracle Database 12c Certification

Wed April 13 | 8:00 a.m. – 9:00 a.m. | Room Jasmine A


Categories: DBA Blogs

The 5 Best Things That Will Happen to DBAs When SQL Server Moves to Linux

Fri, 2016-03-18 14:50


In the second half of 2017, SQL Server will start calling Linux its second home. Azure Data Lake for Ubuntu was the sign that Microsoft was serious about going Linux. Private preview is already available for SQL Server on Linux, and this scribe is not part of it but let me wildly guess what would warm the hearts of those DBAs who have played with Oracle on Linux and SQL Server on Windows:


  1. Cleanup won’t require sifting through registry entries and cleaning it up. The uninstall would be quick, simple, and a breeze.
  2. No confusion about Windows authentication or SQL Server authentication.
  3. A much much better system utilities for monitoring and root cause analysis.
  4. Boasting and bragging rights about managing an enterprise database on Linux.


Probably the most serious thing that will happen with SQL Server on Linux is that more and more Oracle DBAs who prefer to use Linux, will start taking an interest in managing SQL Server.

Pythian is perfectly poised to leverage this change in technology from Microsoft. We have world class SQL Server DBAs, Linux gurus, and some magnificent Oracle DBAs. Existing clients as well as new clients can count on these resources to get their SQL Server databases supported or migrated to Linux.

And of course, it doesn’t matter really whether these database are on cloud or not come 2017, because Pythian has already covered that too.

Categories: DBA Blogs

Pythian Tech Chatter

Fri, 2016-03-18 13:23

At Pythian, the DevOps, Big Data, and Data Science teams use Slack for our IM system. We’re a diverse group drawn to a wide range of technologies so there’s some interesting and valuable chatter about what folks are reading. Here are some subjects that came across our channels the past couple of weeks:


Pythian engineers support Solr and Elasticsearch for a number of clients. Here’s a great summary of the important tunables in elasticsearch:

We often are called into companies to help manage deployment infrastructure, and it is quite common to encounter an unwieldy monolithic application that has been cobbled together over the years. Our head of DevOps pointed out this book as a great resource for strategies to decompose those into microservices:

Big Data

As Data Lake architectures mature, we’re seeing more comprehensive offerings from vendors. This blog talks about Microsoft’s offerings. I like the integration of active directory for strong security and the U-SQL approach to pulling data from data lakes.  While I’m not a huge fan of C#, the concept of having a library of extractors and outputters is a nice nut and bolt approach:

Another thread from the Data Lakes discussion highlights the critical importance of Data Governance. Waterline’s Data Inventory tool is a strong player for MetaData/Governance automation:

Data Science

Facebook released its implementation of deep learning neural nets last year. The Data Science team has been spending some time with it as they evaluate and build AI tools:

The team has also been using some great Java tools for natural language processing from Stanford:

Categories: DBA Blogs

GoldenGate 12.2 Big Data Adapters: part 2 – FLUME

Thu, 2016-03-17 12:36

In this blog post we continue our review of the new Oracle GoldenGate Big Data adapters. In the first part of the series I tested the basic HDFS adapter and checked how it worked with some DML and DDL. In this article I will try the Flume adapter and see how it works.

A quick reminder on what Flume is. It is not a topic about the popular Australian musician. Today we are talking about Apache Flume. In short, it is a pipeline or a kind of streaming system that allows you to move large amount of data. It has simple architecture and, in general, there are three main components:

a) Source: where data enters into Flume from an outside system.
b) Sink: responsible for passing data to the destination system whether it is the final destination, or another flow.
c) Channel: connects the Source and Sink.

I know that it is a rather simplistic description, but the main subject of this article is not what Flume can do, but how we can pass our data from Oracle to Flume using GoldenGate.
My first post discussed how you set up an Oracle source system, and how to start GoldenGate initial load and extract. I am not repeating it here. Let’s assume we have the source system. It is Oracle database, replicating DML and DDL for one particular schema, and GGTEST using Oracle GoldenGate 12.2 to trail files on our box where we already have GoldenGate for Big Data. Have a look at the first part to see how to set up the GoldenGate for Big Data (OGG BD).

So, we have our OGG BD setup and the manager up and running.

GGSCI (sandbox.localdomain) 1> info manager

Manager is running (IP port sandbox.localdomain.7839, Process ID 18521).

GGSCI (sandbox.localdomain) 2>

What we need now is to prepare our Flume agent to accept messages from OGG. I’ve already set up my Flume’s agent-ng service on my Linux box, and now we need to prepare the configuration file for the agent to handle the income stream, and pass it to the destination system. We will set up our source to “avro” and sink will be writing to HDFS. The source can be either avro or thrift. According to oracle documentation the Flume handler can stream data from a trail file to Avro or Thrift RPC Flume sources.

I have to admit that the destination as HDFS looks quite artificial since we have a special adapter for HDFS and don’t need a Flume to write there. But such a configuration can help us compare different adapters and what they can do.
I used Flume version 1.6.0:

[oracle@sandbox flume-ng]$ bin/flume-ng version
Flume 1.6.0
Source code repository:
Revision: 2561a23240a71ba20bf288c7c2cda88f443c2080
Compiled by hshreedharan on Mon May 11 11:15:44 PDT 2015
From source with checksum b29e416802ce9ece3269d34233baf43f
[oracle@sandbox flume-ng]$

Here is my configuration file for the Flume agent:

# Name/aliases for the components on this agent
agent.sources = ogg1
agent.sinks = hdfs1
agent.channels = ch1
#Avro source
agent.sources.ogg1.type = avro
agent.sources.ogg1.bind =
agent.sources.ogg1.port = 4141

# Describe the sink
agent.sinks.hdfs1.type = hdfs
agent.sinks.hdfs1.hdfs.path = hdfs://sandbox/user/oracle/ggflume
#agent.sinks.hdfs1.type = logger

# Use a channel which buffers events in memory
agent.channels.ch1.type = memory
agent.channels.ch1.capacity = 100000
agent.channels.ch1.transactionCapacity = 10000

# Bind the source and sink to the channel
agent.sources.ogg1.channels = ch1 = ch1

I’ve made the configuration simple and clear. You may change agent.sources.ogg1.port and agent.sinks.hdfs1.hdfs.path  depending on your system.

On the target HDFS we have to create directory as it was defined in our sink configuration.

[oracle@sandbox ~]$ hadoop fs -mkdir /user/oracle/ggflume
[oracle@sandbox ~]$ hadoop fs -ls /user/oracle/ggflume
[oracle@sandbox ~]$

We can start our Flume agent now.

[root@sandbox conf]# service flume-ng-agent start
Starting Flume NG agent daemon (flume-ng-agent):           [  OK  ]
[root@sandbox conf]# service flume-ng-agent status
Flume NG agent is running                                  [  OK  ]
[root@sandbox conf]#
[root@sandbox conf]# tail /var/log/flume-ng/flume.log
25 Feb 2016 11:56:37,113 INFO  [lifecycleSupervisor-1-0] (org.apache.flume.instrumentation.MonitoredCounterGroup.register:120)  - Monitored counter group for type: CHANNEL, name: ch1: Successfully registered new MBean.
25 Feb 2016 11:56:37,121 INFO  [lifecycleSupervisor-1-0] (org.apache.flume.instrumentation.MonitoredCounterGroup.start:96)  - Component type: CHANNEL, name: ch1 started
25 Feb 2016 11:56:37,122 INFO  [conf-file-poller-0] (org.apache.flume.node.Application.startAllComponents:173)  - Starting Sink hdfs1
25 Feb 2016 11:56:37,123 INFO  [conf-file-poller-0] (org.apache.flume.node.Application.startAllComponents:184)  - Starting Source ogg1
25 Feb 2016 11:56:37,139 INFO  [lifecycleSupervisor-1-0] (org.apache.flume.source.AvroSource.start:228)  - Starting Avro source ogg1: { bindAddress:, port: 4141 }...
25 Feb 2016 11:56:37,146 INFO  [lifecycleSupervisor-1-2] (org.apache.flume.instrumentation.MonitoredCounterGroup.register:120)  - Monitored counter group for type: SINK, name: hdfs1: Successfully registered new MBean.
25 Feb 2016 11:56:37,147 INFO  [lifecycleSupervisor-1-2] (org.apache.flume.instrumentation.MonitoredCounterGroup.start:96)  - Component type: SINK, name: hdfs1 started
25 Feb 2016 11:56:38,114 INFO  [lifecycleSupervisor-1-0] (org.apache.flume.instrumentation.MonitoredCounterGroup.register:120)  - Monitored counter group for type: SOURCE, name: ogg1: Successfully registered new MBean.
25 Feb 2016 11:56:38,115 INFO  [lifecycleSupervisor-1-0] (org.apache.flume.instrumentation.MonitoredCounterGroup.start:96)  - Component type: SOURCE, name: ogg1 started
25 Feb 2016 11:56:38,116 INFO  [lifecycleSupervisor-1-0] (org.apache.flume.source.AvroSource.start:253)  - Avro source ogg1 started.
[root@sandbox conf]#

Flume is ready, and we can now prepare our OGG configuration. We have examples for the Flume adapter configuration files in $OGG_HOME/AdapterExamples/big-data/flume/ :

[oracle@sandbox oggbd]$ ll AdapterExamples/big-data/flume/
total 12
-rw-r--r--. 1 oracle oinstall 107 Dec  9 12:56
-r-xr-xr-x. 1 oracle oinstall 812 Dec  9 12:56 flume.props
-rw-r--r--. 1 oracle oinstall 332 Dec  9 12:56 rflume.prm
[oracle@sandbox oggbd]$

We can copy the examples to our configuration directory and adjust them to our needs:

[oracle@sandbox oggbd]$ cp AdapterExamples/big-data/flume/* dirprm/

Here is configuration file for our adapter:

[oracle@sandbox oggbd]$ cat dirprm/flume.props

gg.handlerlist = flumehandler
#gg.handler.flumehandler.maxGroupSize=100, 1Mb
#gg.handler.flumehandler.minGroupSize=50, 500 Kb




javawriter.bootoptions=-Xmx512m -Xms32m -Djava.class.path=ggjava/ggjava.jar

You will need to adjust your gg.classpath depending on your system, as it has to include Flume java classes and a file with Flume source properties ( file).
Here is my example for the file which will be used by OGG adapter to connect to the flume-ng agent. I’ve placed it to the dirprm directory along with other parameters.

[oracle@sandbox oggbd]$ cat dirprm/

As you can see, my flume-ng agent is on the same host as the OGG which may not be the same for you. In your case you may need to provide hostname and port for your running glume-ng agent. We need to prepare the configuration file for our initial load. The OGG trail file is located in the dirdat/ directory and has name initld.

<pre></pre> <pre>Here is our parameter file for initial load using passive replicat: [oracle@sandbox oggbd]$ cat dirprm/irflume.prm --initial REPLICAT irflume -- Command to run REPLICAT in passive mode -- ./replicat paramfile dirprm/irflume.prm reportfile dirrpt/irflume.rpt SPECIALRUN END RUNTIME EXTFILE /u01/oggbd/dirdat/initld --DDLERROR default discard DDL include all TARGETDB LIBFILE SET property=dirprm/flume.props REPORTCOUNT EVERY 1 MINUTES, RATE GROUPTRANSOPS 10000 MAP ggtest.*, TARGET bdtest.*; 

Let’s run the load and see what we get in the end:

[oracle@sandbox oggbd]$ ./replicat paramfile dirprm/irflume.prm reportfile dirrpt/irflume.rpt

The command completed successfully and we got three new files on HDFS. The first 2 files had the schema description and the 3-d one had the data for the replicated tables.

[root@sandbox ~]# hadoop fs -ls /user/oracle/ggflume
Found 12 items
-rw-r--r--   1 flume oracle       1833 2016-03-10 11:17 /user/oracle/ggflume/FlumeData.1457626634620
-rw-r--r--   1 flume oracle       1762 2016-03-10 11:17 /user/oracle/ggflume/FlumeData.1457626634621
-rw-r--r--   1 flume oracle       1106 2016-03-10 11:17 /user/oracle/ggflume/FlumeData.1457626634622
[root@sandbox ~]# hadoop fs -tail /user/oracle/ggflume/FlumeData.1457626634620
        "name" : "PK_ID",
        "type" : [ "null", "string" ],
        "default" : null
      }, {
        "name" : "PK_ID_isMissing",
        "type" : "boolean"
      }, {
        "name" : "RND_STR",
        "type" : [ "null", "string" ],
        "default" : null
      }, {
        "name" : "RND_STR_isMissing",
        "type" : "boolean"

[root@sandbox ~]# hadoop fs -tail /user/oracle/ggflume/FlumeData.1457626634621
  }, {
    "name" : "primary_keys",
    "type" : {
      "type" : "array",
      "items" : "string"
  }, {
    "name" : "tokens",
    "type" : {
      "type" : "map",
      "values" : "string"
[root@sandbox ~]# hadoop fs -tail /user/oracle/ggflume/FlumeData.1457626634622
:?v??8?????	SaQm?"BDTEST.TEST_TAB_1Ñ??
                                          ?"BDTEST.TEST_TAB_1I42016-02-16 19:17:40.74669942016-03-10T11:17:14.448000(00000000-10000002012
                                                                                   ?"BDTEST.TEST_TAB_1I42016-02-16 19:17:40.74669942016-03-10T11:17:14.459000(00000000-10000002155
                                                                                   ?"BDTEST.TEST_TAB_1I42016-02-16 19:17:40.74669942016-03-10T11:17:14.459001(00000000-10000002298
                                                                                   ?"BDTEST.TEST_TAB_1I42016-02-16 19:17:40.74669942016-03-10T11:17:14.460000(00000000-10000002441
PK_ID4RXZT5VUN&2013-09-04:23:32:56HW82LI73&2014-05-11:05:23:23"BDTEST.TEST_TAB_2?????"BDTEST.TEST_TAB_2I42016-02-16 19:17:40.76289942016-03-10T11:17:14.466000(00000000-10000002926

The initial load has succeeded, and now we can create and start the proper ongoing replication to HDFS through Flume.
Let’s prepare a new parameter file for our permanent Flume replicat and starting it up.

GGSCI (sandbox.localdomain) 2> edit param rflume
-- Trail file for this example is located in "dirdat/" directory
-- Command to add REPLICAT
-- add replicat rflume, exttrail dirdat/or
TARGETDB LIBFILE SET property=dirprm/flume.props
MAP ggtest.*, TARGET bdtest.*;

GGSCI (sandbox.localdomain) 1> add replicat rflume, exttrail dirdat/or, begin now

GGSCI (sandbox.localdomain) 2> start replicat rflume

Sending START request to MANAGER ...

Let’s insert a row and see what we get on the target system.

orclbd> insert into ggtest.test_tab_1
  2  values (7,dbms_random.string('x', 8), sysdate-(7+dbms_random.value(0,1000)),
  3  dbms_random.string('x', 8), sysdate-(6+dbms_random.value(0,1000))) ;

1 row inserted.

orclbd> commit;

Commit complete.


As soon as commit had been executed we received a couple of new files on HDFS where the first had the schema for the changed table, and the second had the data for the transaction or  “payload”.

[root@sandbox ~]# hadoop fs -ls /user/oracle/ggflume
-rw-r--r--   1 flume oracle       1833 2016-03-10 11:17 /user/oracle/ggflume/FlumeData.1457626634620
-rw-r--r--   1 flume oracle       1762 2016-03-10 11:17 /user/oracle/ggflume/FlumeData.1457626634621
-rw-r--r--   1 flume oracle       1106 2016-03-10 11:17 /user/oracle/ggflume/FlumeData.1457626634622
-rw-r--r--   1 flume oracle       1833 2016-03-10 12:43 /user/oracle/ggflume/FlumeData.1457631817021
-rw-r--r--   1 flume oracle        605 2016-03-10 12:43 /user/oracle/ggflume/FlumeData.1457631817022
[root@sandbox ~]#

[root@sandbox ~]# hadoop fs -cat /user/oracle/ggflume/FlumeData.1457631817021
SEQ!"]B?9?k??	]kTSa?m??{
  "type" : "record",
  "name" : "TEST_TAB_1",
  "namespace" : "BDTEST",
  "fields" : [ {
    "name" : "table",
    "type" : "string"
  }, {
    "name" : "op_type",
    "type" : "string"
  }, {
[root@sandbox ~]# hadoop fs -cat /user/oracle/ggflume/FlumeData.1457631817022
  "type" : "record",
  "name" : "generic_wrapper",
  "namespace" : "oracle.goldengate",
  "fields" : [ {
    "name" : "table_name",
    "type" : "string"
  }, {
    "name" : "schema_hash",
    "type" : "int"
  }, {
    "name" : "payload",
    "type" : "bytes"
  } ]
                            ?"BDTEST.TEST_TAB_1I42016-03-10 17:43:31.00169042016-03-10T12:43:33.464000(00000000080001408270

I prepared and executed a small regression testing of inserts and updates to the table using jmeter, and started to push inserts and updates with a rate about 29 transaction per second.Even with one flume channel and my small Hadoop environment, it had a pretty good response time without trashing any errors. Flume put about 900 transactions per a HDFS file.

-rw-r--r--   1 flume oracle     123919 2016-03-10 14:52 /user/oracle/ggflume/FlumeData.1457639485465
-rw-r--r--   1 flume oracle      35068 2016-03-10 14:52 /user/oracle/ggflume/FlumeData.1457639485466
-rw-r--r--   1 flume oracle     145639 2016-03-10 14:52 /user/oracle/ggflume/FlumeData.1457639485467
-rw-r--r--   1 flume oracle     178943 2016-03-10 14:52 /user/oracle/ggflume/FlumeData.1457639485468
-rw-r--r--   1 flume oracle     103285 2016-03-10 14:52 /user/oracle/ggflume/FlumeData.1457639485469
[oracle@sandbox Downloads]$ hadoop fs -cat /user/oracle/ggflume/FlumeData.1457639485467 | wc -l
[oracle@sandbox Downloads]$ hadoop fs -cat /user/oracle/ggflume/FlumeData.1457639485468 | wc -l
[oracle@sandbox Downloads]$ hadoop fs -cat /user/oracle/ggflume/FlumeData.1457639485469 | wc -l
[oracle@sandbox Downloads]$

I’ve also tried the “thrift” datasource for Flume and it worked well too. To switch from “avro” to “thrift” I changed the value in the parameter agent.sources.ogg1.type in the flume.conf and restarted the flume agent. You also have to change client.type from default to thrift in your file. It worked fine, and I was able to get the information from the trail and write to the hdfs.

[oracle@sandbox oggbd]$ ./replicat paramfile dirprm/irflume.prm reportfile dirrpt/irflume.rpt
[oracle@sandbox oggbd]$ hadoop fs -ls /user/oracle/ggflume
Found 3 items
-rw-r--r--   1 flume oracle       1833 2016-02-25 16:05 /user/oracle/ggflume/FlumeData.1456434311892
-rw-r--r--   1 flume oracle       1762 2016-02-25 16:05 /user/oracle/ggflume/FlumeData.1456434311893
-rw-r--r--   1 flume oracle       1106 2016-02-25 16:05 /user/oracle/ggflume/FlumeData.1456434311894
[oracle@sandbox oggbd]$
[oracle@sandbox oggbd]$ hadoop fs -cat  /user/oracle/ggflume/FlumeData.1456434311892
  "type" : "record",
  "name" : "TEST_TAB_1",
  "namespace" : "BDTEST",
  "fields" : [ {
    "name" : "table",
    "type" : "string"
  }, {

[oracle@sandbox oggbd]$ hadoop fs -cat  /user/oracle/ggflume/FlumeData.1456434311894
                                                                                                              ?"BDTEST.TEST_TAB_1I42016-02-16 19:17:40.74669942016-02-25T16:05:11.480000(00000000-10000002012
                                                                                   ?"BDTEST.TEST_TAB_1I42016-02-16 19:17:40.74669942016-02-25T16:05:11.498000(00000000-10000002155
                                                                                   ?"BDTEST.TEST_TAB_1I42016-02-16 19:17:40.74669942016-02-25T16:05:11.498001(00000000-10000002298
                                                                                   ?"BDTEST.TEST_TAB_1I42016-02-16 19:17:40.74669942016-02-25T16:05:11.499000(00000000-10000002441
PK_ID4RXZT5VUN&amp;2013-09-04:23:32:56HW82LI73&amp;2014-05-11:05:23:23"BDTEST.TEST_TAB_2?????"BDTEST.TEST_TAB_2I42016-02-16 19:17:40.76289942016-02-25T16:05:11.505000(00000000-10000002926
PK_IDRND_STR_1ACC_DATE7IJWQRO7T&amp;2013-07-07:08:13:52[oracle@sandbox oggbd]$

You can see from the output that in the FlumeData.1456434311894 file we are getting the schema description and in the FlumeData.1456434311894 we have the data from the tables TEST_TAB_1 and TEST_TAB_2.

Let’s try some simple DDL commands.
If we truncate a table:

orclbd> truncate table ggtest.test_tab_1;

Table GGTEST.TEST_TAB_1 truncated.


It is not going to be replicated. If we are altering the table, we are not seeing it as a separate command, but it is going to be reflected in the new schema definition for any new transaction replicated to HDFS. You will get a file with new schema definition and the transaction itself in a next file.

orclbd> alter table ggtest.test_tab_1 add (new1 varchar2(10));

Table GGTEST.TEST_TAB_1 altered.

orcl> insert into ggtest.test_tab_1
  2  values (7,dbms_random.string('x', 8), sysdate-(7+dbms_random.value(0,1000)),
  3  dbms_random.string('x', 8), sysdate-(6+dbms_random.value(0,1000)),'new_col' );

1 row created.

orcl> commit;

Commit complete.

[oracle@sandbox oggbd]$ hadoop fs -cat /user/oracle/ggflume/FlumeData.1457117136700
  "type" : "record",
  "name" : "TEST_TAB_1",
  "namespace" : "BDTEST",
  "fields" : [ {
    "name" : "table",
    "type" : "string"
        "name" : "NEW1",
        "type" : [ "null", "string" ],
        "default" : null
      }, {
        "name" : "NEW1_isMissing",
        "type" : "boolean"
[oracle@sandbox oggbd]$ hadoop fs -cat /user/oracle/ggflume/FlumeData.1457117136701
}SB???"BDTEST.TEST_TAB_1?????"BDTEST.TEST_TAB_1I42016-03-04 18:45:30.00131442016-03-04T13:45:34.156000(00000000000000014363

As I’ve mentioned in my previous post, the deeper investigation of supported DDL changes is going to be the subject of a dedicated blog post. Here we can conclude that the adapter worked as expected, and supported the flow of transactions from our Oracle database down to Flume using Avro and Thrift sources. Of course it is not production implementation, and serves only as a basic functional and elementary regression testing. For a serious production workflow we need to develop appropriate architecture.

In my next few posts I plan to check Kafka and HBASE adapters and see how they work. Stay tuned!

Categories: DBA Blogs

Connection timeout parameters in MySQL

Thu, 2016-03-17 08:09
  • wait_timeout
  • interactive_timeout
  • net_read_timeout
  • net_write_timeout

What do these timeouts do in MySQL? If you search the web for one or more of these, you may find complaints that no comprehensive explanation exists for all of these timeouts in one place (besides the obvious documentation of dynamic server system variables in MySQL). This blog post seeks to provide a central documentation source for timeouts and provide some practical explanation.

Knowing what timeouts do helps in a troubleshooting effort. It’s good to understand when an issue is timeout related and when it’s not, and to know the right reasons for changing timeout variables, or the right time to ask the developer or ad-hoc user to please tune the variables in the session, instead of asking the DBA to change the global variables.

Before diving into the meat of this topic, here are a few introductory concepts that I will touch on.

I mentioned session variables. The four timeout variables we are discussing: interactive_timeout, wait_timeout, net_read_timeout, net_write_timeout can all be set within the context of the session using the SET command. e.g. “set session wait_timeout=3600;“.  A general rule of thumb that I always try to follow: Once I have what I think are correct timeouts to protect my mission critical application, I do not want to change them unless there is a really good reason. If I’m encountering a client disconnection issue and it’s not coming from my main application, I need to see if the problem can be alleviated by adjusting timeouts on the session level using SET commands.

Second, it’s important to note that timeouts are there to protect your server and critical application. You don’t want too many clients connected to your server doing nothing (they can crowd out connections that need to do something). You don’t want clients that are in a failed state due to a network connection or other interruptive problem to continue to consume resources such as locks on your database. Timeouts should be set very thoughtfully, based on the unique environment. If you’re unsure, leave at defaults and adjust when needed.

Finally, I created a perl script to illustrate wait_timeout situations. I made a great effort to also incorporate net_read_timeout and net_write_timeout into the script and you’ll see empty functions there as place-holders. There are many layers to the “MySQL client/server onion” involving buffers on the system, client, and server layers. In lieu of providing a direct scripting examples of net_read_timeout and net_write_timeout, I hope you’ll find the other methods of explanation included below, useful.


According to the MySQL manual, Interactive_timeout is only used for connections from interactive clients like the command line MySQL clients.

The technical explanation is: “clients utilizing the “CLIENT_INTERACTIVE option to mysql_real_connect().” Check the specific documentation to your favorite client to see whether it fits this description.

If your client fits this behavior, wait_timeout is set to the value in interactive_timeout. The only benefit to this parameter is flexibility. If your command line or other interactive clients have a vastly different requirement than your application, feel free to set this differently than wait_timeout.


Wait timeout is simply there to protect you in the common case of clients sitting there doing nothing but absorbing a connection. You want to set this to the lowest acceptable number of seconds in order to protect your server against an application malfunction or some other event that may cause too many connections to the database to be opened, crowding out other clients. If a client is doing nothing for wait_timeout seconds, the MySQL server will terminate the connection. The proper setting for this variable depends on the particular environment.

An environment I worked in for 3+ years set this to 120 seconds, because jboss connection pooling (and associated parameters) took care of making sure the mission critical application in the environment had available connections.

A common misconception and common misuse is to try to adjust wait_timeout for a situation in which a query has been interrupted. Wait timeout only applies to idle sessions. If the connection is in any other state besides idle, wait_timeout does not apply. This is why utilities like pt-kill exist. It does not help that an idle connection being interrupted by KILL produces the same error message.

Example using my perl script and the MySQL processlist:

jscott@js-trusty1:~$ ./ -K
Doing Killable Wait Timeout Test
ATTN: You have 20 seconds to kill the MySQL thread!!

in another shell

jscott@js-trusty1:~$ mysql -e "show processlist"
| Id | User | Host | db | Command | Time | State | Info |
| 36 | js | | NULL | Sleep | 28514 | | NULL |
| 79 | root | localhost:43350 | timeout_test | Sleep | 4 | | NULL |
| 80 | root | localhost | NULL | Query | 0 | NULL | show processlist |
jscott@js-trusty1:~$ mysql -e "kill 79";

back to the perl script output.

DBD::mysql::st execute failed: MySQL server has gone away at ./ line 134.
Err: 2006
ErrStr: MySQL server has gone away
State: HY000


Note that we got MySQL error code 2006 “MySQL server has gone away”.

Now we’ll allow wait_timeout to trigger instead of killing a process.

jscott@js-trusty1:~$ ./ -W
Doing Wait Timeout Test
DBD::mysql::st execute failed: MySQL server has gone away at ./ line 151.
Err: 2006
ErrStr: MySQL server has gone away
State: HY000

The fact that a killed idle process and a process receiving wait_timeout get the same error message and code (2006 “MySQL server has gone away”) causes much confusion. Rest assured, wait_timeout only kills idle connections, even if the error messages are the same.  If there is a running query, you will typically get error code 2013: “Lost connection to MySQL server during query”.

The most common reason a query is killed, in my experience, is a backup. Many backup utilities for MySQL have the default behavior of killing long-running queries.


net_read_timeout is the number of seconds mysql will wait between bytes as you are sending it data. As mentioned above, I tried very hard to synthetically cause a net_read_timeout with a perl script and was unsuccessful.

The best way to explain net_read_timeout is to use the example of an ETL (Extract, Transform, Load) job. At the beginning of an ETL job, you select data from one data source, then transform and load it into another data source, such as a MySQL data warehouse. Tools like “Pentaho Data Integration” (cited below) provide a GUI view of ETL.

Pentaho ETL Process

Pentaho ETL Process

If your ETL job is complex, and you use something similar to Pentaho Kettle, you have very little visibility into what is happening behind the scenes.

If the steps in between the initial query in an ETL job and an output step consume more than net_read_timeout seconds, then an error may occur due to incomplete statements being sent to the “output” steps.  In mid-transaction, if the MySQL server fails to receive data within the timeout interval, the connection will be terminated.

Another way net_read_timeout can occur: Normally all database connections used in a Pentaho ETL transformation are opened at the beginning of the transformation. If you have “use result streaming cursor” turned off on the database connection used for “table input”, the result set may take longer than “net_read_timeout” seconds to receive, causing an error on the database connection being used for the output steps.

Have you ever executed a simple query (with a large result set) on the MySQL command line and wondered why there is a pause of several seconds or even minutes prior to beginning to receive the results? Your client actually has begun receiving the results, but they simply have not been presented to you. The reason for this is, it’s generally better for a MySQL server to be “finished” with a query as soon as possible; therefore, the default behavior of the MySQL client libraries on a read-only query is to receive all the results first, before passing the results onto the program. The default behavior is called “mysql_store_result“.  The MySQL reference manual does a good job of explaining this in detail. As the manual states, you should not change the default behavior unless you are doing a minimal amount of processing on each result row.

Homemade ETL programs in other languages can have the same issue. Take care to use auto_reconnect features and/or streaming cursors / “mysql_use_result”, when appropriate.

ETLs and scripts are good examples of clients which can be modified with SET commands.  “SET SESSION net_read_timeout=<longer value>;” should be considered for jobs outside of your main application.

Lastly, net_read_timeout is also often caused by bad network connections. This is the default explanation offered in many blog posts on this topic.

The Error returned to the client will be (2013) “Lost connection to MySQL server during query”.


net_write_timeout is the number of seconds MySQL will wait for the client to receive additional data before terminating a connection. Using the example of the ETL job above, if steps in the middle of the transformation cause the “input” steps to pause for longer than net_write_timeout seconds, it’s possible to receive an error. I’ve never seen net_write_timeout hit because of an ETL job. The MySQL client has default behavior of buffering result sets.  Also, I try to use the “use result streaming cursor” in Pentaho or the “mysql_use_result” option in the Perl DBI, sparingly.

The most typical case of receiving net_write_timeout is a backup such as a mysqldump. During a long data transfer such as a mysqldump, a myriad of factors could come into play, not the least of which is network connectivity. Remember, if you run into problems with net_write_timeout on a mysqldump backup, mysqldump has its own section of the my.cnf. You can set an option file on a client or a server to specifically set this and other variables for a mysqldump session.

Categories: DBA Blogs

Log Buffer #465: A Carnival of the Vanities for DBAs

Wed, 2016-03-16 12:17

This Log Buffer Edition covers some tricks, tips, workarounds, and tech-dives covered in various blog posts from Oracle, SQL Server and MySQL.


SQLcl can run Ruby script!!.. JRuby to be precise.

Financial Information Discovery Integration with Oracle Assets

Data Vault Modeling and Snowflake Elastic Data Warehouse

Couchbase Bucket Index Status in NetBeans IDE 8.1

Make SQL Developer Run Faster

SQL Server:

Find a String in a String

Microsoft cloud platform allows you to provide resiliency for your on-premises SQL Server deployments. Marcin Policht describes this functionality, available as part of the Azure Site Recovery solution.

Trust, or Verify – How FKs Can (or Cannot) Help Performance

10 New Features Worth Exploring in SQL Server 2016

Validating a Windows Cluster Prior to Installing SQL Server 2014


When you think of Cloud services for database administration, Database as a Service (DBaaS) is what tends to comes to mind, but one of the ways that the Cloud has had an enormous impact on our work productivity is in the area of collaboration.

Introduction to MySQL Troubleshooting Webinar: Q & A for How to Create a Test Setup

MariaDB is a community-developed fork of MySQL and aims to be an enhanced, drop-in replacement for it.

The MySQL binary package have been growing a lot the last GA versions.

Indexing JSON documents via Virtual Columns

Categories: DBA Blogs

Breaking Down Barriers with DevOps

Wed, 2016-03-16 09:53

In a recently recorded webinar,  How To Deliver Faster, Better Software Releases with DevOps, Pythian’s Chief Digital Officer, Aaron Lee and guest speaker Amy DeMartine, Senior Research Analyst at Forrester Research, discussed DevOps and the important role it plays in the race to deliver the best customer experience with faster and better software releases.

One of the webinar participants asked about the critical success factors when implementing a rapid release software model. 

The case for adopting agile operations is very compelling. Rapid release cycles can reduce costs, provide better and more stable solutions, increase customer and employee satisfaction, and most importantly, outpace competition and drive revenue.

Driving both efficiency and effectiveness of operations is easier said than done. DeMartine highlighted 7 habits of effective DevOps, but stressed that culture is the number one enemy.

She said, “this fear of change is a huge cultural gap. DevOps is at the point of cultural change, and this is one of the biggest ones that IT has to break”.

Historically, IT teams have held large and lengthy change boards, requiring a strong reliance on rigid approval processes. Combine that history with the complexity and degree of unknowns that operations faces, and it isn’t surprising that some teams are risk averse and consensus driven.

While new technologies, processes, and automation are critical to implementing rapid release models and enhancing DevOps, both DeMartine and Lee agreed that it boils down to the people, and the way Dev and Ops interact with each other.

“It’s clear what the responsibilities of a product manager are. It’s clear where the developers responsibilities start and stop,” said Lee. Often this leads to operations “picking up everything else” and facing an increased amount of unknowns.

Part of the solution is understanding how to embrace and maximize the opportunity for change, as opposed to minimizing the occurrence of change.

Most importantly, Dev and Ops need to know and understand each other. By setting and balancing clear boundaries and expectations, each team can understand the role the other is playing, and begin to see the environment under a common goal.

For more on this topic, download the on-demand webinar below:

devopswebinar_CTA (1)

Categories: DBA Blogs

Converting Hortonworks Sandbox to run on Hyper-V

Tue, 2016-03-15 10:58

It looks like Hortonworks recently decided to stop hosting a version of their Sandbox VM for Windows Hyper-V. I only see VirtualBox and VMware versions listed.

What if, like me, your primary learning lab machine runs Hyper-V?

Well, you can convert it fairly easily. My method is to use VirtualBox to do this.

I run VirtualBox on my Mac because it’s free, it has free conversion tools and I usually only run 1-2 VMs on it, but my Mac isn’t my learning lab. This tip WILL work on a Windows machine that has VirtualBox installed.

Note that VirtualBox and Hyper-V may not get along well if installed on the same device, hence my using two machines to do this.

In order to convert it, here’s what you need to do.

  • Download the VirtualBox Sandbox VM here.
  • Follow Hortonworks’ instructions to import the appliance into VirtualBox.
  • Find the disk that it created by looking at the properties of the VM you just created.
  • Open a terminal and navigate to that directory.
  • From that directory, run this:

VBoxManage clonehd Hortonworks_sanbox_with_hdp_2_4_virtualbox-disk1.vmdk HDP2.4.vhd --format vhd

This process runs for a bit and creates a copy in VHD format, which you can copy onto, and run from; any Hyper-V machine.

Simply create a new Hyper-V machine, as you normally would, but instead of creating a new disk, choose this one and fire it up.

On the subject of VM Config, you should give it access to your internal network so that you can access it via browser, a couple processors and on memory, a word of caution: when I did this with dynamic memory enabled, the VM took all of my available system memory, so you may want to limit consumption to a number that reserves some computing power for the host and any other VMs you may want to run in parallel.

After mounting and starting my new Hyper-V, VM I found that I hadn’t allocated enough RAM or processor and it was “dying” on boot so I upped the RAM to 6 GB and 4 processors from 2 GB and 1 respectively.

Next up, eth0 wasn’t found on boot so I checked what Google had to say and found this article.

I edited the first file, and upon checking the second (/etc/sysconfig/network-scripts/ifcfg-eth0) I found that the MAC address was not recorded so didn’t have anything to do.

I saved, rebooted, and watched and eth0 was found at this time – of course the VirtualBox add-in failed at boot, but that isn’t a big deal.

When the VM came up, it instructed me to connect to which didn’t work. I looked up the IP assigned by my router put that IP (without a port) into my browser and was able to connect without issue.

Happy learning!

Categories: DBA Blogs

There’s more to IT than just coding

Tue, 2016-03-15 10:00


March 2, 2016 was officially the midpoint of the Technovation Challenge in Ottawa. The 2016 competition started on Sunday January 17, at Carleton University, where Anar Simpson, Global Ambassador for Technovation, kicked off the program.

Technovation is a global technology entrepreneurship competition for young women that sets out to prove that there’s more than just coding in the technology sector. The program is designed to inspire and educate young women to pursue a career in technology by showing them all aspects of starting a technology business.

Regional Technovation Chapters contact local high schools to introduce the program and recruit teams of high school girls. Thanks to the efforts of Jennifer Francis, chair of Women Powering Technology, an Ottawa Chapter of Technovation that started up in January 2015. The pilot was such a success, participation in 2016 has doubled with over 100 high school and middle school girls participating and 30 female mentors from the Ottawa tech sector.

In addition to IBM, Shopify, and L-Spark, Pythian is a proud sponsor of the 2016 competition. Having just announced the Pythia program , it was a natural fit for Pythian to sponsor Technovation. The Pythia program focuses on increasing the percentage of talented women who work at Pythian, especially in tech roles. It also encourages and supports the participation of girls and women in STEM fields, which is exactly what Technovation is all about.

The support of the sponsors allows the teams to meet weekly at the  sponsor’s facilities. Here the teams, along with their mentors, guest speakers and instructor from Carleton University’s Technology Innovation Management (TIM) program, focus on identifying a problem facing their community, creating an App to solve the problem, code the App, build a company, and pitch their business plan to experts in the field! It’s pretty impressive considering the high school girls squeeze this competition in on top of their day-to-day school classes and after-school activities. They are all committed and dedicated – a great sign of future leaders!

“My views of working in the technology sector have changed, since it feels like something anyone can be a part of, whereas it was a distant idea before,” said 17-year-old Doris Feng, a student at Merivale High School and member of the team Women With Ambition. “I came in with the notion that we would be coding during the first week, but it turns out much of the development takes place off screen, with many hours dedicated to brainstorming, surveying users, drawing a paper prototype, and mulling over the ideas with team members.”

I couldn’t have said it better Doris! This is exactly what happens in the real world.

Technovation is a program designed to inspire women to pursue the entrepreneurial spirit in all of us. For more information on Technovation and starting your own local chapter, visit Technovation online. Globally, Technovation is sponsored by Adobe Foundation, Google, Verizon, CA Technologies, Intel and Oracle, in partnership with UN Women, UNESCO and MIT Media Lab.

Categories: DBA Blogs

How to run OpenTSDB with Google Bigtable

Mon, 2016-03-14 12:49

In a previous post (OpenTSDB and Google Cloud Bigtable) we discussed OpenTSDB, an open source distributed database specifically designed for storing timeseries data. We also explained how OpenTSDB relies on Apache HBase for a reliable and scalable data backend. However, deployment and administration of an HBase cluster is not a trivial task, as it requires a full Hadoop setup. This means that it takes a big data engineer (or better a team of them) to plan for the cluster sizing, provision the machines and setup the Hadoop nodes, configure all services and tune them for optimal performance. If this is not enough, Operations teams have to constantly monitor the cluster, deal with hardware and service failures, perform upgrades, backup regularly, and a ton of other tasks that make maintenance of a Hadoop cluster and OpenTSDB a challenge for most organizations.

With the release of Google Bigtable as a cloud service and its support for the HBase API, it was obvious that if we managed to integrate OpenTSDB with Google Bigtable, we would enable more teams to have access to the powerful functionality of OpenTSDB by removing the burden from maintaining an HBase cluster.

Nevertheless, integration of OpenTSDB with Bigtable was not as seamless as dropping a few jars in its release directory. This happened because the OpenTSDB developers went over and above the standard HBase libraries, by implementing their very own asynchbase library. Asynchbase is a fully asynchronous, non-blocking, thread-safe, high-performance HBase API. And no one can put it better than the asynchbase developers themselves who claim that ‘This HBase client differs significantly from HBase’s client. Switching to it is not easy as it requires one to rewrite all the code that was interacting with any HBase API.’

This meant that integration with Google Bigtable required OpenTSDB to switch back to the standard HBase API. We saw the value of such an effort here at Pythian and set about developing this solution.

The asyncbigtable library

Today, we are very happy to announce the release of the asyncbigtable library. The asyncbigtable library is a 100% compatible implementation of the great asynchbase library that can be used as a drop in replacement and enable OpenTSDB to use Google Bigtable as a storage backend.

Thanks to support from the OpenTSDB team, the asyncbigtable code is hosted in the OpenTSDB GitHub repository.


To create asyncbigtable we had to overcome two great challenges. The first one was that OpenTSDB assumes that the underlying library (until now asynchbase) performs asynchronous and non-blocking operations. On the other hand, the standard HBase API only supports synchronous and blocking calls. As a workaround for this, we used the BufferedMutator  implementation that collects all Mutation operations in a buffer and performs them in batches, allowing for mutations with an extremely low latency.

The second challenge stemmed from the fact that the OpenTSDB project has a very limited set of jar dependencies, that are explicitly defined in Makefiles. Contrary to this spartan approach, HBase and Bigtable client libraries have a significant number of transitive dependencies. Since, adding those dependencies one-by-one in the OpenTSDB build process would complicate its dependency management, we decided to  package all asyncbigtable dependencies in an uber-jar using the Maven assembly plugin. Therefore, building OpenTSDB with asyncbigtable support is now as simple as downloading a single beefy jar.

Build stepsBefore you start

Before you build OpenTSDB with Google Bigtable support, you must complete the following required steps:

  1. Create a Google Bigtable cluster (
  1. Install HBase shell with access to the Google Bigtable cluster (
  1. Download and install the required tools for compiling OpenTSDB from source (
Build and run OpenTSDB
  1. Clone and build the modified source code from the Pythian github repository:

git clone -b bigtable
cd opentsdb

  1. Create OpenTSDB tables

OpenTSDB provides a script that uses HBase shell to create its tables.  To create the tables run the following command:
env COMPRESSION=NONE HBASE_HOME=/path/to/hbase-1.1.2 \

  1. Run OpenTSDB

export HBASE_CONF=/path/to/hbase-1.1.2/conf
mkdir -p <tmp_dir>
./build/tsdb tsd --port=4242 --staticroot=build/staticroot \

Future work

By all means our work on asyncbigtable does not stop here. We are putting great effort towards improving the library to achieve the high quality standards of the rest of OpenTSDB code. Our first priority is to test the library against most real world scenarios and achieve the highest quality. In the future, we plan to benchmark the performance of OpenTSDB with Bigtable and compare how it competes against HBase.

We are also working on building a true asynchronous implementation of the asyncbigtable library by integrating deeper with the Google Bigtable API.


We would like to thank the OpenTSDB developers (Benoît Sigoure and Chris Larsen) for their brilliant work in building such great software and for embracing the asyncbigtable library. Their insights and code contributions helped us deal with some serious issues. Also, we would like to thank the Google Cloud Bigtable team because they expressed genuine interest in this project and they were very generous in providing us with cloud infrastructure and excellent support.

Categories: DBA Blogs

Bug in Ointment: ORA-600 in Online Datafile Move

Mon, 2016-03-14 09:02

Instead of using ‘fly in ointment’, I have used ‘Bug in Ointment’ because in this prolonged Australian summer, my backyard is full of bugs (to the sheer delight of my bug-loving son, at the same time causing much anxiety among the rest of us). When your backyard is full of bugs and you get bugs in a database, it’s only natural to customize the idioms.

Oracle 12c has been warming up the hearts of database aficionados in various ways with its features. One of the celebrated features is the online datafile moving and renaming. Lots has been written about it and suffice to say that we don’t need any down time in order to move, rename, or copy the data files anymore. It’s an online operation with zero down time incurring a slight performance overhead.

I was playing with this feature on my test system with Oracle 12.1 on OEL 6, and when moving a datafile in a pluggable database I got this error:

ORA-600 [kpdbGetOperLock-incompatible] from ALTER PLUGGABLE DATABASE .. DATAFILE ALL ONLINE

Well, I tried searching for this error using ORA-600 look up tool, but it didn’t turn up anything and simply informed me:

An Error document for ORA-600 [kpdbgetoperlock-incompatible] is not registered with the tool.

Digging more in My Oracle Support pulled out following associated bug:

Bug 19329654 – ORA-600 [kpdbGetOperLock-incompatible] from ALTER PLUGGABLE DATABASE .. DATAFILE ALL ONLINE (Doc ID 19329654.8)

The good news was that the bug was fixed in the (Oct 2014) Database Patch Set Update. And it’s true, after applying this PSU, everything was hunky-dory.

Categories: DBA Blogs

Changes to DBCA Patch Application Behaviour Causes PDB Cloning to Fail

Fri, 2016-03-11 07:23

A test upgrade from 11g to 12c and conversion to a container and pluggable database recently pointed out some important 12c behavior differences with respect to the DBCA and whether or not it automatically applies PSUs installed in the Oracle Home.

The original objective was to take an existing database and upgrade it to and convert it to a PDB.

From a high level the procedure was:

  • Install the Oracle software and apply the latest PSU (in this case the JAN2016 PSU).
  • Create a new CDB to house the upgraded database.
  • Upgrade the database to in-place using the DBUA.
  • Convert the upgraded 12c database to a PDB (via the clone through DB link method).

Seemed pretty straight forward. However as part of the PDB conversion (running of the noncdb_to_pdb.sql script), the following error was encountered:

  2     threads pls_integer := &&1;
  3  BEGIN
  4     utl_recomp.recomp_parallel(threads);
  5  END;
  6  /
ERROR at line 1:
ORA-04045: errors during recompilation/revalidation of SYS.DBMS_SQLPATCH
ORA-00600: internal error code, arguments: [kql_tab_diana:new dep], [0x0CF59D0B8], [0x7F1525B91DE0], [1], [2], [], [], [], [], [], [], []
ORA-06512: at "SYS.DBMS_UTILITY", line 1294
ORA-06512: at line 1


The noncdb_to_pdb.sql script can only be run once so at this point the PDB conversion has failed and must be restarted. But first we must understand what went wrong or what steps we missed.

Root Cause: DBCA no longer automatically applies PSUs

It’s obvious from the ORA-04045 error that the issue is related to patching. But the question is still what was missed in the process since the 12c Oracle Home was fully patched before creating or upgrading any databases?

The problem is that DBAs have maybe become complacent with respect to PSU applications after creating databases. With Oracle Database 11g whenever we created a database via the DBCA, the latest PSU was automatically applied. It doesn’t matter if we created the database from a template or used a custom install. Regardless of which DBCA method was used, after DB creation we’d see something similar to:

SQL> select comments, action_time from dba_registry_history
  2  where bundle_series like '%PSU' order by 2;

COMMENTS                       ACTION_TIME
------------------------------ ------------------------------
PSU            04-MAR-16 PM



Clearly the latest PSU (JAN2016 in this case) installed in the Oracle Home was applied automatically by the DBCA. And of course this is reflected in the official README documentation (in this example for DB PSU patch 21948347 [JAN2016] – requires a My Oracle Support login to view) which states:

There are no actions required for databases that have been upgraded or created after installation of PSU


However this functionality has completely changed with Oracle Database 12c! The change in behaviour is documented in My Oracle Support (MOS) Note: “12.1:DBCA (Database Creation) does not execute “datapatch” (Doc ID 2084676.1)” which states:

DBCA does not execute datapatch in Oracle 12.1.0.X. The solution is to apply the SQL changes manually after creating a new Database


Similarly the 12c JAN2016 DB PSU (patch 21948354) README documentation states:

You must execute the steps in Section 3.3.2, “Loading Modified SQL Files into the Database” for any new or upgraded database.


This is a significant change in behaviour and is the root cause of the PDB creation error!


Resolving the “ORA-00600 [kql_tab_diana:new dep]” error

Back to the CDB creation error, the first logical place to check whenever experiencing plug-in or PDB creation errors is the PDB_PLUG_IN_VIOLATIONS view:

SQL> CREATE PLUGGABLE DATABASE MY_PROD FROM NON$CDB@clone_link FILE_NAME_CONVERT=('/u01/app/oracle/oradata/MY_PROD','/u01/app/oracle/oradata/CPRD1/MY_PROD');

Pluggable database created.

SQL> SELECT name, type, status, message, action FROM pdb_plug_in_violations ORDER BY 1,2;

NAME     TYPE      STATUS    MESSAGE                                  ACTION
-------- --------- --------- ---------------------------------------- ----------------------------------------
MY_PROD  ERROR     PENDING   PDB plugged in is a non-CDB, requires no Run noncdb_to_pdb.sql.
                             ncdb_to_pdb.sql be run.

MY_PROD  WARNING   PENDING   CDB parameter compatible mismatch: Previ Please check the parameter in the curren
                             ous '' Current ''    t CDB

MY_PROD  WARNING   PENDING   Service name or network name of service  Drop the service and recreate it with an
                             MY_PROD in the PDB is invalid or conflic  appropriate name.
                             ts with an existing service name or netw
                             ork name in the CDB.



Nothing there is really concerning yet. It’s pretty much what we’d expect to see at this point. However, taking the next step in the PDB clone process encounters the error:


Session altered.

SQL> @$ORACLE_HOME/rdbms/admin/noncdb_to_pdb.sql
  2     threads pls_integer := &&1;
  3  BEGIN
  4     utl_recomp.recomp_parallel(threads);
  5  END;
  6  /
ERROR at line 1:
ORA-04045: errors during recompilation/revalidation of SYS.DBMS_SQLPATCH
ORA-00600: internal error code, arguments: [kql_tab_diana:new dep],
[0x062623070], [0x7FB582065DE0], [1], [2], [], [], [], [], [], [], []
ORA-06512: at "SYS.DBMS_UTILITY", line 1294
ORA-06512: at line 1

Disconnected from Oracle Database 12c Enterprise Edition Release - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options


Now looking in the PDB_PLUG_IN_VIOLATIONS view the error is evident:

SQL> SELECT name, type, status, message, action FROM pdb_plug_in_violations ORDER BY 1,2;

NAME     TYPE      STATUS    MESSAGE                                            ACTION
-------- --------- --------- -------------------------------------------------- --------------------------------------------------
MY_PROD  ERROR     PENDING   SQL patch ID/UID 22139226/19729684 (Database PSU 1 Call datapatch to install in the PDB or the CDB
                   , Oracle JavaVM Component (Jan2016))
                             : Installed in the PDB but not in the CDB.

MY_PROD  ERROR     PENDING   PDB plugged in is a non-CDB, requires noncdb_to_pd Run noncdb_to_pdb.sql.
                             b.sql be run.

MY_PROD  WARNING   RESOLVED  Service name or network name of service MY_PROD in Drop the service and recreate it with an appropria
                              the PDB is invalid or conflicts with an existing  te name.
                             service name or network name in the CDB.

MY_PROD  WARNING   RESOLVED  CDB parameter compatible mismatch: Previous '11.2. Please check the parameter in the current CDB
                             0.4.0' Current ''

MY_PROD  WARNING   PENDING   Database option OLS mismatch: PDB installed versio Fix the database option in the PDB or the CDB
                             n NULL. CDB installed version

MY_PROD  WARNING   PENDING   Database option DV mismatch: PDB installed version Fix the database option in the PDB or the CDB
                              NULL. CDB installed version

6 rows selected.



At this point since the CDB clone has failed and since the noncdb_to_pdb.sql script cannot be run twice, the new PDB should be dropped. Resolving the root cause of the error by patching and then repeating the clone is necessary.

Applying the PSU

Fortunately the fix is conceptually simple: apply the PSU patch into the database. Though the catch is that I actually had installed the “Combo of OJVM PSU and DB PSU (Jan 2016)” bundle patch (22191659) into the Oracle Home. This combo includes the DB PSU (patch 21948354) plus the OJVM PSU (patch 22139226). And while the DB PSU can be applied without outage, the OJVM patch cannot. Instead for the OJVM patch or the combo, the CDB and the PDBs must all be restarted in UPGRADE mode.

Restarting in UPGRADE mode is fine in this case study where the CDB was just recently created to house the newly upgraded PDB. But if trying to plug the new database into an existing CDB with other applications running in production, shutting down the entire CDB to run datapatch may cause a problem.

Following the README documentation for the just the JAN2016 DB PSU (patch 21948354) doesn’t help. It states that the patch can be applied the database and pluggable databases open (section “3.3.2 Loading Modified SQL Files into the Database“). However because I’ve installed the combo patch into the Oracle Home, trying to patch with the database open will cause the patching to fail:

$ ./datapatch -verbose
SQL Patching tool version on Fri Mar  4 15:45:27 2016
Copyright (c) 2015, Oracle.  All rights reserved.

Log file for this invocation: /u01/app/oracle/cfgtoollogs/sqlpatch/sqlpatch_3260_2016_03_04_15_45_27/sqlpatch_invocation.log

Connecting to database...OK
Note:  Datapatch will only apply or rollback SQL fixes for PDBs
       that are in an open state, no patches will be applied to closed PDBs.
       Please refer to Note: Datapatch: Database 12c Post Patch SQL Automation
       (Doc ID 1585822.1)
Bootstrapping registry and package to current versions...done
Determining current state...done

Current state of SQL patches:
Patch 22139226 (Database PSU, Oracle JavaVM Component (Jan2016)):
  Installed in the binary registry only
Bundle series PSU:
  ID 160119 in the binary registry and not installed in any PDB

Adding patches to installation queue and performing prereq checks...
Installation queue:
  For the following PDBs: CDB$ROOT PDB$SEED
    Nothing to roll back
    The following patches will be applied:
      22139226 (Database PSU, Oracle JavaVM Component (Jan2016))
      21948354 (Database Patch Set Update : (21948354))

Error: prereq checks failed!
  patch 22139226: The pluggable databases that need to be patched must be in upgrade mode
Prereq check failed, exiting without installing any patches.

Please refer to MOS Note 1609718.1 and/or the invocation log
for information on how to resolve the above errors.

SQL Patching tool complete on Fri Mar  4 15:45:52 2016


The solution to this error is to start the CDB and PDBs in UPGRADE mode (as per the OJVM patch documentation) and then re-run datapatch:

SQL> shutdown immediate;
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> startup upgrade;
ORACLE instance started.

Total System Global Area 2097152000 bytes
Fixed Size                  2926320 bytes
Variable Size             603982096 bytes
Database Buffers         1476395008 bytes
Redo Buffers               13848576 bytes
Database mounted.
Database opened.
SQL> alter pluggable database all open upgrade;

Pluggable database altered.

SQL> exit
Disconnected from Oracle Database 12c Enterprise Edition Release - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options

$ ./datapatch -verbose
SQL Patching tool version on Fri Mar  4 15:50:59 2016
Copyright (c) 2015, Oracle.  All rights reserved.

Log file for this invocation: /u01/app/oracle/cfgtoollogs/sqlpatch/sqlpatch_5137_2016_03_04_15_50_59/sqlpatch_invocation.log

Connecting to database...OK
Note:  Datapatch will only apply or rollback SQL fixes for PDBs
       that are in an open state, no patches will be applied to closed PDBs.
       Please refer to Note: Datapatch: Database 12c Post Patch SQL Automation
       (Doc ID 1585822.1)
Bootstrapping registry and package to current versions...done
Determining current state...done

Current state of SQL patches:
Patch 22139226 (Database PSU, Oracle JavaVM Component (Jan2016)):
  Installed in the binary registry only
Bundle series PSU:
  ID 160119 in the binary registry and not installed in any PDB

Adding patches to installation queue and performing prereq checks...
Installation queue:
  For the following PDBs: CDB$ROOT PDB$SEED
    Nothing to roll back
    The following patches will be applied:
      22139226 (Database PSU, Oracle JavaVM Component (Jan2016))
      21948354 (Database Patch Set Update : (21948354))

Installing patches...
Patch installation complete.  Total patches installed: 8

Validating logfiles...
Patch 22139226 apply (pdb CDB$ROOT): SUCCESS
  logfile: /u01/app/oracle/cfgtoollogs/sqlpatch/22139226/19729684/22139226_apply_CPRD1_CDBROOT_2016Mar04_15_51_23.log (no errors)
Patch 21948354 apply (pdb CDB$ROOT): SUCCESS
  logfile: /u01/app/oracle/cfgtoollogs/sqlpatch/21948354/19553095/21948354_apply_CPRD1_CDBROOT_2016Mar04_15_51_24.log (no errors)
Patch 22139226 apply (pdb PDB$SEED): SUCCESS
  logfile: /u01/app/oracle/cfgtoollogs/sqlpatch/22139226/19729684/22139226_apply_CPRD1_PDBSEED_2016Mar04_15_51_28.log (no errors)
Patch 21948354 apply (pdb PDB$SEED): SUCCESS
  logfile: /u01/app/oracle/cfgtoollogs/sqlpatch/21948354/19553095/21948354_apply_CPRD1_PDBSEED_2016Mar04_15_51_29.log (no errors)
SQL Patching tool complete on Fri Mar  4 15:51:31 2016


Now retrying the CDB cloning process:


Session altered.

SQL> @$ORACLE_HOME/rdbms/admin/noncdb_to_pdb.sql

SQL> alter session set container = "&pdbname";

Session altered.

SQL> -- leave the PDB in the same state it was when we started
  2    execute immediate '&open_sql &restricted_state';
  5    BEGIN
  6      IF (sqlcode  -900) THEN
  7        RAISE;
  8      END IF;
  9    END;
 10  END;
 11  /

PL/SQL procedure successfully completed.


Warning: PDB altered with errors.

SQL> connect / as sysdba
SQL> SELECT name, type, status, message, action FROM pdb_plug_in_violations ORDER BY 1,2;

NAME     TYPE      STATUS    MESSAGE                                            ACTION
-------- --------- --------- -------------------------------------------------- --------------------------------------------------
MY_PROD  ERROR     RESOLVED  PDB plugged in is a non-CDB, requires noncdb_to_pd Run noncdb_to_pdb.sql.
                             b.sql be run.

MY_PROD  ERROR     PENDING   PSU bundle patch 160119 (Database Patch Set Update Call datapatch to install in the PDB or the CDB
                              : (21948354)): Installed in the C
                             DB but not in the PDB.

MY_PROD  WARNING   RESOLVED  Service name or network name of service MY_PROD in Drop the service and recreate it with an appropria
                              the PDB is invalid or conflicts with an existing  te name.
                             service name or network name in the CDB.

MY_PROD  WARNING   PENDING   Database option OLS mismatch: PDB installed versio Fix the database option in the PDB or the CDB
                             n NULL. CDB installed version

MY_PROD  WARNING   PENDING   Database option DV mismatch: PDB installed version Fix the database option in the PDB or the CDB
                              NULL. CDB installed version

MY_PROD  WARNING   RESOLVED  CDB parameter compatible mismatch: Previous '11.2. Please check the parameter in the current CDB
                             0.4.0' Current ''

6 rows selected.



Note that first time the error was related to the OJVM PSU patch and stated that the PDB was patched but the CDB was not. Now after patching the CDB the error message states that the DB PSU patch is installed in the CDB but not the PDB.

Again the solution is to run datapatch one more time. Fortunately since we’re only patching a PDB, we no longer need to worry about starting the CDB and PDBs in UPGRADE mode to apply the OJVM patch. The OJVM patch does not apply to the PDBs.  Hence we can patch successfully with both the CDB and PDBs open:

$ ./datapatch -verbose
SQL Patching tool version on Fri Mar  4 16:19:06 2016
Copyright (c) 2015, Oracle.  All rights reserved.

Log file for this invocation: /u01/app/oracle/cfgtoollogs/sqlpatch/sqlpatch_9245_2016_03_04_16_19_06/sqlpatch_invocation.log

Connecting to database...OK
Note:  Datapatch will only apply or rollback SQL fixes for PDBs
       that are in an open state, no patches will be applied to closed PDBs.
       Please refer to Note: Datapatch: Database 12c Post Patch SQL Automation
       (Doc ID 1585822.1)
Bootstrapping registry and package to current versions...done
Determining current state...done

Current state of SQL patches:
Patch 22139226 (Database PSU, Oracle JavaVM Component (Jan2016)):
  Installed in binary and CDB$ROOT PDB$SEED MY_PROD
Bundle series PSU:
  ID 160119 in the binary registry and ID 160119 in PDB CDB$ROOT, ID 160119 in PDB PDB$SEED

Adding patches to installation queue and performing prereq checks...
Installation queue:
  For the following PDBs: CDB$ROOT PDB$SEED
    Nothing to roll back
    Nothing to apply
  For the following PDBs: MY_PROD
    Nothing to roll back
    The following patches will be applied:
      21948354 (Database Patch Set Update : (21948354))

Installing patches...
Patch installation complete.  Total patches installed: 1

Validating logfiles...
Patch 21948354 apply (pdb MY_PROD): SUCCESS
  logfile: /u01/app/oracle/cfgtoollogs/sqlpatch/21948354/19553095/21948354_apply_CPRD1_MY_PROD_2016Mar04_16_19_31.log (no errors)
SQL Patching tool complete on Fri Mar  4 16:19:32 2016


And finally the cloned PDB can be opened successfully:


Pluggable database altered.


Pluggable database altered.

SQL> show pdbs

    CON_ID CON_NAME                       OPEN MODE  RESTRICTED
---------- ------------------------------ ---------- ----------
         2 PDB$SEED                       READ ONLY  NO
         3 MY_PROD                        READ WRITE NO

SQL> SELECT name, type, status, message, action FROM pdb_plug_in_violations ORDER BY 1,2;

NAME     TYPE      STATUS    MESSAGE                                            ACTION
-------- --------- --------- -------------------------------------------------- --------------------------------------------------
MY_PROD  ERROR     RESOLVED  PDB plugged in is a non-CDB, requires noncdb_to_pd Run noncdb_to_pdb.sql.
                             b.sql be run.

MY_PROD  ERROR     RESOLVED  PSU bundle patch 160119 (Database Patch Set Update Call datapatch to install in the PDB or the CDB
                              : (21948354)): Installed in the C
                             DB but not in the PDB.

MY_PROD  WARNING   RESOLVED  Service name or network name of service MY_PROD in Drop the service and recreate it with an appropria
                              the PDB is invalid or conflicts with an existing  te name.
                             service name or network name in the CDB.

MY_PROD  WARNING   PENDING   Database option OLS mismatch: PDB installed versio Fix the database option in the PDB or the CDB
                             n NULL. CDB installed version

MY_PROD  WARNING   PENDING   Database option DV mismatch: PDB installed version Fix the database option in the PDB or the CDB
                              NULL. CDB installed version

MY_PROD  WARNING   RESOLVED  CDB parameter compatible mismatch: Previous '11.2. Please check the parameter in the current CDB
                             0.4.0' Current ''

6 rows selected.


The warnings marked as “PENDING” can be safely ignored.


What started out as an issue when cloning a non-CDB into a PDB led to some learning about patching with Oracle Database 12c.

The most important take-away is that Oracle Database 12c introduces a change in behaviour when it comes to patch applications through the DBCA. This change is well documented in both the patch and MOS documents. So if a DBA reads through the documentation thoroughly, they won’t have a problem.  However if the DBA is used to doing things the “old way” and only skims through the documentation they may unexpectedly get caught with errors such as the ORA-00600 encountered when creating a PDB through cloning.


My Oracle Support (MOS) Documents:

  • 12.1:DBCA(Database Creation) does not execute “datapatch” (Doc ID 2084676.1)
  • How to Convert Non PDB to PDB Database in 12c – Testcase (Doc ID 2012448.1)
  • How to migrate an existing pre12c database(nonCDB) to 12c CDB database ? (Doc ID 1564657.1)
  • Complete Checklist for Upgrading to Oracle Database 12c Release 1 using DBUA (Doc ID 1516557.1)

Pythian Blogs:


Categories: DBA Blogs

Centralized authentication with Red Hat Directory Server for Linux systems

Thu, 2016-03-10 15:14

User management on Linux systems can be tedious, and when you add in more than 10 systems the chances are it is going to take a good amount of time for you to manage user accounts on all systems individually.

There are various tools available to overcome this, and all of these use LDAP in some way.

The same goes for Red Hat Directory Server, which is an extension of LDAP by Red Hat to provide centralized user management. Though I have primarily demonstrated integration with Red Hat Directory Server with Linux systems, it can be used on all systems which supports LDAP authentication.

You can find the official Red Hat Directory Server installation guide here.

For our test scenario I used two RHEL 5 servers named as Server101 which is the Red Hat Directory Server and Server201 which is the client.

For RHEL based systems you need to make sure that you are subscribed to RHDS repo for installing Red Hat Directory Server. If you are using CentOS or other derivatives you can use 389-Directory Server which is upstream for Red Hat Directory Server.

Once you have the prerequisite ready you can start with installation.

Installing Red Hat Directory Server

I have designated server101 as Red Hat Directory Server.

Below are the steps to Install packages required for Red Hat Directory Server.

[root@server101 ~]#yum install redhat-ds -y

yum install redhat-ds

Installing RHDS




Once the installation is complete we can move to configuring Red Hat Directory Server.


Configuring Red Hat Directory Server


[root@server101 ~]#

Once you run this command you will be prompted for inputs by the setup script which are mostly straight forward.

But there are few things that need to be taken care of before we proceed with the configuration.

We want to run the ldap service as ldap user, so create ldap user and group if its not there.

Then open the below ports on your firewall/iptables so that directory server can work properly.

  • 389 for LDAP service
  • 686 for secure LDAP service
  • 9830 for directory server admin console connectivity

You should also increase the number of file descriptors as it can help Red Hat Directory Server access files more efficiently. Editing the maximum number of file descriptors the kernel can allocate can also improve file access speeds.

  • First, check the current limit for file descriptors in  /proc/sys/fs/file-max
  • If the setting is lower than 64000, edit the /etc/sysctl.conf file, and reset the fs.file-max parameter and set it to 64000 or up.
  • Then increase the maximum number of open files on the system by editing the /etc/security/limits.conf configuration file. Add the following entry
    *        -        nofile        8192


After this we can start configuring Red Hat Directory Server with command.

Once it is executed it will prompt for inputs which are mostly self explanatory, like below. Mostly we will accept the default values, as this is a fresh installation. We will only change the system user and group which will run ldap service from nobody to ldap user we created earlier. Don’t forget to make a note of passwords for admin and Directory Manager as it will be used to login to Admin Console.


[root@server101 ~]# -k

This program will set up the Red Hat Directory and Administration Servers.

It is recommended that you have “root” privilege to set up the software.
Tips for using this program:
– Press “Enter” to choose the default and go to the next screen
– Type “Control-B” then “Enter” to go back to the previous screen
– Type “Control-C” to cancel the setup program

Would you like to continue with set up? [yes]: yes


Do you agree to the license terms? [no]: yes

Your system has been scanned for potential problems, missing patches,
etc. The following output is a report of the items found that need to
be addressed before running this software in a production

Red Hat Directory Server system tuning analysis version 10-AUGUST-2007.

NOTICE : System is i686-unknown-linux2.6.18-308.el5 (1 processor).

WARNING: 502MB of physical memory is available on the system. 1024MB is recommended for best performance on large production system.

NOTICE: The net.ipv4.tcp_keepalive_time is set to 7200000 milliseconds
(120 minutes). This may cause temporary server congestion from lost
client connections.

WARNING: There are only 1024 file descriptors (hard limit) available, which
limit the number of simultaneous connections.

WARNING: There are only 1024 file descriptors (soft limit) available, which
limit the number of simultaneous connections.

Would you like to continue? [no]: yes

Choose a setup type:

1. Express
Allows you to quickly set up the servers using the most
common options and pre-defined defaults. Useful for quick
evaluation of the products.

2. Typical
Allows you to specify common defaults and options.

3. Custom
Allows you to specify more advanced options. This is
recommended for experienced server administrators only.

To accept the default shown in brackets, press the Enter key.

Choose a setup type [2]:

Enter the fully qualified domain name of the computer
on which you’re setting up server software. Using the form

To accept the default shown in brackets, press the Enter key.

Computer name []:

The servers must run as a specific user in a specific group.
It is strongly recommended that this user should have no privileges
on the computer (i.e. a non-root user). The setup procedure
will give this user/group some permissions in specific paths/files
to perform server-specific operations.

If you have not yet created a user and group for the servers,
create this user and group using your native operating
system utilities.

System User [nobody]: ldap
System Group [nobody]: ldap

Server information is stored in the configuration directory server.
This information is used by the console and administration server to
configure and manage your servers.  If you have already set up a
configuration directory server, you should register any servers you
set up or create with the configuration server. To do so, the
following information about the configuration server is required: the
fully qualified host name of the form
<hostname>.<domainname>(e.g., the port number
(default 389), the suffix, the DN and password of a user having
permission to write the configuration information, usually the
configuration directory administrator, and if you are using security
(TLS/SSL). If you are using TLS/SSL, specify the TLS/SSL (LDAPS) port
number (default 636) instead of the regular LDAP port number, and
provide the CA certificate (in PEM/ASCII format).

If you do not yet have a configuration directory server, enter ‘No’ to
be prompted to set up one.

Do you want to register this software with an existing
configuration directory server? [no]:

Please enter the administrator ID for the configuration directory
server. This is the ID typically used to log in to the console.  You
will also be prompted for the password.

Configuration directory server
administrator ID [admin]:
Password (confirm):

The information stored in the configuration directory server can be
separated into different Administration Domains. If you are managing
multiple software releases at the same time, or managing information
about multiple domains, you may use the Administration Domain to keep
them separate.

If you are not using administrative domains, press Enter to select the
default. Otherwise, enter some descriptive, unique name for the
administration domain, such as the name of the organization
responsible for managing the domain.

Administration Domain []:

The standard directory server network port number is 389. However, if
you are not logged as the superuser, or port 389 is in use, the
default value will be a random unused port number greater than 1024.
If you want to use port 389, make sure that you are logged in as the
superuser, that port 389 is not in use.

Directory server network port [389]:

Each instance of a directory server requires a unique identifier.
This identifier is used to name the various
instance specific files and directories in the file system,
as well as for other uses as a server instance identifier.

Directory server identifier [server101]:

The suffix is the root of your directory tree.  The suffix must be a valid DN.
It is recommended that you use the dc=domaincomponent suffix convention.
For example, if your domain is,
you should use dc=example,dc=com for your suffix.
Setup will create this initial suffix for you,
but you may have more than one suffix.
Use the directory server utilities to create additional suffixes.

Suffix [dc=example, dc=com]:

Certain directory server operations require an administrative user.
This user is referred to as the Directory Manager and typically has a
bind Distinguished Name (DN) of cn=Directory Manager.
You will also be prompted for the password for this user. The password must
be at least 8 characters long, and contain no spaces.
Press Control-B or type the word “back”, then Enter to back up and start over.

Directory Manager DN [cn=Directory Manager]:
Password (confirm):

The Administration Server is separate from any of your web or application
servers since it listens to a different port and access to it is

Pick a port number between 1024 and 65535 to run your Administration
Server on. You should NOT use a port number which you plan to
run a web or application server on, rather, select a number which you
will remember and which will not be used for anything else.

Administration port [9830]:

The interactive phase is complete.  The script will now set up your
servers.  Enter No or go Back if you want to change something.

Are you ready to set up your servers? [yes]:
Creating directory server . . .
Your new DS instance ‘server101’ was successfully created.
Creating the configuration directory server . . .
Beginning Admin Server creation . . .
Creating Admin Server files and directories . . .
Updating adm.conf . . .
Updating admpw . . .
Registering admin server with the configuration directory server . . .
Updating adm.conf with information from configuration directory server . . .
Updating the configuration for the httpd engine . . .
Starting admin server . . .
The admin server was successfully started.
Admin server was successfully created, configured, and started.
Exiting . . .
Log file is ‘/tmp/setupZa3jGe.log’

[root@server101 ~]#







Now that we have installed and configured Red Hat Directory Server its not set to autostart during system boot.

So we need to make Red Hat directory service and redhat directory console admin service to start at boot.

[root@server101 ~]# chkconfig dirsrv-admin --list 
dirsrv-admin   0:off1:off2:off3:off4:off5:off6:off 
[root@server101 ~]# chkconfig dirsrv --list 
dirsrv         0:off1:off2:off3:off4:off5:off6:off 
[root@server101 ~]# chkconfig dirservrv on 
[root@server101 ~]# chkconfig dirsrv-admin on 
[root@server101 ~]#

Now that we have our server ready, we need to add a user to it.

We will use Directory Server admin console to connect to the GUI and will create ldap user from there.

We can invoke directory server admin console gui with redhat-idm-console. It will open a GUI like below.

Directory Server Admin Console GUI

Directory Server Admin Console GUI

The user id is directory manager which was created during directory server setup, mostly it will be default as cn=Directory Manager. Now put your password and Administration url as http://server101:9830.

Directory Server Admin Console

Directory Server Admin Console

Once you login you will be presented with console screen as below.


Now click on Users and Groups tab and then click on create button, there select user from the menu.

RHDS031  RHDS032

Now Select organizational unit, mostly we will use the default and will select people from the list as below.


It will open Create User menu.


Now we will create ldapuser account as shown below. Fill in required details. Also select posix user tab as we need the account for unix system login. Fill up required details for posix account as well.

RHDS035   RHDS036


Now that we have created user account we can start configuring client.


Configuring Linux client for LDAP login

I have created server201 which we will configure for LDAP login.

For that we need to execute authconfig-tui from console.

It will open a terminal ui to configure authconfig to use LDAP.

[root@server201 pam.d]# authconfig-tui


Select Use LDAP for user information.


Select Use LDAP Authentication.


After this we need to make sure when user login on the server with LDAP authentication the home dir is created automatically, which is not enabled by default.

We can do this by executing below command at console.

[root@server201 pam.d]# authconfig –enablemkhomedir –update


Once this is done you can now use your ldap user to login to client server.

Now that we have created LDAP, we can use it to centralized login for all linux systems in the environment.

The user management is easy from single location.

We can also configure TLS and do replication for redundancy.

We can define schema and policies as well but that is for another time.


Categories: DBA Blogs

Log Buffer #464: A Carnival of the Vanities for DBAs

Thu, 2016-03-10 13:42

This Log Buffer Edition delves deep into the realms of Oracle, SQL Server and MySQL while gathering up some nifty blog posts for this week.


Speed, Security, and Best Practices in the Cloud: Oracle Releases Market-Leading Retail Demand Forecasting Solution

OBIEE 12c – Your Answers After Upgrading

Using the SQL ACCESS Advisor PL/SQL interface

How has JD Edwards EnterpriseOne 9.2 Transformed your Business?

In the article you will have a look at an example of configuring Fast Start Failover (FSFO).

SQL Server:

How to show Quarters Dynamically in SQL

Azure SQL Data Warehouse is a fully-managed and scalable cloud service. It is still in preview, but solid.

The occasional problems that you can get with POST and GET are typical of the difficulties of separating any command and query operations.

4 Convenient Ways To Run PowerShell Scripts

10 New Features Worth Exploring in SQL Server 2016


Maintaining mission critical databases on our pitchfork wielding brother, the “Daemon” of FreeBSD, seems quite daunting, or even absurd, from the perspective of a die-hard Linux expert, or from someone who has not touched it in a long time.

Planets9s: Sign up for our best practices webinar on how to upgrade to MySQL 5.7

Using jemalloc heap profiling with MySQL

Sometimes a Variety of Databases is THE Database You Need

Taking the new MySQL 5.7 JSON features for a test drive

Categories: DBA Blogs

Debugging Kibana using Chrome developer tools

Tue, 2016-03-08 17:53

Amazon Elasticsearch Service is a managed service to implement Elasticsearch in AWS. Underlying instances are managed by AWS and interaction with the service is available through API and AWS GUI.

Kibana is also integrated with Amazon Elasticsearch Service. We came across an issue which caused Kibana4 to show the following error message, when searching for *.

Courier Fetch: 10 of 60 shards failed.

Error is not very descriptive.

As Amazon Elasticsearch service is an endpoint only and we do not have direct access to the instances. We also have access to few API tools.

We decided to see what can be found from the chrome browser.

The Chrome Developer Tools (DevTools) contains lots of useful debugging possibilities.

DevTools can be started using several methods.

1. Right click and click Inspect.
2. From Menu -> More Tools -> Developer Tools
3. Press F12

Network tab under DevTools can be used to debug wide variety of issues. It records every requests made when a web page is loading. It captures wide range of information about every request like HTTP access Method, status and time took to complete the request etc.

By clicking on any of the requested resource, we will be able to get more information on the request.

In this case, the interesting bit was under the Preview tab. The Preview tab captures the data chrome got back from the search and store it as objects.

A successful query would look like the image below captured from Kibana3 of public website


We checked “_msearch?timeout=3000..” and received following errors messages under the nested values (For example “responses” -> “0” -> “_shards” -> “failures” -> “0”)

{index: “logstash-2016.02.24”, shard: 1, status: 500,…}index: “logstash-2016.02.24″reason: “RemoteTransportException[[Leech][inet[/]][indices:data/read/search[phase/query]]]; nested: ElasticsearchException[org.elasticsearch.common.breaker.CircuitBreakingException: [FIELDDATA] Data too large, data for [@timestamp] would be larger than limit of [5143501209/4.7gb]]; nested: UncheckedExecutionException[org.elasticsearch.common.breaker.CircuitBreakingException: [FIELDDATA] Data too large, data for [@timestamp] would be larger than limit of [5143501209/4.7gb]]; nested: CircuitBreakingException[[FIELDDATA] Data too large, data for [@timestamp] would be larger than limit of [5143501209/4.7gb]]; “shard: 1status: 500

So the issue is clear, fielddata usage is above the limit.

As per Amazon documentation,

Field Data Breaker –
Percentage of JVM heap memory allowed to load a single data field into memory. The default value is 60%. We recommend raising this limit if you are uploading data with large fields.
For more information, see Field data in the Elasticsearch documentation.

Following url documents the supported Amazon Elasticsearch operations.

On checking the current heap usage (second column) of the data nodes, we can see that heap usage is very high,

$ curl -XGET “”
host ip heap.percent ram.percent load node.role master name
x.x.x.x   10   85   0.00   –   m   Drax the Destroyer
x.x.x.x   7   85   0.00   –   *   H.E.R.B.I.E.
x.x.x.x   78   64   1.08   d   –   Black Cat
x.x.x.x   80   62   1.41   d   – Leech
x.x.x.x   7   85   0.00   –   m   Alex
x.x.x.x   78   63   0.27   d   –   Saint Anna
x.x.x.x   80   63   0.28   d   –   Martinex
x.x.x.x   78   63   0.59   d   –   Scorpio

Following command can be used to increase the indices.breaker.fielddata.limit value. This can be used as a workaround.

$ curl -XPUT -d ‘{ “persistent” : { “indices.breaker.fielddata.limit” : “89%” } }’

Running the command allowed the kibana search to run without issues and show the data.

The real solution would be to increase the number of nodes or reduce the amount of field data that need to be loaded by limiting number of indexes.

AWS Lamda can be used to to run a script to cleanup indices as a scheduled event.

Categories: DBA Blogs

Can The Public Cloud Meet the Needs of Your Enterprise Applications?

Tue, 2016-03-08 12:19


Any applications your company runs on premise can also be run in the public cloud. But does that mean they should be?

While the cloud offers well-documented benefits of flexibility, scalability, and cost efficiency, some applications — and especially business-critical enterprise applications — have specific characteristics that can make them tricky to move into a public cloud environment.

That’s not to say you shouldn’t consider the cloud as an option, but you should be aware of the following enterprise application needs before you make any migration decisions:

1. Highly customized infrastructure

Enterprise applications often rely on software components that are uniquely configured: they may need very specific storage layouts and security settings or tight integration with certain third-party tools. That makes it hard to replace them with generic platform-as-a-service (PaaS) alternatives in the cloud.
The same is true on the infrastructure side: application software components often need particular network configurations and controls that aren’t available from a typical infrastructure-as-a-service (IaaS) offering. (An example would be the way Oracle Real Application Clusters have to allow the cluster software to manipulate network settings, such as controlling IP addresses and network interfaces.)

2. Tightly coupled components

Today’s cloud application architectures are based on “microservices” — collections of services that perform specific tasks. When combined, these answer the whole of the application requirements. With enterprise applications, there are so many interdependencies between the various software components that it can be extremely difficult to change, upgrade, move, or scale an individual component without having a huge impact on the rest of the system.

3. Siloed IT departments

Enterprise applications are usually supported by siloed enterprise IT operations — DBAs, system administrators, storage administrators, network administrators and the like — each with their own responsibilities. Cloud deployment, on the other hand, requires much greater focus on collaboration across the IT environment. This means breaking down traditional silos to create full-stack teams with vertical application ownership. Some teams are likely to resist this change as they could end up with significantly less work and responsibility once the management of application components has shifted to the cloud vendor. So migrating to the cloud isn’t just a technical decision; it has people-process implications, too.

4. Costly infrastructure upgrades

Every company knows upgrading enterprise applications is a major undertaking and can often cause downtime and outages. This is true when the application stays inside your own data center — and doubly so when it moves to a cloud provider due to how long it takes to move massive amounts of data through the Internet and risks associated with unknown issues on the new virtual platform. For these reasons, significant financial commitment is often required to build and maintain an IT team with the right skills to do upgrades quickly and effectively as well as maintain the system.

5. Inflexible licensing models

The components used in enterprise applications are often proprietary products with licensing models that are not compatible with the elasticity of the cloud. For example, many Oracle licenses are for legacy applications and can used only on particular systems. Transferring those licenses to a cloud-based infrastructure is not an easy task.

In addition, perpetual software licenses are often not portable to the typical pay-as-you-go model used by most cloud providers. Plus, most software vendors don’t have any incentive to transition their customers from locked-in perpetual licenses with a steady maintenance revenue stream to a model that allows them to switch to a competitive product at any time.

Even though the nature of enterprise applications makes them difficult to migrate to the cloud, the benefits of doing so — in costs savings, availability, and business agility — still make it a very compelling proposition. In my next blog, I’ll take a look at some of the paths available to you should you decide to move your enterprise applications to the public cloud.

For more on this topic, check out our white paper on Choosing the Right Public Cloud Platform For Your Enterprise Applications Built on Oracle Database.



Categories: DBA Blogs

Sources of Inspiration on International Women’s Day

Tue, 2016-03-08 08:58

True inspiration comes in many forms and I consider myself fortunate to be inspired every day by the women around me. International Women’s Day is the perfect opportunity to reflect on the women in our lives who positively influence us.

This post is my heartfelt thank you to the women in my ‘circle’ who have made an indelible mark in my life. They are women who continue to inspire, challenge and motivate me.

The women on Pythian HR’s team: These women continually teach me valuable life lessons. They are mothers, partners, sisters, care providers, aunts, cousins, and friends to many. They are strong, spirited, supportive and have generous natures that are contagious. They demonstrate an unwavering commitment to working hard, they’re incredibly talented and they have a steady focus on doing what’s best for our employees. These women go above and beyond and approach every puzzle with optimism.

My mother:  My mother is the most positive and ‘glass half full’ person that I know. She is a person who never fails to find the bright side to life’s most thought-provoking issues and one of her favourite questions to ask her loved ones is “Are You Happy?” (Spoiler alert: she’s not satisfied unless the answer is a truthful “yes”). Her love, guidance and support have helped sustain me through so much and over the years she has evolved into my BFF.

My friend, Jen:  Jen is a breast cancer survivor who decided to fight back and co-found Vixens Victorious. In October 2015, the dynamic duo of Vixens Victorious successfully launched Lights! Camera! CURE! which showcases female film makers from Canada and the proceeds go to support the Ottawa Regional Cancer Society. Jen’s positive spirit and take charge attitude empowers everyone who meets her.

My friend, Kate:  Kate moved to Canada with her three month old daughter to start a new journey with her husband. She took the initiative to make new friends, develop a network and often navigate a new city on her own when her partner travelled for work. Kate isn’t one to complain about adversities in life; she is courageous and gratefully embraces her adventure.

My fitness trainer JulesJules gets out of bed every morning, puts on her workout gear and travels across Ottawa to provide the most fun and effective workouts to her clients. She generously shares her own personal health journey and always finds a way to connect with her clients so they can experience the one on one attention they need. She is full of greatness.

Our family physician, Dr. Judy:  Dr. Judy’s medical practice is thriving because of her commitment to patient care. She ensures you are her priority in the moments that you are with her. She makes each of her patients feel important, cared for and heard. Dr. Judy emulates a kind and caring nature that everyone could benefit from.

My neighbor, Anne Marie:  In her late forties, Anne Marie taught herself to swim so she could begin competing in triathlons. She now travels internationally to compete in races. I’m inspired by her hard work, determination and strategic ability to set and meet goals.

The influences (sometimes subtle) of these women make an impact on how I choose to live my life. I am thankful for all of them.

On this International Women’s Day, I encourage you to think about who inspires you and why. Bonus points if you honour them with words of appreciation!

Categories: DBA Blogs

SQL Server on Linux – No April fools

Mon, 2016-03-07 20:58

Years ago, I got burned by an “April Fools” joke published by Steve Jones on He republished it as one of his favorites here.

Naturally, I had to rub my eyes today when I read that Microsoft announced that SQL Server 2016 would be coming to Linux.

There were mixed reactions on the internal SQL Server teams. I was afraid to respond to the thread, fearing I would get burned again. I quickly checked the date to confirm that the article hadn’t been resurrected.

One of the sentiments expressed in our internal chatter was that some of the DBAs love “Satya’s Microsoft” and I agree. I like what they’re doing, but I am very skeptical about the move to port SQL Server onto Linux.

I doubt this will enable new markets and new customer bases. I don’t think there are any large organizations who will suddenly decide to adopt the product because it will run on Linux.

One theory was that this move was to attract new developers who want to deploy multi-platform tech stacks. That could certainly be right, but I think PaaS satisfies that and many of the startup natures.

Other articles I read theorized it was a move towards SQL Server on Linux-powered containers.

I’m wondering what this will mean for future features. Will PowerShell or .NET be ported to Linux? What will change in the security model? Will clustering be available? Will a more RAC-like feature be available?

These are very interesting times and while this wasn’t a move that I was pining for, or even expected, I am excited to see where this is going.

I “applied” to test one of the early versions, and you can too.

What do you think? Are you excited about running SQL Server on Linux? When would you choose Linux over Windows?

Categories: DBA Blogs