Skip navigation.

Pythian Group

Syndicate content
Official Pythian Blog - Love Your Data
Updated: 11 hours 38 min ago

Log Buffer #375, A Carnival of the Vanities for DBAs

Fri, 2014-06-13 07:57

What’s better; Watching Football World Cup or Reading Log Buffer? Yes, right Log Buffer, but please also take out sometime to watch the matches, as this tournament comes only once in 4 years. No? Ok, as you say. So read along then.


Alan Hargreaves reasons as Why you should Patch NTP.

This post examines the results of a recent database protection survey conducted by Database Trends and Applications (DBTA) Magazine.

Keep jobs active with screen command.

ORE Getting Connected: ore.connect and other commands.

End-to-End ODI12c ETL on Oracle Big Data Appliance Pt.4 : Transforming Data using Python & Hive Streaming

SQL Server:

Enabling and Configuring Reporting and Logging for Maintenance Plans in SQL Server 2012

The articles collected here will help you understand the theories and methodologies behind every stage of the database delivery pipeline, starting when database changes are checked in, and ending when they’re deployed to production.

Stairway to Database Source Control Level 2: Getting a Database into Source Control .

What does it take to become a database administrator, or what kinds of traits should I be looking for when I am hiring a DBA. Those traits can be summarized it two categories: Technical and Personal.

Display all errors thrown within a catch block by a restore statement in a stored procedure using extended events.


RHEL7 & the transition from MySQL to MariaDB: A first look.

FairWarning Privacy Monitoring Solutions Rely on MySQL to Secure Patient Data

MariaDB 5.5.38 Overview and Highlights

Recruiters Looking for MySQL DBAs and MySQL Developers

MariaDB Galera Cluster 10.0.11 now available

On-disk/block-level encryption for MariaDB

Categories: DBA Blogs

Internet Scale Design: Part One

Thu, 2014-06-12 07:52

Whether your company is a small, bootstrapped startup or a rapidly growing, well-funded tier 2 or 3 business, everyone dreams of having ‘Internet scale’ systems which are highly automated, anti-fragile, self-healing, and inexpensive to maintain.

The problem, however, is that those types of complex systems are only within the reach of well-funded companies who can afford to hire elite technical resources and expensive, leading edge technology systems, right?


Recently, I was working with one of my smaller retail clients. Their business is highly seasonal, doing the vast majority of their business during two separate two-month periods each year. During these periods, the demand on their system is extremely high, and downtime would be catastrophic because of the amount of their annual revenue generated during those times.

They run on a LAMP stack (Linux Apache MySQL PHP) and they’re hosted on a mix of shared VMs, and colocation equipment.  Their hosting costs are fixed year round, even though most of the year they don’t need two thirds of it.

They were beginning their annual budget review when we brought up what the next steps were for them. About a year ago, I began introducing the idea of Amazon Web Services (AWS) cloud, taking more of a site reliability engineering (SRE) approach to their business, so we scheduled a conference call with their Chief Financial Officer, a few VPs, and one of my SREs.

“We would like to move you 100% into the Amazon Cloud,” I said. “In doing so, we will automate all of your systems using image and configuration management, and set up detailed monitoring, graphics, and trending systems. We will work with your developers to redesign your applications to take advantage of the platform and its flexibility. The end result will be a system that is considerably more anti-fragile, runs in multiple regions, offers significantly faster recovery in the event of major failures, and can scale down to very little and up to larger scale in a moment’s notice.” We assured them that the systems could be self-healing and would require very little management.

The Sr. VP of Product Management’s response surprised me.

“Bill, this sounds awesome. But we’re a pretty small company, and we couldn’t possibly afford that sort of system.” I laughed a bit and responded, “Actually, not only can you afford it, but it will save you tens of thousands of dollars.”

I explained the benefits of auto-scaling and the cost savings it would bring to their business model. We discussed how those cost savings would fund the expenses of having Pythian build and manage the new service.

His response? “This is amazing – why isn’t everyone doing this?”

The answer is twofold. I’ll be sharing my insights in part two, so stay tuned…

Categories: DBA Blogs

Availability Group – Delay with Listener Connectivity After Failover

Wed, 2014-06-11 07:45

The beauty of working for multiple clients from different industries, is that you get exposed to a myriad of environment setup and configuration. Every company has its own standards for Network and Server configuration, as well different hardware vendors.  This introduces their own kinks and excitement to your everyday work – half of which you’ll likely not encounter if you are working in-house and using the same hardware.

The past week we encountered a rare and interesting issue with High Availability Group. The issue was two-fold, first it was initially not failing over automatically one one node though that was not as exciting as the second part – when it was able to failover correctly, our client was experiencing delays with the availability of the Listener Name outside its own sub-net  after failover, automatic or otherwise.  It is reachable within its own subnet but takes more than thirty minutes to be reachable outside of it even though the Failover happened smoothly and without error.

The first part was fairly straightforward. Checking on the cluster logs and event logs the automatic failover was throwing the error below when trying to failover on one of the nodes.

Cluster network name resource 'Listener_DNS_NAME' failed registration of one or more associated DNS name(s) for the following reason:
DNS operation refused.

Ensure that the network adapters associated with dependent IP address resources are configured with at least one accessible DNS server.

The error is as it says, the Computer object does not have the appropriate permissions on the Domain to register the DNS Name Resource for the Listener.  For the cluster  to perform this operation smoothly “Authenticated Users” should have read/write all permissions on the Computer Object for the cluster, its nodes and the Listener DNS Name. To do this, Log in to the Active Directory Server

  1. Open Active Directory Users and Computers.
  2. On the View menu, select Advanced Features.
  3. Right-click the object  and then click Properties.
  4. On the Security tab, click Advanced to view all of the permission entries that exist for the object.
  5. Verify that the Authenticated Users is in the list and has the permission to Read and Write All. Add the required permissions then Save the changes.

Now after doing that and testing the fail over, it is now encountering a different error, Kerberos-related one showed below.

The Kerberos client received a KRB_AP_ERR_MODIFIED error from the server ComputerName$. The target name used was HTTP/ This indicates that the target server failed to decrypt the ticket provided by the client. This can occur when the target server principal name (SPN) is registered on an account other than the account the target service is using. Ensure that the target SPN is only registered on the account used by the server. This error can also happen if the target service account password is different than what is configured on the Kerberos Key Distribution Center for that target service. Ensure that the service on the server and the KDC are both configured to use the same password. If the server name is not fully qualified, and the target domain (Domain.COM) is different from the client domain (Doamin.COM), check if there are identically named server accounts in these two domains, or use the fully-qualified name to identify the server.

Ah, the often over-looked SPN. This should be part of your installation process – setting the SPN. To keep the story short and so we can get to the exciting part, you can refer here for the detailed instructions on how to configure the SPN for SQL Server.  Aside from registering the SPN for each of the Nodes as specified on the MSDN Link, You’ll also need to register the SPN for the Listener, as always 1433 is the port being used by your SQL Server:

setspn -A MSSQLSvc/ DOMAIN/SQLServiceAccount

This will enable Kerberos for the client connection to the Availability Group Listener and address the errors we received above. After configuring the SPN for the servers Automatic Fail over is now running smoothly, or so we thought.

The client came back to us that it was taking some time for the application to connect to the Listener Name. Checking on the cluster logs and SQL Server’s end, everything appears to be in order. No errors being thrown and Server is accessible. Now it get’s interesting. Ping test within the Database sub net is successful but ping test outside of it was timing out. It was able to connect though after a substantial amount of time.  After a few more test and checking the behavior is the same on both nodes. It takes more than thirty minutes for the Name to be reachable outside of the database sub net.  After involving the Network Admin we found out that  a MAC Address conflict is happening.  That’s our “Aha!” moment.  Windows 2003 servers and later issues a Gratuitous ARP (GARP) requests during failover. There are some switches/devices that does not forward Gratuitous ARP by default.   This causes the devices on the other end of the switch to not have the correct MAC address associated to the Name.  This causes the failure. It often corrects itself when the router detects the failures and do a broadcast and gets the correct value. That’s why it becomes accessible after some time. This KB details the issue. To address this, changes must be done on the configuration of the switches, you’ll need to check with your hard ware vendor for this.

Now, that could be the end of our problems, but after enabling the switch to forward GARP, we found out that the server itself is not sending a GARP request.  This is a server configuration issue and requires some Registry changes. Open the Registry for the server and locate the key below:


From there check if there is a key for ArpRetryCount, if there is make sure that the Value is not set to 0. The value could be between 0-3. After changing this and restarting the servers Everything works perfectly.

Last two issues are a bit rare, and something I wouldn’t have experienced if the client wasn’t using that particular hard ware and that particular standard configuration.

Categories: DBA Blogs

2014 Hadoop Summit Summary

Mon, 2014-06-09 07:41


Last week I was at the 2014 Hadoop Summit in San Jose, trying to keep abreast of the ever-changing Apache landscape: what projects are up-and-coming, what projects are ready for production, and most importantly which projects can solve problems for our clients. It was also a great chance to hear about real, production deployments – both in the halls and in some of the presentations. And Doug Cutting and Arun Murthy had an excellent panel about the future of the platform, both from a business and technology perspective.


Hadoop Security was an incredibly popular topic this year, reflecting the fact that Hadoop deployments are growing up and fulfilling their stated purpose: to consolidate organizations’ data and make it visible to everyone. “Visible to everyone” is a noble goal, but in reality PII and other sensitive information needs to be guarded, and access needs to be audited and limited. Apache Knox makes it possible to audit all user interactions with the cluster, tying user access to an existing identity management system. Cloudera Sentry provides fine-grained user permissions for Hive, HBase and Search, similar to existing RDBMSes. During the conference Cloudera also announced their acquisition of Gazzang, who make  a platform for key management and data encryption at rest in Hadoop (similar to Intel’s Project Rhino).

Booz Allen Hamilton also put on an excellent presentation about a real client system storing sensitive data on EMR using ephemeral storage – I strongly recommend looking at this as an example of what’s possible now, and also how difficult it is to implement cell or row-level security policies in Hadoop.


YARN is the new “data operating system” responsible for all computations running on your clutster. It handles container placement and resource allocation to allow multiple frameworks like MapReduce, Tez and Spark to co0exist on the same nodes without competing for resources. Applications can also be written to run directly on YARN, opening up the Hadoop cluster to support more general purpose tasks (Yahoo is apparently encoding video on YARN with low latency, although details were scarce), and making it easier for developers to provide distributed, fault-tolerant applications. Early adopters have been using YARN in production for a while, but now every major vendor is including it in their distribution, and features like the High-Availability ResourceManager (or “HARMful YARN”) are available.

Many talks from different sources (Twitter, Yahoo, HortonWorks) focused on different aspects of YARN: new features, production deployment hints, and the general architecture.


I thought with Spark becoming so popular and widely supported – in every major distribution – Spark Streaming would supplant Storm as the leading complex event processing engine. Visiting Hadoop Summit, however, it seems like Storm has plenty of momentum. It’s been ported to YARN to work seamlessly within your cluster, and multiple presentations demonstrated real-world systems running on Storm right now, as well as integrations with other technologies like R and Pig. Spark overall had nearly as many presentations, but these were more technical and theoretical: it might be another year before we see many presentations about Spark and Spark Streaming applications being deployed at scale.


Apache Falcon had two talks this summit, and it’s been incubating since last year. It caught my attention as an open-source project which is aiming to supplant existing proprietary tools. Falcon allows you to declaratively define ETL flows in terms of sources, sinks and transformations, and schedule them on a regular basis. Flows are monitored and idempotent, and late data can be handled according to user-defined rules. Right now the emphasis is on power: an XML config coordinates Hive, Pig, Oozie and distcp, but more user-friendly features like libraries of transformations and a web UI for visualizing flows will bring Falcon closer to the feature set of commerical ETL tools.

SQL on Hadoop

This space has settled down a lot since last year, when Stinger and Impala seemed to invade every track and time slot. Yahoo still put on a number of Hive-on-Tez architecture and performance reviews, and less established projects like Apache Tajo (incubating), BlinkDB, Actian’s Vortex and Facebook’s Presto made appearances. Even though performance has been increasing year over year, SQL-on-Hadoop engines are still wildly variable in their features and performance, and there aren’t any clear winners right now – new entrants still have a chance to make their mark. If you’re curious about choosing a SQL-on-Hadoop engine, check out my presentation this year surveying the landscape.

More to Watch

There were so many great presentations, it was hard to choose for every time slot. Once the videos are released I also recommend watching:

  • Ted Dunning’s stunningly simple anomaly detection
  • Jagane Sundar (of WanDisco) explaining Paxos
  • Koji Noguchi (of Yahoo) with tips about stabilising your Hadoop clusters

Were you at Hadoop Summit? What were your favourite presentations and what trends did you notice?

Categories: DBA Blogs

Log Buffer #374, A Carnival of the Vanities for DBAs

Fri, 2014-06-06 07:51

This high-spirited Log Buffer Edition covers some of the best blog posts of the previous week from Oracle, SQL Server and MySQL bloggers from across the planet.


GoldenGate 12c Trail Encryption and Credentials with Oracle Wallet.

Convert ddply {plyr} to Oracle R Enterprise, or use with Embedded R Execution.

New Rapid Install StartCD for EBS 12.2 Now Available.

Oracle XE 11.2 for Windows 64 bit is here.

Oracle Accelerate is Oracle’s approach for providing simple to deploy, packaged, enterprise-class software solutions to growing midsize organizations through its network of expert partners.

SQL Server:

The ubiquity of databases and the potentially valuable information stored in them makes them attractive targets for people who want to steal data or harm its owner by tampering with it.

Who Truncated That Table & What Do I Do Now?

SQL Server Hardware Configuration Best Practices.

The DBA Team tackles data corruption.

T-SQL User-Defined Functions: Ten Questions You Were Too Shy To Ask.


Percona Replication Manager – Renaming Cluster Hostnames.

Performance impact of MySQL query cache on modern hardware.

How To Install Nginx With PHP5 (And PHP-FPM) And MySQL Support On CentOS 6.5.

Build a MySQL Fabric Farm in one step using AWS CloudFormation.

MariaDB 10 is a Sandbox killjoy?

Categories: DBA Blogs

A Look Back on SREcon14

Thu, 2014-06-05 07:57

The landscape of the Systems Administration industry is shifting. As the trend continues towards Infrastructure as a Service (IaaS), Software as a Service (SaaS) and other *aaS offerings, the traditional role of systems administrations is being challenged. While this traditional model will remain (well beyond its expiration date) in large enterprises who have invested heavily in their hardware, processes, and infrastructure more and more companies will be looking to maximize their investment and get the most business value out of their resources.

The industry is being called on with increasing frequency to reinvent systems administration to meet the needs and the demands of a modern business and technology stack.

The traditional way that we have been doing systems administration is on the decline as the desire to break down the old silos between operations and development to maximize business value and inter-group communication and collaboration force both sides to evolve new skills, and at the core adopt new philosophies.

One such philosophy is Site Reliability Engineering, or SRE for short.

Generally accepted to have started at Google, the SRE movement has now spread well beyond to other companies such as Dropbox, Netflix, Twitter, Facebook, LinkedIn, and others.

As my colleague Bill Lincoln will explain in an upcoming post, although this has started with internet scale organizations, SRE is a philosophy put into action that companies of all sizes can benefit from.

At its core, the prime directive of SRE is reliability of a service as a whole and this subtle, yet important paradigm shift is what is driving change within the Systems Administration and Software Development industries towards a place where both groups have a unified objective of reliability and the differences between SRE and SWE become subtle and fluid.

I have been a strong advocate for the SRE philosophy as a major emerging trend in the Systems Administration space with the Pythian Leadership and was thrilled to be able to attend the USENIX Site Reliability Engineering Conference (SRECon14) which was held on Friday, May 30, 2014 in Santa Clara California USA along with two of my colleagues from the Pythian Enterprise Infrastructure Services Group.

It was a single day, but from the first keynote delivered by Ben Treynor, Vice President, Engineering and Google Site Reliability Tsar, to the final Talk by Michael “Mikey” Dickerson on how Silicon Valley’s SREs saved, the information delivered was packed full of value, and a good amount of inspiration.

With a prime directive of “reliability” the talks delivered ran the entire lifecycle of an IT Service from Designing for Reliability, Deploying at Scale, Metrics and Monitoring for Reliability, Cascading Failure of a Service and Disaster Preparedness.

The call to action was also clear; You are absolutely within your rights to not like that it is happening, but there is no denying that change is coming. We (SysAdmins and Software Engineers) can choose to evolve, we can choose to challenge ourselves and “up our game” or we can run the very real risk of being left behind.

SRECon14 was a great success and I look forward to attending the event again. I would enthusiastically recommend it to my friends and colleagues who are in systems administration / software engineering roles.

Finally I end with a Quote Ben Treynor “Let the Packets flow, and the pager remain silent”

Categories: DBA Blogs

Expanding the Couchbase Collector for Diamond

Wed, 2014-06-04 08:24
The code

For the impatient ones, the couchbase collector can be found in github:

Couchbase Collector

Follow the instructions in the README file to get it installed under your diamond!


If you have been involved with metric collections at any point you may have heard of BrightCove’s Diamond.

Diamond is literally a little piece of diamond regarding metrics collection. With its modular architecture it is possible to collect metrics from a large collection of operating system and software components.

In addition to that, it is able to ship those metrics to a diverse range of trending software including Graphite, RRD or anything that supports StatsD.

While recently working with Couchbase, I needed to collect and ship metrics using Diamond; a github project was brought to my attention doing exactly that.

Unfortunately the author zooldk, has only one entry in the commit history listed as “Initial skeleton of collector” and the only statistic currently collected is itemCount from basicStats.

Luckily the python code is quite simple and straightforward, so I went ahead and extended it.

First let’s have an overview of the metrics you can expect to see in Graphite after installing the collector.

What did we achieve?

The target is to graph, as many of the useful Couchbase metrics as possible.

After installing the extended Couchbase Collector this is what we can expect to see in Graphite:


Here is a plot of memory used by Couchbase on my (memory starved) vm:

A bit of theory: (Data) Buckets and Couchbase cluster metrics

Couchbase is a NoSQL database using JSON for Documents. It is highly scalable and very easy to create a cluster.

For the sake of working on extending the above mentioned metrics collector, I installed the Couchbase server, community edition on two VMs.

My VMs have IP addresses: and

I mostly used the default parameters in the setup and installed both demo databases “beersample” and “gamesim-sample”.

My Couchbase user interface now looks like:

couchbase_basic_installation Metrics in Couchbase

Collecting metrics from Couchbase Buckets is as simple as executing a GET request:

For example:

$ curl -s

Now this is not very readable so let’s reformat it using Python’s JSON library. I am only pasting the output that is useful for metric collection.

$ curl -s | python -mjson.tool
    "basicStats": {
        "dataUsed": 49454080,
        "diskFetches": 0.0,
        "diskUsed": 50731634,
        "itemCount": 7303,
        "memUsed": 35406928,
        "opsPerSec": 0.0,
        "quotaPercentUsed": 33.76667785644531
    "name": "beer-sample",
    "nodes": [
            "clusterCompatibility": 131072,
            "clusterMembership": "active",
            "couchApiBase": "",
            "hostname": "",
            "interestingStats": {
                "cmd_get": 0.0,
                "couch_docs_actual_disk_size": 138325417,
                "couch_docs_data_size": 137479323,
                "couch_views_actual_disk_size": 637700,
                "couch_views_data_size": 616830,
                "curr_items": 7888,
                "curr_items_tot": 7889,
                "ep_bg_fetched": 0.0,
                "get_hits": 0.0,
                "mem_used": 99496472,
                "ops": 0.0,
                "vb_replica_curr_items": 1
            "mcdMemoryAllocated": 1501,
            "mcdMemoryReserved": 1501,
            "memoryFree": 932651008,
            "memoryTotal": 1968685056,
            "os": "x86_64-unknown-linux-gnu",
            "otpNode": "ns_1@",
            "ports": {
                "direct": 11210,
                "proxy": 11211
            "replication": 0.0,
            "status": "healthy",
            "systemStats": {
                "cpu_utilization_rate": 18.0,
                "mem_free": 932651008,
                "mem_total": 1968685056,
                "swap_total": 855629824,
                "swap_used": 112218112
            "thisNode": true,
            "uptime": "353144",
            "version": "2.2.0-837-rel-community"
    "quota": {
        "ram": 104857600,
        "rawRAM": 104857600

So what are interesting statistics to collect? The array basicStats sounds like a good candidate as it contains keys like:

'diskUsed', 'memUsed', 'diskFetches', 'quotaPercentUsed', 'opsPerSec', 'dataUsed', 'itemCount'

All of those sound great values to graph, so we will keep/collect them.

Then there is the quota object, showing ram which is useful to graph as well, so we keep this too.

Finally there is nodes which is an array. This object is an array because it includes statistics for each node forming the cluster. If the bucket does not occupy more than one nodes, there will be a single entry in this array.

In my setup, the gamesim-sample Bucket spans across two virtual machines, hence ‘nodes’ contains two items in its array corresponding to each vm.

Following I am showing side-by-side the keys used for each of nodes array members (note that this is for the gamesim-sample bucket):

nodes[0]                nodes[1]
====================    ====================
clusterCompatibility    clusterCompatibility 
clusterMembership       clusterMembership    
couchApiBase            couchApiBase         
hostname                hostname             
interestingStats        interestingStats     
mcdMemoryAllocated      mcdMemoryAllocated   
mcdMemoryReserved       mcdMemoryReserved    
memoryFree              memoryFree           
memoryTotal             memoryTotal          
os                      os                   
otpNode                 otpNode              
ports                   ports                
replication             replication          
status                  status               
systemStats             systemStats          
uptime                  uptime               
version                 version

thisNode is a boolean that helps us understand which array member corresponds to the machine we are querying.

In this case I got those stats from:


To determine exactly which stats refer to which node, the couchApiBase key can be used for more detail:



This further confirms that nodes[0] refers to my second vm ( and nodes[1] to the first vm.

Installing/Configuring the Couchbase collector on Diamond

Get the Couchbase Collector and copy it under:


Edit the python file and enter your IP/port/name of databag/username/password; mine looks like so:

class CouchBaseCollector(diamond.collector.Collector):

def get_default_config(self):
config = super(CouchBaseCollector, self).get_default_config()
‘host’: ‘localhost’,
‘port’: 8091,
‘path’: ‘beer-sample’,
‘username’: ‘Administrator’,
‘password’: ‘obfuscated’
return config

You will also need to create a config file under:


With the contents:

$ cat CouchBaseCollector.conf
enabled = True

Cluster Metrics

The collector has the intelligence to present only the nodes statistics that are applicable for the node it polls.
For clustered couchbase environments, every node will be running a diamond collector of it’s own.

This is how Graphite presents the two nodes of the cluster, corresponding to my two vm’s:

Categories: DBA Blogs

Pythian Named 2014 Global Outsourcing 100 Rising Star

Wed, 2014-06-04 07:50

Pythian has once again been named a 2014 Global Outsourcing 100® Rising Star by the International Association of Outsourcing Professionals® (IAOP®). The list is an annual ranking of the world’s best outsourcing service providers and highlights on-the-rise companies in the outsourcing space.

“As companies continue to accumulate more data than ever before, IT teams struggle to manage all of that information in-house,” said Rob Hamel, Vice President of Service Delivery at Pythian. “More businesses are rethinking their internal data infrastructure management processes and rightsourcing to find flexible, cost-effective solutions to optimize performance, reduce IT burdens, and improve productivity. Pythian brings together top-tier talent, proven processes, and a customer-centric business model to help our clients preserve the wellness of their valuable data infrastructures.”

See the full press release here.

Categories: DBA Blogs

Log Buffer #373, A Carnival of the Vanities for DBAs

Fri, 2014-05-30 07:22

There are many ways to express the knowledge learned but nothing is better than a comprehensive and concise blog post. This Log Buffer Edition gathers such blog posts across Oracle, SQL Server and MySQL.


It’s All About CPU But There Is NO CPU Bottleneck! What?

Understanding the Value of SOA

Java Magazine: Developer Tools and More

The Evolution of Oracle Forms Survey: Results are In!

Australian Oracle job market status update is out.

FAST Refresh of Materialized View Returns ORA-12004

SQL Server:

Learn how relationships work in Powerpivot workbooks and how you can build them using the DAX language.

Branching and Merging: Ten Pretty-Good Practices

Survey: Which new database platforms are you adopting?

Stairway to SQLCLR Level 3: Security (General and SAFE Assemblies)

Free eBook: SQL Server Execution Plans, Second Edition

Transparent Data Encryption (TDE) in SQL Server


MariaDB 10.0.11 Overview and Highlights.

Webinar-on-demand: Set up & operate real-time data loading into Hadoop

Using MySQL Sandbox to setup a MySQL Fabric cluster in development

How MySQL ‘queries’ and ‘questions’ are measured

From zero to full visibility of MySQL in 3 minutes with Percona Cloud Tools

There is one new feature in MySQL 5.6 that didn’t get the attention it deserved

Categories: DBA Blogs

List of SQL Server Databases in an Availability Group

Fri, 2014-05-23 11:07

After migrating some databases to SQL Server 2012 and creating Availability Groups for some databases, we have noticed that some of our maintenance scripts were failing with the following error:

The target database (‘MyDB1′) is in an availability group and is currently accessible for connections when the application intent is set to read only. For more information about application intent, see SQL Server Books Online.

Databases that are part of an Availability group and play the secondary role can be read-only replica and therefore are not writable.

Those databases cannot be part of some of the out-of-the-box maintenance plans or other coded maintenance procedures.

For the out-of-the-box Maintenance Plans, there is an option in the Plan to choose the preferred replica. (Link)

Any code that requires writes in the database will not work.

How do we get the list of those databases so that we can exclude them/include them in our code?

So first,

How do we know that this instance is part of Availability Group(s):

ISNULL(agstates.primary_replica, '') AS [PrimaryReplicaServerName],
ISNULL(arstates.role, 3) AS [LocalReplicaRole]
FROM master.sys.availability_groups AS AG
LEFT OUTER JOIN master.sys.dm_hadr_availability_group_states as agstates
    ON AG.group_id = agstates.group_id
INNER JOIN master.sys.availability_replicas AS AR
    ON AG.group_id = AR.group_id
INNER JOIN master.sys.dm_hadr_availability_replica_states AS arstates
    ON AR.replica_id = arstates.replica_id AND arstates.is_local = 1


Name PrimaryReplicaServerName LocalReplicaRole (1=primary,2=secondary,3=none) AvGroup_1 MyServer01 2 AvGroup_2 MyServer02 1


How do we get some information about the databases in the Availability Group:

SELECT AS [AvailabilityGroupName],
ISNULL(agstates.primary_replica, '') AS [PrimaryReplicaServerName],
ISNULL(arstates.role, 3) AS [LocalReplicaRole],
dbcs.database_name AS [DatabaseName],
ISNULL(dbrs.synchronization_state, 0) AS [SynchronizationState],
ISNULL(dbrs.is_suspended, 0) AS [IsSuspended],
ISNULL(dbcs.is_database_joined, 0) AS [IsJoined]
FROM master.sys.availability_groups AS AG
LEFT OUTER JOIN master.sys.dm_hadr_availability_group_states as agstates
   ON AG.group_id = agstates.group_id
INNER JOIN master.sys.availability_replicas AS AR
   ON AG.group_id = AR.group_id
INNER JOIN master.sys.dm_hadr_availability_replica_states AS arstates
   ON AR.replica_id = arstates.replica_id AND arstates.is_local = 1
INNER JOIN master.sys.dm_hadr_database_replica_cluster_states AS dbcs
   ON arstates.replica_id = dbcs.replica_id
LEFT OUTER JOIN master.sys.dm_hadr_database_replica_states AS dbrs
   ON dbcs.replica_id = dbrs.replica_id AND dbcs.group_database_id = dbrs.group_database_id
ORDER BY ASC, dbcs.database_name


AvailabilityGroupName PrimaryReplicaServerName LocalReplicaRole DatabaseName SynchronizationState IsSuspended IsJoined AvGroup_1 MyServer01 2 MyDB1 2 0 1 AvGroup_1 MyServer01 2 MyDB2 2 0 1 AvGroup_1 MyServer01 2 MyDB3 2 0 1

So, for example,

If we would like to get the databases that are secondary in the Availability Group,  to be excluded when writes are required:

dbcs.database_name AS [DatabaseName]
FROM master.sys.availability_groups AS AG
LEFT OUTER JOIN master.sys.dm_hadr_availability_group_states as agstates
   ON AG.group_id = agstates.group_id
INNER JOIN master.sys.availability_replicas AS AR
   ON AG.group_id = AR.group_id
INNER JOIN master.sys.dm_hadr_availability_replica_states AS arstates
   ON AR.replica_id = arstates.replica_id AND arstates.is_local = 1
INNER JOIN master.sys.dm_hadr_database_replica_cluster_states AS dbcs
   ON arstates.replica_id = dbcs.replica_id
LEFT OUTER JOIN master.sys.dm_hadr_database_replica_states AS dbrs
   ON dbcs.replica_id = dbrs.replica_id AND dbcs.group_database_id = dbrs.group_database_id
WHERE ISNULL(arstates.role, 3) = 2 AND ISNULL(dbcs.is_database_joined, 0) = 1
ORDER BY  dbcs.database_name


DatabaseName MyDB1 MyDB2 MyDB3

Related Links:

Monitor Availability Groups (Transact-SQL)

AlwaysOn Availability Groups Dynamic Management Views and Functions (Transact-SQL)

AlwaysOn Availability Groups Connectivity Cheat Sheet (describing types of AG replicas and if connection or reads are allowed on secondary DB)

Backups on secondary replicas – Always ON Availability Groups

Categories: DBA Blogs

Life at Pythian as a MySQL DBA

Fri, 2014-05-23 08:00

Several people have asked me in the last year what it is like to work for Pythian. There are many reasons for a question like this, depending on who is asking.

Some people are concerned about the fact I am working for a Canadian-based company.

This typically comes from some of my xenophobic North American peers. But let me assure you, it’s not bad! Pythian is based out of Canada, but has employees around the globe in more than 25 countries. It’s pretty impressive, because Pythian must comply with the work laws of each country.

When you’re first hired at Pythian, you’ll be invited to visit their global headquarters in Ottawa, Canada. This is a great opportunity to get to know your peers, as well as the history of the company and how it operates. The country is beautiful, even if you’re lucky enough to be hired during their winter. Overall, it’s not very different compared to working for any other company in my country, aside from the extreme cold weather and all the talk about hockey and curling.

Besides, I actually like hockey.

Some people are curious about what it’s like working from home.

Pythian is not my first experience working remotely. I have been telecommuting since 2005. I tell these people that it’s not for everyone. Working remotely takes hard work and self-discipline.

When I first transitioned from office-life to working remotely, it was brutal. My productivity plummeted; I rarely began real work before noon. You typically don’t have your boss hovering over your shoulder at home—If you want this, feel free to add a monitor on the wall behind you and make sure your boss uses a webcam.

A remote employee must treat the home office like a real office. That means no wandering to your desk in your pajamas, half asleep. Make a concerted effort to dress for work, just as if you’re going into the office. If you have to, take a drive around the block as part of your commute!

If you have family or friends living with you, make sure they know that while you may be physically in the building, you are off limits except in emergencies.

Communication with colleagues can be challenging, and despite technology, your organization must develop an attitude with dealing with remote employees. At my first company I was among two people working remotely, and staying in the loop was like pulling teeth. Pythian on the other hand, is built with a large portion of its workforce being remote. The company is growing rapidly, and so must its policies. It is a major focus to ensure that all employees are kept up-to-date and in the loop. Communication lines are open using appropriate technologies such as Skype and Google Hangouts ensuring that team members are engaged with each other.

Some people are interested in the type of work I do as a MySQL consultant.

This is the conversation I love to have. Most often it comes from someone I met on the internet, and that’s ok. The best thing about working at Pythian is the sheer amount of technologies I get to work with on a daily basis.

As a MySQL Consultant at Pythian, I deal with customers running MySQL 4.1 through 5.6, Percona Server and MariaDB variations, Galera, Tungsten Replicator and many other solutions to solve some of the internet’s toughest problems. Our clients run datasets from a few gigabytes to multiple terabytes. Our clients run on different operating systems: Solaris, FreeBSD, all flavors of Linux, and even Windows in some cases. Our clients use filesystems ranging from ext2 to xfs and zfs, although we try really hard to persuade against ext2. We provide consulting for environments running on many of the cloud providers and hosting providers. We develop proof of concepts, migration and upgrade plans, performance tuning, capacity planning, high availability and disaster recovery plans, and much more.

Let’s face it: The open source world is ever-changing and new technologies are always being created to interact with the MySQL database or even NoSQL technologies, such as MongoDB. There are relatively few places to work at that can offer exposure and experience to such environments.

Do you have what it takes to be a “Pythianite”?

Pythian is dedicated to developing their employees, also known as Pythianites. Pythian provides ample opportunity for career growth, but this work is not for everyone. It’s very fast paced and at times stressful. There are high expectations for our consultants, and we genuinely have to love your data.

If you think you  have what it takes to work at Pythian, check out our current opportunities.

Categories: DBA Blogs

Should AUD$ Table Exist on System Tablespace During DB Upgrade?

Fri, 2014-05-23 07:59

I see this following message on all My Oracle Support notes, which talks about database Oracle manual upgrade steps.

4.16 Verify the existence of the AUD$ tables
Ensure that if the aud$ table exists that it is in the SYS schema and in the SYSTEM tablespace.
If the AUD$ table is not in SYSTEM tablespace and not owned by the SYS user then before doing the upgrade put it back to the SYSTEM tablespace and it should be owned by SYS .

When I come across the same kind of message even for the latest database version 12cR1 on MOS note 1503653.1, I thought of checking the true functionality of this warning. This doubt seems very valid especially when we have new feature named “DBMS_AUDIT_MGMT”, which can be used to relocate and purge data for aud$ table from oracle rdbms version 10gR1.

I created a database named “test” using rdbms version and enabled database auditing. After few sessions testing, I see records on aud$ table. I moved the table and associated LOB segments after disabling db audit and enable it again after the activity completion.

SQL> SELECT comp_id, status, SUBSTR(version,1,10) version, comp_name FROM dba_registry ORDER BY 1;

——- ——– —————————————- ———————————–
CATALOG VALID Oracle Database Catalog Views
CATPROC VALID Oracle Database Packages and Types

SQL> show parameter audit_trail

———————————— ———– ——————————
audit_trail string NONE

SQL> alter table sys.aud$ move tablespace users
2 LOB (sqltext) STORE AS lobsegment1 (TABLESPACE users )
3 LOB (sqlbind) STORE AS lobsegment2 (TABLESPACE users );

Table altered.

SQL> select count(1) from aud$;


SQL> select distinct tablespace_name from dba_segments where segment_name in(‘AUD$’,'SYS_IL0000000384C00040$$’,'SYS_IL0000000384C00041$$’);


SQL> ALTER SYSTEM SET audit_trail=db, extended scope=spfile;

System altered.

SQL> alter system set audit_sys_operations=TRUE scope=spfile;

System altered.

I configured Oracle home of version and executed the pre-upgrade script “preupgrd.sql” on the 11gR2 oracle home. I didn’t notice any error messages related to this table availability on different tablespace. Reviewed the steps required as per MOS note 1503653.1, omitted many points due to the nature of this database including step 4.16. I never faced any issues with the upgrade and even noticed the aud$ table exists on USERS tablespace only after the upgrade.

SQL> SELECT comp_id, status, SUBSTR(version,1,10) version, comp_name FROM dba_registry ORDER BY 1;

——- ——– —————————————- ———————————–
CATALOG VALID Oracle Database Catalog Views
CATPROC VALID Oracle Database Packages and Types
XDB VALID Oracle XML Database

SQL> select owner,count(*) from dba_objects where status=’INVALID’ group by owner;

no rows selected

SQL> select count(1) from sys.aud$;


SQL> select tablespace_name from dba_segments where segment_name=’AUD$’;


SQL> select owner,count(*) from dba_objects where status=’INVALID’ group by owner;

no rows selected

SQL> show parameter compatible

———————————— ———– ——————————
compatible string 12.0.0

SQL> show parameter audit

———————————— ———– ——————————
audit_file_dest string /u02/app/oracle/product/12.1.0/dbhome_1/rdbms/audit
audit_sys_operations boolean TRUE
audit_syslog_level string
audit_trail string DB, EXTENDED

Checked few other MOS notes, it seems they really look for aud$ table existence of SYS schema only.

FAQ : Database Upgrade And Migration (Doc ID 1352987.1)
Which schema should the AUD$ table exist in while doing the upgrade ?
AUD$ table should exist in SYS schema while doing the upgrade.

So this blog question remains valid. For those who keep bigger aud$ table on dedicated tablespace for better database performance, this relaxation means saving 1-3 hours of production database downtime. Is this the time to ask Oracle Support to review the point 4.16 to check for owner only for aud$ table?

Note: This testing was carried out only from rdbms version 11gR2 to 12cR1. Please test this behavior on your test environment before you prepare action plan for the production upgrade.

Categories: DBA Blogs

Log Buffer #372, A Carnival of the Vanities for DBAs

Fri, 2014-05-23 07:58

Log Buffer Carnival enables readers to see through the minds of database bloggers who are converting their thought process into blog posts. This Log Buffer Edition is the part of same chain. Enjoy.


Tanel Poder is combining Bloom Filter Offloading and Storage Indexes on Exadata.

Randolf talks about 12c Hybrid Hash Distribution with Skew Detection / Handling – Failing.

Kyle talks about 5 Databases issues that are costing you time and money.

Frits Hoogland unfolds as how Exadata Smart Scan works.

A Framework Approach to Building an Oracle WebCenter Intranet, Extranet, or Portal.

SQL Server:

Microsoft adds forecasting capabilities to Power BI for O365

Capping CPU using Resource Governor – The Concurrency Mathematics

Dell Doubles Application Speeds, Processes Transactions 9X Faster with In-Memory OLTP

Architecture of the Microsoft Analytics Platform System

Introducing the AzureCAT PPI Theater at PASS BA


Dean Ellish blogs about using Go with MariaDB.

MySQL 5.6 has added support for EXPLAIN FORMAT=JSON. The basic use case for that feature is that one can look at the JSON output and see more details about the query plan. More advanced/specific use cases are difficult, though.

Real-Time Data Movement: The Key to Enabling Live Analytics With Hadoop.

MySQL Enterprise Monitor 2.3.17 is now available for download on the My Oracle Support (MOS) web site.

If you run multiple MySQL environments on multiple servers it’s a good habit to set your MySQL prompt to double check which server you are on.

Categories: DBA Blogs

Microsoft Analytics Platform System: Name Overhaul in Big Data War!

Wed, 2014-05-21 08:18

I had the chance to attend a course about what used to be called Parallel Data Warehouse (PDW). PDW was introduced few years ago with the offering of SQL Server 2008 R2 Parallel Data Warehouse , something very few people could get their hands on. The appliance used to cost a ton of money, too many parts and only offered by HP or DELL in a pre-configured setup.

With SQL server 2012, Microsoft made many enhancements slashing the size of the appliance to almost half, and consequently the cost, and improving the performance as well.

Just while I was attending the course, Microsoft announced a name change and the introduction of new fancy brand name: Microsoft Analytics platform. CEO Satya Nadella announced the new platform with other products as well. I suggest reading this blog written by Satya himself

I’m sharing some of my personal (again, personal) opinions here about the platform and the appliance. So let’s take a step back and explore some of the basics here.

What is PDW (aka APS)?

Microsoft loves those three letters acronyms, although certifications may be an exception. Not to go to great lengths, it’s a Massively parallel Processing (MPP) “APPLIANCE” that is optimized for large scale enterprise data warehouses. The appliance is a logical entity that encompasses servers , switches , storage enclosed into a rack. Think of it as your typical environment of servers, switches and storage all brought together in one huge rack appliance. The idea behind the appliance is simple: We do all the dirty work for you and give you something “optimized” to handle huge amounts of data without the hassle of configuration, tuning and license worries; Of course the word “optimized” is according to Microsoft terms.


The appliance is not only about SQL server but it also incorporates Hadoop and an engine, PolyBase, to simplify talking to Hadoop using conventional SQL server T-SQL and can also tap HDsinghts to reach out to data stored in the cloud.

The appliance is only available through few vendors, used to be only HP and DELL but now also includes Quantas. Please check resources section for more information about Microsoft Analytics Platform.

The following video vividly tries to demonstrate the usage of APS:
Why PDW/APS appliance ?

Customers with huge amounts of data that also spans heterogeneous sources want always to get meaningful information out of that data. The more the data they have , the harder and longer the time to extract key information. Appliances are tuned machines with massive resources to help analyze , aggregate and join data much faster. Conventional SMP machines can work up to a certain level with much needed tuning and optimization that may not always work. vendors take this tuning and optimizations responsibility and present you a sleek machine that is supposed to overcome multiple design and resources limitations. Some of the examples of existing appliances are Oracle Exadata , Teradata Data Warehouse Appliance and IBM PureData and Netezza.

Are you saying that conventional SQL server setup can’t achieve this? Not entirely. Think of this as car upgrades where they may be based on the same chassis but high-end models have more powerful engines, features and performance. Although SQL server keeps bringing enhancements like updatable clustered columnstore indexes and in-memory OLTP in SQL server 2014 , PDW/APS appliance differs from conventional SMP in the following areas:

  1. PDW/APS appliance is a bundle of hardware and software offering customized to scale out. You can add and remove(much harder though) nodes to scale out to your data needs.Each “node” runs on separate server with seperate SQL server and hardware resources and managed by a “control” node to distribute the workload. You can read about APS Solution Brief here
  2. You can’t buy a PDW SQL server licence and install in your environment and you can not even assemble the product even if you have the blueprint , you just get it from one of the vendors and plug-n-play it.
  3. PolyBase plugs in Hadoop. You may be able to connect existing SQL server with Hadoop but Polycase provides easy to use T-SQL functions to extract data from Hadoop providing almost immediate ability to query Hadoop without a long learning curve.
  4. Many conventional SQL server features are suppressed. Yes , I can see your raised eyebrows but the idea is that Microsoft wanted to remove areas that can introduce bottlenecks such as CLR , Extended dlls and even SQL server agent. If you need to do something that can’t be done inside PDW , such as scheduled jobs, then move it to another tier.

Where does PDW/APS fit?

The cost of buying and running the appliance suggests that it’s not for everyone. The available resources and bundling Ploycase to connect to Hadoop shows that it’s for an enterprise with huge and heterogeneous amounts of data that is spread around. Bringing this data together with least customization is the goal of the appliance.

PDW/APS can help bring data together from following areas:

  • Social apps
  • Mobile
  • Sensors
  • Web
  • Other data sources such as RDBMS

The appliance fits a segment of industries, notably:

  • Retail
  • Healthcare
  • Financials
  • Manufacturing
  • Social media

Microsoft has some case studies about clients deploying PDW and realizing up to 100X performance gain. Here are the case studies:

  1. MEC -Media Firm Uncovers the Value of Digital Media with Parallel Data Warehouse
  2. The Royal Bank of Scotland – Leading UK Bank Gains Rapid Insight into Economic Trends with Analytics Platform System
  3. Progressive Insurance – Progressive Data Performance Grows by Factor of Four, Fueling Business Growth Online Experience

I believe Microsoft has to do more to get the appliance to more customers and I think the brand rename is part of this push. Training, support and deployment materials are also needed since there is not much resources online.
Is PDW worth it for clients?

Microsoft competes against TeraData, Netezza and Oracle Exadata. Cost is a very big factor: Licence and support. MS tends to do fairly well with MS shop customers and SQL server base clients. However, first version of PDW tended to be so expensive and bulky but current appliance is almost half price and half size than used to be. Expertise seems to be low still and this is what MS is working on.

Microsoft word is that instead of investing too much and too long on creating the same technology by trial and error and spend much time on tuning , here’s a working appliance that we have tuned it for you and just focus on your business. Per following chart, MS claims the cheapest price per TB compared to other vendors.


Regardless , the appliance makes more sense to SQL server clients even more ; however , I still see clients sticking to solutions from vendors of the main RDBMS technology they run since importing data will easier and learning curve will be less steep.

The appliance will make a case for mid to large enterprises with new Terabytes of data each month including unstructured data. SMP SQL and APS may correlate in the region of few to tens of terabytes but once we talk about hundreds of terabytes of data including unstructured data then APS starts to make sense.
PDW Against Cloud?

There are few reasons clients may opt for an in-premise appliance, including :

  1. Some data is sensitive to trust putting in the cloud.
  2. The amount of data is huge to upload to cloud.
  3. Cloud is not mature yet to support all features.

Most cloud vendors are not yet ready to offer such expensive appliance. Amazon, however, has something similar called Redshift. They actually make a case against on-premise solution saying you don’t need those bulky expensive appliances that can break, while you can use Amazon cloud solution and pay “less”. However, there are few points :

  1. MS appliance tries to play in the lower segment of cost.
  2. You don’t need lots of DBAs to manage. In fact, I was surprised that MS took out most of the the parts that can introduce problems with the appliance : no CLR, no extended modules, many other features were disabled. They applied best-practices and locked it so you can not temper with the appliance. I was told that the only thing that can cause APS to fail is a hardware failure and we already have redundancy so a failure is even less probable
  3. Not everyone wants their data in cloud , mostly for security. I was told about a customer who wanted to destroy , with a hammer, the hard drives after testing the appliance. It took few weeks to zero write the drives , few times.
  4. Transferring a lot of data to public cloud is not that fast , unless you are already hosting your data in same location such as on Amazon.

APS VS Hadoop

Hadoop excels in non-structred data such as text , sensors data , web crawling ..etc and whether you already have existing Hadoop lusters running or plan to , you may still have valuable relational data stored in your existing SQL server instances. APS makes it easy to bridge the gap between the two and use T-SQL to join the data from the two sources without worrying much about the design of Hadoop cluster.

APS region share

USA is top then EU then Asia.
APS and SQL 2014

The appliance still runs a customized version of SQL server 2012 , no word yet when SQL 2014 will be introduced. Upgrading the cluster is supported but it is not something end-customer can do yet.
PDW support

Many cases are still only supported by Microsoft Customer service and support (CSS) like failing back and downsizing but they are trying to automate some tasks to be done by experienced DBAs.
DBA role with APS

Don’t hold your breath! As outlined before, much of the configuration and optimizations are done for you in what MS believes is the best balance. You can not do many of the tasks a DBA does like changing Tempdb configurations, max degree of parallelism, etc. That’s the idea behind the appliance in fact. Focusing on bringing the data to the appliance and modifying your design to extract information.

When I did work with the appliance , I didn’t find it particularly complex from an operations point of view. Most of the work is about bringing data in, figuring out the best way to join data together without shuffling data around and that’s about understanding the data and business logic.

You can help in the following areas though:

  • Exporting> importing data from existing data sources into APS
  • Advise any changes to the design and business logic to comply with appliance requirements
  • Advise how to bring data together for analysis.
  • Design and implement data extraction practices.
  • Troubleshoot analysis routines and schedules.
  • Troubleshoot long running queries and schedules. APS has a web portal where you can see all running processes; you can also use DMVs to get this information and some of them are unique to APS.


APS appliance targets customers with huge amount of data that span heterogeneous sources who need read to plug solution.

With the new brand of the PDW appliance, Microsoft is getting more serious about big data and analytics. However, many vendors are well-established here and it’s still a long run in a market that’s expected to explode if it has not started yet.

Happy reading!

Categories: DBA Blogs

Everyone Should Have Firefighter Mentality

Wed, 2014-05-21 07:43

Australia, being the most arid country in the world is known for its bushfires. The part of Australia where I live in doesn’t have many bushfires, but we do have several building fires, like any other part of the world.

Firefighter is one occupation which many kids want to be when they grow up. Kids are inspired by the shiny, clean, smart outwardly appearances of the firefighter crews passing majestically with sirens ringing in their immaculately clean trucks.

While volunteering for one of the fire stations in my suburb on a quiet day, I found those firefighters doing nothing but polishing their trucks, cleaning their stuff, making sure everything is in optimal order, and waiting for the emergency on their toes.

No matter what field you are in, what profession you are following; No matter on which step of corporate ladder you are; If you are a full time employee, a contractor, or a business owner, it is a fact that there are ample quiet times. Small chunks of hushed, still periods during each workday.

Those still moments are the “firefighter” time. Don’t let that time go to waste. Clean your desk and your tools. Polish your skills, and think about yourself and your career. Identify your areas for improvement, and grow your network. Read, write, or help out others.

In other words, get ready for that fire.

Categories: DBA Blogs

Log Buffer #371, A Carnival of the Vanities for DBAs

Fri, 2014-05-16 07:56

Fueled by massive growth of data and propelled by mammoth future potential of its application, bloggers across the database technologies are finding new things to explore in the light of previous knowledge. This Log Buffer Edition covers that all.


To configure an instrumentation watch, you first need to know what instrumentation is, and how to instrument applications or servers.

Why Choose to Deploy Agile PLM in the Cloud?

One of the things that makes JSF different from other approaches to server-side Java web development is that it offers a much higher level of abstraction and transparently maintains UI state.

Step by step instructions for setting up a development environment for doing development with Hippo CMS.

Oracle Solaris 11.2 at the Atlanta OpenStack Summit

SQL Server:

RAID and Its Impact on your SQL Performance.

Microsoft Azure Diagnostics Part 1: Introduction

Using Encrypted Backups in SQL Server 2014

A new plug in for Management Studio from Red Gate is free. It will give you access to all the scripts at SQLServerCentral, including your own briefcase.

Validate File Properties with PowerShell Prior to Import into SQL Server


Benchmark: SimpleHTTPServer vs pyclustercheck (twisted implementation)

Archival and Analytics – Importing MySQL data into Hadoop Cluster using Sqoop

Cross your Fingers for Tech14, see you at OSCON

New Tungsten Replicator 2.2.1 now available

MySQL May Newsletter is Available!

Categories: DBA Blogs

Oracle Passwords: How to Use Punctuation Symbols

Wed, 2014-05-14 07:04

You can’t make a password strong enough. But at least you can try. Having at least one upper case character, one lower case character, one number, one punctuation mark or symbol, and greater than 8 characters, you can have a password which can be considered something decently safe, probably.

In Oracle, if you embed punctuation marks within your passwords, you have to use single quote around that password if you are using orapwd utility from command line. If you are altering the password from the sqlplus utility, you need to use the double quotes around the password.

Example of both is given below, as sys password is being changed:

From orapwd:

$ORACLE_HOME/dbs $ orapwd file=orapwtest password=”My1s!pass” entries=40  force=y
-bash: !pass”: event not found

$ORACLE_HOME/dbs $ orapwd file=orapwtest password=’My1s!pass’ entries=40  force=y

Changed successfully.

From sqlplus:

SQL> alter user sys identified by ‘My1s!pass’;
alter user sys identified by ‘My1s!pass
ERROR at line 1:
ORA-00988: missing or invalid password(s)

SQL> alter user sys identified by “My1s!pass”;

User altered.


Categories: DBA Blogs

The Hadoop and the Hare

Mon, 2014-05-12 07:38

I speak to a lot of people who are terribly concerned with “real-time”. Can data get into the warehouse in real-time? Can we record everything the user does on the site in real-time? Real-time is a magic phrase, because it really means “so fast that nobody cares”. In a business sense, it usually means that nobody has really thought about the latency requirements, so they’re going to push for the lowest latency possible. This is a dangerous habit, and you need to fight against it at every turn. Some of the best, cheapest, most scalable solutions will only be available when you accept that you can wait for an hour (or even hours!) to see some results.

Case in point, a client asking for a NoSQL store to hold user event data. It would have to handle millions of concurrent users, all generating dozens of events within minutes. They were (rightly) concerned about write performance and scaling this NoSQL store horizontally to handle the volume of writes. When asked why they needed the events, they didn’t know: they wanted to do exploratory analytics and see if there were interesting metrics to be mined. I proposed dumping the events on a queue – Kafka, Kinesis – or just writing logs for Flume to pick up. Rather than asking a data store to make all of these messages available for querying immediately, do things asynchronously, run some batch jobs to process data in Hadoop, and see if interesting data comes out. The Hadoop cluster and the ingestion pipeline is an order of magnitude less expensive and risky than the cool, real-time solution which confers no immediate benefits.

Maybe this client will decide that batch analytics are good enough, and they’ll stick with using Hive or Impala for querying. Maybe they discover a useful metric, or a machine learning technique they want to feed back into their user-facing application. Many specific metrics can be stored as simple counters per user, in a store like Redis, once they’ve been identified as valuable. Machine learning techniques are interesting because (besides cold start situations) they should be stable: the output shouldn’t change dramatically for small amounts of new data, so new information only needs to be considered periodically. In both cases, the requirements should be narrowed down as far as possible before deciding to invest in “real-time”.

In another case, a client presented the classic data warehousing problem: store large volumes of events with fine grain, and slice and dice them in every imaginable way. Once again, the concept of reducing latency came up. How quickly can we get events into the warehouse so people can start querying them? The answer is that we can make anything happen, but it will be needlessly expensive and malperformant. The main goal of the project was to provide historical reporting: there was no indication that clients would want to do the same kind of pivots and filtering on data only from the past minute. The low latency application would be cool and interesting to develop, but it would be much simpler to find which reports  users want to be low-latency, then precompute them and store them in a more appropriate way.

A more appropriate way is Dynamo, Cassandra, or your preferred NoSQL key-value store. Precompute aggregates you know the user wants with very low latency, and store them keyed on time with a very fine grain: you have the benefit of high write throughput here, but at the cost of little query complexity. Once the data is no longer interesting – it’s visible in the warehouse with much more grain along the various dimensions – then drop it from NoSQL.

Starting with a relatively slow, batch platform gives very high flexibility at a low cost, and with little development effort. Once your users – internal or clients – have explored the data set and started complaining about latency, then is the time to build specialized pipelines for those specific use cases. Since Kafka, Kinesis and Flume all support streaming computations as well as batch, these will serve as good connection points for your “real-time” pipeline.

Categories: DBA Blogs

Log Buffer #370, A Carnival of the Vanities for DBAs

Fri, 2014-05-09 07:51

With promising clouds hovering over the skies of database technology, this Log Buffer Edition gathers some selective droplets for Oracle, SQL Server and MySQL.

ACM: organizing the chaos and flexing rigid process structures through adaptive case management by Lucas Jellema.

How to Build GlassFish 4 from Source.

Make Your Oracle Database Run Faster on Oracle Linux.

How to: Run Repository Creation Utility (RCU) on Solaris.

A Customer Centric Approach To Supply Chain: Supply Chain Journey Mapping (SCJM).

SQL Server:

Stairway to PowerPivot and DAX – Level 1: Getting Started with PowerPivot and DAX.

Free eBook: SQL Server Backup and Restore.

12 SQL Server XML Questions You Were Too Shy To Ask.

Context Is Everything – Check Your Settings.

How column COLLATION can affect SQL Server query performance.


Many Sphinx users utilize MySQL as a data source. Divendo is one of them.

Another reason for Xtrabackup error “log block numbers mismatch”.

MySQL Audit Plugin now available in Percona Server 5.5 and 5.6.

New Release Webinar on May 13th: Introducing ClusterControl 1.2.6 – Live Demo.

The Perfect Server – Ubuntu 14.04 (nginx, BIND, MySQL, PHP, Postfix, Dovecot and ISPConfig 3).

Categories: DBA Blogs

Concurrent Processing Issues with Distributed Transactions Glitch

Thu, 2014-05-08 10:58

This is my first post in Pythian blog, and I wanted to share interesting use-cases that we’ve dealt with recently. I believe someone may see similar cases in the future, so these are the troubleshooting steps to get to the source and apply workaround to fix it. So, the root problem was that concurrent processing doesn’t start up, and Internal Manger log file reports error (same error also throwing Oracle forms trying to access Concurrent -> Administer form)

CP-GENERIC ORACLE ERROR (ERROR=ORA-01591: lock held by in-dou) (ROUTINE=get_lk_handle)

ORA-01591 error clearly identifies lock held by in-doubt distributed transaction string; however, DBA views DBA_2PC_PENDING and DBA_2PC_NEIGHBORS return no rows (also sys.pending_trans$ and sys.pending_sessions$ have no rows). If any of these views return any row, then most probably these transactions can be committed, rolled back or purged and the problem would be resolved. However, this wasn’t the case – so this must be some kind of glitch.

The first document to review is Master Note for Troubleshooting Oracle Managed Distributed Transactions (Doc ID 100664.1), but this gives no valuable information, and neither does the much detailed Manually Resolving In-Doubt Transactions: Different Scenarios (Doc ID 126069.1). Next step is to find out where ORA-01591 is coming from. Here, a database trace is very useful tool. I was lucky this time as error is thrown in forms, so it’s 2 clicks away from getting trace. If it’s not so easy in your case, the SERV_MOD_ACT_TRACE_ENABLE procedure can be used to get trace. In my case it was:

PARSING IN CURSOR #139830878745608 len=66 dep=1 uid=0 oct=3 lid=0 tim=1398844275853403 hv=1493897133 ad='1c121bb90' sqlid='19x1189chq3xd'
PARSE #139830878745608:c=0,e=137,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=1,plh=0,tim=1398844275853402
BINDS #139830878745608:
oacdty=01 mxl=128(128) mxlc=00 mal=00 scl=00 pre=00
oacflg=03 fl2=1206001 frm=01 csi=31 siz=128 off=0
kxsbbbfp=7f2ce9db2d80 bln=128 avl=11 flg=05
ERROR #139830878745608:err=1591 tim=1398844275855101

Next, trying to execute the same SQL manually in sqlplus I got the following error:

ERROR at line 1:
ORA-01591: lock held by in-doubt distributed transaction 35.15.13775

So now I have some details on ORA-01591 error. I know which transaction is holding it – but it’s not displayed in any of DBA_2PC _ views. Also trying to purge, commit or rollback this transaction will end in error as there is no full data about this transaction:

ERROR at line 1:
ORA-06510: PL/SQL: unhandled user-defined exception
ORA-06512: at "SYS.DBMS_TRANSACTION", line 96
ORA-06512: at line 1
SQL> COMMIT FORCE '35.15.13775';
COMMIT FORCE '35.15.13775'
ERROR at line 1:
ORA-02058: no prepared transaction found with ID 35.15.13775
SQL> rollback force '35.15.13775';
rollback force '35.15.13775'
ERROR at line 1:
ORA-02058: no prepared transaction found with ID 35.15.13775

Trying to dig deeper, MoS note DBMS_LOGSTDBY.BUILD Seems to Hang And Does Not Return To SQL prompt. (Doc ID 747495.1) got my attention. Standby database is not used in this configuration, but the fact that x$ktuxe internal view contains information about transactions and query against it soon gave some results:

---------- ---------- ---------- ---------------- ------------------------

This is exact transaction where we identified earlier with trace (ORA-01591: lock held by in-doubt distributed transaction 35.15.13775). Once transaction is identified there is a way to get rid of it, and there are already a few options:

Using the same technique, I inserted dummy data into pending_trans$   and pending_sessions$ tables. Then purge_lost_db_entry solved the case, and the dummy entry was removed, along with references in x$ktuxe table (see examples in above sources I mentioned).  According to the oracle-L comments, this is Oracle Support provided workaround to deal with a particular SR case – so most probably this is not a supported way to deal with such cases.

In my case, we didn’t know what created the half-purged transaction, and this test system was needed ASAP for training, so this method really saved the day.

Categories: DBA Blogs