Skip navigation.

Pythian Group

Syndicate content
Official Pythian Blog - Love Your Data
Updated: 20 min 23 sec ago

Pythian Named 2014 Global Outsourcing 100 Rising Star

Wed, 2014-06-04 07:50

Pythian has once again been named a 2014 Global Outsourcing 100® Rising Star by the International Association of Outsourcing Professionals® (IAOP®). The list is an annual ranking of the world’s best outsourcing service providers and highlights on-the-rise companies in the outsourcing space.

“As companies continue to accumulate more data than ever before, IT teams struggle to manage all of that information in-house,” said Rob Hamel, Vice President of Service Delivery at Pythian. “More businesses are rethinking their internal data infrastructure management processes and rightsourcing to find flexible, cost-effective solutions to optimize performance, reduce IT burdens, and improve productivity. Pythian brings together top-tier talent, proven processes, and a customer-centric business model to help our clients preserve the wellness of their valuable data infrastructures.”

See the full press release here.

Categories: DBA Blogs

Log Buffer #373, A Carnival of the Vanities for DBAs

Fri, 2014-05-30 07:22

There are many ways to express the knowledge learned but nothing is better than a comprehensive and concise blog post. This Log Buffer Edition gathers such blog posts across Oracle, SQL Server and MySQL.


It’s All About CPU But There Is NO CPU Bottleneck! What?

Understanding the Value of SOA

Java Magazine: Developer Tools and More

The Evolution of Oracle Forms Survey: Results are In!

Australian Oracle job market status update is out.

FAST Refresh of Materialized View Returns ORA-12004

SQL Server:

Learn how relationships work in Powerpivot workbooks and how you can build them using the DAX language.

Branching and Merging: Ten Pretty-Good Practices

Survey: Which new database platforms are you adopting?

Stairway to SQLCLR Level 3: Security (General and SAFE Assemblies)

Free eBook: SQL Server Execution Plans, Second Edition

Transparent Data Encryption (TDE) in SQL Server


MariaDB 10.0.11 Overview and Highlights.

Webinar-on-demand: Set up & operate real-time data loading into Hadoop

Using MySQL Sandbox to setup a MySQL Fabric cluster in development

How MySQL ‘queries’ and ‘questions’ are measured

From zero to full visibility of MySQL in 3 minutes with Percona Cloud Tools

There is one new feature in MySQL 5.6 that didn’t get the attention it deserved

Categories: DBA Blogs

List of SQL Server Databases in an Availability Group

Fri, 2014-05-23 11:07

After migrating some databases to SQL Server 2012 and creating Availability Groups for some databases, we have noticed that some of our maintenance scripts were failing with the following error:

The target database (‘MyDB1′) is in an availability group and is currently accessible for connections when the application intent is set to read only. For more information about application intent, see SQL Server Books Online.

Databases that are part of an Availability group and play the secondary role can be read-only replica and therefore are not writable.

Those databases cannot be part of some of the out-of-the-box maintenance plans or other coded maintenance procedures.

For the out-of-the-box Maintenance Plans, there is an option in the Plan to choose the preferred replica. (Link)

Any code that requires writes in the database will not work.

How do we get the list of those databases so that we can exclude them/include them in our code?

So first,

How do we know that this instance is part of Availability Group(s):

ISNULL(agstates.primary_replica, '') AS [PrimaryReplicaServerName],
ISNULL(arstates.role, 3) AS [LocalReplicaRole]
FROM master.sys.availability_groups AS AG
LEFT OUTER JOIN master.sys.dm_hadr_availability_group_states as agstates
    ON AG.group_id = agstates.group_id
INNER JOIN master.sys.availability_replicas AS AR
    ON AG.group_id = AR.group_id
INNER JOIN master.sys.dm_hadr_availability_replica_states AS arstates
    ON AR.replica_id = arstates.replica_id AND arstates.is_local = 1


Name PrimaryReplicaServerName LocalReplicaRole (1=primary,2=secondary,3=none) AvGroup_1 MyServer01 2 AvGroup_2 MyServer02 1


How do we get some information about the databases in the Availability Group:

SELECT AS [AvailabilityGroupName],
ISNULL(agstates.primary_replica, '') AS [PrimaryReplicaServerName],
ISNULL(arstates.role, 3) AS [LocalReplicaRole],
dbcs.database_name AS [DatabaseName],
ISNULL(dbrs.synchronization_state, 0) AS [SynchronizationState],
ISNULL(dbrs.is_suspended, 0) AS [IsSuspended],
ISNULL(dbcs.is_database_joined, 0) AS [IsJoined]
FROM master.sys.availability_groups AS AG
LEFT OUTER JOIN master.sys.dm_hadr_availability_group_states as agstates
   ON AG.group_id = agstates.group_id
INNER JOIN master.sys.availability_replicas AS AR
   ON AG.group_id = AR.group_id
INNER JOIN master.sys.dm_hadr_availability_replica_states AS arstates
   ON AR.replica_id = arstates.replica_id AND arstates.is_local = 1
INNER JOIN master.sys.dm_hadr_database_replica_cluster_states AS dbcs
   ON arstates.replica_id = dbcs.replica_id
LEFT OUTER JOIN master.sys.dm_hadr_database_replica_states AS dbrs
   ON dbcs.replica_id = dbrs.replica_id AND dbcs.group_database_id = dbrs.group_database_id
ORDER BY ASC, dbcs.database_name


AvailabilityGroupName PrimaryReplicaServerName LocalReplicaRole DatabaseName SynchronizationState IsSuspended IsJoined AvGroup_1 MyServer01 2 MyDB1 2 0 1 AvGroup_1 MyServer01 2 MyDB2 2 0 1 AvGroup_1 MyServer01 2 MyDB3 2 0 1

So, for example,

If we would like to get the databases that are secondary in the Availability Group,  to be excluded when writes are required:

dbcs.database_name AS [DatabaseName]
FROM master.sys.availability_groups AS AG
LEFT OUTER JOIN master.sys.dm_hadr_availability_group_states as agstates
   ON AG.group_id = agstates.group_id
INNER JOIN master.sys.availability_replicas AS AR
   ON AG.group_id = AR.group_id
INNER JOIN master.sys.dm_hadr_availability_replica_states AS arstates
   ON AR.replica_id = arstates.replica_id AND arstates.is_local = 1
INNER JOIN master.sys.dm_hadr_database_replica_cluster_states AS dbcs
   ON arstates.replica_id = dbcs.replica_id
LEFT OUTER JOIN master.sys.dm_hadr_database_replica_states AS dbrs
   ON dbcs.replica_id = dbrs.replica_id AND dbcs.group_database_id = dbrs.group_database_id
WHERE ISNULL(arstates.role, 3) = 2 AND ISNULL(dbcs.is_database_joined, 0) = 1
ORDER BY  dbcs.database_name


DatabaseName MyDB1 MyDB2 MyDB3

Related Links:

Monitor Availability Groups (Transact-SQL)

AlwaysOn Availability Groups Dynamic Management Views and Functions (Transact-SQL)

AlwaysOn Availability Groups Connectivity Cheat Sheet (describing types of AG replicas and if connection or reads are allowed on secondary DB)

Backups on secondary replicas – Always ON Availability Groups

Categories: DBA Blogs

Life at Pythian as a MySQL DBA

Fri, 2014-05-23 08:00

Several people have asked me in the last year what it is like to work for Pythian. There are many reasons for a question like this, depending on who is asking.

Some people are concerned about the fact I am working for a Canadian-based company.

This typically comes from some of my xenophobic North American peers. But let me assure you, it’s not bad! Pythian is based out of Canada, but has employees around the globe in more than 25 countries. It’s pretty impressive, because Pythian must comply with the work laws of each country.

When you’re first hired at Pythian, you’ll be invited to visit their global headquarters in Ottawa, Canada. This is a great opportunity to get to know your peers, as well as the history of the company and how it operates. The country is beautiful, even if you’re lucky enough to be hired during their winter. Overall, it’s not very different compared to working for any other company in my country, aside from the extreme cold weather and all the talk about hockey and curling.

Besides, I actually like hockey.

Some people are curious about what it’s like working from home.

Pythian is not my first experience working remotely. I have been telecommuting since 2005. I tell these people that it’s not for everyone. Working remotely takes hard work and self-discipline.

When I first transitioned from office-life to working remotely, it was brutal. My productivity plummeted; I rarely began real work before noon. You typically don’t have your boss hovering over your shoulder at home—If you want this, feel free to add a monitor on the wall behind you and make sure your boss uses a webcam.

A remote employee must treat the home office like a real office. That means no wandering to your desk in your pajamas, half asleep. Make a concerted effort to dress for work, just as if you’re going into the office. If you have to, take a drive around the block as part of your commute!

If you have family or friends living with you, make sure they know that while you may be physically in the building, you are off limits except in emergencies.

Communication with colleagues can be challenging, and despite technology, your organization must develop an attitude with dealing with remote employees. At my first company I was among two people working remotely, and staying in the loop was like pulling teeth. Pythian on the other hand, is built with a large portion of its workforce being remote. The company is growing rapidly, and so must its policies. It is a major focus to ensure that all employees are kept up-to-date and in the loop. Communication lines are open using appropriate technologies such as Skype and Google Hangouts ensuring that team members are engaged with each other.

Some people are interested in the type of work I do as a MySQL consultant.

This is the conversation I love to have. Most often it comes from someone I met on the internet, and that’s ok. The best thing about working at Pythian is the sheer amount of technologies I get to work with on a daily basis.

As a MySQL Consultant at Pythian, I deal with customers running MySQL 4.1 through 5.6, Percona Server and MariaDB variations, Galera, Tungsten Replicator and many other solutions to solve some of the internet’s toughest problems. Our clients run datasets from a few gigabytes to multiple terabytes. Our clients run on different operating systems: Solaris, FreeBSD, all flavors of Linux, and even Windows in some cases. Our clients use filesystems ranging from ext2 to xfs and zfs, although we try really hard to persuade against ext2. We provide consulting for environments running on many of the cloud providers and hosting providers. We develop proof of concepts, migration and upgrade plans, performance tuning, capacity planning, high availability and disaster recovery plans, and much more.

Let’s face it: The open source world is ever-changing and new technologies are always being created to interact with the MySQL database or even NoSQL technologies, such as MongoDB. There are relatively few places to work at that can offer exposure and experience to such environments.

Do you have what it takes to be a “Pythianite”?

Pythian is dedicated to developing their employees, also known as Pythianites. Pythian provides ample opportunity for career growth, but this work is not for everyone. It’s very fast paced and at times stressful. There are high expectations for our consultants, and we genuinely have to love your data.

If you think you  have what it takes to work at Pythian, check out our current opportunities.

Categories: DBA Blogs

Should AUD$ Table Exist on System Tablespace During DB Upgrade?

Fri, 2014-05-23 07:59

I see this following message on all My Oracle Support notes, which talks about database Oracle manual upgrade steps.

4.16 Verify the existence of the AUD$ tables
Ensure that if the aud$ table exists that it is in the SYS schema and in the SYSTEM tablespace.
If the AUD$ table is not in SYSTEM tablespace and not owned by the SYS user then before doing the upgrade put it back to the SYSTEM tablespace and it should be owned by SYS .

When I come across the same kind of message even for the latest database version 12cR1 on MOS note 1503653.1, I thought of checking the true functionality of this warning. This doubt seems very valid especially when we have new feature named “DBMS_AUDIT_MGMT”, which can be used to relocate and purge data for aud$ table from oracle rdbms version 10gR1.

I created a database named “test” using rdbms version and enabled database auditing. After few sessions testing, I see records on aud$ table. I moved the table and associated LOB segments after disabling db audit and enable it again after the activity completion.

SQL> SELECT comp_id, status, SUBSTR(version,1,10) version, comp_name FROM dba_registry ORDER BY 1;

——- ——– —————————————- ———————————–
CATALOG VALID Oracle Database Catalog Views
CATPROC VALID Oracle Database Packages and Types

SQL> show parameter audit_trail

———————————— ———– ——————————
audit_trail string NONE

SQL> alter table sys.aud$ move tablespace users
2 LOB (sqltext) STORE AS lobsegment1 (TABLESPACE users )
3 LOB (sqlbind) STORE AS lobsegment2 (TABLESPACE users );

Table altered.

SQL> select count(1) from aud$;


SQL> select distinct tablespace_name from dba_segments where segment_name in(‘AUD$’,'SYS_IL0000000384C00040$$’,'SYS_IL0000000384C00041$$’);


SQL> ALTER SYSTEM SET audit_trail=db, extended scope=spfile;

System altered.

SQL> alter system set audit_sys_operations=TRUE scope=spfile;

System altered.

I configured Oracle home of version and executed the pre-upgrade script “preupgrd.sql” on the 11gR2 oracle home. I didn’t notice any error messages related to this table availability on different tablespace. Reviewed the steps required as per MOS note 1503653.1, omitted many points due to the nature of this database including step 4.16. I never faced any issues with the upgrade and even noticed the aud$ table exists on USERS tablespace only after the upgrade.

SQL> SELECT comp_id, status, SUBSTR(version,1,10) version, comp_name FROM dba_registry ORDER BY 1;

——- ——– —————————————- ———————————–
CATALOG VALID Oracle Database Catalog Views
CATPROC VALID Oracle Database Packages and Types
XDB VALID Oracle XML Database

SQL> select owner,count(*) from dba_objects where status=’INVALID’ group by owner;

no rows selected

SQL> select count(1) from sys.aud$;


SQL> select tablespace_name from dba_segments where segment_name=’AUD$’;


SQL> select owner,count(*) from dba_objects where status=’INVALID’ group by owner;

no rows selected

SQL> show parameter compatible

———————————— ———– ——————————
compatible string 12.0.0

SQL> show parameter audit

———————————— ———– ——————————
audit_file_dest string /u02/app/oracle/product/12.1.0/dbhome_1/rdbms/audit
audit_sys_operations boolean TRUE
audit_syslog_level string
audit_trail string DB, EXTENDED

Checked few other MOS notes, it seems they really look for aud$ table existence of SYS schema only.

FAQ : Database Upgrade And Migration (Doc ID 1352987.1)
Which schema should the AUD$ table exist in while doing the upgrade ?
AUD$ table should exist in SYS schema while doing the upgrade.

So this blog question remains valid. For those who keep bigger aud$ table on dedicated tablespace for better database performance, this relaxation means saving 1-3 hours of production database downtime. Is this the time to ask Oracle Support to review the point 4.16 to check for owner only for aud$ table?

Note: This testing was carried out only from rdbms version 11gR2 to 12cR1. Please test this behavior on your test environment before you prepare action plan for the production upgrade.

Categories: DBA Blogs

Log Buffer #372, A Carnival of the Vanities for DBAs

Fri, 2014-05-23 07:58

Log Buffer Carnival enables readers to see through the minds of database bloggers who are converting their thought process into blog posts. This Log Buffer Edition is the part of same chain. Enjoy.


Tanel Poder is combining Bloom Filter Offloading and Storage Indexes on Exadata.

Randolf talks about 12c Hybrid Hash Distribution with Skew Detection / Handling – Failing.

Kyle talks about 5 Databases issues that are costing you time and money.

Frits Hoogland unfolds as how Exadata Smart Scan works.

A Framework Approach to Building an Oracle WebCenter Intranet, Extranet, or Portal.

SQL Server:

Microsoft adds forecasting capabilities to Power BI for O365

Capping CPU using Resource Governor – The Concurrency Mathematics

Dell Doubles Application Speeds, Processes Transactions 9X Faster with In-Memory OLTP

Architecture of the Microsoft Analytics Platform System

Introducing the AzureCAT PPI Theater at PASS BA


Dean Ellish blogs about using Go with MariaDB.

MySQL 5.6 has added support for EXPLAIN FORMAT=JSON. The basic use case for that feature is that one can look at the JSON output and see more details about the query plan. More advanced/specific use cases are difficult, though.

Real-Time Data Movement: The Key to Enabling Live Analytics With Hadoop.

MySQL Enterprise Monitor 2.3.17 is now available for download on the My Oracle Support (MOS) web site.

If you run multiple MySQL environments on multiple servers it’s a good habit to set your MySQL prompt to double check which server you are on.

Categories: DBA Blogs

Microsoft Analytics Platform System: Name Overhaul in Big Data War!

Wed, 2014-05-21 08:18

I had the chance to attend a course about what used to be called Parallel Data Warehouse (PDW). PDW was introduced few years ago with the offering of SQL Server 2008 R2 Parallel Data Warehouse , something very few people could get their hands on. The appliance used to cost a ton of money, too many parts and only offered by HP or DELL in a pre-configured setup.

With SQL server 2012, Microsoft made many enhancements slashing the size of the appliance to almost half, and consequently the cost, and improving the performance as well.

Just while I was attending the course, Microsoft announced a name change and the introduction of new fancy brand name: Microsoft Analytics platform. CEO Satya Nadella announced the new platform with other products as well. I suggest reading this blog written by Satya himself

I’m sharing some of my personal (again, personal) opinions here about the platform and the appliance. So let’s take a step back and explore some of the basics here.

What is PDW (aka APS)?

Microsoft loves those three letters acronyms, although certifications may be an exception. Not to go to great lengths, it’s a Massively parallel Processing (MPP) “APPLIANCE” that is optimized for large scale enterprise data warehouses. The appliance is a logical entity that encompasses servers , switches , storage enclosed into a rack. Think of it as your typical environment of servers, switches and storage all brought together in one huge rack appliance. The idea behind the appliance is simple: We do all the dirty work for you and give you something “optimized” to handle huge amounts of data without the hassle of configuration, tuning and license worries; Of course the word “optimized” is according to Microsoft terms.


The appliance is not only about SQL server but it also incorporates Hadoop and an engine, PolyBase, to simplify talking to Hadoop using conventional SQL server T-SQL and can also tap HDsinghts to reach out to data stored in the cloud.

The appliance is only available through few vendors, used to be only HP and DELL but now also includes Quantas. Please check resources section for more information about Microsoft Analytics Platform.

The following video vividly tries to demonstrate the usage of APS:
Why PDW/APS appliance ?

Customers with huge amounts of data that also spans heterogeneous sources want always to get meaningful information out of that data. The more the data they have , the harder and longer the time to extract key information. Appliances are tuned machines with massive resources to help analyze , aggregate and join data much faster. Conventional SMP machines can work up to a certain level with much needed tuning and optimization that may not always work. vendors take this tuning and optimizations responsibility and present you a sleek machine that is supposed to overcome multiple design and resources limitations. Some of the examples of existing appliances are Oracle Exadata , Teradata Data Warehouse Appliance and IBM PureData and Netezza.

Are you saying that conventional SQL server setup can’t achieve this? Not entirely. Think of this as car upgrades where they may be based on the same chassis but high-end models have more powerful engines, features and performance. Although SQL server keeps bringing enhancements like updatable clustered columnstore indexes and in-memory OLTP in SQL server 2014 , PDW/APS appliance differs from conventional SMP in the following areas:

  1. PDW/APS appliance is a bundle of hardware and software offering customized to scale out. You can add and remove(much harder though) nodes to scale out to your data needs.Each “node” runs on separate server with seperate SQL server and hardware resources and managed by a “control” node to distribute the workload. You can read about APS Solution Brief here
  2. You can’t buy a PDW SQL server licence and install in your environment and you can not even assemble the product even if you have the blueprint , you just get it from one of the vendors and plug-n-play it.
  3. PolyBase plugs in Hadoop. You may be able to connect existing SQL server with Hadoop but Polycase provides easy to use T-SQL functions to extract data from Hadoop providing almost immediate ability to query Hadoop without a long learning curve.
  4. Many conventional SQL server features are suppressed. Yes , I can see your raised eyebrows but the idea is that Microsoft wanted to remove areas that can introduce bottlenecks such as CLR , Extended dlls and even SQL server agent. If you need to do something that can’t be done inside PDW , such as scheduled jobs, then move it to another tier.

Where does PDW/APS fit?

The cost of buying and running the appliance suggests that it’s not for everyone. The available resources and bundling Ploycase to connect to Hadoop shows that it’s for an enterprise with huge and heterogeneous amounts of data that is spread around. Bringing this data together with least customization is the goal of the appliance.

PDW/APS can help bring data together from following areas:

  • Social apps
  • Mobile
  • Sensors
  • Web
  • Other data sources such as RDBMS

The appliance fits a segment of industries, notably:

  • Retail
  • Healthcare
  • Financials
  • Manufacturing
  • Social media

Microsoft has some case studies about clients deploying PDW and realizing up to 100X performance gain. Here are the case studies:

  1. MEC -Media Firm Uncovers the Value of Digital Media with Parallel Data Warehouse
  2. The Royal Bank of Scotland – Leading UK Bank Gains Rapid Insight into Economic Trends with Analytics Platform System
  3. Progressive Insurance – Progressive Data Performance Grows by Factor of Four, Fueling Business Growth Online Experience

I believe Microsoft has to do more to get the appliance to more customers and I think the brand rename is part of this push. Training, support and deployment materials are also needed since there is not much resources online.
Is PDW worth it for clients?

Microsoft competes against TeraData, Netezza and Oracle Exadata. Cost is a very big factor: Licence and support. MS tends to do fairly well with MS shop customers and SQL server base clients. However, first version of PDW tended to be so expensive and bulky but current appliance is almost half price and half size than used to be. Expertise seems to be low still and this is what MS is working on.

Microsoft word is that instead of investing too much and too long on creating the same technology by trial and error and spend much time on tuning , here’s a working appliance that we have tuned it for you and just focus on your business. Per following chart, MS claims the cheapest price per TB compared to other vendors.


Regardless , the appliance makes more sense to SQL server clients even more ; however , I still see clients sticking to solutions from vendors of the main RDBMS technology they run since importing data will easier and learning curve will be less steep.

The appliance will make a case for mid to large enterprises with new Terabytes of data each month including unstructured data. SMP SQL and APS may correlate in the region of few to tens of terabytes but once we talk about hundreds of terabytes of data including unstructured data then APS starts to make sense.
PDW Against Cloud?

There are few reasons clients may opt for an in-premise appliance, including :

  1. Some data is sensitive to trust putting in the cloud.
  2. The amount of data is huge to upload to cloud.
  3. Cloud is not mature yet to support all features.

Most cloud vendors are not yet ready to offer such expensive appliance. Amazon, however, has something similar called Redshift. They actually make a case against on-premise solution saying you don’t need those bulky expensive appliances that can break, while you can use Amazon cloud solution and pay “less”. However, there are few points :

  1. MS appliance tries to play in the lower segment of cost.
  2. You don’t need lots of DBAs to manage. In fact, I was surprised that MS took out most of the the parts that can introduce problems with the appliance : no CLR, no extended modules, many other features were disabled. They applied best-practices and locked it so you can not temper with the appliance. I was told that the only thing that can cause APS to fail is a hardware failure and we already have redundancy so a failure is even less probable
  3. Not everyone wants their data in cloud , mostly for security. I was told about a customer who wanted to destroy , with a hammer, the hard drives after testing the appliance. It took few weeks to zero write the drives , few times.
  4. Transferring a lot of data to public cloud is not that fast , unless you are already hosting your data in same location such as on Amazon.

APS VS Hadoop

Hadoop excels in non-structred data such as text , sensors data , web crawling ..etc and whether you already have existing Hadoop lusters running or plan to , you may still have valuable relational data stored in your existing SQL server instances. APS makes it easy to bridge the gap between the two and use T-SQL to join the data from the two sources without worrying much about the design of Hadoop cluster.

APS region share

USA is top then EU then Asia.
APS and SQL 2014

The appliance still runs a customized version of SQL server 2012 , no word yet when SQL 2014 will be introduced. Upgrading the cluster is supported but it is not something end-customer can do yet.
PDW support

Many cases are still only supported by Microsoft Customer service and support (CSS) like failing back and downsizing but they are trying to automate some tasks to be done by experienced DBAs.
DBA role with APS

Don’t hold your breath! As outlined before, much of the configuration and optimizations are done for you in what MS believes is the best balance. You can not do many of the tasks a DBA does like changing Tempdb configurations, max degree of parallelism, etc. That’s the idea behind the appliance in fact. Focusing on bringing the data to the appliance and modifying your design to extract information.

When I did work with the appliance , I didn’t find it particularly complex from an operations point of view. Most of the work is about bringing data in, figuring out the best way to join data together without shuffling data around and that’s about understanding the data and business logic.

You can help in the following areas though:

  • Exporting> importing data from existing data sources into APS
  • Advise any changes to the design and business logic to comply with appliance requirements
  • Advise how to bring data together for analysis.
  • Design and implement data extraction practices.
  • Troubleshoot analysis routines and schedules.
  • Troubleshoot long running queries and schedules. APS has a web portal where you can see all running processes; you can also use DMVs to get this information and some of them are unique to APS.


APS appliance targets customers with huge amount of data that span heterogeneous sources who need read to plug solution.

With the new brand of the PDW appliance, Microsoft is getting more serious about big data and analytics. However, many vendors are well-established here and it’s still a long run in a market that’s expected to explode if it has not started yet.

Happy reading!

Categories: DBA Blogs

Everyone Should Have Firefighter Mentality

Wed, 2014-05-21 07:43

Australia, being the most arid country in the world is known for its bushfires. The part of Australia where I live in doesn’t have many bushfires, but we do have several building fires, like any other part of the world.

Firefighter is one occupation which many kids want to be when they grow up. Kids are inspired by the shiny, clean, smart outwardly appearances of the firefighter crews passing majestically with sirens ringing in their immaculately clean trucks.

While volunteering for one of the fire stations in my suburb on a quiet day, I found those firefighters doing nothing but polishing their trucks, cleaning their stuff, making sure everything is in optimal order, and waiting for the emergency on their toes.

No matter what field you are in, what profession you are following; No matter on which step of corporate ladder you are; If you are a full time employee, a contractor, or a business owner, it is a fact that there are ample quiet times. Small chunks of hushed, still periods during each workday.

Those still moments are the “firefighter” time. Don’t let that time go to waste. Clean your desk and your tools. Polish your skills, and think about yourself and your career. Identify your areas for improvement, and grow your network. Read, write, or help out others.

In other words, get ready for that fire.

Categories: DBA Blogs

Log Buffer #371, A Carnival of the Vanities for DBAs

Fri, 2014-05-16 07:56

Fueled by massive growth of data and propelled by mammoth future potential of its application, bloggers across the database technologies are finding new things to explore in the light of previous knowledge. This Log Buffer Edition covers that all.


To configure an instrumentation watch, you first need to know what instrumentation is, and how to instrument applications or servers.

Why Choose to Deploy Agile PLM in the Cloud?

One of the things that makes JSF different from other approaches to server-side Java web development is that it offers a much higher level of abstraction and transparently maintains UI state.

Step by step instructions for setting up a development environment for doing development with Hippo CMS.

Oracle Solaris 11.2 at the Atlanta OpenStack Summit

SQL Server:

RAID and Its Impact on your SQL Performance.

Microsoft Azure Diagnostics Part 1: Introduction

Using Encrypted Backups in SQL Server 2014

A new plug in for Management Studio from Red Gate is free. It will give you access to all the scripts at SQLServerCentral, including your own briefcase.

Validate File Properties with PowerShell Prior to Import into SQL Server


Benchmark: SimpleHTTPServer vs pyclustercheck (twisted implementation)

Archival and Analytics – Importing MySQL data into Hadoop Cluster using Sqoop

Cross your Fingers for Tech14, see you at OSCON

New Tungsten Replicator 2.2.1 now available

MySQL May Newsletter is Available!

Categories: DBA Blogs

Oracle Passwords: How to Use Punctuation Symbols

Wed, 2014-05-14 07:04

You can’t make a password strong enough. But at least you can try. Having at least one upper case character, one lower case character, one number, one punctuation mark or symbol, and greater than 8 characters, you can have a password which can be considered something decently safe, probably.

In Oracle, if you embed punctuation marks within your passwords, you have to use single quote around that password if you are using orapwd utility from command line. If you are altering the password from the sqlplus utility, you need to use the double quotes around the password.

Example of both is given below, as sys password is being changed:

From orapwd:

$ORACLE_HOME/dbs $ orapwd file=orapwtest password=”My1s!pass” entries=40  force=y
-bash: !pass”: event not found

$ORACLE_HOME/dbs $ orapwd file=orapwtest password=’My1s!pass’ entries=40  force=y

Changed successfully.

From sqlplus:

SQL> alter user sys identified by ‘My1s!pass’;
alter user sys identified by ‘My1s!pass
ERROR at line 1:
ORA-00988: missing or invalid password(s)

SQL> alter user sys identified by “My1s!pass”;

User altered.


Categories: DBA Blogs

The Hadoop and the Hare

Mon, 2014-05-12 07:38

I speak to a lot of people who are terribly concerned with “real-time”. Can data get into the warehouse in real-time? Can we record everything the user does on the site in real-time? Real-time is a magic phrase, because it really means “so fast that nobody cares”. In a business sense, it usually means that nobody has really thought about the latency requirements, so they’re going to push for the lowest latency possible. This is a dangerous habit, and you need to fight against it at every turn. Some of the best, cheapest, most scalable solutions will only be available when you accept that you can wait for an hour (or even hours!) to see some results.

Case in point, a client asking for a NoSQL store to hold user event data. It would have to handle millions of concurrent users, all generating dozens of events within minutes. They were (rightly) concerned about write performance and scaling this NoSQL store horizontally to handle the volume of writes. When asked why they needed the events, they didn’t know: they wanted to do exploratory analytics and see if there were interesting metrics to be mined. I proposed dumping the events on a queue – Kafka, Kinesis – or just writing logs for Flume to pick up. Rather than asking a data store to make all of these messages available for querying immediately, do things asynchronously, run some batch jobs to process data in Hadoop, and see if interesting data comes out. The Hadoop cluster and the ingestion pipeline is an order of magnitude less expensive and risky than the cool, real-time solution which confers no immediate benefits.

Maybe this client will decide that batch analytics are good enough, and they’ll stick with using Hive or Impala for querying. Maybe they discover a useful metric, or a machine learning technique they want to feed back into their user-facing application. Many specific metrics can be stored as simple counters per user, in a store like Redis, once they’ve been identified as valuable. Machine learning techniques are interesting because (besides cold start situations) they should be stable: the output shouldn’t change dramatically for small amounts of new data, so new information only needs to be considered periodically. In both cases, the requirements should be narrowed down as far as possible before deciding to invest in “real-time”.

In another case, a client presented the classic data warehousing problem: store large volumes of events with fine grain, and slice and dice them in every imaginable way. Once again, the concept of reducing latency came up. How quickly can we get events into the warehouse so people can start querying them? The answer is that we can make anything happen, but it will be needlessly expensive and malperformant. The main goal of the project was to provide historical reporting: there was no indication that clients would want to do the same kind of pivots and filtering on data only from the past minute. The low latency application would be cool and interesting to develop, but it would be much simpler to find which reports  users want to be low-latency, then precompute them and store them in a more appropriate way.

A more appropriate way is Dynamo, Cassandra, or your preferred NoSQL key-value store. Precompute aggregates you know the user wants with very low latency, and store them keyed on time with a very fine grain: you have the benefit of high write throughput here, but at the cost of little query complexity. Once the data is no longer interesting – it’s visible in the warehouse with much more grain along the various dimensions – then drop it from NoSQL.

Starting with a relatively slow, batch platform gives very high flexibility at a low cost, and with little development effort. Once your users – internal or clients – have explored the data set and started complaining about latency, then is the time to build specialized pipelines for those specific use cases. Since Kafka, Kinesis and Flume all support streaming computations as well as batch, these will serve as good connection points for your “real-time” pipeline.

Categories: DBA Blogs

Log Buffer #370, A Carnival of the Vanities for DBAs

Fri, 2014-05-09 07:51

With promising clouds hovering over the skies of database technology, this Log Buffer Edition gathers some selective droplets for Oracle, SQL Server and MySQL.

ACM: organizing the chaos and flexing rigid process structures through adaptive case management by Lucas Jellema.

How to Build GlassFish 4 from Source.

Make Your Oracle Database Run Faster on Oracle Linux.

How to: Run Repository Creation Utility (RCU) on Solaris.

A Customer Centric Approach To Supply Chain: Supply Chain Journey Mapping (SCJM).

SQL Server:

Stairway to PowerPivot and DAX – Level 1: Getting Started with PowerPivot and DAX.

Free eBook: SQL Server Backup and Restore.

12 SQL Server XML Questions You Were Too Shy To Ask.

Context Is Everything – Check Your Settings.

How column COLLATION can affect SQL Server query performance.


Many Sphinx users utilize MySQL as a data source. Divendo is one of them.

Another reason for Xtrabackup error “log block numbers mismatch”.

MySQL Audit Plugin now available in Percona Server 5.5 and 5.6.

New Release Webinar on May 13th: Introducing ClusterControl 1.2.6 – Live Demo.

The Perfect Server – Ubuntu 14.04 (nginx, BIND, MySQL, PHP, Postfix, Dovecot and ISPConfig 3).

Categories: DBA Blogs

Concurrent Processing Issues with Distributed Transactions Glitch

Thu, 2014-05-08 10:58

This is my first post in Pythian blog, and I wanted to share interesting use-cases that we’ve dealt with recently. I believe someone may see similar cases in the future, so these are the troubleshooting steps to get to the source and apply workaround to fix it. So, the root problem was that concurrent processing doesn’t start up, and Internal Manger log file reports error (same error also throwing Oracle forms trying to access Concurrent -> Administer form)

CP-GENERIC ORACLE ERROR (ERROR=ORA-01591: lock held by in-dou) (ROUTINE=get_lk_handle)

ORA-01591 error clearly identifies lock held by in-doubt distributed transaction string; however, DBA views DBA_2PC_PENDING and DBA_2PC_NEIGHBORS return no rows (also sys.pending_trans$ and sys.pending_sessions$ have no rows). If any of these views return any row, then most probably these transactions can be committed, rolled back or purged and the problem would be resolved. However, this wasn’t the case – so this must be some kind of glitch.

The first document to review is Master Note for Troubleshooting Oracle Managed Distributed Transactions (Doc ID 100664.1), but this gives no valuable information, and neither does the much detailed Manually Resolving In-Doubt Transactions: Different Scenarios (Doc ID 126069.1). Next step is to find out where ORA-01591 is coming from. Here, a database trace is very useful tool. I was lucky this time as error is thrown in forms, so it’s 2 clicks away from getting trace. If it’s not so easy in your case, the SERV_MOD_ACT_TRACE_ENABLE procedure can be used to get trace. In my case it was:

PARSING IN CURSOR #139830878745608 len=66 dep=1 uid=0 oct=3 lid=0 tim=1398844275853403 hv=1493897133 ad='1c121bb90' sqlid='19x1189chq3xd'
PARSE #139830878745608:c=0,e=137,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=1,plh=0,tim=1398844275853402
BINDS #139830878745608:
oacdty=01 mxl=128(128) mxlc=00 mal=00 scl=00 pre=00
oacflg=03 fl2=1206001 frm=01 csi=31 siz=128 off=0
kxsbbbfp=7f2ce9db2d80 bln=128 avl=11 flg=05
ERROR #139830878745608:err=1591 tim=1398844275855101

Next, trying to execute the same SQL manually in sqlplus I got the following error:

ERROR at line 1:
ORA-01591: lock held by in-doubt distributed transaction 35.15.13775

So now I have some details on ORA-01591 error. I know which transaction is holding it – but it’s not displayed in any of DBA_2PC _ views. Also trying to purge, commit or rollback this transaction will end in error as there is no full data about this transaction:

ERROR at line 1:
ORA-06510: PL/SQL: unhandled user-defined exception
ORA-06512: at "SYS.DBMS_TRANSACTION", line 96
ORA-06512: at line 1
SQL> COMMIT FORCE '35.15.13775';
COMMIT FORCE '35.15.13775'
ERROR at line 1:
ORA-02058: no prepared transaction found with ID 35.15.13775
SQL> rollback force '35.15.13775';
rollback force '35.15.13775'
ERROR at line 1:
ORA-02058: no prepared transaction found with ID 35.15.13775

Trying to dig deeper, MoS note DBMS_LOGSTDBY.BUILD Seems to Hang And Does Not Return To SQL prompt. (Doc ID 747495.1) got my attention. Standby database is not used in this configuration, but the fact that x$ktuxe internal view contains information about transactions and query against it soon gave some results:

---------- ---------- ---------- ---------------- ------------------------

This is exact transaction where we identified earlier with trace (ORA-01591: lock held by in-doubt distributed transaction 35.15.13775). Once transaction is identified there is a way to get rid of it, and there are already a few options:

Using the same technique, I inserted dummy data into pending_trans$   and pending_sessions$ tables. Then purge_lost_db_entry solved the case, and the dummy entry was removed, along with references in x$ktuxe table (see examples in above sources I mentioned).  According to the oracle-L comments, this is Oracle Support provided workaround to deal with a particular SR case – so most probably this is not a supported way to deal with such cases.

In my case, we didn’t know what created the half-purged transaction, and this test system was needed ASAP for training, so this method really saved the day.

Categories: DBA Blogs

12c Multi-tenant Environment and DBRM for Container Database

Wed, 2014-05-07 09:15

In multi-tenant environment, Database Resource Manager (DBRM), at the Container Database (CDB) level enables us to manage the resources like CPU usage and number of parallel execution servers among the plugabble databases (PDBs). Then within each PDB, DBRM enables us to manage resources like CPU, parallelism and managing the runaway queries which exceed the certain thresholds of CPU, physical IO, logical IO or elapsed time.

In 12c version of Oracle database, at the CDB level, a CDB plan is implemented. This CDB plan allocates resources to the PDBs. A CDB plan is made up of directives, with each directive aimed at a single PDB. The directive controls the allocation of CPU and Parallel execution servers. The default CDB plan is DEFAULT_CDB_PLAN. CDB plan deals in share values and utilization limits.

Shares are like counters. More the share a PDB has, the more resources it would enjoy. The utilization limit for a PDB limits resource allocation to the PDB. By default, share value for each PDB is 1, and utilization limit is 100. Utilization limit restrains the system resource usage of a specific PDB. parallel_server_limit and PARALLEL_SERVERS_TARGET are used to limit parallel servers.

Example of a CDB Plan:

Following example enforces a CDB plan ‘cdb_plan’ for two databases ‘PROD’ and ‘DEV’. Prod has 2 shares and 100% utilization limit, whereas DEV has half of it’s shares i.e. 1 and 50% of utilization limit. Pending area is just a staging area to create, edit, and validate the plans.


plan    => ‘cdb_plan’,
comment => ‘CDB plan for CDB’);

plan                  => ‘cdb_plan’,
pluggable_database    => ‘Prod’,
shares                => 2,
utilization_limit     => 100,
parallel_server_limit => 100);

plan                  => ‘cdb_plan’,
pluggable_database    => ‘Dev’,
shares                => 1,
utilization_limit     => 50,
parallel_server_limit => 50);



You may use the DBA_CDB_RSRC_PLANS view to display all of the CDB resource plans defined in the CDB.

Categories: DBA Blogs

Dream DBA

Mon, 2014-05-05 08:01

When I was 6 I dreamed about dinosaurs – the bigger ones, like the T-Rex. I loved them. As I reached 16, I dreamed about the girl next door, and everything else was simply invisible. When I was 26, I was already a DBA of few years, going through the final years of being a rookie. I dreamed about becoming a true DBA. That’s a night dream and a day dream. A dream which I am still living day in and day out. As I have crossed 36, the journey of dreams go on.

I believe that if you want to be a champion DBA, you need to be a dreamer. What databases are today and what they would be tomorrow, will only be shaped by the database dreamers. Without these dreamers, these databases would be just a bland labor of bits and bytes. Dreamers live in an ideal world where they strive to create and nurture the things around them. So a dreamer’s database is the stainless vision of its lofty ideals.

A dreamer DBA cherishes his visions. A dreamer DBA only hears that music which stirs the chords of heart. A dreamer DBA only adopts what’s most beautiful for his database. He only drapes the configuration of his darling database with purest of parameters, for out of them grows all delightful conditions, a heavenly optimized environment, and when he remains true to it, an ideal database system gets created.

So dream big. Dream ideally, because a database dream is the prophecy which will dawn one day, and dawn it will.

Everything initially is a dream. Rose is a dream in a seed, initially. A Rainbow Lorikeet sits like a dream in an egg, for some time. At the start, every database is a bunch of files and processes. Let your dreams flourish and evolve positively, and channel them to become great implementations. From dreams, originate realities.

If you are a DBA, mired in crisis-hit databases; you only need to worry about if you don’t dream of improvement and resolution. If in difficult circumstances, you are dreaming about refinements, solutions, and elegance; only then you will be available to achieve that. To aspire is to achieve. To dream is to get. Dreams are kind of thoughts. Keep them positive and hopeful, your acts will get aligned accordingly, and you will be the victor; a true champion DBA.

If you are looking at today’s big names in databases and wondering in awe, that’s alright. But don’t attribute that to luck, fortune or chance. Don’t say that how lucky those cool DBA kids are, because they are not. They dreamed big, and then followed their dreams, and dreams enabled them do great things. They have their share of struggles and failures, but they persisted. They put in efforts and got the results. They were able to do that, because they didn’t want their dreams to crash.

You will become what your dreams are.

Categories: DBA Blogs

Log Buffer #369, A Carnival of the Vanities for DBAs

Fri, 2014-05-02 08:23

The Log Buffer carries on with the sparkling blog posts gathered from all across Oracle, SQL Server, and MySQL.

The VISIBILITY column in USER_INDEXES is used to check that each change has worked.

Oracle Utilities Application Framework V4. has been released with Oracle Utilities Customer Care and Billing V2.

There has been a good deal of activity in the Enhancement Request (ER) Community which is being piloted in Procurement.

A personalized customer experience is already beginning to overtake price and product as the key brand differentiator.

A common request is to produce query zones which are aligned with the base product zones.

SQL Server:

How to centralize your SQL Server Event Logs.

Use Extended Events to Get More Information About failed SQL Server Login Attempts.

DAX: Topn is not a Filter

Sometimes, you just want to do a search in a SQL Server database as if you were using a search engine like Google.

Stairway to SQL PowerShell Level 6: PowerShell Modules.


Name Change: MySQL Connect Becomes “MySQL Central @ OpenWorld”.

How to identify and cure MySQL replication slave lag.

MySQL, the original brand, the one developed by the MySQL team at Oracle, is steadily evolving.

Labor Day: Let’s upgrade a Galera Cluster.

Upgrading temporal columns from MySQL 5.5 to MySQL 5.6 format.

Categories: DBA Blogs

What to Consider When Selecting a Data Service Provider

Fri, 2014-05-02 07:41

Today’s blog post completes our series dedicated to data infrastructure outsourcing, with excerpts from our latest white paper. In the first two posts, we discussed the growing trend toward data outsourcing, and conducted a high level analysis of the strengths and weaknesses of the three sourcing options.

To protect the wellness of an organization’s data infrastructure, there are several criteria that should be evaluated. One critical element is the quality of talent.

The more knowledge and skills team members can bring to the table, the more they can add value to any project they’re involved with. Both breadth and depth of experience are critical to ensure they have a solid understanding of the client’s environment and know how to get the most value out of it right from the start.

Top DBAs and infrastructure experts can command high fees. However, the higher short-term cost associated with investing in quality is more than made up for by the productivity and efficiency gains they deliver. For example, the more experience team members have with systems similar to a new client’s, the faster they can identify processes that can be easily automated, resulting in immediate cost savings and freeing them up to focus on more strategic activities.

To access the full list of criteria, download the rest of our white paper, Data Infrastructure Outsourcing.

Read the first blog post, The Growing Trend Toward Data Infrastructure Outsourcing.

Read the second blog post, Developing a Data Infrastructure Sourcing Strategy.

Categories: DBA Blogs

Watch: Advanced Hadoop Security Features

Wed, 2014-04-30 07:41

As a follow up to a previous video, Pythian CTO Alex Gorbachev gives an overview of the advanced security features within Hadoop.

Categories: DBA Blogs

Watch: Hadoop in the Cloud

Tue, 2014-04-29 08:01

The Pythian team has received many questions about big data in the cloud, and specifically about Hadoop. Pythian CTO, Alex Gorbachev shares some of his recommendations in our latest video:

Categories: DBA Blogs

Watch: In-Memory Option for Oracle 12c

Mon, 2014-04-28 08:04

Back in September at Oracle OpenWorld 2013, Larry Ellison announced the Oracle Database In-Memory Option to Oracle 12c. Today, one of Pythian’s Advanced Technology Consultants, Christo Kutrovsky shares his thoughts on the new feature in our latest video. Stay tuned for updates while it’s being tested out in the field.

Categories: DBA Blogs