Skip navigation.

Pythian Group

Syndicate content
Official Pythian Blog - Love Your Data
Updated: 4 hours 58 min ago

Pillars of Powershell #2: Commanding

Tue, 2015-03-31 07:28

This is the second blog post as a continuance in the series on the Pillars of PowerShell. In the initial blog post we went over the various interfaces that can be used to work with PowerShell. In this blog post we are going to start out by going through a few terms you might find when you start reading up on PowerShell. Otherwise, I will go over three of the cmdlets you will find can be used to discover and get documentation on the cmdlets available to you in PowerShell.

Pillar 2: Commanding

The following are a few terms I will use throughout this series, and ones you might find referenced in any reading material, which I wanted to introduce so we start out on the same page:

  • Session
    When you open PowerShell.exe or PowerShell_ISE.exe it will create a session for you, essentially a blank slate for you to build and create. You can think of this as a query window within the SQL Server Management Studio.
  • Cmdlets
    Pronounced “command-lets”, these are the bread and butter of PowerShell that allow you to do things from getting information or manipulating it. Microsoft has coined the format of Verb-Noun for the cmdlets and it has pretty much stuck. Each version of PowerShell as it is released comes with additional cmdlets, and then product teams like SQL Server and Active Directory are also releasing cmdlets to allow you to interact with them through PowerShell. Each time you open a session with PowerShell there are a set of core cmdlets that are automatically loaded for you.
  • Module
    A module is basically just a set of cmdlets that can be added within your session of PowerShell. When you load a module into your session, the commands are then made available to you. If you close that session and open a new one, you will then have to reload that module to access the commands again.
  • Objects
    PowerShell is based on .NET, this is what is behind the scenes more or less, thus with .NET being an object oriented language PowerShell treats the data that is returned as objects. So, if I use a cmdlet to return the processes running on a machine, each process that it returns is as an object.
  • The Pipeline
    This is named after the symbol used to connect cmdlets together, “|” (vertical bar on your keyboard). You can think of this like a train, each train car represents a set of objects and each car you pass it through can do something to each object until you reach the end.
  • PowerShell Profile
    This is basically the ability to customize your PowerShell session each time you open or start PowerShell. You can do things such as pre-load modules, create custom bits of PowerShell code for reuse or easy access, and many other things. I would compare this to your profile in Windows that keeps up with things like the icons or applications you have pinned to the taskbar or the default browser.

Now I want to take you through a few core cmdlets that are used most commonly to discover what is available in the current version of PowerShell or the module you might be working with in it. I tend to use these commands almost every time I open PowerShell. I do not try to memorize everything, especially when I can look it up so quickly in PowerShell.


This cmdlet does exactly what you think it does, gets a list of commands that are available in your current PowerShell session. You can use the parameters of this cmdlet to filter the list down to what interest you, say all the “get” cmdlets:

Get-Command -Verb Get -CommandType Cmdlet
p2_get-command-verb-get Get-Help

Now you might be wondering where the documentation is for all of the cmdlets you saw using Get-Command? Where is the Books Online equivalent to what you get with SQL Server? Well unlike SQL Server you can actually get the documentation via the cmdlet Get-Help. This cmdlet can return the information to you or you can use a parameter to open it up in the browser, if it is available. So for example one of the best things to look up documentation on initially is the Get-Help cmdlet itself:


The output of this command is good to read through but the main items I want to pull out are three particular parameters:

  1. Online: This will take you to the TechNet page of the documentation for the cmdlet. This may not work with every cmdlet you come across but if Microsoft owns it there should be something.
  2. Examples: This is going to provide a few examples and descriptions of how you can use the cmdlet and the more common parameters.
  3. Full: This will show you pretty much what document is online. This just keeps you in PowerShell instead of view it in the browser.

So let me try bringing up the examples of the Get-Help cmdlet itself:

Get-Help Get-Help -Examples

If you are using Windows 7 OS or higher, you may receive something similar to this screenshot. This is something that was added in PowerShell version 3.0, the cmdlet Update-Help. This cmdlet is used to actually update the help files on a computer as needed. In the event Microsoft updates the help files, or the online TechNet page, you can use this to download a current version of it. Microsoft has moved to this method in place of trying to do the updates locally with cumulative updates or service packs. It does require Internet access to execute the cmdlet. If your machine is not on the Internet you can download them from Microsoft’s download center. In order to fix the above message I just need to issue the command: Update-Help.

You should see a progress bar as shown below while it is running through updating all the files (which that progress bar is actually done using PowerShell):


I ended up getting two errors for certain modules and this is because I am not running the cmdlet with elevated privileges. If you open PowerShell.exe with the “Run As Administrator” option and then execute the cmdlet again it will be able to update all help files without error.

Now if you run the previous command again you should see the actual examples, although if you notice it can be annoying to try and scroll back up to read all that information. A tidbit I did not know about right away was that there is a parameter in Get-Help that opens up a separate window, which makes it easier to read called, “-ShowWindow“. It is basically the “-Full” output but with the option to filter out sections that do not interest you.

Get-Help Get-Help -ShowWindow

You actually can use Get-Help to search for cmdlets as well. I tend to do this more than trying to use Get-Command just because it is a bit quicker. You can just issue something like this to find all the Get” cmdlets:

Get-Help get*

One more thing about the help system in PowerShell, it also includes things called “about files” that are basically concept topics that go deeper into certain areas. They offer a wealth of information and you can also get to these online. Something for you to try on your own to see what is available is just issue this command:

Get-Help about*

This cmdlet is a little gem that you will use more than anything. If you pipe any cmdlet (or one-liner) to Get-Member it will provide you a list of the properties or methods available to you for the object(s) passed. This cmdlet also includes a filter of “-MemberType” that I can use to only return the properties available to me. The properties are those that we can “select” to return as output or pass to other cmdlets down the pipe.

Get-Command | Get-Member -MemberType Property

I am only going to touch on this cmdlet. It can be used to output objects into a table like view, that also offers some filtering attributes too. There are a few different Out-* cmdlets that are available to you for outputting information to various destinations. You can find these using the Get-Command or Get-Help cmdlets. To use Out-GridView on a Windows Server OS you will have to add the PowerShell ISE feature. You will get an error stating as much if you do not.

Get-Command | Out-GridView

get-comand-get-member-out-gridview Summary

The three cmdlets Get-CommandGet-Help, and Get-Member that I spoke on above are ones I think you should become very familiar with and explore them deeply. Once you master using these it will provide you the ability to find out anything and everything about a cmdlet or module that you are trying to use. When you start working with various modules such as Azure or SQL Server PowerShell (SQLPS) these cmdlets are quite useful in discovering what is available.

Categories: DBA Blogs

Pillars of PowerShell #1: Interacting

Tue, 2015-03-31 06:55

PowerShell is a tool that if adopted can be used to help automate and standardize processes in your Windows and SQL Server environment (among other things). This blog series is intended to show you some of the basics (not all of them) that will get you up and running with PowerShell. I say not all of them, because there are areas in PowerShell that you can go pretty deep in, just like SQL Server. I want to just give you the initial tools to get you on your way to discovering the awesomeness within PowerShell. I decided to go with a Greek theme, and just break this series up into pillars. In this first blog post I just wanted to show you the tools that are available to allow you to interact with PowerShell itself.

Pillar 1: Interacting

Interacting with PowerShell is most commonly issuing commands directly on the command line interface (CLI), the step above that would be building out a script that contains multiple commands. The first two options are available “out-of-the-box” on a Windows machine that has PowerShell installed. After this, you have a few third party options available to you that I will point out.

  1. PowerShell.exe
    This is the command prompt (or console as some may call it) that most folks will spend their day-to-day life entering what are referred to as “one-liners”. This is the CLI for PowerShell. You can access this in Windows by going through the Start Menu, or just type it into the Run prompt.
  2. PowerShell_ISE.exe
    This is the PowerShell Integrated Scripting Environment and is included in PowerShell 2.0 and up. This tool gives you the ability to have a script editor and CLI in one place. You can find out more about this tool and the various features that come with each version here. You can access this similar to the same way you would PowerShell.exe. In Windows Server 2008 R2 and above though this it is a Windows Feature that has to be added or activated before you can use it.
  3. Visual Studio (VS) 2013 Community Edition + PowerShell Tools for Visual Studio 2013
    VS 2013 Community is the free version of Visual Studio that includes the equivalent functionality of Visual Studio Professional Edition. Microsoft opened up the door for many things when they did this, the main one being that you can now develop PowerShell scripts along side your C# or other .NET projects. Adam Driscoll (PowerShell MVP) developed and released an add-on specifically for VS 2013 Community that you can get from GitHub, here.
  4. Third party ISE/Editors
    The following are the main players in the third party offerings for PowerShell ISE or script editors. I have tried all of them before, but since they only exist on the machine you install them on I tend to stick with what is in Windows. They have their place and if you begin to develop PowerShell heavily (e.g. full project solutions) they can be very useful in the management of your scripts:


This was a fairly short post that just started out with showing you what your options are to start working and interacting with PowerShell. PowerShell is a fun tool to work with and discover new things that it can do for you. In this series I will typically stick with using the CLI (PowerShell.exe) for examples.

One more thing to point out is the versions of PowerShell currently released (as of this blog post) are 2.0, 3.0, and 4.0. The basic commands I am going to go over will work in any version, but where specific nuances exist between each version I will try to point out if needed.

Categories: DBA Blogs

PowerShell Script to Manipulate SQL Server Backup Files

Tue, 2015-03-31 06:39

I use Ola Hallengren’s famous backup solution to back up my SQL Server databases. The destination for full backups is a directory on local disk; let’s say D:\SQLBackup\

If you are familiar with Ola’s backup scripts, you know the full path for backup file looks something like:


Where InstanceName is a placeholder for the name of the SQL server instance, similarly, DatabaseName is for the Database Name.


Depending upon my retention period settings, I may have multiple copies of full backup files under the said directory. The directory structure is complicated too (backup file for each database is under two parent folders). I want to copy the latest backup file (only) for each database to a UNC share and rename the backup file scrubbing everything but the database name.

Let’s say the UNC path is \\RemoteServer\UNCBackup. The end result would have the latest full backup file for all the databases copied over to \\RemoteServer\UNCBackup with files containing their respective database names only.


I wrote a PowerShell script to achieve the solution. This script can be run from a PowerShell console or PowerShell ISE. The more convenient way would be to use PS subsystems and schedule a SQL Server agent job to run this PowerShell script. As always, please run this on a test system first and use at your own risk. You may want to tweak the script depending upon your requirement.




Script Name: CopyLatestBackupandRename.ps1                       

Author     : Prashant Kumar                           

Date       : March 29th, 2015


Description: The script is useful for those using Ola Hallengren’s backup solution.

             This script takes SQL Server full backup parent folder as an input,

             a remote UNC path as another input and copies the latest backup file

             for each database, renames the backup file to the remote UNC path.



This Sample Code is provided for the purpose of illustration only and is not

intended to be used in a production environment. THIS SAMPLE CODE AND ANY






#Clear screen



#Specify Parent folder where Full backup files are originally being taken

$SourcePath = ‘D:\SQLBackup\InstanceName’


#Specify UNC path ot network share where backup files has to be copied

$UNCpath = ‘\\RemoteServer\UNCBackup’


#Browse thru subfolders (identical to database names) inside $SourcePath

$SubDirs = dir $SourcePath -Recurse | Where-Object {$_.PSIsContainer} | ForEach-Object -Process {$_.FullName}


#Browse through each sub-drorectory inside parent folder

ForEach ($Dirs in $SubDirs)


    #List recent file (only one) within sub-directories

       $RecentFile = dir $Dirs | Where-Object {!$_.PSIsContainer} | Sort-Object {$_.LastWriteTime} -Descending | Select-Object -First 1


    #Perform operation on each file (listed above) one-by-one

       ForEach ($File in $RecentFile)



              $FilePath = $File.DirectoryName

              $FileName = $File.Name


        $PathToCopy=($filepath -replace [regex]::Escape($SourcePath), $UNCpath)+‘\’


        #Forecfully create the desired directory structure at destination if one doesn’t exist

        New-Item -ItemType Dir -Path $PathToCopy -Force


        #Copy the backup file

        Copy-Item $FileToCopy $PathToCopy


        #Trim the date time from the copied file name, store in a variable

        $DestinationFile = $PathToCopy+$FileName

        $RenamedFile = ($DestinationFile.substring(0,$DestinationFile.length-20))+‘.bak’


        #Rename the copied file

        Rename-Item $DestinationFile $RenamedFile








Categories: DBA Blogs

SQL Server 2012 SP2 Cummulative Update 5

Tue, 2015-03-31 06:20

Hey folks,

Microsoft released the 5th Cummulative Update for SQL Server 2012 SP2. This update package contains fixes for 27 different issues, distributed as follows:




One very important issue that was fixed on this CU release was KB3038943 –   Error 4360 when you restore the backup of secondary replica to another server in AlwaysOn Availability Groups.

If you use SQL Server 2012 SP2 Always On and you offload your log backups to the secondary node it is recommended that you apply this patch!

The full Cummulative Update release and the download links can be found here:


Categories: DBA Blogs

Log Buffer #416, A Carnival of the Vanities for DBAs

Mon, 2015-03-30 12:29

This log buffer edition sprouts from the beauty, glamour and intelligence of various blog posts from Oracle, SQL Server, and MySQL.


Oracle Exadata Performance: Latest Improvements and Less Known Features

Exadata Storage Index Min/Max Optimization

Oracle system V shared memory indicated deleted

12c Parallel Execution New Features: Concurrent UNION ALL

Why does index monitoring makes Connor’s scratch his head and charges off to google so many times.

SQL Server:

Learn how to begin unit testing with tSQLt and SQL Server.

‘Temporal’ tables contain facts that are valid for a period of time. When they are used for financial information they have to be very well constrained to prevent errors getting in and causing incorrect reporting.

As big data application success stories (and failures) have appeared in the news and technical publications, several myths have emerged about big data. This article explores a few of the more significant myths, and how they may negatively affect your own big data implementation.

When effective end dates don’t align properly with effective start dates for subsequent rows, what are you to do?

In order to automate the delivery of an application together with its database, you probably just need the extra database tools that allow you to continue with your current source control system and release management system by integrating the database into it.


Ronald Bradford on SQL, ANSI Standards, PostgreSQL and MySQL.

How to Manage the World’s Top Open Source Databases: ClusterControl 1.2.9 Features Webinar Replay

A few interesting findings on MariaDB and MySQL scalability, multi-table OLTP RO

MariaDB: The Differences, Expectations, and Future

How to Tell If It’s MySQL Swapping

Categories: DBA Blogs

Pythian at Collaborate 15

Fri, 2015-03-27 15:05

Make sure you check out Pythian’s speakers at Collaborate 15. Stop by booth #1118 for a chance meet some of Pythian’s top Oracle experts, talk shop, and ask questions. This many Oracle experts in one place only happens once a year, have a look at our list of presenters, we think you’ll agree.

Click here to view a PDF of our presenters


Pythian’s Collaborate 15 Presenters | April 12 – 16 | Mandalay Bay Resort and Casino, Las Vegas


Christo Kutrovsky | ATCG Senior Consultant | Oracle ACE


Maximize Exadata Performance with Parallel Queries

Wed. April 15 | 10:45 AM – 11:45 AM | Room Banyan D


Big Data with Exadata

Thu. April 16 | 12:15 PM – 1:15 PM | Room Banyan D


Deiby Gomez Robles | Database Consultant | Oracle ACE


Oracle Indexes: From the Concept to Internals

Tue. April 14 | 4:30 PM – 5:30 PM | Room Palm C


Marc Fielding | ATCG Senior Consultant


Ensuring 24/7 Availability with Oracle Database Application Continuity

Mon. April 13 | 2:00 PM – 3:00 PM | Room Palm D


Using Oracle Multi-tenant to Efficiently Manage Development and Test Databases

Tue. April 14 | 11:00 AM – 12:00 PM | Room Palm C


Maris Elsins | Oracle Application DBA | Oracle ACE

Mining the AWR: Alternative Methods for Identification of the Top SQLs in Your Database

Tue. April 14 | 3:15 PM – 4:15 PM | Room Palm B


Ins and Outs of Concurrent Processing Configuration in Oracle e-Business Suite

Wed. April 15 | 8:00 AM – 9:00 AM | Room Breakers B


DB12c: All You Need to Know About the Resource Manager

Thu. April 16 | 9:45 AM – 10:45 AM | Room Palm A


Alex Gorbachev | CTO | Oracle ACE Director


Using Hadoop for Real-time BI Queries

Tue, April 14 | 9:45 AM – 10:45 AM | Room Jasmine E


Using Oracle Multi-tenant to Efficiently Manage Development and Test Databases

Tue, April 14 | 11:00 AM – 12:00 PM | Room Palm C


Anomaly Detection for Database Monitoring

Thu, April 16 | 11:00 AM – 12:00 PM | Room Palm B


Subhajit Das Chaudhuri | Team Manager


Deep Dive: Integration of Oracle Applications R12 with OAM 11g, OID 11g , Microsoft AD and WNA

Tue, April 14 | 3:15 PM – 4:15 PM | Room Breakers D


Simon Pane | ATCG Senior Consultant | Oracle Certified Expert


Oracle Service Name Resolution – Getting Rid of the TNSNAMES.ORA File!

Wed, April 15 | 9:15 AM – 10:15 AM | Room Palm C


René Antunez | Team  Manager | Oracle ACE


Architecting Your Own DBaaS in a Private Cloud with EM12c

Mon. April 13 | 9:15 AM – 10:15 AM | Room Reef F


Wait, Before We Get the Project Underway, What Do You Think Database as a Service Is…

Mon, Apr 13 | 03:15 PM – 04:15 PM | Room Reef F


My First 100 days with a MySQL DBMS

Tue, Apr 14 | 09:45 AM – 10:45 AM | Room Palm A


Gleb Otochkin | ATCG Senior Consultant | Oracle Certified Expert


Your Own Private Cloud

Wed. April 15 | 8:00 AM – 9:00 AM | Room Reef F


Patching Exadata: Pitfalls and Surprises

Wed. April 15 | 12:00 PM – 12:30 PM | Room Banyan D


Second Wind for Your exadata

Tue. April 14 | 12:15 PM – 12:45 PM | Room Banyan C


Michael Abbey | Team Manager, Principal Consultants | Oracle ACE


Working with Colleagues in Distant Time Zones

Mon, April 13 | 12:00 PM – 12:30 PM | Room North Convention, South Pacific J


Manage Referential Integrity Before It Manages You

Tue, April 14 | 2:00 PM – 3:00 PM | Room Palm C


Nothing to BLOG About – Think Again

Wed, April 15 | 7:30 PM – 8:30 PM | Room North Convention, South Pacific J


Do It Right; Do It Once. A Roadmap to Maintenance Windows

Thu, April 16 | 11:00 AM – 12:00 PM | Room North Convention, South  Pacific J

Categories: DBA Blogs

Oracle Database 12c In-Memory Q&A Webinar

Thu, 2015-03-26 09:21

Today I will be debating Oracle 12c’s In-Memory option with Maria Colgan of Oracle (aka optimizer lady, now In-Memory lady).

This will be in a debate form with lots of Q&A from the audience. Come ask the questions you always wanted to ask.

Link to register and attend:

Starts at 12:00pm EDT.

Categories: DBA Blogs

Free Apache Cassandra Training Event in Cambridge, MA March 23

Fri, 2015-03-20 14:24

I’ll be speaking, along with DataStax and Microsoft representatives at Cassandra Essentials Day this coming Monday (March 23) in Cambridge. MA. This free training event will cover the basics of Apache Cassandra and show you how to try it out quickly, easily, and free of charge on the Azure cloud. Expect to learn about the unique aspects of Cassandra and DataStax Enterprise and to dive into real-world use cases.

Space is limited, so register online to reserve a spot.

Categories: DBA Blogs

My Co-op Experience at Pythian

Fri, 2015-03-20 06:30
That's me in front of our office. I promise there is a bigger Pythian logo!

That’s me in front of our office. I promise there is a bigger Pythian logo!

Unlike most other engineering physics students at Carleton who prefer to remain within the limits of engineering, I had chosen to apply for a software developer co-op position at Pythian in 2014. For those of you who do not know much about the engineering physics program (I get that a lot and so I will save you the trip to Google and tell you), this is how Stanford University describes their engineering physics program: “Engineering Physics prepares students to apply physics to tackle 21st century engineering challenges and to apply engineering to address 21st century questions in physics.” As you can imagine, very little to do with software development. You might ask, then why apply to Pythian?

Programming is changing the way our world functions. Look at the finance sectors: companies rely on complicated algorithms to determine where they should be investing their resources which in turn determines the course of growth for the company. In science and technology, algorithms help us make sense of huge amounts of unstructured data which would otherwise take us years to process, and help us understand and solve many or our 21st century problems. Clearly, learning how to write these algorithms or code cannot be a bad idea, rather, one that will be invaluable. A wise or a not so wise man once said, (you will know what I mean if you have seen the movie iRobot): “If you cannot solve a problem, make a program that can.” In a way, maybe I intend to apply physics to tackle all of 21st century problems by writing programs. (That totally made sense in my head).

Whatever it might be, my interest in programming or my mission to somehow tie physics, engineering, and programming together, I found myself looking forward to an interview with Pythian. I remember having to call in for a Skype interview. While waiting for my interviewers to join the call, I remember thinking about all the horror co-op stories I had heard: How you will be given piles of books to read over your work term (you might have guessed from this blog so far, not much of a reader, this one. If I hit 500 words, first round’s on me!). Furthermore, horror stories of how students are usually labeled as a co-op and given no meaningful work at all.

Just as I was drifting away in my thoughts, my interviewers joined the call. And much to my surprise they were not the traditional hiring managers in their formal dresses making you feel like just another interviewee in a long list of interviewees. Instead they were warm and friendly people who were genuinely interested in what I could offer to the company as a co-op student. The programming languages I knew, which one was my favourite, the kind of programs I had written, and more. They clearly stated the kind of work I could expect as a co-op student, which was exactly the same kind of work that the team was going to be doing. And most importantly, my interviewers seemed to be enjoying the kind of work they do and the place they work at.

So, when I was offered the co-op position at Pythian. I knew I had to say yes!

My pleasant experience with Pythian has continued ever since. The most enjoyable aspect of my work has been the fact that I am involved in a lot of the team projects which means I am always learning something new and gaining more knowledge each day, after each project. I feel that in an industry like this, the best way to learn is by experience and exposure. At Pythian that is exactly what I am getting.

And if those are not good enough reasons to enjoy working for this company, I also have the privilege of working with some extremely experienced and knowledgeable people in the web development industry. Bill Gates had once suggested that he wants to hire the smartest people at Microsoft and surround himself with them. This would create an environment where everyone would learn from each other and excel in their work. And I agree with that. Well now if you are the next Bill Gates, go ahead, create your multibillion dollar company and hire the best of the best and immerse yourself in the presence of all that knowledge and intelligence. But I feel I have found myself a great alternative, a poor man approach, a student budget approach or whatever you want to call it, take full advantage of working with some really talented people and learn as much as you can.

Today, five months into my yearlong placement with Pythian, I could not be more sure and proud of becoming a part of this exciting company, becoming a Pythianite. And I feel my time spent in this company has put me well in course to complete my goal of tying physics, engineering and programming together.

Categories: DBA Blogs

Log Buffer #415, A Carnival of the Vanities for DBAs

Fri, 2015-03-20 06:25

This Log Buffer Edition covers the Oracle, SQL Server and MySQL with a keen look on the novel ideas.


The case was to roll forward a physical standby with an RMAN SCN incremental backup taken from primary.

Oracle Database 12c: Smart upgrade

This blog covers how to specify query parameters using the REST Service Editor.

Production workloads blend Cloud and On-Premise Capabilities


SQL Server:

Mail Fails with SQLCMD Error

How to get Database Design Horribly Wrong

Using the ROLLUP, CUBE, and GROUPING SETS Operators

The Right and Wrong of T-SQL DML TRIGGERs (SQL Spackle)

How converting extensive, repetitive code to a data-driven approach resolved a maintenance headache and helped identify bugs


Distributing innodb tables made simpler!

Choosing a good sharding key in MongoDB (and MySQL)

Update a grails project from version 2.3.8 to version 2.4.4

MySQL Enterprise Backup 3.12.0 has been released

If table is partitioned it makes it easy to maintain. Table has grown so huge and the backups are just keep running long then probably you need to think of archival or purge.

Categories: DBA Blogs

Using strace to debug application errors in linux

Fri, 2015-03-20 06:24

strace is a very useful tool which traces system calls and signals for a running process. This helps a lot while debugging application level performance issues and bugs. Aim of this post is to demonstrate the power of strace in pinning down an application bug.

I came across an issue in which nagios was sending the following alerts for a RHEL6 system.

***** Nagios ***** Notification Type: PROBLEM Service: NTP Host: xxxxx Address: xx.xx.xx.xx State: UNKNOWN Date/Time: Tue Feb 17 10:08:36 EST 2015 Additional Info: cant create socket connection

On manually executing the nagios plugin on the affected system, we can see that the command is not running correctly.

# /usr/lib64/nagios/plugins/check_ntp_time -H localhost -w 1 -c 2
can’t create socket connection

I ran strace on the command. This would create a file /tmp/strace.out with strace output.

# strace -xvtto /tmp/strace.out /usr/lib64/nagios/plugins/check_ntp_time -H localhost -w 1 -c 2

Following are the options which I passed.

-x Print all non-ASCII strings in hexadecimal string format.
-v Print unabbreviated versions of environment, stat, termios, etc. calls. These structures
are very common in calls and so the default behavior displays a reasonable subset of struc?
ture members. Use this option to get all of the gory details.
-tt If given twice, the time printed will include the microseconds.
-o filename Write the trace output to the file filename rather than to stderr. Use if -ff
is used. If the argument begins with `|’ or with `!’ then the rest of the argument is
treated as a command and all output is piped to it. This is convenient for piping the
debugging output to a program without affecting the redirections of executed programs.

Time stamps displayed with -tt option is not very useful in this example, but it is very useful while debugging application performance issues. -T which shows the time spend in each system call is also useful for those issues.

From strace output,

10:26:11.901173 socket(PF_INET, SOCK_DGRAM, IPPROTO_IP) = 3
10:26:11.901279 connect(3, {sa_family=AF_INET, sin_port=htons(123), sin_addr=inet_addr(“″)}, 16) = 0
10:26:11.901413 getsockname(3, {sa_family=AF_INET, sin_port=htons(38673), sin_addr=inet_addr(“″)}, [16]) = 0
10:26:11.901513 close(3) = 0
10:26:11.901621 socket(PF_INET6, SOCK_DGRAM, IPPROTO_IP) = 3
10:26:11.901722 connect(3, {sa_family=AF_INET6, sin6_port=htons(123), inet_pton(AF_INET6, “::1″, &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = -1 ENETUNREACH (Network is unreachable) <—————-
10:26:11.901830 close(3) = 0
10:26:11.901933 socket(PF_INET, SOCK_DGRAM, IPPROTO_UDP) = 3
10:26:11.902033 connect(3, {sa_family=AF_INET, sin_port=htons(123), sin_addr=inet_addr(“″)}, 16) = 0
10:26:11.902130 socket(PF_INET6, SOCK_DGRAM, IPPROTO_UDP) = 4
10:26:11.902238 connect(4, {sa_family=AF_INET6, sin6_port=htons(123), inet_pton(AF_INET6, “::1″, &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = -1 ENETUNREACH (Network is unreachable) <—————-
10:26:11.902355 fstat(1, {st_dev=makedev(0, 11), st_ino=3, st_mode=S_IFCHR|0620, st_nlink=1, st_uid=528, st_gid=5, st_blksize=1024, st_blocks=0, st_rdev=makedev(136, 0), st_atime=2015/02/17-10:26:11, st_mtime=2015/02/17-10:26:11, st_ctime=2015/02/17-10:16:32}) = 0
10:26:11.902490 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fc5a8752000
10:26:11.902608 write(1, “can’t create socket connection”, 30) = 30

Let us have a deeper look,

You can see that socket() is opening a socket with PF_INET (IP v4) domain and IPPROTO_IP (tcp) protocol. This returns file descriptor 3. Then connect() is connecting to the socket using the same file descriptor and connects to ntp port (123) in localhost. Then it calls getsockname and closes the file descriptor for the socket.

10:26:11.901173 socket(PF_INET, SOCK_DGRAM, IPPROTO_IP) = 3
10:26:11.901279 connect(3, {sa_family=AF_INET, sin_port=htons(123), sin_addr=inet_addr(“″)}, 16) = 0
10:26:11.901413 getsockname(3, {sa_family=AF_INET, sin_port=htons(38673), sin_addr=inet_addr(“″)}, [16]) = 0
10:26:11.901513 close(3) = 0

Next it does the same but with PF_INET6 (IP v6) domain. But you can see that connect() fails with ENETUNREACH.

10:26:11.901621 socket(PF_INET6, SOCK_DGRAM, IPPROTO_IP) = 3
10:26:11.901722 connect(3, {sa_family=AF_INET6, sin6_port=htons(123), inet_pton(AF_INET6, “::1″, &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = -1 ENETUNREACH (Network is unreachable) <—————-
10:26:11.901830 close(3)

From connect man page,

Network is unreachable.

This process is repeated with IPPROTO_UDP (udp) protocol as well.

On checking the system, I see that that only IPv4 is enabled. ‘inet6 addr’ line is missing.

[root@pbsftp ~]# ifconfig
eth0 Link encap:Ethernet HWaddr 00:50:56:90:2E:31
inet addr:xx.xx.xx.xx Bcast:xx.xx.xx.xx Mask:xx.xx.xx.xx <——————–
RX packets:5494691 errors:0 dropped:0 overruns:0 frame:0
TX packets:4014672 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:5877759230 (5.4 GiB) TX bytes:5608605924 (5.2 GiB)

IPv6 is disabled in the system using following /etc/sysctl.conf entries.

net.ipv6.conf.all.disable_ipv6 = 1

This behavior of nagios plugin is wrong as it should not die when one of the connect fails.

Issue is fixed in upstream patch.

Enabling IPv6 by removing following entries from /etc/sysctl.conf and running ‘sysctl -p’ would act as a workaround.

net.ipv6.conf.all.disable_ipv6 = 1

To fix the issue, the upstream patch need to be either backported manually to create an rpm or a support ticket need to be opened with the operating system vendor to backport the patch in their product release.

Categories: DBA Blogs

MySQL Sounds Like Fun

Mon, 2015-03-16 07:48

I love finding out new things about MySQL. Last week, I stumbled on a query that had the phrase “SOUNDS LIKE” in it. Sounds made-up, right? Turns out MySQL is using a known “soundex” algorithm common to most databases, and popular in use cases in geneaology.

The basic idea is that words are encoded according to their consonants. Consonants that sound similar (like M and N) are given the same code. Here’s a simple example:

(“soundex” and “sounds like” are different ways of doing the same thing in these queries)

MariaDB> select soundex("boom");
| soundex("boom") |
| B500            |

MariaDB> select soundex("bam");
| soundex("bam") |
| B500           |

MariaDB> select soundex("bin");
| soundex("bin") |
| B500           |

This simple example isn’t terribly useful, but if you were trying to find similar, but differently spelled, names across continents, it could be helpful:

MariaDB> select soundex("William");
| soundex("William") |
| W450               |

MariaDB> select soundex("Walaam");
| soundex("Walaam") |
| W450              |

MariaDB> select soundex("Willem");
| soundex("Willem") |
| W450              |

MariaDB> select soundex("Williama");
| soundex("Williama") |
| W450                |

And you could probably agree these variations match as well:

MariaDB> select soundex("Guillaume");
| soundex("Guillaume") |
| G450                 |

MariaDB> select soundex("Uilleam");
| soundex("Uilleam") |
| U450               |

MariaDB> select soundex("Melhem");
| soundex("Melhem") |
| M450              |

MariaDB> select soundex("Uilliam");
| soundex("Uilliam") |
| U450               |

Well, that’s pretty neat. Of course, I want to try the silliest word I can think of:

MariaDB> select soundex("supercalifragilisticexpealidocious");
| soundex("supercalifragilisticexpealidocious") |
| S162416242321432                              |

So the algorithm doesn’t stop at 3 digits; good to know.

What does the algorithm do? Luckily MySQL is open source, and so we can look in the source code:

This looks like the raw mapping. And then this is called into a function that loops through the characters in the word.

/* :::::::::::::::::::::::::: */
const char *soundex_map= "01230120022455012623010202";

Note that even though it’s called “sounds like” it is really simply a character mapping based on an agreement by the developers’ ears which characters sounds similar. That’s of course an oversimplification, and I see in the code comments the following:

* *
* The basic Algorithm source is taken from EDN Nov. *
* 14, 1985 pg. 36. *

But despite hitting up several librarians, I can’t seem to get a copy of this. Someone out there has a copy sitting around, right?

As a side note, this is obviously specific to the English language. I found references to German and other languages having soundex mappings, and would be curious to see those and hear of any language-specific ways to do this.

Curiosity aside, here’s a real use.

I pulled down some government climate data. Let’s say the location field has some of my favorite misspellings of “Durham” in it:

MariaDB [weather]> select distinct(two), count(two) from weather.temps group by two;
| two                                        | count(two) |
| NULL                                       |          0 |
| DRM                                        |         51 |
| DURHAM                                     |    1101887 |
| DURM                                       |         71 |
| NCSU                                       |    1000000 |
| RDU AIRPORT                                |    1000000 |

A “LIKE” clause won’t work terribly well here.

I confirmed the misspellings would match as I expected:

MariaDB [weather]> select soundex("Durham"), soundex("Durm"), soundex("DRM");
| soundex("Durham") | soundex("Durm") | soundex("DRM") |
| D650              | D650            | D650           |

So instead of manually creating a query like:

MariaDB [weather]> select count(two) from weather.temps where two='DRM' or two='DURHAM' or two='DURM';
| count(two) |
|    1102009 |

I can simply do this:

MariaDB [weather]> select count(two) from weather.temps where two sounds like 'Durham';
| count(two) |
|    1102009 |

There are more than several ways to do string comparisons, but I enjoyed finding this one.

(Bonus points will be granted to the first person who comments that RDU is also Durham and submits a unique query to include it in the count.)

Categories: DBA Blogs

Monitoring Cassandra with Grafana and Influx DB

Mon, 2015-03-16 07:37


In this post I will explain how to set up Cassandra monitoring with influxDB and Grafana. This can also be used to connect to other monitoring systems (Graphite, Collectd, etc…) but since both influxDB and Grafana are hot topics at the moment I decided to follow the trend! I was asked why I was doing this when a tool like OpsCenter is available, but sometimes you want to have all your systems reporting to a single dashboard. And if your dashboard is Grafana and your Backend is influxDB then you will learn how to connect Cassandra to it!

– You are running a Linux system (This post is based on CentOS 7)
– You are using Cassandra 1.2+ (I’m using 2.1.3 in this case)

  • Cassandra Installation
  • Graphite Metrics Jar
  • influxDB –
  • Grafana –
  • Apache (Any webserver would do)
Installing and configure influxDB

This one is dead easy, once you have the package install it (rpm -i, dpkg -i). Start the service:

service influxdb start

Once the service is running, go to the configuration (/opt/influxdb/shared/config.toml) and edit the file so that under [input_plugins] it looks like this:

# Configure the graphite api
enabled = true
# address = "" # If not set, is actually set to bind-address.
port = 2003
database = "cassandra-metrics" # store graphite data in this database
udp_enabled = true

Save the file, reload the service:

service influxdb reload

Now go to your browser localhost:8083, click connect (no credentials should be needed), and after you logged in, enter in a database name (use cassandra-metrics) and click Create (This should be your only option). Now you can click the database, and add an user to it (and make it admin). Now create another database, with name “grafana”, create an admin for that database also.
Now you are done with influxDB.

Installing Grafana

Grafana is a bit more tricky, since it is needed to configure a webserver also. Let’s assume apache is installed, and the home directory for www is /var/www/html.

So get the grafana package and extract it to /var/www/html. So the end result should be something like /var/www/html/grafana.

Now do the following:

cd /var/www/html/grafana
cp config.sample.js config.js

Now let’s configure the connection between influXDB and Grafana. Open for edit the new copied file config.js and edit it so it looks like this:

datasources: {
  influxdb: {
    type: 'influxdb',
    url: "http://localhost:8086/db/cassandra-metrics",
    username: 'admin',
    password: 'admin',
  grafana: {
    type: 'influxdb',
    url: "http://localhost:8086/db/grafana",
    username: 'admin',
    password: 'admin',
    grafanaDB: true

Now redirect your browser to localhost/grafana and you will have the Grafana default dashboard.

Preparing Cassandra

Now the final piece of the puzzle. Now we follow more or less the Cassandra guide that exists here, but we need to make some changes to make it more valuable (and allow multiple nodes to provide metrics).

So, first of all, put the metrics-graphite-2.2.0.jar in all the Cassandra nodes /lib directory.
Now create a yaml file with similar to the Datastax example, lets call it influx-reporting.yaml and store it on /conf directory. Now edit the file again so it looks like this:

  period: 60
  timeunit: 'SECONDS'
  prefix: 'Node1'
  - host: 'localhost'
    port: 2003
    color: "white"
    useQualifiedName: true
    - ".*"

What did we change here, first we added a prefix field, this will allow us to identify the node that is providing the metrics. It must be different for every node, otherwise the metrics will overwrite/mix with each other. Then we decided to allow all patterns (“.*”), this means that Cassandra will push out all the metrics into influxDB. You can decide whether or not this is too much and just enable the metrics you want (find out more about it here).

Now edit the so that it will read the yaml file to provide the metrics. Add the following line to the end of the file:

JVM_OPTS="$JVM_OPTS -Dcassandra.metricsReporterConfigFile=influx-reporting.yaml"

If all is done correctly, you can restart the Cassandra node (or nodes, but don’t do it all at the same time, 2min between each is recommended) and if the log file has the following message:

INFO [main] YYYY-MM-DD HH:MM:SS,SSS - Trying to load metrics-reporter-config from file: inf
INFO [main] YYYY-MM-DD HH:MM:SS,SSS - Enabling GraphiteReporter to localhost:2003

All is good!


Grafana is not that difficult to use, so once you start exploring a bit (And reading the documentation) you will find out doing nice graphs. This could be a long post only about graphing out, so I’m just go and post some images of the graphs I’m getting out of Grafana so that you can see how it can be powerful and help you on keeping your Cassandra Healthy.

Grafana_cassandra-test3 Grafana_cassandra-test2 Grafana_cassandra-test1
Categories: DBA Blogs

Cassandra 101 : Understanding What Cassandra Is

Mon, 2015-03-16 07:35

As some of you may know, in my current role at Pythian, I am tackling OSDB and currently Cassandra is on my radar. So one of the things I have been trying to do is learn what Cassandra is, so in this series, I’m going to share a bit of what I have been able to learn.

According to the whitepaper “Solving Big Data Challenges for Enterprise Application Performance Management” , Cassandra is a “distributed key value store developed at Facebook. It was designed to handle very large amounts of data spread out across many commodity servers while providing a highly available service without single point of failure allowing replication even across multiple data centers as well as for choosing between synchronous or asynchronous replication for each update.”

Cassandra, in layman’s terms, is a NoSQL database developed in JavaOne. One of Cassandra’s many benefits is that it’s an open source DB with deep developer support. It is also a fully distributed DB, meaning that there is no master DB, unlike Oracle or MySQL, so this allows this database to have no point of failure. It also touts being linearly scalable, meaning that if you have 2 nodes and a throughput of 100,000 transactions per second, and you added 2 more nodes, you would now get 200,000 transactions per second, and so forth.


Cassandra is based on 2 core technologies, Google’s Big Table and Amazon’s Dynamo, which Facebook uses to power their Inbox Search feature and released it as an open source project on Google Code and then incubated at Apache, and is nowadays a Top-Level-Project. Currently there exists 2 versions of Cassandra:

Since Cassandra is a distributed system, it follows the CAP Theorem, which is awesomely explained here, and it states that, in a distributed system, you can only have two out of the following three guarantees across a write/read pair:

  • Consistency.- A read is guaranteed to return the most recent write for a given client.
  • Availability.-A non-failing node will return a reasonable response within a reasonable amount of time (no error or timeout).
  • Partition Tolerance.-The system will continue to function when network partitions occur.

Also Cassandra is a BASE (Basically Available, Soft state, Eventually consistent) type system, not an ACID (Atomicity, Consistency, Isolation, Durability) type system, meaning that the system is optimistic and accepts that the database consistency will be in a state of flux, not like ACID which is pessimistic and it forces consistency at the end of every transaction.

Cassandra stores data according to the column family data model where:

  • Keyspace is the container for your application data, similar to a schema in a relational database. Keyspaces are used to group column families together. Typically, a cluster has one keyspace per application.It also defines the replication strategy and data objects belong to a single keyspace
  • Column Family is a set of  one,two or more individual rows with a similar structure
  • Row is a collection of sorted columns, it is the the smallest unit that stores related data in Cassandra, and any component of a Row can store data or metadata
    •  Row Key uniquely identifies a row in a column family

      •  Column key uniquely identifies a column value in a row
      •  Column value stores one value or a collection of values

Also we need to understand the basic architecture of Cassandra, which has the following key structures:

  • Node is one Cassandra instance and is the basic infrastructure component in Cassandra. Cassandra assigns data to nodes in the cluster, each node is assigned a part of the database based on the Row Key. Usually corresponds to a host, but not necessarily, specially in Dev or Test environments.
  • Rack is a logical set of nodes
  • Data Center is a logical set of Racks, a data center can be a physical data center or virtual data center. Replication is set by data center
  • Cluster contains one or more data centers and is the full set of nodes which map to a single complete token ring


Hopefully this will help you understand the basic Cassandra concepts. In the next series, I will go over architecture concepts of what a Seed node is, the purpose of the Snitch and topologies, the Coordinator node, replication factors, etc

Note 1:

André Araújo, a great friend of mine and previous Pythianite, wrote about his first experience with Cassandra : My First Experience with Cassandra – Part 1

Note 2:

This post was originally published in my personal blog:

Categories: DBA Blogs

Log Buffer #414, A Carnival of the Vanities for DBAs

Mon, 2015-03-16 07:22

This Log Buffer Edition picks the sea shells from Blogs across the seas of Oracle, SQL Server and MySQL and arrange them for you in this Edition. Enjoy.


12c Parallel Execution New Features: Concurrent UNION ALL

Visualizing Statspack Performance Data in SQL Developer

Optimizer statistics – Gathering Statistics and Histograms

Big Data Made Actionable with Omar TawaKol at SXSW

Mobile backend with REST services and JSON payload based on SOA Suite 12c

SQL Server:

Setting Different Colors for Connections in SSMS

Defusing Database Time Bombs: Avoiding the Need to Refactor Databases

This article shows a step by step tutorial to create a virtual machine in 15 min on Windows Azure.

What SQL Statements Are Currently Using The Transaction Logs?

SQL Server Random Sorted Result Set


Oracle Linux 7.1 and MySQL 5.6

MySQL Workbench 6.3.2 RC has been released

MariaDB CONNECT storage engine now offers access to JSON

Avoiding MySQL ERROR 1052 by prefixing column names in multi table queries

MySQL 5.7.6 DMR: Packages, Repos, Docker Images

Categories: DBA Blogs

Log Buffer #413, A Carnival of the Vanities for DBAs

Mon, 2015-03-09 21:15

This Log Buffer Editions scours the Internet and brings some of the fresh blog posts from Oracle, SQL Server and MySQL.


Most of Kyles’ servers tend to be Linux VMs on VMware ESX without any graphics desktops setup, so it can be disconcerting trying to install Oralce with it’s graphical “runInstaller” being the gate way we have to cross to achieve installation.

Working around heatbeat issues caused by tracing or by regexp

APEX 5 EA Impressions: Custom jQuery / jQuery UI implementations

Introduction to the REST Service Editor, Generation (PART 2)

Due to recent enhancements and importance within Oracle’s storage portfolio, StorageTek Storage Archive Manager 5.4 (SAM-QFS) has been renamed to Oracle Hierarchical Storage Manager (Oracle HSM) 6.0.

SQL Server:

There are different techniques to optimize the performance of SQL Server queries but wouldn’t it be great if we had some recommendations before we started planning or optimizing queries so that we didn’t have to start from the scratch every time? This is where you can use the Database Engine Tuning Advisor utility to get recommendations based on your workload.

Data Mining Part 25: Microsoft Visio Add-Ins

Stairway to Database Source Control Level 3: Working With Others (Centralized Repository)

SQL Server Hardware will provide the fundamental knowledge and resources you need to make intelligent decisions about choice, and optimal installation and configuration, of SQL Server hardware, operating system and the SQL Server RDBMS.

Questions About SQL Server Transaction Log You Were Too Shy To Ask


The post shows how you can easily read the VCAP_SERVICES postgresql credentials within your Java Code using the maven repo. This assumes your using the ElephantSQL Postgresql service. A single connection won’t be ideal but for demo purposes might just be all you need.

MariaDB 5.5.42 Overview and Highlights

How to test if CVE-2015-0204 FREAK SSL security flaw affects you

Using master-master for MySQL? To be frankly we need to get rid of that architecture. We are skipping the active-active setup and show why master-master even for failover reasons is the wrong decision.

Resources for Highly Available Database Clusters: ClusterControl Release Webinar, Support for Postgres, New Website and More

Categories: DBA Blogs

Recovering an Oracle Database with Missing Redo

Mon, 2015-03-09 21:14

I ran into a situation where we needed to recover from an old online backup which (due to some issues with the RMAN “KEEP” command) was missing the archived redo log backups/files needed to make the backup consistent.  The client wasn’t concerned about data that changed during the backup, they were interested in checking some very old data from long before this online backup had started.

Visualizing the scenario using a timeline (not to scale):

  t0      t1                 t2        t3                 t4
          Data is added                                   Present

The client thought that some data had become corrupted and wasn’t sure when but knew that it wasn’t recently so the flashback technologies were not an option.  Hence they wanted a restore of the database into a new temporary server as of time t1 which was in the distant past.

An online (hot) backup was taken between t2 and t3 and was considered to be old enough or close enough to t1 however the problem was that all archived redo log backups were missing. The client was certain that the particular data they were interested in would not have change during the online backup.

Hence the question is: without the necessary redo data to make the online backup consistent (between times t2 and t3) can we still open the database to extract data from prior to when the online backup began?  The official answer is “no” – the database must be made consistent to be opened.  And with an online backup the redo stream is critical to making the backed up datafiles consistent.  So without the redo vectors in the redo stream, the files cannot be made consistent with each other and hence the database cannot be opened.  However the unofficial, unsupported answer is that it can be done.

This article covers the unsupported and unofficial methods for opening a database with consistency corruption so that certain data can be extracted.

Other scenarios can lead to the same situation.  Basically this technique can be used to open the Oracle database any time the datafiles cannot be made consistent.


Demo Setup

To illustrate the necessary steps I’ve setup a test 12c non-container database called NONCDB.  And to simulate user transactions against it I ran a light workload using the Swingbench Order Entry (SOE) benchmark from another computer in the background.

Before beginning any backups or recoveries I added two simple tables to the SCOTT schema and some rows to represent the “old” data (with the words “OLD DATA” in the C2 column):

SQL> create table scott.parent (c1 int, c2 varchar2(16), constraint parent_pk primary key (c1)) tablespace users;

Table created.

SQL> create table scott.child (c1 int, c2 varchar2(16), foreign key (c1) references scott.parent(c1)) tablespace soe;

Table created.

SQL> insert into scott.parent values(1, 'OLD DATA 001');

1 row created.

SQL> insert into scott.parent values(2, 'OLD DATA 002');

1 row created.

SQL> insert into scott.child  values(1, 'OLD DETAILS A');

1 row created.

SQL> insert into scott.child  values(1, 'OLD DETAILS B');

1 row created.

SQL> insert into scott.child  values(1, 'OLD DETAILS C');

1 row created.

SQL> insert into scott.child  values(2, 'OLD DETAILS D');

1 row created.

SQL> commit;

Commit complete.



Notice that I added a PK-FK referential integrity constraint and placed each table is a different tablespace so they could be backed up at different times.

These first entries represent my “old data” from time t1.


The Online Backup

The next step is to perform the online backup.  For simulation purposes I’m adjusting the steps a little bit to try to represent a real life situation where the data in my tables is being modified while the backup is running.  Hence my steps are:

  • Run an online backup of all datafiles except for the USERS tablespace.
  • Add some more data to my test tables (hence data going into the CHILD table is after the SOE tablespace backup and the data into the PARENT table is before the USERS tablespace backup).
  • Record the current archived redo log and then delete it to simulate the lost redo data.
  • Backup the USERS tablespace.
  • Add some post backup data to the test tables.

The actual commands executed in RMAN are:

$ rman

Recovery Manager: Release - Production on Thu Feb 26 15:59:36 2015

Copyright (c) 1982, 2014, Oracle and/or its affiliates.  All rights reserved.

RMAN> connect target

connected to target database: NONCDB (DBID=1677380280)

RMAN> backup datafile 1,2,3,5;

Starting backup at 26-FEB-15
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=46 device type=DISK
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00005 name=/u01/app/oracle/oradata/NONCDB/datafile/SOE.dbf
input datafile file number=00001 name=/u01/app/oracle/oradata/NONCDB/datafile/o1_mf_system_b2k8dsno_.dbf
input datafile file number=00002 name=/u01/app/oracle/oradata/NONCDB/datafile/o1_mf_sysaux_b2k8f3d4_.dbf
input datafile file number=00003 name=/u01/app/oracle/oradata/NONCDB/datafile/o1_mf_undotbs1_b2k8fcdm_.dbf
channel ORA_DISK_1: starting piece 1 at 26-FEB-15
channel ORA_DISK_1: finished piece 1 at 26-FEB-15
piece handle=/u01/app/oracle/fast_recovery_area/NONCDB/backupset/2015_02_26/o1_mf_nnndf_TAG20150226T155942_bgz9ol3g_.bkp tag=TAG20150226T155942 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:11:16
Finished backup at 26-FEB-15

Starting Control File and SPFILE Autobackup at 26-FEB-15
piece handle=/u01/app/oracle/fast_recovery_area/NONCDB/autobackup/2015_02_26/o1_mf_s_872698259_bgzb0647_.bkp comment=NONE
Finished Control File and SPFILE Autobackup at 26-FEB-15

RMAN> alter system switch logfile;

Statement processed

RMAN> commit;

Statement processed

RMAN> alter system switch logfile;

Statement processed

RMAN> insert into scott.parent values (3, 'NEW DATA 003');

Statement processed

RMAN> insert into scott.child  values (3, 'NEW DETAILS E');

Statement processed

RMAN> commit;

Statement processed

RMAN> select sequence# from v$log where status='CURRENT';


RMAN> alter system switch logfile;

Statement processed

RMAN> alter database backup controlfile to '/tmp/controlfile_backup.bkp';

Statement processed

RMAN> backup datafile 4;

Starting backup at 26-FEB-15
using channel ORA_DISK_1
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00004 name=/u01/app/oracle/oradata/NONCDB/datafile/o1_mf_users_b2k8gf7d_.dbf
channel ORA_DISK_1: starting piece 1 at 26-FEB-15
channel ORA_DISK_1: finished piece 1 at 26-FEB-15
piece handle=/u01/app/oracle/fast_recovery_area/NONCDB/backupset/2015_02_26/o1_mf_nnndf_TAG20150226T165814_bgzdrpmk_.bkp tag=TAG20150226T165814 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01
Finished backup at 26-FEB-15

Starting Control File and SPFILE Autobackup at 26-FEB-15
piece handle=/u01/app/oracle/fast_recovery_area/NONCDB/autobackup/2015_02_26/o1_mf_s_872701095_bgzdrrrh_.bkp comment=NONE
Finished Control File and SPFILE Autobackup at 26-FEB-15

RMAN> alter database backup controlfile to '/tmp/controlfile_backup.bkp';

Statement processed

RMAN> insert into scott.parent values (4, 'NEW DATA 004');

Statement processed

RMAN> insert into scott.child  values (4, 'NEW DETAILS F');

Statement processed

RMAN> commit;

Statement processed

RMAN> exit

Recovery Manager complete.


Notice that in the above steps that since I’m using Oracle Database 12c I’m able to execute normal SQL commands from RMAN – this is a RMAN 12c new feature.


Corrupting the Backup

Now I’m going to corrupt my backup by removing one of the archived redo logs needed to make the backup consistent:

SQL> set pages 999 lines 120 trims on tab off
SQL> select 'rm '||name stmt from v$archived_log where sequence#=68;

rm /u01/app/oracle/fast_recovery_area/NONCDB/archivelog/2015_02_26/o1_mf_1_68_bgzcnv04_.arc

SQL> !rm /u01/app/oracle/fast_recovery_area/NONCDB/archivelog/2015_02_26/o1_mf_1_68_bgzcnv04_.arc



Finally I’ll remove the OLD data to simulate the data loss (representing t4):

SQL> select * from scott.parent order by 1;

        C1 C2
---------- ----------------
         1 OLD DATA 001
         2 OLD DATA 002
         3 NEW DATA 003
         4 NEW DATA 004

SQL> select * from scott.child order by 1;

        C1 C2
---------- ----------------
         1 OLD DETAILS A
         1 OLD DETAILS B
         1 OLD DETAILS C
         2 OLD DETAILS D
         3 NEW DETAILS E
         4 NEW DETAILS F

6 rows selected.

SQL> delete from scott.child where c2 like 'OLD%';

4 rows deleted.

SQL> delete from scott.parent where c2 like 'OLD%';

2 rows deleted.

SQL> commit;

Commit complete.



Attempting a Restore and Recovery

Now let’s try to recover from our backup on a secondary system so we can see if we can extract that old data.

After copying over all of the files, the first thing to do is to try a restore as per normal:

$ rman target=/

Recovery Manager: Release - Production on Mon Mar 2 08:40:12 2015

Copyright (c) 1982, 2014, Oracle and/or its affiliates.  All rights reserved.

connected to target database (not started)

RMAN> startup nomount;

Oracle instance started

Total System Global Area    1577058304 bytes

Fixed Size                     2924832 bytes
Variable Size                503320288 bytes
Database Buffers            1056964608 bytes
Redo Buffers                  13848576 bytes

RMAN> restore controlfile from '/tmp/controlfile_backup.bkp';

Starting restore at 02-MAR-15
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=12 device type=DISK

channel ORA_DISK_1: copied control file copy
output file name=/u01/app/oracle/oradata/NONCDB/controlfile/o1_mf_b2k8d9nq_.ctl
output file name=/u01/app/oracle/fast_recovery_area/NONCDB/controlfile/o1_mf_b2k8d9v5_.ctl
Finished restore at 02-MAR-15

RMAN> alter database mount;

Statement processed
released channel: ORA_DISK_1

RMAN> restore database;

Starting restore at 02-MAR-15
Starting implicit crosscheck backup at 02-MAR-15
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=12 device type=DISK
Crosschecked 4 objects
Finished implicit crosscheck backup at 02-MAR-15

Starting implicit crosscheck copy at 02-MAR-15
using channel ORA_DISK_1
Crosschecked 2 objects
Finished implicit crosscheck copy at 02-MAR-15

searching for all files in the recovery area
cataloging files...
cataloging done

using channel ORA_DISK_1

channel ORA_DISK_1: starting datafile backup set restore
channel ORA_DISK_1: specifying datafile(s) to restore from backup set
channel ORA_DISK_1: restoring datafile 00001 to /u01/app/oracle/oradata/NONCDB/datafile/o1_mf_system_b2k8dsno_.dbf
channel ORA_DISK_1: restoring datafile 00002 to /u01/app/oracle/oradata/NONCDB/datafile/o1_mf_sysaux_b2k8f3d4_.dbf
channel ORA_DISK_1: restoring datafile 00003 to /u01/app/oracle/oradata/NONCDB/datafile/o1_mf_undotbs1_b2k8fcdm_.dbf
channel ORA_DISK_1: restoring datafile 00005 to /u01/app/oracle/oradata/NONCDB/datafile/SOE.dbf
channel ORA_DISK_1: reading from backup piece /u01/app/oracle/fast_recovery_area/NONCDB/backupset/2015_02_26/o1_mf_nnndf_TAG20150226T155942_bgz9ol3g_.bkp
channel ORA_DISK_1: piece handle=/u01/app/oracle/fast_recovery_area/NONCDB/backupset/2015_02_26/o1_mf_nnndf_TAG20150226T155942_bgz9ol3g_.bkp tag=TAG20150226T155942
channel ORA_DISK_1: restored backup piece 1
channel ORA_DISK_1: restore complete, elapsed time: 00:01:46
channel ORA_DISK_1: starting datafile backup set restore
channel ORA_DISK_1: specifying datafile(s) to restore from backup set
channel ORA_DISK_1: restoring datafile 00004 to /u01/app/oracle/oradata/NONCDB/datafile/o1_mf_users_b2k8gf7d_.dbf
channel ORA_DISK_1: reading from backup piece /u01/app/oracle/fast_recovery_area/NONCDB/backupset/2015_02_26/o1_mf_nnndf_TAG20150226T165814_bgzdrpmk_.bkp
channel ORA_DISK_1: piece handle=/u01/app/oracle/fast_recovery_area/NONCDB/backupset/2015_02_26/o1_mf_nnndf_TAG20150226T165814_bgzdrpmk_.bkp tag=TAG20150226T165814
channel ORA_DISK_1: restored backup piece 1
channel ORA_DISK_1: restore complete, elapsed time: 00:00:01
Finished restore at 02-MAR-15



Notice that it did restore the datafiles from both the SOE and USERS tablespaces, however we know that those are inconsistent with each other.

Attempting to do the recovery should give us an error due to the missing redo required for consistency:

RMAN> recover database;

Starting recover at 02-MAR-15
using channel ORA_DISK_1

starting media recovery

archived log for thread 1 with sequence 67 is already on disk as file /u01/app/oracle/fast_recovery_area/NONCDB/archivelog/2015_02_26/o1_mf_1_67_bgzcn05f_.arc
archived log for thread 1 with sequence 69 is already on disk as file /u01/app/oracle/fast_recovery_area/NONCDB/archivelog/2015_02_26/o1_mf_1_69_bgzdqo9n_.arc
Oracle Error:
ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below
ORA-01194: file 1 needs more recovery to be consistent
ORA-01110: data file 1: '/u01/app/oracle/oradata/NONCDB/datafile/o1_mf_system_bh914cx2_.dbf'

RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of recover command at 03/02/2015 08:44:21
RMAN-06053: unable to perform media recovery because of missing log
RMAN-06025: no backup of archived log for thread 1 with sequence 68 and starting SCN of 624986 found to restore



As expected we got the dreaded ORA-01547, ORA-01194, ORA-01110 errors meaning that we don’t have enough redo to make the recovery successful.


Attempting a Recovery

Now the crux of the situation. We’re stuck with the common inconsistency error which most seasoned DBAs should be familiar with:

Oracle Error:
ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below
ORA-01194: file 1 needs more recovery to be consistent
ORA-01110: data file 1: '/u01/app/oracle/oradata/NONCDB/datafile/o1_mf_system_bh914cx2_.dbf'

RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of recover command at 03/02/2015 08:44:21
RMAN-06053: unable to perform media recovery because of missing log
RMAN-06025: no backup of archived log for thread 1 with sequence 68 and starting SCN of 624986 found to restore


And of course we need to be absolutely positive that we don’t have the missing redo somewhere.  For example in an RMAN backup piece on disk or on tape somewhere from an archive log backup that can be restored.  Or possibly still in one of the current online redo logs.  DBAs should explore all possible options for retrieving the missing redo vectors in some form or another before proceeding.

However, if we’re absolutely certain of the following we can continue:

  1. We definitely can’t find the missing redo anywhere.
  2. We absolutely need to extract data from prior to the start of the online backup.
  3. Our data definitely wasn’t modified during the online backup.


The natural thing to check first when trying to open the database after an incomplete recovery is the fuzziness and PIT (Point In Time) of the datafiles from SQLPlus:

SQL> select fuzzy, status, checkpoint_change#,
  2         to_char(checkpoint_time, 'DD-MON-YYYY HH24:MI:SS') as checkpoint_time,
  3         count(*)
  4    from v$datafile_header
  5   group by fuzzy, status, checkpoint_change#, checkpoint_time
  6   order by fuzzy, status, checkpoint_change#, checkpoint_time;

----- ------- ------------------ -------------------- ----------
NO    ONLINE              647929 26-FEB-2015 16:58:14          1
YES   ONLINE              551709 26-FEB-2015 15:59:43          4



The fact that there are two rows returned and that not all files have FUZZY=NO indicates that we have a problem and that more redo is required before the database can be opened with the RESETLOGS option.

But our problem is that we don’t have that redo and we’re desperate to open our database anyway.


Recovering without Consistency

Again, recovering without consistency is not supported and should only be attempted as a last resort.

Opening the database with the data in an inconsistent state is actually pretty simple.  We simply need to set the “_allow_resetlogs_corruption” hidden initialization parameter and set the undo management to “manual” temporarily:

SQL> alter system set "_allow_resetlogs_corruption"=true scope=spfile;

System altered.

SQL> alter system set undo_management='MANUAL' scope=spfile;

System altered.

SQL> shutdown abort;
ORACLE instance shut down.
SQL> startup mount;
ORACLE instance started.

Total System Global Area 1577058304 bytes
Fixed Size                  2924832 bytes
Variable Size             503320288 bytes
Database Buffers         1056964608 bytes
Redo Buffers               13848576 bytes
Database mounted.


Now, will the database open? The answer is still: “probably not”.  Giving it a try we get:

SQL> alter database open resetlogs;
alter database open resetlogs
ERROR at line 1:
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00600: internal error code, arguments: [2663], [0], [551715], [0], [562781], [], [], [], [], [], [], []
Process ID: 4538
Session ID: 237 Serial number: 5621



Doesn’t look good, right?  Actually the situation is not that bad.

To put it simply this ORA-00600 error means that a datafile has a recorded SCN that’s ahead of the database SCN.  The current database SCN is shown as the 3rd argument (in this case 551715) and the datafile SCN is shown as the 5th argument (in this case 562781).  Hence a difference of:

562781 - 551715 = 11066

In this example, that’s not too large of a gap.  But in a real system, the difference may be more significant.  Also if multiple datafiles are ahead of the current SCN you should expect to see multiple ORA-00600 errors.

The solution to this problem is quite simple: roll forward the current SCN until it exceeds the datafile SCN.  The database automatically generates a number of internal transactions on each startup hence the way to roll forward the database SCN is to simply perform repeated shutdowns and startups.  Depending on how big the gap is, it may be necessary to repeatedly shutdown abort and startup – the gap between the 5th and 3rd parameter to the ORA-00600 will decrease each time.  However eventually the gap will reduce to zero and the database will open:

SQL> connect / as sysdba
Connected to an idle instance.
SQL> shutdown abort
ORACLE instance shut down.
SQL> startup
ORACLE instance started.

Total System Global Area 1577058304 bytes
Fixed Size                  2924832 bytes
Variable Size             503320288 bytes
Database Buffers         1056964608 bytes
Redo Buffers               13848576 bytes
Database mounted.
Database opened.


Now presumably we want to query or export the old data so the first thing we should do is switch back to automatic undo management using a new undo tablespace:

SQL> create undo tablespace UNDOTBS2 datafile size 50M;

Tablespace created.

SQL> alter system set undo_tablespace='UNDOTBS2' scope=spfile;

System altered.

SQL> alter system set undo_management='AUTO' scope=spfile;

System altered.

SQL> shutdown abort
ORACLE instance shut down.
SQL> startup
ORACLE instance started.

Total System Global Area 1577058304 bytes
Fixed Size                  2924832 bytes
Variable Size             503320288 bytes
Database Buffers         1056964608 bytes
Redo Buffers               13848576 bytes
Database mounted.
Database opened.


Finally the database is opened (although the data is inconsistent) and the “old” data can be queried:

SQL> select * from scott.parent;

        C1 C2
---------- ----------------
         1 OLD DATA 001
         2 OLD DATA 002
         3 NEW DATA 003

SQL> select * from scott.child;

        C1 C2
---------- ----------------
         1 OLD DETAILS A
         1 OLD DETAILS B
         1 OLD DETAILS C
         2 OLD DETAILS D



As we can see, all of the “old” data (rows that begin with “OLD”) that were from before the backup began (before t2) is available.  And only part of the data inserted during the backup (rows where C1=3) as would be expected – that’s our data inconsistency.

We’ve already seen that we can SELECT the “old” data.  We can also export it:

$ expdp scott/tiger dumpfile=DATA_PUMP_DIR:OLD_DATA.dmp nologfile=y

Export: Release - Production on Mon Mar 2 09:39:11 2015

Copyright (c) 1982, 2014, Oracle and/or its affiliates.  All rights reserved.

Connected to: Oracle Database 12c Enterprise Edition Release - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options
Starting "SCOTT"."SYS_EXPORT_SCHEMA_02":  scott/******** dumpfile=DATA_PUMP_DIR:OLD_DATA.dmp nologfile=y
Estimate in progress using BLOCKS method...
Total estimation using BLOCKS method: 640 KB
Processing object type SCHEMA_EXPORT/USER
Processing object type SCHEMA_EXPORT/SYSTEM_GRANT
Processing object type SCHEMA_EXPORT/ROLE_GRANT
Processing object type SCHEMA_EXPORT/DEFAULT_ROLE
Processing object type SCHEMA_EXPORT/TABLE/TABLE
Processing object type SCHEMA_EXPORT/TABLE/COMMENT
. . exported "SCOTT"."CHILD"                             5.570 KB       4 rows
. . exported "SCOTT"."PARENT"                            5.546 KB       3 rows
Master table "SCOTT"."SYS_EXPORT_SCHEMA_02" successfully loaded/unloaded
Dump file set for SCOTT.SYS_EXPORT_SCHEMA_02 is:
Job "SCOTT"."SYS_EXPORT_SCHEMA_02" successfully completed at Mon Mar 2 09:39:46 2015 elapsed 0 00:00:34



At this point we’ve either queried or extracted that critical old data which was the point of the exercise and we should immediately discard the restored database.  Remember it has data inconsistency which may include in internal tables an hence shouldn’t be used for anything beyond querying or extracting that “old” data.  Frequent crashes or other bizarre behavior of this restored database should be expected.  So get in, get the data, get out, and get rid of it!



If “desperate times call for desperate measures” and if you’re in that situation described in detail above where you need the data, are missing the necessary redo vectors, and are not concerned about the relevant data being modified during the backup then there options.

The “more redo needed for consistency” error stack should be familiar to most DBAs:

ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below
ORA-01194: file 1 needs more recovery to be consistent

And they may also be somewhat familiar with the “_allow_resetlogs_corruption” hidden initialization parameter.  However don’t let the resulting ORA-00600 error make the recovery attempt seem unsuccessful:

ORA-00600: internal error code, arguments: [2663], [0], [551715], [0], [562781], [], [], [], [], [], [], []

This error is overcome-able and the database likely can still be opened so the necessary data can be queried or extracted.

Note: this process has been tested with Oracle Database 10g, Oracle Database 11g, and Oracle Database 12c.

Categories: DBA Blogs

Log Buffer #412, A Carnival of the Vanities for DBAs

Fri, 2015-02-27 10:58

This Log Buffer Edition makes it way through the realms of Oracle, SQL Server and MySQL and brings you some of the blog posts.


Introducing Oracle Big Data Discovery Part 3: Data Exploration and Visualization


Base64 Encode / Decode with Python (or WebLogic Scripting Tool) by Frank Munz

Why I’m Excited About Oracle Integration Cloud Service – New Video

Reminder: Upgrade Database to by July 2015

SQL Server:

An article about how we underestimate the power of joins and degrade our query performance by not using proper joins

Most large organizations have implemented one or more big data applications. As more data accumulates internal users and analysts execute more reports and forecasts, which leads to additional queries and analysis, and more reporting.

How do you develop and deploy your database?

A database must be able to maintain and enforce the business rules and relationships in data in order to maintain the data model.

Error handling with try-catch-finally in PowerShell for SQL Server


MySQL Enterprise Monitor 3.0.20 has been released

MySQL Cluster 7.4 is GA!

Connector/Python 2.1.1 Alpha released with C Extension

Worrying about the ‘InnoDB: detected cycle in LRU for buffer pool (…)’ message?

MySQL Cluster 7.4 GA: 200 Million QPS, Active-Active Geographic Replication and more

Categories: DBA Blogs

Virtual CPUs with Google Compute Engine

Wed, 2015-02-25 12:12

Continuing on my series of virtual CPU behavior in Amazon Web Services, Amazon Web Service HVM instances, and Microsoft Azure, I’m taking a look at Google Compute Engine (GCE). GCE is a relative newcomer to the public cloud world, become generally available in December 2013. It does have some interesting features, including transparent maintenance through live migration, and automatic sustained-use discounts without upfront commitments.

Unlike Amazon or Microsoft, Google is very upfront about their vCPU definition.

For the n1 series of machine types, a virtual CPU is implemented as a single hyperthread on a 2.6GHz Intel Sandy Bridge Xeon or Intel Ivy Bridge Xeon (or newer) processor. This means that the n1-standard-2 machine type will see a whole physical core.

I still believe calling such a hyperthread a “virtual CPU” is misleading. When creating a virtual machine in a non-cloud VM platform, 1 virtual CPU = 1 physical core. Plain and simple. But when using a cloud platform, I need 2 virtual CPUs to get that same physical core.


Anyways, off to run some CPU tests. n1-standard-4 is a close match to the m3.xlarge instances previously tested, so I’ll try that.

Getting set up on Google Compute Engine

I already signed up with Google Compute Engine’s free trial and created a project I’m calling marc-cpu-test. Installing the gcloud compute command-line tools.

[marc@quartz ~]$ gcloud auth login --no-launch-browser
Go to the following link in your browser:

Enter verification code: (redacted)
Saved Application Default Credentials.

You are now logged in as [].
Your current project is [None].  You can change this setting by running:
  $ gcloud config set project PROJECT
[marc@quartz ~]$ gcloud config set project marc-cputest
[marc@quartz ~]$ gcloud config set compute/zone us-central1-a
[marc@quartz ~]$ gcloud compute instances create cpu-test-n4 --image centos-6 --machine-type "n1-standard-4" --zone us-central1-a
Created [].
cpu-test-n4 us-central1-a n1-standard-4 RUNNING
[marc@quartz ~]$ gcloud compute ssh cpu-test-n4
WARNING: You do not have an SSH key for Google Compute Engine.
WARNING: [/usr/bin/ssh-keygen] will be executed to generate a key.
Generating public/private rsa key pair.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/marc/.ssh/google_compute_engine.
Your public key has been saved in /home/marc/.ssh/
The key fingerprint is:
Updated [].
Warning: Permanently added '' (RSA) to the list of known hosts.
Warning: Permanently added '' (RSA) to the list of known hosts.

OK, instance all set and connected. As a CentOS 6 image it doesn’t allow SSH root logins by default, so attempting to set up a gcloud environment as a root user will get you “permission denied” errors on SSH. Serves me right for trying to run these tools as root in the first place :-).

Looking around

Checking what they got us:

[marc@cpu-test-n4 ~]$ egrep '(processor|model name|cpu MHz|physical id|siblings|core id|cpu cores)' /proc/cpuinfo
processor       : 0
model name      : Intel(R) Xeon(R) CPU @ 2.60GHz
cpu MHz         : 2600.000
physical id     : 0
siblings        : 4
core id         : 0
cpu cores       : 2
processor       : 1
model name      : Intel(R) Xeon(R) CPU @ 2.60GHz
cpu MHz         : 2600.000
physical id     : 0
siblings        : 4
core id         : 0
cpu cores       : 2
processor       : 2
model name      : Intel(R) Xeon(R) CPU @ 2.60GHz
cpu MHz         : 2600.000
physical id     : 0
siblings        : 4
core id         : 1
cpu cores       : 2
processor       : 3
model name      : Intel(R) Xeon(R) CPU @ 2.60GHz
cpu MHz         : 2600.000
physical id     : 0
siblings        : 4
core id         : 1
cpu cores       : 2

Google has redacted the exact CPU model numbers, but has clearly marked this as a 2-core system with core IDs 0 and 1.

The single-CPU case
[marc@cpu-test-n4 ~]$ taskset -pc 0 $$
pid 1558's current affinity list: 0-3
pid 1558's new affinity list: 0
[marc@cpu-test-n4 ~]$  dd if=/dev/zero bs=1M count=2070 2> >(grep bytes >&2 ) | gzip -c > /dev/null
2170552320 bytes (2.2 GB) copied, 14.3427 s, 151 MB/s
[marc@cpu-test-n4 ~]$ for i in {1..2}; do dd if=/dev/zero bs=1M count=2070 2> >(grep bytes >&2 ) | gzip -c > /dev/null & done
2170552320 bytes (2.2 GB) copied, 29.3081 s, 74.1 MB/s
2170552320 bytes (2.2 GB) copied, 29.3065 s, 74.1 MB/s

We get a nice boost in raw CPU numbers as compared to the 120 MB/s I saw in AWS. With two processes sharing this CPU, see a tiny bit less than half the throughput.

Sharing the cores (or trying to)
[marc@cpu-test-n4 ~]$ taskset -pc 0,1 $$
pid 1558's current affinity list: 0
pid 1558's new affinity list: 0,1
[marc@cpu-test-n4 ~]$ for i in {1..2}; do dd if=/dev/zero bs=1M count=2070 2> >(grep bytes >&2 ) | gzip -c &gt; /dev/null &amp; done
[1] 1803
[2] 1805
[marc@cpu-test-n4 ~]$ 2170552320 bytes (2.2 GB) copied, 14.6959 s, 148 MB/s
2170552320 bytes (2.2 GB) copied, 14.7139 s, 148 MB/s

This is interesting; we see almost full-speed throughput in spite of processors 0 and 1 said to have a shared core. With processors 0 and 2 the situation is the same.

[marc@cpu-test-n4 ~]$ taskset -pc 0,2 $$
pid 1558's current affinity list: 0,1
pid 1558's new affinity list: 0,2
[marc@cpu-test-n4 ~]$ for i in {1..2}; do dd if=/dev/zero bs=1M count=2070 2> >(grep bytes >&2 ) | gzip -c &gt; /dev/null &amp; done
[1] 1830
[2] 1833
[marc@cpu-test-n4 ~]$ 2170552320 bytes (2.2 GB) copied, 14.6683 s, 148 MB/s
2170552320 bytes (2.2 GB) copied, 14.6692 s, 148 MB/s

Is the CPU scheduler ignoring my taskset commands? Running mpstat 2-second samples during the test to see actual CPU usage:

[marc@cpu-test-n4 ~]$ mpstat -P ALL 2
06:08:44 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
06:08:46 PM  all   46.31    0.00    3.75    0.00    0.00    0.00    0.00    0.00   49.94
06:08:46 PM    0   93.50    0.00    6.50    0.00    0.00    0.00    0.00    0.00    0.00
06:08:46 PM    1   92.00    0.00    8.00    0.00    0.00    0.00    0.00    0.00    0.00
06:08:46 PM    2    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
06:08:46 PM    3    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
06:08:52 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
06:08:54 PM  all   46.75    0.00    3.25    0.00    0.00    0.00    0.00    0.00   50.00
06:08:54 PM    0   93.47    0.00    6.53    0.00    0.00    0.00    0.00    0.00    0.00
06:08:54 PM    1    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
06:08:54 PM    2   93.50    0.00    6.50    0.00    0.00    0.00    0.00    0.00    0.00
06:08:54 PM    3    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00

So the taskset commands are working: when we ask for CPUs 0 and 1, we are getting them, but throughput shows that cores aren’t being shared. It means that the CPUs in the virtual machine are not statically bound to hardware threads as seen under AWS. I’d call it a win as it gets more consistent performance even if the guest operating system is forced to makes poor CPU scheduling decisions as in this case.

[marc@cpu-test-n4 ~]$ taskset -pc 0-3 $$
pid 1558's current affinity list: 0,2
pid 1558's new affinity list: 0-3
[marc@cpu-test-n4 ~]$ for i in {1..4}; do dd if=/dev/zero bs=1M count=2070 2> >(grep bytes >&2 ) | gzip -c > /dev/null & done
2170552320 bytes (2.2 GB) copied, 22.9823 s, 94.4 MB/s
2170552320 bytes (2.2 GB) copied, 22.9914 s, 94.4 MB/s
2170552320 bytes (2.2 GB) copied, 22.9915 s, 94.4 MB/s
2170552320 bytes (2.2 GB) copied, 23.1333 s, 93.8 MB/s

This is more the throughput we would expect with two shared cores.

Lessons learned

Over the course of these tests, I’ve discovered a few things:

  • Although they share virtual CPUs like competitors, Google is very upfront about this behavior.
  • Actual throughput for a simple gzip workload is excellent.
  • Google Compute Engine has an abstraction layer in front of CPUs that dynamically schedules tasks between CPU threads, in addition to the regular scheduler in the virtual machine. In my testing, it allocates tasks efficiently across CPU cores, even when the OS scheduler is configured suboptimally.
Categories: DBA Blogs

Exadata X5 – A Practical Point of View of the New Hardware and Licensing

Wed, 2015-02-25 12:10

Oracle recently announced its latest iteration of Exadata – X5-2. It includes a refresh of the hardware to the most recent Xeon® E5-2699 v3 CPUs. These new CPUs boost the total cores count in a full rack to 288. This is higher than the current 8 socket “big machine” version X4-8, which has only 240 cores.

But the most exciting part is the all flash version of Exadata. In the previous generation – X4 – Oracle had to switch from 15K drives to 10K drives in order to boost capacity from 600 GB to 1200 GB per hard drive to keep disk space higher than flash cache size. At that time of X4 announcements, we were already wondering why Oracle was still offering high-speed disks and not switching to all flash, and now we know why. Because that type of high-performance flash wasn’t quite ready.

Maintaining high IO rates over long periods of times needed some changes to the ILOM in order to maintain cooling fans speed based on many individual temperature sensors inside the flash cards (details). Removing the SAS controller and using the new NVMe connectivity resulted in much higher bandwidth per hard drive – 3.2 GBytes/sec vs. the old 1.2 GBytes/sec SAS.

With temperature and bandwidth sorted out, we now have a super-high performance option (EF – Extreme Flash) for Exadata which delivers the stunning 263 GB/sec uncompressed scan speed in a full rack. The difference in performance between the High Capacity and High Performance EF flash option is now much higher. The high-performance option in Exadata X5 is now viable. In Exadata X4 it made so little difference, that it was pointless.

x4 vs x5

The one thing I wonder with the X5 announcement is why the X5-2 storage server still uses the very old and quite outdated 8 core CPUs. I’ve seen many cases where a Smart Scan on an HCC table is CPU bound on the storage server even when reading from spinning disk. I am going to guess that there’s some old CPU inventory to cleanup. But that may not end up being such a problem (see “all columnar” flash cache feature).

But above all, the most important change was the incremental licensing option. With 36 cores per server, even the 1/8th rack configuration was in the multi-million dollars in licenses, and in many cases was too much for the problem in hand.

The new smallest configuration is:

  • 1/8th rack, with 2 compute nodes
  • 8 cores enabled per compute node (16 total)
  • 256 GB RAM per node (upgradable to 768 GB per node)
  • 3 storage servers with only half the cores, disks and flash enabled

Then you can license additional cores as you need them, 2 cores at a time. Similar to how ODA licensing option worked. You cannot reduce licensed cores.

The licensing rules changes go even further. Now you can mix & match compute and storage servers to create even more extreme options. Some non-standard examples:

  • Extreme Memory – more compute nodes with max RAM, reduced licensed cores
  • Extreme Storage – replace compute node with storage nodes, reduced licensed cores

x5 custom
Link to video

In conclusion, Oracle Exadata X5 configuration options and the changes it brings to licensing allows an architect to craft a system that will meet any need and allow for easy, small step increments in the future, potentially without any hardware changes.

There are many more exciting changes in Oracle 12c, Exadata X5 and the new storage server software which I may cover in the future as I explore them in detail.

Categories: DBA Blogs