Skip navigation.

Pythian Group

Syndicate content
Official Pythian Blog - Love Your Data
Updated: 12 hours 47 min ago

TechTalk v5.0 – The Age of Big Data with Alex Morrise

Mon, 2014-07-28 14:06

Who: Hosted by Blackbird, with a speaking session by Alex Morrise, Chief Data Scientist at Pythian.

What: TechTalk presentation, beer, wine, snacks and Q&A

Where: Blackbird HQ – 712 Tehama Street (corner of 8th and Tehama) San Francisco, CA

When: Thursday July 31, 2014 from 6:00-8:00 PM

How: RSVP here!

TechTalk v5.0 welcomes to the stage, Alex Morrise, Chief Data Scientist at Pythian. Alex previously worked with Idle Games, Quid, and most recently Beats Music where he led the development of an adaptive, contextual music recommendation server.  Alex earned a PhD in Theoretical Physics from UC Santa Cruz.

This edition of TechTalk will be based on how the age of big data allows statistical inference on an unprecedented scale. Inference is the process of extracting knowledge from data, many times uncovering latent variables unifying seemingly diverse pieces of information. As data grows in complexity and dimension, visualization becomes increasingly difficult. How do we represent complex data to discover implicit and explicit relationships? We discuss how to Visualize Inference in some interesting data sets that uncover topics as diverse as the growth of technology, social gaming, and music.

You won’t want to miss this event, so be sure to RSVP.

 

Categories: DBA Blogs

Unexpected Shutdown Caused by ASR

Mon, 2014-07-28 13:45

In past few days I had two incidents and an outage, for just a few minutes. However, outage in a production environment is related to cost relatively and strictly. The server that had outage was because of failing over and then failing back about 4 to 5 times in 15 minutes. I was holding pager, and was then involved in investigating root cause for this fail-over and failed-back. Looking at the events in SQL Server error logs did not give me any clue towards what was happening, or why so I looked at the Windows Event View’s System log. I thought, “Maybe I have something there!”

There were two events that came to my attention:

Event Type:        Error

Event Source:    EventLog

Event Category:                None

Event ID:              6008

Date:                     7/24/2014

Time:                     1:14:12 AM

User:                     N/A

Computer:          SRV1

Description:

The previous system shutdown at 1:00:31 AM on 7/24/2014 was unexpected.

 

Event Type:        Information

Event Source:    Server Agents

Event Category:                Events

Event ID:              1090

Date:                     7/24/2014

Time:                     1:15:16 AM

User:                     N/A

Computer:          SRV1

Description:

System Information Agent: Health: The server is operational again.  The server has previously been shutdown by the Automatic Server Recovery (ASR) feature and has just become operational again.

 

 

The errors are closely related to the feature called Automatic Server Recovery (ASR) which is mainly configured with the server, and comes with the hardware. In our case, HP Blade, ProLiant server. There has been some resources/threads already discussed around similar topic. Most of the hardware vendor has somewhat similar software with similar functionality made available for servers.

In my case, my understanding was that maybe firmware are out of date and requiring updating, or the servers are aged. Further, I have sent my findings to customer with an incident report.  In a couple of hours, I had a reply and the feedback I received was just what I was expecting, the hardware was aged.  This may be the case with you when you see a message in event viewer which reads like “System Information Agent: Health: The server is operational again.  The server has previously been shutdown by the Automatic Server Recovery (ASR) feature and has just become operational again.”  Go check with your system administrator. The root cause of this unexepcted shutdown may not be related or caused by the SQL Server, rather, the system itself.  Please keep in mind that this could be one of the reasons, and certainly not the only.

References:

Automatic System Recovery

 

Categories: DBA Blogs

Logging for Slackers

Mon, 2014-07-28 07:41

When I’m not working on Big Data infrastructure for clients, I develop a few internal web applications and side projects. It’s very satisfying to write a Django app in an afternoon and throw it on Heroku, but there comes a time when people actually start to use it. They find bugs, they complain about downtime, and suddenly your little side project needs some logging and monitoring infrastructure. To be clear, the right way to do this would be to subscribe to a SaaS logging platform, or to create some solution with ElasticSearch and Kibana, or just use Splunk. Today I was feeling lazy, and I wondered if there wasn’t an easier way.

Enter Slack

Slack is a chat platform my team already uses to communicate – we have channels for different purposes, and people subscribe to keep up to date about Data Science, our internal Hadoop cluster, or a bunch of other topics. I already get notifications on my desktop and my phone, and the history of messages is visible and searchable for everyone in a channel. This sounds like the ideal lazy log repository.

Slack offers a rich REST API where you can search, work with files, and communicate in channels. They also offer an awesome (for the lazy) Incoming WebHooks feature – this allows you to POST a JSON message with a secret token, which is posted to a pre-configured channel as a user you can configure in the web UI. The hardest part of setting up a new WebHook was choosing which emoji would best represent application errors – I chose a very sad smiley face, but the devil is also available.

The Kludge

Django already offers the AdminEmailHandler, which emails log messages to the admins listed in your project. I could have created a mailing list, added it to the admins list, and let people subscribe. They could then create a filter in their email to label the log messages. That sounds like a lot of work, and there wouldn’t be a history of the messages except in individual recipients’ inboxes.

Instead, I whipped up this log handler for Django which will post the message (and a stack trace, if possible) to your Slack endpoint:

from logging import Handler
import requests, json, traceback
class SlackLogHandler(Handler):
   def __init__(self, logging_url="", stack_trace=False):
      Handler.__init__(self)
      self.logging_url = logging_url
      self.stack_trace = stack_trace
   def emit(self, record):
      message = '%s' % (record.getMessage())
      if self.stack_trace:
         if record.exc_info:
            message += '\n'.join(traceback.format_exception(*record.exc_info))
            requests.post(self.logging_url, data=json.dumps({"text":message} ))

There you go: install the requests library, generate an Incoming WebHook URL at api.slack.com, stick the SlackLogHandler in your Django logging configuration, and your errors will be logged to the Slack channel of your choice. Stack traces are optional – I’ve also been using this to post hourly reports of active users, etc. to the channel under a difference username.

For reference, here’s a log configuration for the Django settings.py. Now go write some code, you slacker.

LOGGING = {
    'version':1,
    'disable_existing_loggers':False,
    'handlers': {
        'console': {
            'level':'DEBUG',
            'class':'logging.StreamHandler',
        },
        'slack-error': {
            'level':'ERROR',
            'class':'SlackLogHandler.SlackLogHandler',
            'logging_url':'<SuperSecretWebHookURL>',
            'stack_trace':True
        }
    }
    'loggers': {
        'django': {
            'level': 'INFO',
            'handlers': ['console', 'slack-error']
        }
    }
}
Categories: DBA Blogs

Log Buffer #381, A Carnival of the Vanities for DBAs

Fri, 2014-07-25 08:22

Thy rhythm of blog posts regarding database technology has remained consistent throughout the week. Few of those posts have been plucked by this Log Buffer Edition for your pleasure.

Oracle:

Sayan has shared a Standalone sqlplus script for plans comparing.

Gartner Analysis: PeopleSoft Update Manager Delivers Significant Improvements to the Upgrade Tools and Processes.

Timely blackouts, of course, are essential to keeping the numbers up and (more importantly) preventing Target Down notifications from being sent out.

Are you experiencing analytics pain points?

Bug with xmltable, xmlnamespaces and xquery_string specified using bind variable.

SQL Server:

SQL Server 2012 introduced columnstore indexes, which can immensely improve the performance of OLAP queries.

Restoring the SQL Server Master Database Even Without a Backup .

There times when you need to write T-SQL code that creates specific T-SQL Code and executes it. When you do this you are creating dynamic T-SQL code.

A lot of numbers that we use everyday such as Bank Card numbers, Identification numbers, and ISBN codes, have check digits.

SQL-only ETL using a bulk insert into a temporary table (SQL Spackle).

MySQL:

How MariaDB makes Stored Procedures usable.

DBaaS, OpenStack and Trove 101: Introduction to the basics.

MySQL Fabric is a tool included on MySQL Utilities that helps you to manage your MySQL instances.

Showing all available MySQL data types when creating a new table with MySQL for Excel.

Why TokuDB hates Transparent HugePages.

Categories: DBA Blogs

Happy System Administrator Appreciation Day

Fri, 2014-07-25 07:59

Today is our day. July 25, 2014 marks the 15th annual System Administrator Appreciation Day. On this day we pause and take a moment to forget the impossible tasks, nonexistent budgets, and often unrealistic timelines to say thank you to those people who keeps everything working — system administrators.

So much of what has become a part of everyday life, from doing our jobs, to playing games online, shopping, and connecting with friends and family around the world is only possible due in large part to the tireless efforts of the system administrators who are in the trenches every hour of every day of the year keeping the tubes clear and the packets flowing. The fact that technology has become so common place in our lives, and more often than not “just works” has afforded us the luxury of forgetting (or not evening knowing) the immense infrastructure complexity which the system administrator works with to deliver the services we have come to rely on.

SysAdmin Appreciation Day started 15 years ago thanks to Ted Kekatos. According to Wikipedia, “Kekatos was inspired to create the special day by a Hewlett-Packard magazine advertisement in which a system administrator is presented with flowers and fruit-baskets by grateful co-workers as thanks for installing new printers. Kekatos had just installed several of the same model printers at his workplace.” Ever since then, SysAdmin Appreciation Day has been celebrated on the last Friday in July.

At Pythian, I have the privilege of being part of the Enterprise Infrastructure Services group.  We are a SysAdmin dream team of the best of the best, from around the globe. Day in and day out, our team is responsible for countless servers, networks, and services that millions of people use every day.

To all my colleagues and to anyone who considers themselves a SysAdmin, regardless of which flavour – thank you, and know that you are truly doing work that matters.

Categories: DBA Blogs

Exploring Options of Using RMAN Configure to Simplify Backup

Thu, 2014-07-24 14:06

I am a simple person who likes simple things, especially RMAN backup implementation.

I have yet to understand why RMAN backup implementation does not use configure command, and if you have a good explanation, please share.

Examples for RMAN configure command

configure device type disk parallelism 2 backup type to compressed backupset;
configure channel device type disk format '/oradata/backup/%d_%I_%T_%U' maxopenfiles 1;
configure channel 1 device type disk format '/oradata/backup1/%d_%I_%T_%U' maxopenfiles 1;
configure archivelog deletion policy to backed up 2 times to disk;
configure backup optimization on;

Do you know if backup is using parallelism?
Where is the backup to?
Is the backup to tape?

RMAN> show all;

RMAN configuration parameters for database with db_unique_name SAN are:
CONFIGURE RETENTION POLICY TO REDUNDANCY 1; # default
CONFIGURE BACKUP OPTIMIZATION ON;
CONFIGURE DEFAULT DEVICE TYPE TO DISK;
CONFIGURE CONTROLFILE AUTOBACKUP ON;
CONFIGURE CONTROLFILE AUTOBACKUP FORMAT FOR DEVICE TYPE DISK TO '/oradata/backup/%d_%F.ctl';
CONFIGURE DEVICE TYPE DISK PARALLELISM 2 BACKUP TYPE TO COMPRESSED BACKUPSET;
CONFIGURE DATAFILE BACKUP COPIES FOR DEVICE TYPE DISK TO 1; # default
CONFIGURE ARCHIVELOG BACKUP COPIES FOR DEVICE TYPE DISK TO 1; # default
CONFIGURE CHANNEL DEVICE TYPE DISK FORMAT   '/oradata/backup/%d_%I_%T_%U' MAXOPENFILES 1;
CONFIGURE CHANNEL 1 DEVICE TYPE DISK FORMAT   '/oradata/backup1/%d_%I_%T_%U' MAXOPENFILES 1;
CONFIGURE MAXSETSIZE TO UNLIMITED; # default
CONFIGURE ENCRYPTION FOR DATABASE OFF; # default
CONFIGURE ENCRYPTION ALGORITHM 'AES128'; # default
CONFIGURE COMPRESSION ALGORITHM 'BASIC' AS OF RELEASE 'DEFAULT' OPTIMIZE FOR LOAD TRUE ; # default
CONFIGURE ARCHIVELOG DELETION POLICY TO BACKED UP 2 TIMES TO DISK;
CONFIGURE SNAPSHOT CONTROLFILE NAME TO '/u01/app/oracle/product/11.2.0/dbhome_1/dbs/snapcf_san.f'; # default

RMAN>

Simple RMAN script.

set echo on;
connect target;
show all;
backup incremental level 0 check logical database filesperset 1 tag "fulldb"
plus archivelog filesperset 8 tag "archivelog";

Simple RMAN run.

$ rman @simple.rman

Recovery Manager: Release 11.2.0.4.0 - Production on Thu Jul 24 11:12:19 2014

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

RMAN> set echo on;
2> connect target;
3> show all;
4> backup incremental level 0 check logical database filesperset 1 tag "fulldb"
5> plus archivelog filesperset 8 tag "archivelog";
6>
echo set on

connected to target database: SAN (DBID=2792912513)

using target database control file instead of recovery catalog
RMAN configuration parameters for database with db_unique_name SAN are:
CONFIGURE RETENTION POLICY TO REDUNDANCY 1; # default
CONFIGURE BACKUP OPTIMIZATION ON;
CONFIGURE DEFAULT DEVICE TYPE TO DISK;
CONFIGURE CONTROLFILE AUTOBACKUP ON;
CONFIGURE CONTROLFILE AUTOBACKUP FORMAT FOR DEVICE TYPE DISK TO '/oradata/backup/%d_%F.ctl';
CONFIGURE DEVICE TYPE DISK PARALLELISM 2 BACKUP TYPE TO COMPRESSED BACKUPSET;
CONFIGURE DATAFILE BACKUP COPIES FOR DEVICE TYPE DISK TO 1; # default
CONFIGURE ARCHIVELOG BACKUP COPIES FOR DEVICE TYPE DISK TO 1; # default
CONFIGURE CHANNEL DEVICE TYPE DISK FORMAT   '/oradata/backup/%d_%I_%T_%U' MAXOPENFILES 1;
CONFIGURE CHANNEL 1 DEVICE TYPE DISK FORMAT   '/oradata/backup1/%d_%I_%T_%U' MAXOPENFILES 1;
CONFIGURE MAXSETSIZE TO UNLIMITED; # default
CONFIGURE ENCRYPTION FOR DATABASE OFF; # default
CONFIGURE ENCRYPTION ALGORITHM 'AES128'; # default
CONFIGURE COMPRESSION ALGORITHM 'BASIC' AS OF RELEASE 'DEFAULT' OPTIMIZE FOR LOAD TRUE ; # default
CONFIGURE ARCHIVELOG DELETION POLICY TO BACKED UP 2 TIMES TO DISK;
CONFIGURE SNAPSHOT CONTROLFILE NAME TO '/u01/app/oracle/product/11.2.0/dbhome_1/dbs/snapcf_san.f'; # default


Starting backup at 2014-JUL-24 11:12:21
current log archived
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=20 device type=DISK
allocated channel: ORA_DISK_2
channel ORA_DISK_2: SID=108 device type=DISK
channel ORA_DISK_1: starting compressed archived log backup set
channel ORA_DISK_1: specifying archived log(s) in backup set
input archived log thread=1 sequence=326 RECID=337 STAMP=853758742
channel ORA_DISK_1: starting piece 1 at 2014-JUL-24 11:12:24
channel ORA_DISK_1: finished piece 1 at 2014-JUL-24 11:12:25
piece handle=/oradata/backup1/SAN_2792912513_20140724_8dpe6koo_1_1 tag=ARCHIVELOG comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01
Finished backup at 2014-JUL-24 11:12:25

Starting backup at 2014-JUL-24 11:12:25
using channel ORA_DISK_1
using channel ORA_DISK_2
channel ORA_DISK_1: starting compressed incremental level 0 datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00003 name=/oradata/SAN/datafile/o1_mf_undotbs1_9oqwsjk6_.dbf
channel ORA_DISK_1: starting piece 1 at 2014-JUL-24 11:12:26
channel ORA_DISK_2: starting compressed incremental level 0 datafile backup set
channel ORA_DISK_2: specifying datafile(s) in backup set
input datafile file number=00008 name=/oradata/SAN/datafile/o1_mf_user_dat_9wvp8s78_.dbf
channel ORA_DISK_2: starting piece 1 at 2014-JUL-24 11:12:26
channel ORA_DISK_1: finished piece 1 at 2014-JUL-24 11:13:01
piece handle=/oradata/backup1/SAN_2792912513_20140724_8epe6koq_1_1 tag=FULLDB comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:35
channel ORA_DISK_1: starting compressed incremental level 0 datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00001 name=/oradata/SAN/datafile/o1_mf_system_9oqwr5tm_.dbf
channel ORA_DISK_1: starting piece 1 at 2014-JUL-24 11:13:04
channel ORA_DISK_1: finished piece 1 at 2014-JUL-24 11:13:29
piece handle=/oradata/backup1/SAN_2792912513_20140724_8gpe6kpu_1_1 tag=FULLDB comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:25
channel ORA_DISK_1: starting compressed incremental level 0 datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00002 name=/oradata/SAN/datafile/o1_mf_sysaux_9oqwrv2b_.dbf
channel ORA_DISK_1: starting piece 1 at 2014-JUL-24 11:13:30
channel ORA_DISK_1: finished piece 1 at 2014-JUL-24 11:13:45
piece handle=/oradata/backup1/SAN_2792912513_20140724_8hpe6kqp_1_1 tag=FULLDB comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:15
channel ORA_DISK_1: starting compressed incremental level 0 datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00005 name=/oradata/SAN/datafile/o1_mf_ggs_data_9or2h3tw_.dbf
channel ORA_DISK_1: starting piece 1 at 2014-JUL-24 11:13:45
channel ORA_DISK_1: finished piece 1 at 2014-JUL-24 11:13:48
piece handle=/oradata/backup1/SAN_2792912513_20140724_8ipe6kr9_1_1 tag=FULLDB comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:03
channel ORA_DISK_1: starting compressed incremental level 0 datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00006 name=/oradata/SAN/datafile/o1_mf_testing_9rgp1q31_.dbf
channel ORA_DISK_1: starting piece 1 at 2014-JUL-24 11:13:49
channel ORA_DISK_1: finished piece 1 at 2014-JUL-24 11:13:52
piece handle=/oradata/backup1/SAN_2792912513_20140724_8jpe6krc_1_1 tag=FULLDB comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:03
channel ORA_DISK_2: finished piece 1 at 2014-JUL-24 11:14:44
piece handle=/oradata/backup/SAN_2792912513_20140724_8fpe6koq_1_1 tag=FULLDB comment=NONE
channel ORA_DISK_2: backup set complete, elapsed time: 00:02:18
Finished backup at 2014-JUL-24 11:14:44

Starting backup at 2014-JUL-24 11:14:44
current log archived
using channel ORA_DISK_1
using channel ORA_DISK_2
channel ORA_DISK_1: starting compressed archived log backup set
channel ORA_DISK_1: specifying archived log(s) in backup set
input archived log thread=1 sequence=327 RECID=338 STAMP=853758885
channel ORA_DISK_1: starting piece 1 at 2014-JUL-24 11:14:46
channel ORA_DISK_1: finished piece 1 at 2014-JUL-24 11:14:47
piece handle=/oradata/backup1/SAN_2792912513_20140724_8kpe6kt6_1_1 tag=ARCHIVELOG comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01
Finished backup at 2014-JUL-24 11:14:47

Starting Control File Autobackup at 2014-JUL-24 11:14:48
piece handle=/oradata/backup/SAN_c-2792912513-20140724-05.ctl comment=NONE
Finished Control File Autobackup at 2014-JUL-24 11:14:55

Recovery Manager complete.

-----

$ ls -l backup*
backup:
total 501172
-rw-r-----. 1 oracle oinstall 505167872 Jul 24 11:14 SAN_2792912513_20140724_8fpe6koq_1_1
-rw-r-----. 1 oracle oinstall   8028160 Jul 24 11:14 SAN_c-2792912513-20140724-05.ctl

backup1:
total 77108
-rw-r-----. 1 oracle oinstall   237056 Jul 24 11:12 SAN_2792912513_20140724_8dpe6koo_1_1
-rw-r-----. 1 oracle oinstall  1236992 Jul 24 11:12 SAN_2792912513_20140724_8epe6koq_1_1
-rw-r-----. 1 oracle oinstall 39452672 Jul 24 11:13 SAN_2792912513_20140724_8gpe6kpu_1_1
-rw-r-----. 1 oracle oinstall 34349056 Jul 24 11:13 SAN_2792912513_20140724_8hpe6kqp_1_1
-rw-r-----. 1 oracle oinstall  2539520 Jul 24 11:13 SAN_2792912513_20140724_8ipe6kr9_1_1
-rw-r-----. 1 oracle oinstall  1073152 Jul 24 11:13 SAN_2792912513_20140724_8jpe6krc_1_1
-rw-r-----. 1 oracle oinstall    67072 Jul 24 11:14 SAN_2792912513_20140724_8kpe6kt6_1_1

If this does not hit the nail on the head, then I don’t know what will.

Imagine someone, maybe me or yourself, deleting archivelog accidentally.

RMAN> delete noprompt archivelog all;

using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=108 device type=DISK
allocated channel: ORA_DISK_2
channel ORA_DISK_2: SID=20 device type=DISK
RMAN-08138: WARNING: archived log not deleted - must create more backups
archived log file name=/oradata/SAN/archivelog/arc_845895297_1_326.dbf thread=1 sequence=326
RMAN-08138: WARNING: archived log not deleted - must create more backups
archived log file name=/oradata/SAN/archivelog/arc_845895297_1_327.dbf thread=1 sequence=327

RMAN>

-----

RMAN> configure archivelog deletion policy to none;

old RMAN configuration parameters:
CONFIGURE ARCHIVELOG DELETION POLICY TO BACKED UP 2 TIMES TO DISK;
new RMAN configuration parameters:
CONFIGURE ARCHIVELOG DELETION POLICY TO NONE;
new RMAN configuration parameters are successfully stored

RMAN> delete noprompt archivelog all;

released channel: ORA_DISK_1
released channel: ORA_DISK_2
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=108 device type=DISK
allocated channel: ORA_DISK_2
channel ORA_DISK_2: SID=20 device type=DISK
List of Archived Log Copies for database with db_unique_name SAN
=====================================================================

Key     Thrd Seq     S Low Time
------- ---- ------- - --------------------
337     1    326     A 2014-JUL-24 11:04:17
        Name: /oradata/SAN/archivelog/arc_845895297_1_326.dbf

338     1    327     A 2014-JUL-24 11:12:21
        Name: /oradata/SAN/archivelog/arc_845895297_1_327.dbf

deleted archived log
archived log file name=/oradata/SAN/archivelog/arc_845895297_1_326.dbf RECID=337 STAMP=853758742
deleted archived log
archived log file name=/oradata/SAN/archivelog/arc_845895297_1_327.dbf RECID=338 STAMP=853758885
Deleted 2 objects


RMAN>

Will you be using configure for your next RMAN implementation?

Categories: DBA Blogs

Using SaltStack for Configuration Management

Wed, 2014-07-23 12:40

In my last blog post I mentioned that SaltStack is a fully featured configuration management solution, but we never looked into using the tool in that way. Today we will begin to explore some basic examples of configuration management with SaltStack.  We will look at two aspects of configuration management, installing a package, and will manage a service.

The scenario

A great repeatable task which can be automated with configuration management, and one which is faced by many systems administrators is having to add more capacity to an existing front end webserver pool.

Without a configuration management solution, you generally have to rely on an install document that is maintained by your systems administration team. One of those admins gets the job of preparing the new box, and follows the steps in that document to install all of the required packages and configure all of the required services to make that box a “webserver”

This method introduces a high potential for human error. The person following the document might miss step #17 on page 3, and you end up with a webserver in the pool that delivers content to your users in a strange and inconsistent way.  Depending on the maturity of your infrastructure, you also may or may not have the tools in place to even identify that the webserver is acting strangely due to this misconfiguration until clients begin to complain that your service delivers an unreliable experience.

From a resourcing point of view, this task can tie up 2 resources. The person doing the box install, and a second person you need to “QA” the box after the install is done to catch the fact that the first person missed step #17 on page 3.

Using a configuration management tool you define what your box should look like (a model) at a higher, abstracted level and the tool knows what is required to bring the server in line with it’s desired state. The tool does not need to be told that on a RedHat based system you use “yum” to install a package and on Debian systems you use “apt” as the operator you just say that the systems needs to have the package and the tool takes it from there.

By modelling your systems the tool can then provide accurate repeatability of the task of bringing your systems into line with the defined specifications of the model. And while this does shift the responsibility of eliminating any human error within the model itself, once it has been tested and validated the result is that each subsequent execution will be done programmatically without error.

Using SaltStack to install a package and manage a service

The first thing that we will need to do is tell the salt master that we would like to start using it for configuration management. We do this by uncommenting, or adding the following to our /etc/salt/master config:


file_roots:

base:
- /srv/salt

in the /srv directory as root make a “salt” subdir.

mkdir -p /srv/salt

Everything else, from this point forward will be written under the assumption that you are working in the /srv/salt dir.

Salt formulas

In SALT the set of instructions, or “model” that you define is known as a formula. Salt uses PyYALM as it’s configuration syntax. The first thing that we need to defile a base formula called “top.sls”


base:

'*':
- motd
'web*':
- apache
- webserver

This tells salt that all boxes should have the motd formula and that minions with hostnames starting with “web” should also get the apache formula.

Our Apache formula (apache.sls) is very basic for the purposes of this post:


httpd:

pkg:
- installed
service:
- running
- require:
- pkg: httpd

This tells the minion that it needs to install the package named httpd (remember the minion knows how to do this) and that the service should be running and that the service has a dependency on the package being installed. That is to say, you can’t manage the service unless the package that provides that server also is there.

When we apply the formula you can see that the minion receives the instruction. The minion installs the package and it’s dependant packages. Then it starts the service.


[root@ip-10-0-0-170 salt]# salt '*' state.sls apache

ip-10-0-0-171.ec2.internal:
----------
ID: httpd
Function: pkg.installed
Result: True
Comment: The following packages were installed/updated: httpd.
Changes:
----------
apr:
----------
new:
1.5.0-2.11.amzn1
old:

apr-util:
----------
new:
1.4.1-4.14.amzn1
old:

apr-util-ldap:
----------
new:
1.4.1-4.14.amzn1
old:

httpd:
----------
new:
2.2.27-1.2.amzn1
old:

httpd-tools:
----------
new:
2.2.27-1.2.amzn1
old:

mailcap:
----------
new:
2.1.31-2.7.amzn1
old:

----------
ID: httpd
Function: service.running
Result: True
Comment: Started Service httpd
Changes:
----------
httpd:
True

Summary
------------
Succeeded: 2
Failed: 0
------------
Total: 2

On subsequent runs, you can see that the package is already installed and the service is already running.


[root@ip-10-0-0-170 salt]# salt '*' state.sls apache

ip-10-0-0-171.ec2.internal:
----------
ID: httpd
Function: pkg.installed
Result: True
Comment: Package httpd is already installed
Changes:
----------
ID: httpd
Function: service.running
Result: True
Comment: The service httpd is already running
Changes:

Summary
------------
Succeeded: 2
Failed: 0
------------
Total: 2

If either was not true, if I were to go onto the box and stop the service:


[root@ip-10-0-0-171 ~]# service httpd stop

Stopping httpd: [ OK ]
[root@ip-10-0-0-171 ~]#

The next salt run would start the service again bringing the box back into compliance with my defined model.


ip-10-0-0-171.ec2.internal:

----------
ID: httpd
Function: pkg.installed
Result: True
Comment: Package httpd is already installed
Changes:
----------
ID: httpd
Function: service.running
Result: True
Comment: Started Service httpd
Changes:
----------
httpd:
True

Summary
------------
Succeeded: 2
Failed: 0
------------
Total: 2
[root@ip-10-0-0-171 ~]# service httpd status
httpd (pid 2493) is running...
[root@ip-10-0-0-171 ~]#

This becomes a powerful auditing tool which can allow you to quickly ensure that all boxes of a specific type match each other, and eliminates the above mentioned problem of missing step #17 on page 3 of your install doc.  With the heavy lifting of this task moved from human operators to the tool, and knowing that each node will be built identical to the others you can now scale up much quicker in response to your changing business needs, a task which previously could take a few days is now done in minutes.

 

Categories: DBA Blogs

How To Correlate Oracle Database Transaction with GoldenGate

Mon, 2014-07-21 13:19

So there I was troubleshooting GoldenGate issue and was puzzled as to why GoldenGate transactions were not seen from Oracle database.

I had the transaction XID correct; however, I was filtering by ACTIVE transaction from Oracle which was causing the issue.

Please allow me to share a test case so that you don’t get stumped like I did.

Identify current log and update table

ARROW:(SOE@san):PRIMARY> select max(sequence#)+1 from v$log_history;

MAX(SEQUENCE#)+1
----------------
             196

ARROW:(SOE@san):PRIMARY> update INVENTORIES set QUANTITY_ON_HAND=QUANTITY_ON_HAND-10 where PRODUCT_ID=171 and WAREHOUSE_ID=560;

1 row updated.

ARROW:(SOE@san):PRIMARY>

From GoldenGate, find opened transactions for duration of 10 minutes

$ ./ggsci

Oracle GoldenGate Command Interpreter for Oracle
Version 11.2.1.0.21 18343248 OGGCORE_11.2.1.0.0OGGBP_PLATFORMS_140404.1029_FBO
Linux, x64, 64bit (optimized), Oracle 11g on Apr  4 2014 15:18:36

Copyright (C) 1995, 2014, Oracle and/or its affiliates. All rights reserved.



GGSCI (arrow.localdomain) 1> info all

Program     Status      Group       Lag at Chkpt  Time Since Chkpt

MANAGER     RUNNING
EXTRACT     RUNNING     ESAN        00:00:00      00:00:05
EXTRACT     STOPPED     PSAN_LAS    00:00:00      68:02:14
REPLICAT    STOPPED     RLAS_SAN    00:00:00      68:02:12


GGSCI (arrow.localdomain) 2> send esan, status

Sending STATUS request to EXTRACT ESAN ...


EXTRACT ESAN (PID 2556)
  Current status: Recovery complete: At EOF

  Current read position:
  Redo thread #: 1
  Sequence #: 196
  RBA: 5861376
  Timestamp: 2014-07-21 10:52:59.000000
  SCN: 0.1653210
  Current write position:
  Sequence #: 7
  RBA: 1130
  Timestamp: 2014-07-21 10:52:52.621948
  Extract Trail: /u01/app/ggs01/dirdat/ss



GGSCI (arrow.localdomain) 3> send esan, showtrans duration 10m

Sending showtrans request to EXTRACT ESAN ...


Oldest redo log file necessary to restart Extract is:

Redo Log Sequence Number 196, RBA 4955152

------------------------------------------------------------
XID:                  3.29.673
Items:                1
Extract:              ESAN
Redo Thread:          1
Start Time:           2014-07-21:10:41:41
SCN:                  0.1652053 (1652053)
Redo Seq:             196
Redo RBA:             4955152
Status:               Running


GGSCI (arrow.localdomain) 4>

Note the Redo Seq: 196 matches the sequence when the update was performed from Oracle database.
Also, note XID: 3.29.673

Let’s find the transaction from the database an notice the XID matches between GoldenGate and Oracle database.

ARROW:(SYS@san):PRIMARY> @trans.sql

START_TIME           XID              STATUS          SID    SERIAL# USERNAME           STATUS   SCHEMANAME         SQLID              CHILD
-------------------- ---------------- -------- ---------- ---------- ------------------ -------- ------------------ ------------- ----------
07/21/14 10:41:39    3.29.673         INACTIVE        105          9 SOE                INACTIVE SOE                6cmmk52wfnr7r          0

ARROW:(SYS@san):PRIMARY> @xplan.sql
Enter value for sqlid: 6cmmk52wfnr7r
Enter value for child: 0
SQL_ID  6cmmk52wfnr7r, child number 0
-------------------------------------
update INVENTORIES set QUANTITY_ON_HAND=QUANTITY_ON_HAND-10 where
PRODUCT_ID=171 and WAREHOUSE_ID=560

Plan hash value: 2141863993

-----------------------------------------------------------------------------------
| Id  | Operation          | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | UPDATE STATEMENT   |              |       |       |     3 (100)|          |
|   1 |  UPDATE            | INVENTORIES  |       |       |            |          |
|*  2 |   INDEX UNIQUE SCAN| INVENTORY_PK |     1 |    14 |     2   (0)| 00:00:01 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("PRODUCT_ID"=171 AND "WAREHOUSE_ID"=560)


20 rows selected.

ARROW:(SYS@san):PRIMARY>

For fun, switched logfile and perform another update.

ARROW:(MDINH@san):PRIMARY> select max(sequence#)+1 from v$log_history;

MAX(SEQUENCE#)+1
----------------
             196

ARROW:(MDINH@san):PRIMARY> alter system switch logfile;

System altered.

ARROW:(MDINH@san):PRIMARY> /

System altered.

ARROW:(MDINH@san):PRIMARY> /

System altered.

ARROW:(MDINH@san):PRIMARY> /

System altered.

ARROW:(MDINH@san):PRIMARY> select max(sequence#)+1 from v$log_history;

MAX(SEQUENCE#)+1
----------------
             200

ARROW:(MDINH@san):PRIMARY> update SOE.INVENTORIES set QUANTITY_ON_HAND=QUANTITY_ON_HAND-10 where PRODUCT_ID=170;

883 rows updated.

ARROW:(MDINH@san):PRIMARY>

Check GoldenGate transactions to find 2 open transactions, one from Redo Seq: 196 and one from Redo Seq: 200

GGSCI (arrow.localdomain) 1> send esan, showtrans

Sending SHOWTRANS request to EXTRACT ESAN ...


Oldest redo log file necessary to restart Extract is:

Redo Log Sequence Number 196, RBA 4955152

------------------------------------------------------------
XID:                  3.29.673
Items:                1
Extract:              ESAN
Redo Thread:          1
Start Time:           2014-07-21:10:41:41
SCN:                  0.1652053 (1652053)
Redo Seq:             196
Redo RBA:             4955152
Status:               Running


------------------------------------------------------------
XID:                  4.20.516
Items:                883
Extract:              ESAN
Redo Thread:          1
Start Time:           2014-07-21:11:03:20
SCN:                  0.1654314 (1654314)
Redo Seq:             200
Redo RBA:             5136
Status:               Running


GGSCI (arrow.localdomain) 2>

Let’s kill the transaction by SOE user.

ARROW:(SYS@san):PRIMARY> @trans.sql

START_TIME           XID              STATUS          SID    SERIAL# USERNAME           STATUS   SCHEMANAME         SQLID              CHILD
-------------------- ---------------- -------- ---------- ---------- ------------------ -------- ------------------ ------------- ----------
07/21/14 10:41:39    3.29.673         INACTIVE        105          9 SOE                INACTIVE SOE                6cmmk52wfnr7r          0
07/21/14 11:03:19    4.20.516         INACTIVE         18         53 MDINH              INACTIVE MDINH              a5qywm8993bqg          0

ARROW:(SYS@san):PRIMARY> @xplan.sql
Enter value for sqlid: a5qywm8993bqg
Enter value for child: 0
SQL_ID  a5qywm8993bqg, child number 0
-------------------------------------
update SOE.INVENTORIES set QUANTITY_ON_HAND=QUANTITY_ON_HAND-10 where
PRODUCT_ID=170

Plan hash value: 1060265186

------------------------------------------------------------------------------------
| Id  | Operation         | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | UPDATE STATEMENT  |                |       |       |    28 (100)|          |
|   1 |  UPDATE           | INVENTORIES    |       |       |            |          |
|*  2 |   INDEX RANGE SCAN| INV_PRODUCT_IX |   900 | 12600 |     4   (0)| 00:00:01 |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("PRODUCT_ID"=170)


20 rows selected.

ARROW:(SYS@san):PRIMARY> alter system kill session '105,9' immediate;

System altered.

ARROW:(SYS@san):PRIMARY> @trans.sql

START_TIME           XID              STATUS          SID    SERIAL# USERNAME           STATUS   SCHEMANAME         SQLID              CHILD
-------------------- ---------------- -------- ---------- ---------- ------------------ -------- ------------------ ------------- ----------
07/21/14 11:03:19    4.20.516         INACTIVE         18         53 MDINH              INACTIVE MDINH              a5qywm8993bqg          0

ARROW:(SYS@san):PRIMARY>

Verify transaction from killed session is removed from GoldenGate

GGSCI (arrow.localdomain) 1> send esan, status

Sending STATUS request to EXTRACT ESAN ...


EXTRACT ESAN (PID 2556)
  Current status: Recovery complete: At EOF

  Current read position:
  Redo thread #: 1
  Sequence #: 200
  RBA: 464896
  Timestamp: 2014-07-21 11:06:40.000000
  SCN: 0.1654584
  Current write position:
  Sequence #: 7
  RBA: 1130
  Timestamp: 2014-07-21 11:06:37.435383
  Extract Trail: /u01/app/ggs01/dirdat/ss



GGSCI (arrow.localdomain) 2> send esan, showtrans

Sending SHOWTRANS request to EXTRACT ESAN ...


Oldest redo log file necessary to restart Extract is:

Redo Log Sequence Number 200, RBA 5136

------------------------------------------------------------
XID:                  4.20.516
Items:                883
Extract:              ESAN
Redo Thread:          1
Start Time:           2014-07-21:11:03:20
SCN:                  0.1654314 (1654314)
Redo Seq:             200
Redo RBA:             5136
Status:               Running


GGSCI (arrow.localdomain) 3>

-- trans.sql
set lines 200 pages 1000
col xid for a16
col username for a18
col schemaname for a18
col osuser for a12
select t.start_time, t.xidusn||'.'||t.xidslot||'.'||t.xidsqn xid, s.status,
s.sid,s.serial#,s.username,s.status,s.schemaname,
decode(s.sql_id,null,s.prev_sql_id) sqlid, decode(s.sql_child_number,null,s.prev_child_number) child
from v$transaction t, v$session s
where s.saddr = t.ses_addr
order by t.start_time
;

 

Categories: DBA Blogs

Changing Failgroup of ASM Disks in Exadata

Mon, 2014-07-21 08:12

There was a discrepancy in the failgroups of couple of ASM disks in Exadata. In Exadata, the cell name corresponds to the failgroup name. But there were couple of disks with different failgroup names. Using the following plan to rectify the issue online without any downtime:


1) Check disks and their failgroup:

col name format a27
col path format a45

SQL> select path,failgroup,mount_status,mode_status,header_status,state from v$asm_disk order by failgroup, path;

o/100.100.00.000/DBFSDG_CD_09_mycellnet0    mycellNET0             CACHED  ONLINE  MEMBER      NORMAL
o/100.100.00.000/DATA_CD_08_mycellnet0      mycell_NET0             CACHED  ONLINE  MEMBER      NORMAL

2) Drop Disks:

ALTER DISKGROUP DATA DROP DISK  DATA_CD_08_mycellnet0 REBALANCE POWER 32;

3) Wait for rebalanacing to finish

select * from gv$asm_operation;

4) Add the disks to the correct failgroups

ALTER DISKGROUP DATA ADD failgroup mycellNET0 DISK ‘o/100.100.00.000/DATA_CD_08_mycellnet0′ rebalance power 32;

– Wait for rebalance to complete.

5) select * from v$asm_operation;

6) Verify the incorrect failgroup has gone

select name,path,failgroup from v$asm_disk  where failgroup in (‘mycell_NET0′) order by name;

select path,failgroup,mount_status,mode_status,header_status,state from v$asm_disk order by failgroup, path;

Categories: DBA Blogs

Small Files on MapR-FS

Mon, 2014-07-21 08:11

One of the well-known best practices for HDFS is to store data in few large files, rather than a large number of small ones. There are a few problems related to using many small files but the ultimate HDFS killer is that the memory consumption on the name node is proportional to the number of files stored in the cluster and it doesn’t scale well when that number increases rapidly.

MapR has its own implementation of the Hadoop filesystem (called MapR-FS) and one of its claims to fame is to scale and work well with small files. In practice, though, there are a few things you should do to ensure that the performance of your map-reduce jobs does not degrade when they are dealing with too many small files, and I’d like to cover some of those.

The problem

I stumbled upon this when investigating the performance of a job in production that was taking several hours to run on a 40-node cluster. The cluster had spare capacity but the job was progressing very slowly and using only 3 of the 40 available nodes.

When I looked into the data that was being processed by the active mappers, I noticed that vast majority of the splits being read by the mappers were in blocks that were replicated into the same 3 cluster nodes. There was a significant data distribution skew towards those 3 nodes and since the map-reduce tasks prefer to execute on nodes where the data is local, the rest of the cluster sat idle while those 3 nodes were IO bound and processing heavily.

MapR-FS architecture

Differently from HDFS, MapR-FS doesn’t have name nodes. The file metadata is distributed across different data nodes instead. This is the key for getting rid of the name node memory limitation of HDFS, and let MapR-FS handle a lot more files, small or large, than a HDFS cluster.

Files in MapR-FS have, by default, blocks of 256MB. Blocks are organised in logical structures called “containers”. When a new block is created it is automatically assigned to one existing container within the volume that contains that file. The container determines the replication factor (3 by default) and the nodes where the replicas will be physically stored. Containers are bound to a MapR volume and cannot span multiple volumes.

There’s also a special container in MapR-FS called a “name container”, which is where the volume namespace and file chunk locations are stored. Besides the metadata, the name container always stores the first 64KB of the file’s data.

Also, there’s only a single name container per MaprFS volume. So the metadata for all the files in a volume, along with the files’ first 64KB of data, will be all stored in the same name container. The larger the number of files in a volume, the more data this container will be replicating across the same 3 cluster nodes (by default).

So, if your data set is comprised of a very large number of small files (with sizes around 64KB or less) and is all in the sae volume, most of the data will be stored in the same 3 cluster nodes, regardless of the cluster size. Even if you had a very large cluster, whenever you ran a map-reduce job to process those files, the job’s tasks would be pretty much allocated on only 3 nodes of the cluster due to data locality. Those 3 nodes would be under heavy load while the rest of the cluster would sit idle.

Real impact

To give you an idea of the dimension of this problem, the first time I noticed this in production was due to a Hive query that was causing high load only in 3 nodes of 40-node cluster. The job took 5 hours to complete. When I looked into the problem I found that the table used by the Hive query had tens of thousands of very small files, many of them smaller than 64K, due to the way the data was being ingested.

We coalesced the table to combine all those small files into a much smaller number of bigger ones. The job ran again after that, without any changes, and completed in just 15 minutes!! To be completely fair, we also changed the table’s file format from SequenceFile to RCFile at the same time we coalesced the data, which certainly brought some additional performance improvements. But, from the 3-node contention I saw during the first job run, I’m fairly convinced that the main issue in this case was the data distribution skew due to the large amount of small files.

Best practices

This kind of problem is mitigated when large files are used, since only a small fraction of the data (everything below the 64KB mark) will be stored in the name container, with the rest distributed across other containers and, therefore, other nodes. We’ll also have a smaller number of files (for a similar data volume), which reduces the problem even more.

If your data is ingested in a way that creates many small files, plan to coalesce those files into larger ones on a regular basis. One good tool for that is Edward Capriolo’s File Crusher. This is also (and especially) applicable to HDFS.

Best practice #1: Keep you data stored into large files. Pay special attention to incremental ingestion pipelines, which may create many small files, and coalesce them on a regular basis.

A quick and dirty workaround for the 3-node contention issue explained above would be to increase the replication factor for the name container. This would allow more nodes to run map-reduce tasks on that data. However, it would also use a lot more disk space just to achieve the additional data locality across a larger number of nodes. This is NOT an approach I would recommend to solve this particular problem.

Instead, the proper way to solve this in Mapr-FS is to split your data across different volumes. Especially if you’re dealing with a large number of small files that cannot be coalesced, splitting them across multiple volumes will keep the number of files per volume (and per name container) under control and it will also spread the small files’ data evenly across the cluster, since each volume will have its own name container, replicate across a different set of nodes.

The volumes may, or may not, follow your data lifecycle, with monthly, weekly or even daily volumes, depending on the amount of data being ingested and files being created.

Best practice #2: Use Mapr-FS volumes to plan your data distribution and keep the number of files per volume under control.

References:
  1. MapR Architecture Guide
Categories: DBA Blogs

Cloudera Challenge 2014

Mon, 2014-07-21 08:09

Yesterday, Cloudera released the score reports for their Data Science Challenge 2014 and I was really ecstatic when I received mine with a “PASS” score! This was a real challenge for me and I had to put a LOT of effort into it, but it paid off in the end!

Note: I won’t bother you in this blog post with the technical details of my submission. This is just an account of how I managed to accomplish it. If you want the technical details, you can look here.

Once upon a time… I was a DBA

I first learned about the challenge last year, when Cloudera ran it for the first time. I was intrigued, but after reading more about it I realised I didn’t have what it would be required to complete the task successfully.

At the time I was already delving into the Hadoop world, even though I was still happily working as an Oracle DBA at Pythian. I had studied the basics and the not-so-basics of Hadoop, and the associated fauna and had just passed my first Hadoop certifications (CCDH and CCAH). However, there was (and is) still so much to learn! I knew that to take the challenge I would have to invest a lot more time into my studies.

“Data Science” was still a fuzzy buzzword for me. It still is, but at the time, I had no idea about what was behind it. I remember reading this blog post about how to become a data scientist. A quick look at the map in that post turned me off: apart from the “Fundamentals” track in it, I had barely idea what the rest of the map was about! There was a lot of work to do to get there.

There’s no free lunch

But as I started reading more about Data Science, I started to realise how exciting it was and how interesting were the problems it could help tackle. By now I had already put my DBA career on hold and joined the Big Data team. I felt a huge gap between my expertise as a DBA and my skills as a Big Data engineer, so I put a lot of effort in studying the cool things I wanted to know more about.

The online courses at Coursera, Edx, Stanford and the like were a huge help and soon I started wading through courses and courses, sometime many at once: Scala, R, Python, more Scala, data analysis, machine learning, and more machine learning, etc… That was not easy and it was a steep learning curve for me. The more I read and studied I realised there was many times more to learn. And there still is…

The Medicare challenge

But when Cloudera announced the 2014 Challenge, early this year, I read the disclaimer and realised that this time I could understand it! Even though I had just scratched the surface of what Data Science is meant to encompass, I actually had tools to attempt tackling the challenge.

“Studies shall not stop!!!”, I soon found, as I had a lot more to learn to first pass the written exam (DS-200) and then tackle the problem proposed by the challenge: to detect fraudulent claims in the US Medicare system. It was a large undertaking but I took it one step at a time, and eventually managed to complete a coherent and comprehensive abstract to submit to Cloudera, which, as I gladly found yesterday, was good enough to give me a passing score and the “CCP: Data Scientist” certification from Cloudera!

I’m a (Big Data) Engineer

What’s next now? I have only one answer: Keep studying. There’s so much cool stuff to learn. From statistics (yes, statistics!) to machine learning, there’s still a lot I want to know about and that keeps driving me forward. I’m not turning into a Data Scientist, at least not for a while. I am an Engineer at heart; I like to fix and break things at work and Data Science is one more of those tools I want to have to make my job more interesting. But I want to know more about it and learn how to use it properly, at least to avoid my Data Scientist friends cringing away every time I tell tell I’m going to run an online logistic regression!

Categories: DBA Blogs

SQL Server 2014 Delayed Durability from an Application Perspective

Mon, 2014-07-21 08:06

The idea of this blog post is to describe what the delayed durability feature is in SQL Server 2014 and to describe a use case from an application development perspective.

With every new SQL Server release we get a bunch of new features and delayed durability of transactions really caught my attention. Most of the relational database engines are used to handle transactions with the write ahead log method(http://en.wikipedia.org/wiki/Write-ahead_logging), basically a transaction comes into the database, and in order to successfully commit a piece of information it will flush the pages from the memory, then write to the transaction log and finally to the datafile, always following a synchronous order, since the transaction log is pretty much a log of each transactions, recovery methods can even try to get the data from logs in case the data pages were never committed to the datafile, so as a summary this is a data protection method used to handle transactions, MSDN calls this a transaction with FULL DURABILITY.

So what is Delayed Transaction Durability?

To accomplish delayed durability in a transaction, asynchronous log writes happens from the buffers to the disk. Information is kept in memory until either the buffer is full or a flush takes place. This means instead of flushing from memory, then to log and then to datafile, the data will just wait in memory and the control of the transaction will be restored to the requestor app faster. If a transaction initially only hits memory and avoid going through the disk heads, it will for sure complete faster as well.

But when is the data really stored in disk?

SQL Server will handle this depending on how busy/full the memory is and will then execute asynchronous transactions to finally store the information in disk. You can always force this to happen with this stored procedure “sp_flush_log”.

Ok But there is a risk, right?

Correct, since the original data protection method is basically skipped, in the event of a system disruption such as SQL Server doing a failover or simply “unexpectedly shutting down”, some data can be lost in the so called LIMBO that is somewhere between the application pool and the network cable.

Why would I want to use this?

Microsoft recommends to use this feature only if you can tolerate a data loss, if you are experiencing a bottleneck or performance issue related to log writes or if your workload have a high contention rate(processes waiting for locks to be released.)

How do I Implement it?

To use delayed transactions you should enable this as a database property. You can used FORCED option which will try to handle all transactions as delayed durable, you can use ALLOWED which will let you use delayed durable transactions, which you then need to specify in your TSQL(this is called atomic block level control), see a sample taken from MSDN below:


CREATE PROCEDURE …
WITH NATIVE_COMPILATION, SCHEMABINDING, EXECUTE AS OWNER
AS BEGIN ATOMIC WITH
(
DELAYED_DURABILITY = ON,
TRANSACTION ISOLATION LEVEL = SNAPSHOT,
LANGUAGE = N'English'

)
END

For more syntax and details I invite you to check the so full of wisdom MSDN Library.

Enough of the background information, and let’s take this puppy for a ride, shall we?

Consider the following scenario: You manage a huge application, probably some application between an ERP and a Finance module. The company has developed this application from scratch, each year more and more features are added in this app. The company decides that they want to standardize procedures and want to have more control over the events of the application. They realize they do not have enough audit traces, if someone deletes data, if a new deal or customer information is inserted, management needs to have a track record of almost anything that happens. They have some level of logging, but is implemented differently depending on the developer taste and mood.

So, Mr MS Architect decides they will implement enterprise library logging block, and will handle both exceptions and custom logging with this tool. After adding all this logging to the events, the system begins to misbehave and the usual slow is now officially really slow. Mr Consultant then comes in and suggest that the logging data is moved to a separate database, also this database should use Delayed durability, by doing so, transactions related to logging events will have less contention and will return the control faster to the application, some level of data loss can be tolerated which also makes the decision even better.

Let’s build a proof of concept and test it..

You can find a sample project attached: WebFinalNew

You need to have enterprise library installed in your visual studio. For this sample I am using Visual Studio 2010.

You need to create 2 databases, DelayedDB and NormalDB (Of Course we need to use SQL Server 2014)

 

1

Use the attached script LoggingDatabase (which is part of the scripts of Enterprise library), it will create all the objects needed for the application log block.

 

In the DelayedDB, edit the properties and set the Delayed Durability to FORCED, this will make all transactions to have delayed durability(please note some transactions will never be delayed durable such as system transactions, cross-database transactions, and operations involving FileTable, Change Tracking and Change Data Capture)

2

You need to create a windows web project, it should have a web.config , if not you can manually add a configuration file:

3

 

Make sure you add all the application block references(Logging Block)

references

Now right click over the web.config or app.config file and edit your enterprise library configuration

4

 

In the database Settings block, add 2 new connections to your database(one for NormalDB and the other for DelayedDB), make sure to specify the connection in the form of a connection string like the picture below:

5

 

In the Logging block, create a new category called DelayedLogging, this will point to the database with delayed durability enabled.

6

Then add 2 database Trace listeners, configure General Category to point to “Database Trace Listener” and then configure DelayedLogging Category to point to “Database Trace Listener 2”. Configure each listener to point to the corresponding database(one to each database previously configured in the Database block)

7

 

Save all changes and go back to the application, configure the design layout with something like below

8

 

Add a codebehind to the button in the upper screen and build a code that will iterate and send X amount of commands to each database, track the time it takes to send the transaction and regain control of the application into a variable, check the attached project for more details, but use logwriter.write and pass as parameter the category you configured to connect to DelayedDB(DelayedLogging) and the general category(default, no parameter) to connect to NormalDB. See a sample of how a logging transaction is fired below:

 


logWriter.Write("This is a delayed transaction","DelayedLogging");
logWriter.Write("This is a transaction");

This code will call the logging block and execute a transaction on each database, the “normal” database and the durable one, it will also track milliseconds it takes to return the control to the application, additionally I will have performance monitor and query statistics from the database engine to see the difference in behavior.

 

Quick Test

Batch insert of 1000 rows, a normal database took 1 millisecond more in average per transaction to return the control to the application:9

 

What information we have from sys.dm_io_virtual_file_stats?

Database io_stall_read_ms num_of_writes num_of_bytes_written io_stall_write_ms io_stall size_on_disk_bytes DelayedDB 47 5126 13843456 4960 5007 1048576 Normal 87 5394 14492160 2661 2748 1048576

We can see that the same amount of data was sent to both databases(last column size_on_disk_bytes), interesting observation are the stalls, in a delayed durable database the stall will be higher for writing, this means despite the fact that the transaction is executed “faster”, what really means is that it returns the control to the application faster, but the time it takes to actually store the data to disk can be higher since is done in async mode.

 

Let’s see a quick graphic of the performance impact

Delayed Durability

10 11

With a Delayed Durability the disk queue length average is higher, since it will wait to fill the buffer and then execute the write. You can appreciate the yellow peak(within the red circle) after the transaction completes, it will execute pending writes( moment where I issue a “sp_flush_log”.).

 

Full Durability

1213

 

With a Full Durability the disk queue length average is lower, since it will sequentially execute the writes there will be less pending transactions in memory.

 

Conclusion

Delayed Durability feature is definitely a great addition to your DBA toolbelt, it needs to be used taking in consideration all the risks involved, but if properly tested and implemented it can definitely improve the performance  and architecture of certain applications. Is important to understand this is not a turbo button(like some people does with the nolock hint) and it should be used for certain types of transactions and tables. Will this change your design methods and make you plan for a separate delayed durable database? or plan to implement certain modules with delayed durable transactions? This for sure will have an interesting impact on software design and architecture.

Categories: DBA Blogs

Log Buffer #380, A Carnival of the Vanities for DBAs

Fri, 2014-07-18 07:48

There are plethora of ideas sprouting in technology arena on daily basis and bloggers are catching up with them pretty nicely. This Log Buffer Edition skims some of them.

Oracle:

Building Dynamic Branded Digital Experiences with Oracle WebCenter

ORA-19909: datafile 1 belongs to an orphan incarnation

Opatchauto for OEM Management Servers

Get more out of Product Information Management with PIM Training Resources

ADF Mobile 12c (MAF) Support for Master-Detail Data Control

SQL Server:

Convert Rows into Columns

Developing a Custom SSIS Source Component

Stairway to SQL Server Security Level 3: Principals and Securables

Automate Registering and Maintaining Servers in SQL Server Management Studio (SSMS)

Implementation of partition switching within SSIS

MySQL:

High Availability with mysqlnd_ms on Percona XtraDB Cluster

Oracle Critical Patch Update for MySQL

MySQL Slave Scaling and more

ClouSE 1.0 is generally available

Install Apache2, PHP5 And MySQL Support On CentOS 7 (LAMP)

Categories: DBA Blogs

Auditing Files in Linux

Tue, 2014-07-15 08:25

Stat command in Linux can be used to display a file or a file system status.

I came across an issue in RHEL4 where a file’s ‘Change time’ is far ahead than the ‘Modification time’ without a change in uid, gid and mode.

# stat /etc/php.ini
File: `/etc/php.ini'
Size: 45809 Blocks: 96 IO Block: 4096 regular file
Device: 6801h/26625d Inode: 704615 Links: 1
Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2014-06-25 13:22:15.000000000 -0400
Modify: 2012-10-01 13:21:41.000000000 -0400
Change: 2014-06-01 20:06:35.000000000 -0400 

To explain why this can be considered unusual, I will start by explaining the time values associated with a file:

  • Access (atime) – Time the file was last accessed. This involves syscalls like open(). For example, running cat command on the file would update this.
  • Modify    (mtime) – Time the file content was last modified. For example, if a file is edited and some content is added this value would change.
  • Change (ctime) – When any of the inode attributes in the file changes this value changes. Stat command would notice change if inode attributes except access time is changed. Following are the rest of the inode attributes – mode, uid, gid, size and modification time.

So ctime would get updated with mtime and file size would get updated with a mtime. So if a file’s ctime is changed from mtime without a change in mode, uid, and gid, the behaviour can be considered unexpected.

On checking the stat upstream (coreutils) source, I came across a known issue. Running chmod on a file without changing the file permissions can alter inode and cause the same behaviour. It is documented in TODO of coreutils upstream source.

Modify chmod so that it does not change an inode's st_ctime
when the selected operation would have no other effect.
First suggested by Hans Ecke  in

http://thread.gmane.org/gmane.comp.gnu.coreutils.bugs/2920

Discussed more recently on http://bugs.debian.org/497514.

This behaviour is not fixed in upstream.

Now we can assume that a process or user ran a chmod command which actually did not changed the attributes of php.ini. This would change ctime and not other attributes.

I can reproduce the same behaviour in my Fedora system as well.

For example,

# stat test
File: ‘test’
Size: 0             Blocks: 0          IO Block: 4096   regular empty file
Device: 803h/2051d    Inode: 397606      Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2014-07-14 16:26:10.996128678 +0530
Modify: 2014-07-14 16:26:10.996128678 +0530
Change: 2014-07-14 16:26:10.996128678 +0530
Birth: -
# chmod 644 test
# stat test
File: ‘test’
Size: 0             Blocks: 0          IO Block: 4096   regular empty file
Device: 803h/2051d    Inode: 397606      Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2014-07-14 16:26:10.996128678 +0530
Modify: 2014-07-14 16:26:10.996128678 +0530
Change: 2014-07-14 16:26:41.444377623 +0530 
Birth: -

But this is just an assumption. For getting a conclusive answer on what is causing this behaviour in this specific system, we would need to find what process is causing this.

auditd in linux can be used for watching a file and capturing audit records on that file to /var/log/audit/.

To watch the file, I edited /etc/audit.rules and added following.

-w /etc/php.ini

Then restarted auditd,

# service auditd start
Starting auditd:                                           [  OK  ]
# chkconfig auditd on

Running a cat command on the php.ini file would give following logs.

type=SYSCALL msg=audit(1404006436.500:12): arch=40000003 syscall=5 success=yes exit=3 a0=bff88c10 a1=8000 a2=0 a3=8000 items=1 pid=19905 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0
egid=0 sgid=0 fsgid=0 comm="cat" exe="/bin/cat"
type=FS_WATCH msg=audit(1404006436.500:12): watch_inode=704615 watch="php.ini" filterkey= perm=0 perm_mask=4
type=FS_INODE msg=audit(1404006436.500:12): inode=704615 inode_uid=0 inode_gid=0 inode_dev=68:01 inode_rdev=00:00
type=CWD msg=audit(1404006436.500:12):  cwd="/root"
type=PATH msg=audit(1404006436.500:12): name="/etc/php.ini" flags=101 inode=704615 dev=68:01 mode=0100644 ouid=0 ogid=0 rdev=00:00

ausearch command is available for searching through the audit logs. Following command would display the audit entries from 6th July related to /etc/php.ini file.

# ausearch -ts 7/6/2014 -f /etc/php.ini | less

When I noticed the ctime changed again, I ran ausearch. I saw multiple events on the file. Most of the access are from syscall=5, which is the open system call.

Following entries seem to be pointing to the culprit. You can see that the system call is 271.

type=SYSCALL msg=audit(1404691594.175:37405): arch=40000003 syscall=271 success=yes exit=0 a0=bff
09b00 a1=bff07b00 a2=7beff4 a3=bff0a1a0 items=1 pid=9830 auid=4294967295 uid=0 gid=0 euid=0 suid=
0 fsuid=0 egid=0 sgid=0 fsgid=0 comm="bpbkar" exe="/usr/openv/netbackup/bin/bpbkar"
type=FS_WATCH msg=audit(1404691594.175:37405): watch_inode=704615 watch="php.ini" filterkey= perm
=0 perm_mask=2
type=FS_INODE msg=audit(1404691594.175:37405): inode=704615 inode_uid=0 inode_gid=0 inode_dev=68:
01 inode_rdev=00:00
type=FS_WATCH msg=audit(1404691594.175:37405): watch_inode=704615 watch="php.ini" filterkey= perm
=0 perm_mask=2
type=FS_INODE msg=audit(1404691594.175:37405): inode=704615 inode_uid=0 inode_gid=0 inode_dev=68:
01 inode_rdev=00:00
type=CWD msg=audit(1404691594.175:37405):  cwd="/etc"
type=PATH msg=audit(1404691594.175:37405): name="/etc/php.ini" flags=1 inode=704615 dev=68:01 mod
e=0100644 ouid=0 ogid=0 rdev=00:00

Using ausearch you can search based on system calls also. You can see that there is only one record with system call number 271. Another advantage of ausearch is that it would convert the time stamps to human readable form.

# ausearch -ts 7/6/2014 -sc 271 -f /etc/php.ini 

You can see time in the start of each block of search outputs.

----
time->Sun Jul  6 20:06:34 2014
type=PATH msg=audit(1404691594.175:37405): name="/etc/php.ini" flags=1 inode=704615 dev=68:01 mod
e=0100644 ouid=0 ogid=0 rdev=00:00
type=CWD msg=audit(1404691594.175:37405):  cwd="/etc"
type=FS_INODE msg=audit(1404691594.175:37405): inode=704615 inode_uid=0 inode_gid=0 inode_dev=68:
01 inode_rdev=00:00
type=FS_WATCH msg=audit(1404691594.175:37405): watch_inode=704615 watch="php.ini" filterkey= perm
=0 perm_mask=2
type=FS_INODE msg=audit(1404691594.175:37405): inode=704615 inode_uid=0 inode_gid=0 inode_dev=68:
01 inode_rdev=00:00
type=FS_WATCH msg=audit(1404691594.175:37405): watch_inode=704615 watch="php.ini" filterkey= perm
=0 perm_mask=2
type=SYSCALL msg=audit(1404691594.175:37405): arch=40000003 syscall=271 success=yes exit=0 a0=bff
09b00 a1=bff07b00 a2=7beff4 a3=bff0a1a0 items=1 pid=9830 auid=4294967295 uid=0 gid=0 euid=0 suid=
0 fsuid=0 egid=0 sgid=0 fsgid=0 comm="bpbkar" exe="/usr/openv/netbackup/bin/bpbkar"

The time stamps matches.

# stat /etc/php.ini
File: `/etc/php.ini'
Size: 45809         Blocks: 96         IO Block: 4096   regular file
Device: 6801h/26625d    Inode: 704615      Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2014-07-07 01:06:47.000000000 -0400
Modify: 2012-10-01 13:21:41.000000000 -0400
Change: 2014-07-06 20:06:34.000000000 -0400

From RHEL4 kernel source code we can see that syscall 271 is utimes.

# cat ./include/asm-i386/unistd.h |grep 271
#define __NR_utimes        271

utimes is a legacy syscall that can change a file’s last access and modification times. utimes is later deprecated and replaced with utime from RHEL5.

netbackup process bpbkar is doing a utimes syscall on the file, possibly modifying the mtime to the already existing time resulting in the change.

This example shows us the power of Linux Auditing System. Auditing is a kernel feature which provides interface to daemons like auidtd to capture events related to system and user space processes and log it.

Categories: DBA Blogs

Lock Timeout Error While Creating Database

Mon, 2014-07-14 08:07

Recently I worked on a issue where a third-party application was failing during the installation. Below was the error returned by the Application.

Lock request time out period exceeded. (Microsoft SQL Server, Error: 1222)

The application was failing while it was trying to create a database. The application seemed to have a default timeout setting which was about 60 seconds after which it was failing, as the command had not yet returned any results.

I tried creating a test database directly from the SQL Server Management Studio and noticed that it was taking long time as well. Once I checked the sys.sysprocesses, I found that the created database was having IO related waits.

Some of the reasons why it might be taking more time while creating the database are

  • IO bottleneck on the drive where we are creating the database files
  • Large size of Model database
  • Instant File Initialization is not enabled

I verified the size of the model database files and found that the model database data file is 5 GB and log file is 1 GB. I have reduced the size of the model database to 1 GB and log file size to 512 MB and then I was able to create the Test database quickly.

Now we started the installation of the Application and it completed successfully as well.

Categories: DBA Blogs

Script to Get the Create Date of an Object from Multiple Databases

Mon, 2014-07-14 08:04

As a DBA, it is common to get a request to run scripts against production databases. However, there can be environments where there are multiple databases on same instance, where the script needs to be run against. I have seen a environment where there were 50+ databases which have same schema with different data, and each database serving different customers.

When we get a request to run large scripts against many databases, at times with user over sight, it may be possible to miss running the script against one or more databases. The requirement comes to verify if the script was executed against all the databases. One way to verify if the script is executed against all databases is to pick an object (stored procedure, table, view, function) which was created as part of script execution, get the create date of that object and verify if it is showing the date and time when we ran the script. The challenge is to get the create date of a specific object from all databases at a time with little work.

Below is the code which will help in fetching the create date of the specified object (Stored Procedure, Table, View, Function) from all user databases on a instance. Pass the object name @ObjName in the 7th line of the code. Run the code and verify the create date from the output and make sure that the script was executed and created the object during the time the script was run.

-- Script Originally Written By: Keerthi Deep | http://www.SQLServerF1.com/

Set NOCOUNT ON
Declare @ObjName nvarchar(300)
declare @dbn nvarchar(200)

Set @ObjName = 'Object_Name' -- Specify the name of the Stored Procedure/ Table/View/Function

create table #DatabaseList(dbname nvarchar(2000)) 

Insert into #DatabaseList select name from sys.sysdatabases
where name not in ('master', 'msdb', 'model','tempdb')
order by name asc 

--select * from #DatabaseList
Create table #Output_table (DB nvarchar(200), crdate datetime, ObjectName nvarchar(200))
declare c1 cursor for select dbname from #DatabaseList open c1
Fetch next from c1 into @dbn
WHILE @@FETCH_STATUS = 0
BEGIN

declare @Query nvarchar(2048)
Set @Query = 'select ''' + @dbn + ''' as DBName, crdate, [name] from ' + @dbn + '.sys.sysobjects where name = ''' + @ObjName + ''''
--print @Query
Insert into #Output_table Exec sp_executesql @Query

FETCH NEXT FROM c1 into @dbn
END
CLOSE c1
DEALLOCATE c1

select * from #Output_table
Drop table #Output_table
Drop table #DatabaseList

Limitations:
This will work only if the object is created using create command, but will not work if Alter command is used.

Any suggestions are welcome.

Categories: DBA Blogs

A Ringleader Proxy for Sporadically-Used Web Applications

Fri, 2014-07-11 08:46

As you might already know, I come up with my fair share of toy web applications.

Once created, I typically throw them on my server for a few weeks but, as the resources of good ol’ Gilgamesh are limited, they eventually have to be turned off to make room for the next wave of shiny new toys. Which is a darn shame, as some of them can be useful from time to time. Sure, running all webapps all the time would be murder for the machine, but there should be a way to only fire up the application when it’s needed.

Of course there’s already a way of doing just that. You might have heard of it: it’s called CGI. And while it’s perfectly possible to run PSGI applications under CGI, it’s also… not quite perfect. The principal problem is that since there is no persistence at all between requests (of course, with the help of mod_perl there could be persistence, but that would defeat the purpose), so it’s not exactly snappy. Although, to be fair, it’d probably be still fast enough for most small applications. But still, it feels clunky. Plus, I’m just plain afraid that if I revert to using CGI, Sawyer will burst out of the wall like a vengeful Kool-Aid Man and throttle the life out of me. He probably wouldn’t, but I prefer not to take any chances.

So I don’t want single executions and I don’t want perpetual running. What I’d really want is something in-between. I’d like the applications to be disabled by default, but if a request comes along, to be awaken and ran for as long as there is traffic. And only once the traffic has abated for a reasonable amount of time do I want the application to be turned off once more.

The good news is that it seems that Apache’s mod_fastcgi can fire dynamic applications upon first request. If that’s the case, then the waking-up part of the job comes for free, and the shutting down is merely a question of periodically monitoring the logs and killing processes when inactivity is detected.

The bad news is that I only heard that after I was already halfway done shaving that yak my own way. So instead of cruelly dropping the poor creature right there and then, abandoning it with a punk-like half-shave, I decided to go all the way and see how a Perl alternative would look.

It’s all about the proxy

My first instinct was to go with Dancer (natch). But a quick survey of the tools available revealed something even more finely tuned to the task at hand: HTTP::Proxy. That module does exactly what it says on the tin: it proxies http requests, and allows you to fiddle with the requests and responses as they fly back and forth.

Since I own my domain, all my applications run on their own sub-domain name. With that setting, it’s quite easy to have all my sub-domains point to the port running that proxy and have the waking-up-if-required and dispatch to the real application done as the request comes in.


use HTTP::Proxy;
use HTTP::Proxy::HeaderFilter::simple;

my $proxy = HTTP::Proxy->new( port => 3000 );

my $wait_time = 5;
my $shutdown_delay = 10;

my %services = (
    'foo.babyl.ca' => $foo_config,
    'bar.babyl.ca' => $bar_config,

);

$proxy->push_filter( request => 
    HTTP::Proxy::HeaderFilter::simple->new( sub {

            my( $self, $headers, $request ) = @_;

            my $uri = $request->uri;
            my $host = $uri->host;

            my $service = $services{ $host } or die;

            $uri->host( 'localhost' );
            $uri->port( $service->port );

            unless ( $service->is_running ) {
                $service->start;
                sleep 1;
            }

            # store the latest access time
            $service->store_access_time(time);
    }),
);

$proxy->start;

With this, we already have the core of our application, and only need a few more pieces, and details to iron out.

Enter Sandman

An important one is how to detect if an application is running, and when it goes inactive. For that I went for a simple mechanism. Using CHI to provides me with a persistent and central place to keep information for my application. As soon as an application comes up, I store the time of the current request in its cache, and each time a new request comes in, I update the cache with the new time. That way, the existence of the cache tells me if the application is running, and knowing if the application should go dormant is just a question of seeing if the last access time is old enough.


use CHI;

# not a good cache driver for the real system
# but for testing it'll do
my $chi = CHI->new(
    driver => 'File',
    root_dir => 'cache',
);

...;

# when checking if the host is running
unless ( $chi->get($host) ) {
    $service->start;
    sleep 1;
}

...;

# and storing the access time becomes
$chi->set( $host => time );

# to check periodically, we fork a sub-process 
# and we simply endlessly sleep, check, then sleep
# some more

sub start_sandman {
    return if fork;

    while( sleep $shutdown_delay ) {
        check_activity_for( $_ ) for keys %services;
    }
}

sub check_activity_for {
    my $s = shift;

    my $time = $chi->get($s);

    # no cache? assume not running
    return if !$time or time - $time <= $shutdown_delay;

    $services{$s}->stop;

    $chi->remove($s);
}

Minding the applications

The final remaining big piece of the puzzle is how to manage the launching and shutting down of the applications. We could do it in a variety of ways, beginning by using plain system calls. Instead, I decided to leverage the service manager Ubic. With the help of Ubic::Service::Plack, setting a PSGI application is as straightforward as one could wish for:


use Ubic::Service::Plack;

Ubic::Service::Plack->new({
    server => "FCGI",
    server_args => { listen => "/tmp/foo_app.sock",
                     nproc  => 5 },
    app      => "/home/web/apps/foo/bin/app.pl",
    port     => 4444,
});

Once the service is defined, it can be started/stopped from the CLI. And, which is more interesting for us, straight from Perl-land:


use Ubic;

my %services = (
    # sub-domain      # ubic service name
    'foo.babyl.ca' => 'webapp.foo',
    'bar.babyl.ca' => 'webapp.bar',
);

$_ = Ubic->service($_) for values %services;

# and then to start a service
$services{'foo.babyl.ca'}->start;

# or to stop it
$services{'foo.babyl.ca'}->stop;

# other goodies can be gleaned too, like the port...
$services{'foo.babyl.ca'}->port;

Now all together

And that’s all we need to get our ringleader going. Putting it all together, and tidying it up a little bit, we get:


use 5.20.0;

use experimental 'postderef';

use HTTP::Proxy;
use HTTP::Proxy::HeaderFilter::simple;

use Ubic;

use CHI;

my $proxy = HTTP::Proxy->new( port => 3000 );

my $wait_time      = 5;
my $shutdown_delay = 10;

my $ubic_directory = '/Users/champoux/ubic';

my %services = (
    'foo.babyl.ca' => 'webapp.foo',
);

$_ = Ubic->service($_) for values %services;

# not a good cache driver for the real system
# but for testing it'll do
my $chi = CHI->new(
    driver => 'File',
    root_dir => 'cache',
);


$proxy->push_filter( request => HTTP::Proxy::HeaderFilter::simple->new(sub{
            my( $self, $headers, $request ) = @_;
            my $uri = $request->uri;
            my $host = $uri->host;

            my $service = $services{ $host } or die;

            $uri->host( 'localhost' );
            $uri->port( $service->port );

            unless ( $chi->get($host) ) {
                $service->start;
                sleep 1;
            }

            # always store the latest access time
            $chi->set( $host => time );
    }),
);

start_sandman();

$proxy->start;

sub start_sandman {
    return if fork;

    while( sleep $shutdown_delay ) {
        check_activity_for( $_ ) for keys %services;
    }
}

sub check_activity_for {
    my $service = shift;

    my $time = $chi->get($service);

    # no cache? assume not running
    return if !$time or time - $time <= $shutdown_delay;

    $services{$service}->stop;

    $chi->remove($service);
}

It’s not yet completed. The configuration should go in a YAML file, we should have some more safeguards in case the cache and the real state of the application aren’t in sync, and the script itself should be started by Unic too to make everything Circle-of-Life-perfect. Buuuuut as it is, I’d say it’s already a decent start.

Categories: DBA Blogs

Log Buffer #379, A Carnival of the Vanities for DBAs

Fri, 2014-07-11 07:34

During this summer time in Northern hemisphere, and winter time in Southern hemisphere, the bloggers are solving key problems either by sitting besides the bonfire, or enjoying that bbq. This Log Buffer Edition shares both of these with them.


Oracle:

3 Key Problems To Solve If You Want A Big Data Management System

OpenWorld Update: Content Catalog NOW LIVE!

Interested in Showcasing your Solutions around Oracle Technologies at Oracle OpenWorld?

GoldenGate and Oracle Data Integrator – A Perfect Match in 12c… Part 4: Start Journalizing!

What You Need to Know about OBIEE 11.1.1.7

SQL Server:

Interoperability between Microsoft and SOA Suite 12c

This article describes a way to speed up various file operations performed by SQL Server.

The Mindset of the Enterprise DBA: Creating and Applying Standards to Our Work

Stairway to T-SQL: Beyond The Basics Level 8: Coding Shortcuts using += and -= Operators

Microsoft Azure Diagnostics Part 2: Basic Configuration of Azure Cloud Service Diagnostics

MySQL:

MySQL Enterprise Monitor 2.3.18 has been released

Harnessing the power of master/slave clusters to operate data-driven businesses on MySQL

NoSQL Now! Conference – coming to San Jose, CA this August!

Manually Switch Slaves to new Masters in mySQL 5.6 (XTRADB 5.6)

How to Configure ClusterControl to run on nginx

Categories: DBA Blogs

Comparing CPU Throughput of Azure and AWS EC2

Thu, 2014-07-10 08:11

After observing CPU core sharing with Amazon Web Services EC2, I thought it would be interesting to see if Microsoft Azure platform exhibits the same behavior.

Signing up for Azure’s 30-day trial gives $200 in credit to use over the next 30-day period: more than enough for this kind of testing. Creating a new virtual machine, using the “quick create” option with Oracle Linux, and choosing a 4-core “A3″ standard instance.

I must say I like the machine naming into built-in “clouadpp.net” DNS that Azure uses: no mucking around with IP addresses. The VM provisioning definitely takes longer than AWS, though no more than a few minutes. And speaking of IP addresses, both start with 191.236. addresses assigned to Microsoft’s Brazilian subsidiary through the Latin American LACNIC registry, due to the lack of north american IP addresses.

Checking out the CPU specs as reported to the OS:

[azureuser@marc-cpu ~]$ egrep '(processor|model name|cpu MHz|physical id|siblings|core id|cpu cores)' /proc/cpuinfo
processor       : 0
model name      : Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz
cpu MHz         : 2199.990
physical id     : 0
siblings        : 4
core id         : 0
cpu cores       : 4
processor       : 1
model name      : Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz
cpu MHz         : 2199.990
physical id     : 0
siblings        : 4
core id         : 1
cpu cores       : 4
processor       : 2
model name      : Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz
cpu MHz         : 2199.990
physical id     : 0
siblings        : 4
core id         : 2
cpu cores       : 4
processor       : 3
model name      : Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz
cpu MHz         : 2199.990
physical id     : 0
siblings        : 4
core id         : 3
cpu cores       : 4

2.2GHz rather than 2.6GHz, but otherwise the same family and architecture as the E5-2670 under AWS. Identified as a single-socket, 4-core processor, without hyperthreads at all.

Running the tests
[azureuser@marc-cpu ~]$ taskset -pc 0 $$
pid 1588's current affinity list: 0-3
pid 1588's new affinity list: 0
[azureuser@marc-cpu ~]$ dd if=/dev/zero bs=1M count=2070 2> >(grep bytes >&2 ) | gzip -c > /dev/null
2170552320 bytes (2.2 GB) copied, 36.9319 s, 58.8 MB/s
[azureuser@marc-cpu ~]$ for i in {1..2}; do (dd if=/dev/zero bs=1M count=2070 2> >(grep bytes >&2 ) | gzip -c > /dev/null &) done
2170552320 bytes (2.2 GB) copied, 72.8379 s, 29.8 MB/s
2170552320 bytes (2.2 GB) copied, 73.6173 s, 29.5 MB/s

Pretty low; that’s half the throughput we saw on AWS, albeit with a slower clock speed here.

[azureuser@marc-cpu ~]$ taskset -pc 0,1 $$
pid 1588's current affinity list: 0
pid 1588's new affinity list: 0,1
[azureuser@marc-cpu ~]$ for i in {1..2}; do (dd if=/dev/zero bs=1M count=2070 2> >(grep bytes >&2 ) | gzip -c > /dev/null &) done
[azureuser@marc-cpu ~]$ 2170552320 bytes (2.2 GB) copied, 36.4285 s, 59.6 MB/s
2170552320 bytes (2.2 GB) copied, 36.7957 s, 59.0 MB/s

[azureuser@marc-cpu ~]$ taskset -pc 0,2 $$
pid 1588's current affinity list: 0,1
pid 1588's new affinity list: 0,2
[azureuser@marc-cpu ~]$ for i in {1..2}; do (dd if=/dev/zero bs=1M count=2070 2> >(grep bytes >&2 ) | gzip -c > /dev/null &) done
[azureuser@marc-cpu ~]$ 2170552320 bytes (2.2 GB) copied, 36.3998 s, 59.6 MB/s
2170552320 bytes (2.2 GB) copied, 36.776 s, 59.0 MB/s

Pretty consistent results, so no core sharing, but running considerably slower than we saw with AWS.

Kicking off 20 runs in a rows:

[azureuser@marc-cpu ~]$ taskset -pc 0-3 $$
pid 1588's current affinity list: 0,2
pid 1588's new affinity list: 0-3
[azureuser@marc-cpu ~]$ for run in {1..20}; do
>  for i in {1..2}; do dd if=/dev/zero bs=1M count=2070 2>> output | gzip -c > /dev/null & done
> wait
> done
...
[azureuser@marc-cpu ~]$ cat output | awk '/copied/ {print $8}' | sort | uniq -c
      1 59.1
      4 59.2
      1 59.3
      2 59.4
      2 59.5
      8 59.6
     12 59.7
      7 59.8
      3 59.9

We get very consistent results, between 59.1 and 59.9 mB/sec

Results from “top” while running:

cat > ~/.toprc <<-EOF
RCfile for "top with windows"           # shameless braggin'
Id:a, Mode_altscr=0, Mode_irixps=1, Delay_time=3.000, Curwin=0
Def     fieldscur=AEHIOQTWKNMbcdfgjplrsuvyzX
        winflags=25913, sortindx=10, maxtasks=2
        summclr=1, msgsclr=1, headclr=3, taskclr=1
Job     fieldscur=ABcefgjlrstuvyzMKNHIWOPQDX
        winflags=62777, sortindx=0, maxtasks=0
        summclr=6, msgsclr=6, headclr=7, taskclr=6
Mem     fieldscur=ANOPQRSTUVbcdefgjlmyzWHIKX
        winflags=62777, sortindx=13, maxtasks=0
        summclr=5, msgsclr=5, headclr=4, taskclr=5
Usr     fieldscur=ABDECGfhijlopqrstuvyzMKNWX
        winflags=62777, sortindx=4, maxtasks=0
        summclr=3, msgsclr=3, headclr=2, taskclr=3
EOF
[azureuser@marc-cpu ~]$  top -b -n20 -U azureuser
...
top - 14:38:41 up 2 min,  2 users,  load average: 2.27, 0.78, 0.28
Tasks: 205 total,   3 running, 202 sleeping,   0 stopped,   0 zombie
Cpu0  : 95.4%us,  4.6%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu1  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu2  : 94.4%us,  5.1%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.5%si,  0.0%st
Cpu3  :  0.0%us,  0.3%sy,  0.0%ni, 99.7%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 1606 azureuse  20   0  4292  800  400 R 97.0  0.0   0:03.49 gzip
 1604 azureuse  20   0  4292  796  400 R 96.7  0.0   0:03.50 gzip

top - 14:38:44 up 2 min,  2 users,  load average: 2.25, 0.80, 0.29
Tasks: 205 total,   3 running, 202 sleeping,   0 stopped,   0 zombie
Cpu0  : 94.4%us,  5.1%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.5%si,  0.0%st
Cpu1  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu2  : 72.3%us,  3.9%sy,  0.0%ni, 23.4%id,  0.0%wa,  0.0%hi,  0.4%si,  0.0%st
Cpu3  : 12.0%us,  0.7%sy,  0.0%ni, 85.6%id,  1.4%wa,  0.0%hi,  0.4%si,  0.0%st

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 1604 azureuse  20   0  4292  796  400 R 96.8  0.0   0:06.42 gzip
 1606 azureuse  20   0  4292  800  400 R 96.4  0.0   0:06.40 gzip

top - 14:38:47 up 2 min,  2 users,  load average: 2.25, 0.80, 0.29
Tasks: 205 total,   3 running, 202 sleeping,   0 stopped,   0 zombie
Cpu0  : 94.9%us,  5.1%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu1  :  9.7%us,  0.3%sy,  0.0%ni, 89.7%id,  0.0%wa,  0.0%hi,  0.3%si,  0.0%st
Cpu2  : 51.8%us,  2.8%sy,  0.0%ni, 45.4%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu3  : 17.9%us,  1.4%sy,  0.0%ni, 80.6%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 1604 azureuse  20   0  4292  796  400 R 96.5  0.0   0:09.34 gzip
 1606 azureuse  20   0  4292  800  400 R 95.5  0.0   0:09.29 gzip

It’s using full CPUs and all from gzip, so no large system overhead here. Also, “%st”, time reported “stolen” by the hypervisor, is zero. We’re simply getting half the throughput of AWS.

Basic instances

In addition to standard instances, Microsoft makes available basic instances, which claim to offer “similar machine configurations as the Standard tier of instances offered today (Extra Small [A0] to Extra Large [A4]). These instances will cost up to 27% less than the corresponding instances in use today (which will now be called “Standard”) and do not include load balancing or auto-scaling, which are included in Standard” (http://azure.microsoft.com/blog/2014/03/31/microsoft-azure-innovation-quality-and-price/)

Having a look at throughput here, by creating a basic A3 instance “marc-cpu-basic” that otherwise matches exactly marc-cpu created earlier.

[azureuser@marc-cpu-basic ~]$ egrep '(processor|model name|cpu MHz|physical id|siblings|core id|cpu cores)' /proc/cpuinfo
processor       : 0
model name      : Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz
cpu MHz         : 2199.993
physical id     : 0
siblings        : 4
core id         : 0
cpu cores       : 4
processor       : 1
model name      : Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz
cpu MHz         : 2199.993
physical id     : 0
siblings        : 4
core id         : 1
cpu cores       : 4
processor       : 2
model name      : Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz
cpu MHz         : 2199.993
physical id     : 0
siblings        : 4
core id         : 2
cpu cores       : 4
processor       : 3
model name      : Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz
cpu MHz         : 2199.993
physical id     : 0
siblings        : 4
core id         : 3
cpu cores       : 4

CPU specs are identical to marc-cpu. Running the same tests:

[azureuser@marc-cpu-basic ~]$ taskset -pc 0 $$
pid 1566's current affinity list: 0-3
pid 1566's new affinity list: 0
[azureuser@marc-cpu-basic ~]$  dd if=/dev/zero bs=1M count=2070 2> >(grep bytes >&2 ) | gzip -c > /dev/null
2170552320 bytes (2.2 GB) copied, 54.6678 s, 39.7 MB/s
for i in {1..2}; do (dd if=/dev/zero bs=1M count=2070 2> >(grep bytes >&2 ) | gzip -c > /dev/null &) done
[azureuser@marc-cpu-basic ~]$ for i in {1..2}; do (dd if=/dev/zero bs=1M count=2070 2> >(grep bytes >&2 ) | gzip -c > /dev/null &) done
2170552320 bytes (2.2 GB) copied, 107.73 s, 20.1 MB/s
2170552320 bytes (2.2 GB) copied, 107.846 s, 20.1 MB/s

Now that’s very slow: even with the identical stated CPU specs as marc-cpu, marc-cpu-basic comes in with 33% less throughput.

Doing 20 runs in a rows:

[azureuser@marc-cpu-basic ~]$ taskset -pc 0-3 $$
pid 1566's current affinity list: 0
pid 1566's new affinity list: 0-3
[azureuser@marc-cpu-basic ~]$ for run in {1..20}; do
> for i in {1..2}; do dd if=/dev/zero bs=1M count=2070 2>> output | gzip -c > /dev/null & done
> wait
> done
...
[azureuser@marc-cpu-basic ~]$ cat output | awk '/copied/ {print $8}' | sort | uniq -c
      4 40.4
     15 40.5
     14 40.6
      7 40.7

Very consistent results, but consistently slow. They do show that cores aren’t being shared, but throughput is lower than even a shared core under AWS.

Wrapping up Comparison chart

Under this simple gzip test, we are testing CPU integer performance. The Azure standard instance got half the throughput of the equivalent AWS instance, in spite of a clock speed only 15% slower. But the throughput was consistent: no drops when running on adjacent cores. The basic instance was a further 33% slower than a standard instance, in spite of having the same CPU configuration.

Under Azure, we simply aren’t getting a full physical core’s worth of throughput. Perhaps the hypervisor is capping throughput, and capping even lower for basic instances? Or maybe the actual CPU is different than the E5-2660 reported? For integer CPU-bound workloads like our gzip test, we would need to purchase at least twice as much capacity under Azure than AWS, making Azure considerably more expensive as a platform.

Categories: DBA Blogs

Making it Easier to Graph Your Infrastructure’s Performance Data

Tue, 2014-07-08 07:46

Today I would like to share a story with you about the development of a Puppet module for Grafana and its release on the Puppet Forge. But first, I’d like to provide some context so you can understand where Grafana fits in and why I feel this is important.

Those of you that know me have likely heard me talk about the importance of data-driven decision making, and more specifically some of the tools that can be used to help enable individuals to make smart decisions about their IT infrastructure. A common approach is to deploy a graphing system such as Graphite, which stores performance data about your infrastructure to aide you in performing a number of functions including problem diagnosis, performance trending, capacity planning, and data analytics.

If you are unfamiliar with the software, I’ll briefly describe its architecture. Graphite consists of a daemon, called carbon, which listens for time series data and writes it to a fixed-size database called whisper. It also provides a web application to expose the data and allow the user to create and display graphs on demand using a powerful API.

While Graphite does a good job of storing time series data and providing a rich API for visualizing it, one of the things it does not really focus on is providing a dashboard for the data. Thankfully we have Grafana to fill this role and it happens to do it quite well.

If you have ever worked with the ELK stack (Elasticsearch, Logstash, and Kibana) before, Grafana’s interface should be familiar to you, as it is based on Kibana. It is a frontend for Graphite or InfluxDB, and runs as a client side application in your browser. Its only (optional) external dependency is Elasticsearch, as it can use it to store, load and search for dashboards.

Below are some of Grafana’s most notable features (see its feature highlights for a more comprehensive list):

  • Dashboard search
  • Templated dashboards
  • Save / load from Elasticsearch and / or JSON file
  • Quickly add functions (search, typeahead)
  • Direct link to Graphite function documentation
  • Graph annotation
  • Multiple Graphite or InfluxDB data sources
  • Ability to switch between data sources
  • Show graphs from different data sources on the same dashboard

We like to make use of IT automation software whenever possible to deploy tools for our clients. Most tools already have Puppet modules or Chef cookbooks available for them, including the other components of the graphing system: Graphite itself, and a great Python-based collector named Diamond. Grafana, however, had no Puppet module available so I decided to rectify the situation by creating one and publishing it to the Puppet Forge.

The module would be pretty simple: all that is required is to download and extract Grafana into an installation directory, and ensure appropriate values for the Elasticsearch, Graphite and InfluxDB servers / data sources are inserted into its configuration.

I decided to offload the work of downloading and extracting the software to another module, namely gini/archive. And managing the configuration file, config.js, would be done with a combination of module parameters and ERB template.

The only real complication arose when it came time to test serving Grafana with a web server such as Apache or Nginx. I decided not to have my module manage the web server in any way, so I would leverage Puppet Labs’ own Apache module for this purpose.

My test environment consisted of a CentOS virtual machine provisioned by Vagrant and Puppet, with Graphite and Grafana on the same server. I decided to use Daniel Werdermann’s module to deploy Graphite on my virtual machine as it had worked well for me in the past.

I quickly ran into problems with duplicate resources, however, due to the Graphite module managing Apache for creation of its virtual host etc. I moved to separate virtual machines for Graphite and Grafana, and that made my life easier. If you do decide to run both pieces of software on the same server, and are also using Daniel’s module, you can work around the problem by setting gr_web_server to ‘none’ like this:

class { 'graphite':
  gr_web_server			=> 'none',
  gr_web_cors_allow_from_all	=> true,
}

Since my module does not manage Apache (or Nginx), it is necessary to add something like the following to your node’s manifest to create a virtual host for Grafana:

# Grafana is to be served by Apache
class { 'apache':
  default_vhost   => false,
}

# Create Apache virtual host
apache::vhost { 'grafana.example.com':
  servername      => 'grafana.example.com',
  port            => 80,
  docroot         => '/opt/grafana',
  error_log_file  => 'grafana-error.log',
  access_log_file => 'grafana-access.log',
  directories     => [
    {
      path            => '/opt/grafana',
      options         => [ 'None' ],
      allow           => 'from All',
      allow_override  => [ 'None' ],
      order           => 'Allow,Deny',
    }
  ]
}

And the Grafana declaration itself:

class { 'grafana':
  elasticsearch_host  => 'elasticsearch.example.com',
  graphite_host       => 'graphite.example.com',
}

Now that my module was working, it was time to publish it to the Puppet Forge. I converted my Modulefile to metadata.json, added a .travis.yml file to my repository and enabled integration with Travis CI, built the module and uploaded it to the Forge.

Since its initial release, I have updated the module to deploy Grafana version 1.6.1 by default, including updating the content of the config.js ERB template, and have added support for InfluxDB. I am pretty happy with the module and hope that you find it useful.

I do have plans to add more capabilities to the module, including support of more of Grafana’s configuration file settings, having the module manage the web server’s configuration similar to how Daniel’s module does it, and adding a stronger test suite so I can ensure compatibility with more operating systems and Ruby / Puppet combinations.

I welcome any questions, suggestions, bug reports and / or pull requests you may have. Thanks for your time and interest!

Project page: https://github.com/bfraser/puppet-grafana
Puppet Forge URL: https://forge.puppetlabs.com/bfraser/grafana

Categories: DBA Blogs