Skip navigation.

Pythian Group

Syndicate content
Official Pythian Blog - Love Your Data
Updated: 15 hours 18 min ago

The Smallest Details are the Ones that Make you Better

Fri, 2013-04-05 09:22

One of the things I love about my career (Oracle DataBase Administrator) , is how fragile it is on a day in and day out basis. I can say with certainty , that every day I turn on my laptop to start working, it can probably be my last day working as a DBA, and this is not because I want it to be, it’s because unlike many jobs out there, the smallest mistake I make, can be my last mistake in my current job.

Believe me, I am not under appreciating other jobs or professions out there, but the level of stress that this job has at its peak moments, is comparable to being a surgeon performing an open heart surgery or an air traffic controller with two airplanes that are heading for a collision.

Just imagine this, if we make a mistake, a surgeon might not have the correct data he needs for the surgery, or the traffic controller may not have the electric power to co-ordinate his or her manoeuvres. Putting it more lightly, we are responsible to having your bank account statements up to date, basically your money. Of course we are the ones who have helped you maintain the history of  those online games you played, so you can claim yourself the king of Warcraft or Madden.

What I want to get to, is if you are starting in this job, or you have been doing it for so long, you are only as good as your last 5 minutes, because the last mistake can be your last mistake. So what you want to do is take the necessary steps to avoid any mistake in the environment you are working, this is what we at Pythian call being a FIT-ACER DBA.

The best way that I have found to achieve or being closer to this, is by doing the following in the tools I work with:

PUTTY

The following configuration will help you put your words in Red with a black background, helping you distinguish a Development to a Production environment

Windows --> Colours --> Default Foreground
Red ->255, GREEN ->0 , BLUE->0

Toad

This tool is a little more discreet , but the best thing you can do is in your connection window and search for the “Color” column, and switch all your production connections to the color red.

SQLPLUS

The best thing that you can do is in the glogin.sql, that is in $ORACLE_HOME/sqlplus/admin, add the following line

SET SQLPROMPT '&_connect_identifier >'

Also you can do what Uwe Hesse uses

Put this is a script called whoami.sql

set serveroutput on
begin
dbms_output.put_line('USER: '||sys_context('userenv',-
'session_user'));
dbms_output.put_line('SESSION ID: '||sys_context('userenv',-
'sid'));
dbms_output.put_line('CURRENT_SCHEMA: '||sys_context('userenv', -
'current_schema'));
dbms_output.put_line('INSTANCE NAME: '||sys_context('userenv', -
'instance_name'));
dbms_output.put_line('DATABASE ROLE: '||sys_context('userenv', -
'database_role'));
dbms_output.put_line('OS USER: '||sys_context('userenv', -
'os_user'));
dbms_output.put_line('CLIENT IP ADDRESS: '||sys_context('userenv', -
'ip_address'));
dbms_output.put_line('SERVER HOSTNAME: '||sys_context('userenv', -
'server_host'));
dbms_output.put_line('CLIENT HOSTNAME: '||sys_context('userenv', -
'host'));
end;
/

And your result will be the following:

USER: SYS
SESSION ID: 23
CURRENT_SCHEMA: SYS
INSTANCE NAME: TESTDB
DATABASE ROLE: PRIMARY
OS USER: oracle
CLIENT IP ADDRESS:
SERVER HOSTNAME: servidor1.localdomain
CLIENT HOSTNAME: servidor1.localdomain

UNIX

For us who handle unix and linux environments, one of our best friends to help us identify where we are and which is the current ORACLE_SID we have set in our environment is with the PS1 variable. With the example below, you can see several things which can help you identifying where you are at:

PS1="
`echo ${LOGNAME}`@`hostname` [\${ORACLE_SID}] \${PWD}
`echo ${LOGNAME}` $ "
export PS1
  • Example without PS1 environment set as above
[pythian@servidor1.localdomain ~]$ echo "Hi"
Hi
  • Example with PS1 environment set as above
pythian@servidor1.localdomain [TESTDB] /home/pythian/working/antunez
pythian $ echo "Hi"
Hi

SQL Developer

This Oracle tool allows you to put all your connections into folders, which will allow you to easily identify which type of connections you have

Right Click on the connection name --> Add To Folder

All of these tips might not seem like a lot, but believe me that when you are facing one of those stressful moments , you want to make sure that you are working where you should be working, and have faith in me that this is a little grain of salt that will help you maintain your job, at least for one more day :)

Categories: DBA Blogs

Log Buffer #314, A Carnival of the Vanities for DBAs

Fri, 2013-04-05 06:26

It is an industry norm, supported by analyst researches that blogs are the best informal medium to share the information.  This unique medium provides readers with increased leverage in negotiating the hard turns in their technological lives. Log Buffer makes that leverage more powerful.

Oracle:

Alan Gardner is deploying Cloudera Impala on EC2 with Example Live Demo.

If you want to follow the real news and insight into the online learning revolution, then e-Literate has always been the blog to follow.

Tuula Fai wants to do things, and when he cannot he reverts to a little help.

Here is another quick post about yet another new feature in PS6 . As many of you know we have supported identity propagation for a long time .

With the explosion of connected devices and the ubiquity of high speed wireless connectivity, Internet of Things takes on a new relevance.

SQL Server:

As per Microsoft Books Online and SQL Server Security best practice white paper, it is recommended to periodically review privileges granted to public role, Basit serves SQL Server tip.

You can get backups of your WASD databases. But, if you want them to be transactionally consistent, you need to do a little extra work. Scary DBA writes.

How I prepared myself for the MCSE certification, Koen Verbeeck shares.

Ever wanted to know all there is to know about SQL Server, and afraid to ask?  Pearl Knows.

If you want to make you backup faster, you can try BUFFERCOUNT parameter with compression backup. See the footprint of James.

MySQL:

MariaDB 5.5 is now inside of OpenBSD, the free, functional & secure OS. Colin Charles blogs.

How InnoDB accidentally reserved only 1 bit for table format, Jeremy Cole tells.

Artem Livshits gives an inside perspective of MySQL 5.6

Simon Mudd has a good post about warming up a MySQL 5.6 server.

The new MySQL 5.6 GA release delivers a host of new capabilities to support developers releasing new services faster, with more agility, performance and security.

Categories: DBA Blogs

Which Partitions Does the SQL Statement Access?

Fri, 2013-04-05 01:02

This is a short blog post on how one can prove that a particular partition of the table or index is accessed by a specific query.

The solution to this question seemed interesting enough to share it with you all. Here’s a short description of what I had:

  • A SQL with sql_id = ’9kuvj1g38hja2′ is the TOP 1 statement in “User I/O” section of AWR report.
  • The statement queries data from multiple tables, one of them is a fairly large table named “ALERT”. It is range partitioned by date and each year has it’s own partition (Y2013, Y2012, …)
  • The SQL takes 5 – 10 seconds to complete and is executed 500 – 5000 times each hour (sometimes even ~10 of them are running simultaneously)
  • ~95% of elapsed time is spent on db file sequential reads
  • Diagnostic pack is licensed (it is required for this method as ASH is used)

We also had a statement that none of the queries should be reading data older then 1 year, but looking at the statistics I suspected there was something wrong. So here’s how I found out older data from the ALERT table was accessed:

  1. As the query is executed very often it has many appearances in ASH
  2. As the statement does lots of  db file sequential reads, there are lot of p1 (file_id) and p2 (block_id) combinations available from ASH
  3. file_id and block_id can be joined to DBA_EXTENTS to find out the segment which the block belongs to. That is where the partition names being accessed by the SQL where found out.

The challenge with this approach was the fact that I had more then 100K samples for ’9kuvj1g38hja2′ waiting on db file sequential reads in DBA_HIST_ACTIVE_SESS_HISTORY and there were millions of extents in the database. How do I join them efficiently? I can fetch only the segments we’re interested in from DBA_EXTENTS before joining it all to DBA_HIST_ACTIVE_SESS_HISTORY (Thanks Yury for this great hint)! So this is the query that solved the mystery:

with
exts as (select /*+ materialize */ partition_name, file_id, block_id, blocks
         from dba_extents where owner='SECRET' and segment_name='ALERT'),
stats as (select /*+ materialize */ p1, p2, count(*) cnt
          from DBA_HIST_ACTIVE_SESS_HISTORY where sql_id='9kuvj1g38hja2'
           and event='db file sequential read' group by p1, p2)
select partition_name, sum(s.cnt) read_count
  from exts e, stats s
 where e.file_id=s.p1
   and s.p2 between e.block_id and e.block_id+e.blocks-1
 group by partition_name
 order by 2;

It’s short and simple, and it also answered the question I asked. Is old data accessed by the SQL statement? Yes.

PARTITION_NAME                 READ_COUNT
------------------------------ ----------
Y2006                                 111
Y2007                                5645
Y2008                                5698
Y2013                               10388
Y2009                               13473
Y2010                               21600
Y2011                               37186
Y2012                               56184

 8 rows selected

Be careful interpreting the results! If the SQL does not report a partition as being accessed, it doesn’t mean it was not accessed. ASH is based on sampling – maybe we got “lucky” and samples didn’t cover reads from that partition? Maybe all blocks belonging from to the partition were in buffer cache and no reads were needed? But, if the SQL reported a partition as being accessed, we can be 100% sure it really was, and for me that was enough.

Categories: DBA Blogs

Deploying Cloudera Impala on EC2 with Example Live Demo

Wed, 2013-04-03 10:48

A little while ago I blogged about (and open sourced) an Impala-powered soccer visualization demo, designed to demonstrate just how responsive Impala queries can be. Since not everyone has the time or resources to run the project themselves, we’ve decided to host it ourselves on an EC2 instance. You can try the visualization; we’ve also opened up the Impala web interface, where you can see query profiles and performance numbers, and Hue (username and password are both ‘test’), where you can run your own queries on the dataset.

Deploying Impala on EC2

While there are many tools to deploy a Hadoop cluster on EC2 – like Apache Whirr, or even Cloudera Manager – I only wanted to use a single instance for the entire cluster. Starting from the base Ubuntu (Precise)  image, I added Cloudera’s apt repos, and installed the single node configuration. Impala doesn’t support using Derby for the Hive metastore, so I installed MySQL and configured Hive to use it instead. Then I installed Impala using Cloudera’s instructions. Impala, and all of the Hadoop daemons, are running comfortably on one M3 2XLarge EC2 instance. Given our modest demands, this may actually be overkill; I over-specced the server trying to find a (now-obvious) performance problem involving short-circuit reads.

Short-Circuit Reads

On the Pythian cluster, we could consistently  return a query in around half a second. On EC2 queries took closer to 5 seconds. A bit of investigation showed that in getting the server up and running, I had disabled short-circuit reads, which slows down Impala considerably. While Impala isn’t supposed to start without short-circuit reads, it only throws an error if you have short-circuit reads enabled but misconfigured. If short-circuits are off in the hdfs-site configuration, it will happily start and run very slowly. With the default DEB install, the libhadoop library isn’t installed to the LD_PATH on Ubuntu, which prevents short-circuit read from working. The easiest solution was to create symlinks for libhadoop to /usr/lib/, then run ldconfig:

ln -s /usr/lib/hadoop/lib/native/libhadoop.so /usr/lib/
ln -s /usr/lib/hadoop/lib/native/libhadoop.so.1.0.0 /usr/lib/
ldconfig

To confirm whether your cluster has short-circuit reads enabled, you can visit the Impala web interface (by default, port 25000 on any system running impalad) and click on the ‘/varz’ tab. Search for ‘dfs.client.read.shortcircuit’ – it should be set to ‘true’.

Partitioning

With libhadoop installed and short-circuit reads enabled, the next greatest performance improvement came from partitioning the table on the sensor id. Since all of our web interface queries filter by sensor id, Impala can perform some serious partition elimination: looking at the query profiles, partitioning the table reduced the amount of data read from HDFS from 4GB to 50MB, and the query time from 2.6s to 130ms. The README on Github has instructions on how to use dynamic partitioning in Hive to quickly partition the soccer data; these steps can be generalized to any dataset.

Categories: DBA Blogs

COLLABORATE 13 à la Pythian …

Tue, 2013-04-02 10:51

As per many previous IOUG/OAUG/Quest shows, Pythian is will be in Denver next week! It was a sunny day in the fall of 1991 when I gave my first paper at International Oracle User Week (IOUW), a pre-cursor to COLLABORATE and a few earlier incarnations called IOUG-Live and IOUG-Alive! It has been a whirlwind of papers and sessions since then and I for one am still speaking mainly about the good old database. It is the heart of my skill set and always will be part of it. Outside of our participation at the speaker level, Pythian is also in the exhibit hall in booth 606 with prime real estate close to the main entrance. Just hang a left and we are the 5th or 6th booth on your right. Pythian rock stars will be at the booth off and on all week. You never know who you will meet there as the social network following and interacting with Pythian is vast and deep.

Speaking of vast … we look forward to your attendance at one of our sessions as per the details next:

Maris Elsins (@MarisELsins) & Yury Velikanov (@yvelik) speak on Oracle Cluster 11GR2 with Oracle Applications on Sun. Apr. 7 from 1:15 pm – 2:15 pm as well as Running E-Business Suite Database on Oracle Database Appliance on Mon. Apr. 8 from 9:45 am – 10:45 am.

Alex Gorbachev (@alexgorbachev) presents Practical Hadoop by Example on Mon. Apr. 8 from 1:15 pm – 2:15 pm in addition to Under The Hood of Oracle Pluggable Databases on Mon. Apr. 8 from 2:30 pm – 3:30 pm.

A popular recent arrival at Pythian René Antunez (@AntunezRene) brings you How to survive a disaster with RMAN on Mon. Apr. 8 from 1:15 pm – 2:15 pm.

The “Big Data Lady”, a.k.a. Gwen Shapira (@gwenshap) cannot be missed at RAC Experts Panel on Mon. Apr. 8 from 3:45 pm – 4:45 pm. Her prowess also brings you Engineered Systems Curriculum: Databases in a solid state world: how Exadata X3 and other database systems leverage the performance of flash on Mon. Apr. 8 from 5:00 pm – 6:00 pm. You will not be disappointed at Gwen’s Building the Integrated Data Warehouse With Oracle Database and Hadoop on
Tue. Apr. 9 from 10:45 am – 11:45 am. Catch Gwen as well at Big Data SIG Meeting on Wed. Apr. 10 from 3:00 pm – 4:00 pm and last but not least, Queues, Pools and Caches – The Right Way to Scale OLTP on Thu. Apr. 11 from 11:00 am – 12:00 pm.

The dynamic duo of Paul Vallée (@paulvallee) and Alex Gorbachev (@alexgorbachev) can be seen at IT’s own Last Mile Problem: Privileged Access Surveillance on Mon. Apr. 8 from 5:00 pm – 6:00 pm.

Our very own, well-known and loved in the Twitter world Yury Velikanov (@yvelik) delivers You most probably don’t need RMAN catalog database on Wed. Apr. 10 from 9:30 am – 10:30 am.

Christo Kutrovsky (@kutrovsky) discusses Delivering Big Data with Exadata on Wed. Apr. 10 from 9:30 am – 10:30 am .

One of our many Apps specialist Vasu Balla (@vasuballa) presents Anatomy of Autoconfig in Oracle E-Business Suite on Wed. April 10 from 11:00 am – 12:00 pm as well as Staged Patching Approach in Oracle E-Business Suite on Thu. Apr. 11 from 12:15 pm – 1:15 pm

Yours truly(@michaelabbeyCAN) discusses Pluggable Databases – An Intro on Mon. Apr. 8 from 11:00 am – 12:00 pm as well RMAN: New Features on Wed. Apr. 10 from 8:15 am – 9:15 am.

The list is endless but NOT the talent. Pythian employs well-known, well-respected and very talented people and, oh ya just in case you are wondering, we are hiring :). See you on Denver, the 1.6 kilometre high city as we call it in Canada. If you are thinking about getting involved in a user group you will find representatives all over the show in Denver and please speak to Alex Gorbachev (serving IOUG board member) or yours truly who was on and off the board between 1993 and 2007.

Vanessa Simmons, Pythian’s Director of Business Development will be at the show and would love to meet and speak with any/all interested. Please get her as @pythiansimmons on Twitter or simmons@pythian.com.

Categories: DBA Blogs

Meet Pythian’s Oracle Apps Experts at Collaborate 13!

Mon, 2013-04-01 18:54
The stars have aligned

There’s a rare event happening next week: All three of Pythian’s Oracle Apps tech leads will be in the same place. Vasu Balla, Maris Elsins, and recently-minted Oracle Ace Director Yury Velikanov will all be at Collab 13 this year. Since we’re a globally distributed, round-the-clock team, this is not something that can happen very often; in fact, Yury and Vasu will be meeting face-to-face for the first time.

This isn’t a team-building exercise, though. Our 3 biggest Apps brains have 5 presentations for the community this year, from large-scale systems (EBS on 11gR2 clusters and ODA) to nitty-gritty details (Autoconfig and Apps patching), to essential fundamentals (RMAN).

The presentations

I’ve had an opportunity to review several of the presentations by the Pythian EBS teams, so I can assure you that you’re in for a treat.

Maris and Yury are presenting on complex configurations on the database side of Oracle E-Business Suite:

Yury also has a presentation for our Core DBA compatriots, and it’s good stuff for Apps DBAs as well:

Vasu will be talking about some of the nuts and bolts of the E-Business Suite tech stack:

Other ways to connect

Since you’re reading this blog, you already know we have a pretty strong commitment to social media in these parts. If you miss them at their presentations, the best way to find Maris, Yury (“Mr. Social Media”), and Vasu is on Twitter: @yvelik, @MarisElsins, and @vasuballa. We’ll have several meetups arranged via Twitter throughout the conference. Also, consider stopping by Pythian’s booth (#606) to meet some of our other knowledgeable and entertaining colleagues! We look forward to the opportunity to meeting our clients, colleagues and friends (new and old). See you at the show!

BTW, this post is supposed to be about our rockstar presenters, but just a note: I’ll be there too. Find me (John) at @jpiwowar. :)

Categories: DBA Blogs

How I Put My Collaborate Agenda Together

Mon, 2013-04-01 16:57

It is less than a week before the Collaborate 2013 conference. Most of us are busy putting together our individual agendas for what is shaping up to be a very exciting week. There are so many interesting sessions to choose from! Conference organizers kindly introduced a Show Planner to make planning a bit easier, but in this blog post, I will share with you how I created my agenda for Collaborate.

Disclaimer: It is easy to miss some very interesting sessions. Sessions that I may be interested in will not meet your profile. Please use your own judgement and do not rely 100% on my approach.

Explore sessions from strong technical companies

Pythian is the company I have enjoyed working for almost 4 years now; in fact, I will celebrate 42 months working at Pythian during Collaborate :). However, there is something else that I would like to talk about here. Pythian is one of very few companies that supports and encourages its employees to stand up and speak about the exciting things we deal with on a day-to-day basis. As a result, 9 Pythian employees will deliver 16 sessions at Collaborate this year.  So if you, like me, are after great technical content from seasoned professionals, then add some Pythian sessions to your Collaborate 2013 agenda.

Pythian isn’t the only company that encourages their employees to share knowledge and that invest heavily in Oracle Community programs. There are many others; however, I would like to mention the 2 companies that are at the top of my list:

  • Dbvist – I am glad to see that Arjen (the founder) and his crew are part of Collaborate this year. The content they deliver is very good.
  • Enkitec – I personally love to listen to these speakers.

In the comments section of this blog post, please do not hesitate to share information about other good companies that contribute to the Oracle Community and are part of the Collaborate 2013.

Remember the past

Another tip for putting together a great agenda for Collaborate 2013 is to see if speakers you’ve enjoyed listening to at previous conferences are returning this year. You can find my memories about last year’s conference under the following blog post Collaborate 12: Speakers I’d like to hear again.

Attend RACSIG & RAC Attack events

I am glad to inform you that RAC Special Interest Group and RAC Attack are part of Collaborate 2013 again this year. Traditionally, RAC Attack has been a place where people with expert technical skills meetup. I don’t expect Collaborate 2013 to be any different. You can find information about RAC Attack sessions here.

RACSIG is organizing several sessions including RAC Experts PanelOracle Database Appliance Customer Panel and RAC SIG Birds of a Feather. I am part of the RACSIG board and we have put together a list of RAC-related sessions (here) to make your search efforts a bit easier.

Additional sources

There are several other sources I have used to build my Collaborate 2013 agenda. Some of them are listed below:

As with any blog post, it is very easy to miss mentioning someone. I did my best to avoid naming specific sessions with the exception of panels and SIG meetings. My intention was to give you some hints on how to build your Collaborate agenda. Please do not hesitate to add some additional references and tips at the end of this post.

See you soon in Denver folks,

Yury

Categories: DBA Blogs

How to Get the Most Out of Collaborate! Talking from Experience

Mon, 2013-04-01 16:37

First, the most important advice I can give you is to relax and have FUN!

Personal Experience

Until recently, most of the conferences I participated in were very stressful for me personally. My very first big conference was back in 2000 – the Oracle User Group conference in Copenhagen. Since then, I have participated in 1 or 2 conferences per year. I always do a lot of prep work to ensure I get the most value out of the conference. I try my best to participate in as many sessions as possible and get very upset if I miss a session for any reason. It’s only during this last year or so, after participating in some of the best conferences of my life, that I now understand how much I was missing before. Getting as stressed as I did prevented me from learning as much as I could.

My Advice to You

During the last few conferences, I have learned more from sessions and networking than ever before. Here’s what I learned.

Don’t worry if you deviate from your original agenda

Don’t get me wrong, it is a great idea to plan which sessions you want to attend before you get to the conference; however, don’t spend too much time planning and don’t worry if you don’t attend every session in your agenda. Yes, you are there to learn. Yes, you have paid a significant amount of money to be there, but you will learn a lot more if you aren’t too stressed about keeping to your original schedule. My best advice is to build a good agenda based on your interests and do your best to follow it. Don’t worry if you miss one or two sessions. Enjoy networking, and you will learn more during the next session. If you feel tired, take a break, relax, and ask yourself “What would I like to do next?” and then do it, even if it isn’t in your original agenda. Explore the exhibition, talk to someone, or maybe take a break from the conference entirely and get outside for 30 minutes or so.

As the saying goes, the only constant is change, so don’t worry about sticking to your agenda too strictly. You will get more value out of the conference if you network with people, listen to their advice, take a break, and then adjust your schedule.

Mix sessions with expert panels and SIG meetings

While you can learn a lot by attending sessions focused on a particular topic with one speaker or one presentation, you may get tired faster. I suggest mixing sessions with community discussions. Most conferences offer many panels and special interest group (SIG) meetings. I am sure you will find some that suit your needs and interests. Attend a few and get involved in the discussions. Build new connections and talk with people from your interest group. Ask them which sessions they are planning to attend. Chances are they can recommend a very good speaker or session you may not have considered when building your agenda. Tag along and try something you hadn’t planned on.

Use Twitter for conference communications

Many people think Twitter is a waste of time and sometimes it is, but not when it comes to staying up-to-date at conferences. For me, Twitter is the best communication media and way to stay in touch at any conference. It can help you to do many things, including:

  • Receive the latest news related to the conference (an upcoming event, a good giveaway, an appreciation event, a good person to meet, etc)
  • Ask conference-related questions (pose your question using the conference’s hashtag and someone will answer you)
  • Discuss ongoing sessions in real time and share information you just learned with other conference participants
  • Meet with other professionals from your industry
  • Use Twitter to find people (many use Twitter instead of a phone or any other communication media) who may want to share a beer with a stranger or know of a good party :)

Below are a few suggestions for getting started on Twitter once you’ve created an account:

  • Start by searching for tweets using the conference’s hashtag(s). For Collaborate 2013, the main hashtag is #C13DEN
  • If the main conference hashtag gets too noisy, start following a few folks who are tweeting about topics you are interested in
  • Often (and Collaborate isn’t an exception) conferences have sub-hashtags that may be less noisy and more focused on the areas you are  interested in. For example, #C13TCH is the hashtag for technical context at Collaborate 2013
  • Build your private “Twitter list” of people you would like to follow during the conference. You can add and remove people if those get too noisy.
  • It is easy to use any Twitter client and switch between lists, searches, and streams depending on your preferences. Try it and you will find the best for you in no time.

PS: Don’t forget to fully charge your mobile’s battery each evening (or even buy an external battery for recharging as you go)

Network with people

Twitter and some other communication media are great tools for getting information, but don’t forget that a conference is a place to meet PEOPLE. Do not hesitate to say “Hi” to the person standing next to you in a lunch queue or sitting beside you waiting for a session to start. Part of the challenge is finding the right people to talk to. You rarely talk about databases with your wife or husband, do you? But chances are the person sitting next to you has similar interests. They are there for exactly the same reasons you are. It’s a great opportunity to learn, listen, and share your experiences. Everybody wins.

Have fun

Learning and new opportunities are two of the main reasons for attending conferences, but another reason is to take a break from day-to-day life and have fun. As they say, a change is as good as a rest. Think about it! You get the chance to get away from work, routine, neighbours, commuting, etc. Use the change! Have fun, smile more, get a bit crazy. Do things that you may not get to do that often. Try new things. You don’t see these people on a daily basis. Leave them with the impression that you are a fun and friendly person!

Have fun, my friend, and see you very soon in Denver :)

Yury

You can follow me on Twitter: @yvelik

You can follow many other Collaborate conference participants here

Feel free to share your hints and give me your feedback using Twitter or the comment section below :)

Categories: DBA Blogs

Using Ansible to Secure Cloudera Manager Installation on a Hadoop Cluster

Thu, 2013-03-28 13:54

Building a secure Hadoop cluster requires protecting a number of services which comprise Hadoop infrastructure. If you are using CDH distribution, then Cloudera Manager (CM) is one of the components that needs to be secured. CM consists of Server, Agents running on all cluster machines and web UI. CM provides you with 3 levels of securing (good/better/best) it’s communication:

  • Transport Layer Security (TLS) Encryption for Cloudera Manager
  • TLS Authentication of Server to Agent and Users
  • TLS Authentication of Agents to Server

There is a good step by step guide in CM documentation, and it’s easy to follow for one server, but what when you have hundreds of them? There are different approaches to the problem of managing server’s configuration at scale, but I’d like to focus on Ansible which is a neat framework for parallel commands execution and complex rollouts. And it’s written in Python! Ansible is easy to install and requires only a couple of Python libraries on a “master” node and nothing else that python2.6 on “slaves”. You don’t have to setup ssh-keys and configures passwordless access across all machines, which is important if we are talking about security. To give you and idea of how Ansible command looks like, here is how you can restart CM Agents on all Hadoop data nodes:

ansible hadoop_data_nodes -m service\ 
 -a "name=cloudera-scm-agent state=restarted"  -k --sudo 

This command will read a list of hosts from your /etc/ansible/hosts file or any other file specified by ANSIBLE_HOSTS environment variable. It will find a section [hadoop_data_nodes] in the hosts file (which has ini-like structure) it and will execute given command for all servers in this section. Ansible will ask you for both ssh and sudo passwords only once and will use them to execute command on target servers. Ansible relies on modules (specified by -m option) to perform specific tasks like restarting services, executing shell commands or manipulating text files. Full list of modules can be found here. You can write your own, of course, if you need.

Below is the step by step guide on configuring all 3 levels of TLS for CM. In some aspects it repeats steps from CM documentation guide, but there are some important nuances that I have discovered. This guide is also scalable: you can apply it with very little modifications to clusters of any size.

Level I. TLS Encryption for Cloudera Manager

Following steps assume that you have installed Ansible on the same server as your CM Server and properly configured it to access all nodes in your cluster.

  1. Create a directory for CM keystore:
    mkdir /etc/cloudera-scm-server/keystore
  2. Generate a certificate for CM. You will be prompted for new Keystore password. Also make sure CN field that you will be asked to specify matches CM Server hostname. It’s not *your* name!
    keytool -validity 1095 -keystore\
     /etc/cloudera-scm-server/keystore/scm-keystore -alias jetty\
      -genkeypair -keyalg RSA

    note -validity option, new certificate will be valid for 3 years in this case

  3. Restrict permissions to the Keystore:
    chown -R cloudera-scm:cloudera-scm /etc/cloudera-scm-server/keystore/
    chmod o-r /etc/cloudera-scm-server/keystore/scm-keystore
  4. Enable TLS Encryptions for Agents, provide path and password to the Keystore in CM web UI. You can refer to CM documentation on how to do this.
  5. Next, we need to update CM Agent configuration files to set use_tls=1 option. For this Ansible lineinfile module can be used:
    ansible hadoop -m lineinfile -a\
      "dest=/etc/cloudera-scm-agent/config.ini state=present regexp='use_tls.*'\
         line='use_tls=1'" -k -K

    There are several assumptions made here. First of all, your Ansible host list has a [hadoop] section in it, which covers all Hadoop cluster nodes, or it hase sections like [hadoop_namenodes], [hadoop_datanodes], etc. Ansible can recognise patterns. Second, your template for CM Agnet config.ini files has all security related options in it, but they are commented out. Maintaining a template for all configuration files is a good idea and Ansible can help you here as well, but it’s beyond the scope of this guide.

  6. Restart CM Server (again, I assume that you run Ansible commands from the same machine):
    sudo /etc/init.d/cloudera-scm-agent restart
  7. Restart CM Agents and check their health:
    ansible hadoop -m service -a\
      'name=cloudera-scm-agent state=restarted' -k -K;
    ansible hadoop -m shell -a\
      "tail /var/log/cloudera-scm-agent/cloudera-scm-agent.log" -k -K

    You have already seen an example with service Ansible module. New module used here is shell. It allows you to run arbitrary shell commands. It’s a good idea to verify that all agents started fine, so you can check the log files on all servers.

If all is fine, at this point you have encrypted communication between CM Server and CM Agents.

Level II. TLS Authentication of Server to Agent and Users

At this level you will force CM Agents to check a certificate for CM Server to make sure they are talking to the right machine.

  1. Export CM Server Certificate from Keystore and converting it to .der format:
    keytool -exportcert -keystore\
      /etc/cloudera-scm-server/keystore/scm-keystore -alias jetty\
      -file scm-server.der
    openssl x509 -out scm-server.pem -in scm-server.der -inform der
  2. Create new cert dirs on all agent servers:
    ansible hadoop -m shell -a "mkdir /etc/cloudera-scm-agent/cert" -k -K
  3. Copy server cert to agents:
    ansible hadoop -m copy -a "src=/tmp/scm-server.pem
          dest=/etc/cloudera-scm-agent/cert/ owner=root group=root" -k -K
  4. Change very_cert_file option in agent’s config.ini:
    ansible hadoop -m lineinfile -a "dest=/etc/cloudera-scm-agent/config.ini 
      state=present regexp='verify_cert_file' 
      line='verify_cert_file=/etc/cloudera-scm-agent/cert/scm-server.pem'" -k -K
  5. Enabled Use TLS Encryption for Admin Console in CM Web UI
  6. Restart CM server and Agents. See Steps 6-7 in Level I section.
Level III. TLS Authentication of Agents to Server

This is similar to previous level, but requires certificates to be generated for all agents, so they can authenticate to CM Server. Here I describe an approach with self-signed certificates which is fine for development or POC clusters. For production clusters you may need to comply with your organisation’s standards and obtain properly signed certificates.

    1. Generated password for Agent keys and copied it to all agents machines:
ansible hadoop -m shell -a\
 'echo PASSWORD > /etc/cloudera-scm-agent/cert/agent_cert.pwd' -k -K
ansible hadoop -m shell -a\
  'chmod o-rx /etc/cloudera-scm-agent/cert/agent_cert.pwd' -k -K
  1. Next step is to generate private key and certificate for each CM Agent. To automate this task I wrote a quick script — https://github.com/dazbur/morecerts. It takes list of hosts in a plain text file and produces private key and certificate for each host. Resulting files will have agent_HOSTNAME.key and agent_HOSTNAME.pem format. It also takes care of providing Distinguished Name options for keys-generating commands:
    • Create a text file with list of Agent’s IP adress. You can just copy it from Ansible hosts lists
    • Create agent key password and put it into a text file
    • Generate keys:
      ./morecerts.py -f agentiplist.file -p agent.pass.file gencerts
    • Add keys to CM Server keystore:
      sudo ./morecerts.py -f agentiplist.file -p agent.pass.file -k\
       /etc/cloudera-scm-server/keystore/scm-keystore\
          -w KEYSTORE_PASSOWORD addtokeystore
  2. Copy keys and certs to agent machines (assuming you have generated keys in your home directory):
    ansible hadoop -m copy -a "src=~/agent_$inventory_hostname.key
     dest=/etc/cloudera-scm-agent/cert/agent.key owner=root group=root"  -k -K
    ansible hadoop -m copy -a "src=~/agent_$inventory_hostname.pem
     dest=/etc/cloudera-scm-agent/cert/agent.pem owner=root group=root"  -k -K

    Here you can see another nice trick Ansible can do: $inventory_hostname variable corresponds to the server on which command will be executed. This allows me to copy specific files to specific servers.

  3. Update agent config.ini files to set client_key_file, client_keypw_file and client_cert_file options:
    ansible hadoop -m lineinfile -a "dest=/etc/cloudera-scm-agent/config.ini 
     state=present regexp='client_key_file.*' 
      line='client_key_file=/etc/cloudera-scm-agent/cert/agent.key'" -k -K
    
    ansible hadoop -m lineinfile -a "dest=/etc/cloudera-scm-agent/config.ini 
      state=present regexp='client_keypw_file.*' 
      line='client_keypw_file=/etc/cloudera-scm-agent/cert/agent_cert.pwd'" -k -K
    
    ansible hadoop -m lineinfile -a "dest=/etc/cloudera-scm-agent/config.ini 
      state=present regexp='client_cert_file.*' 
      line='client_cert_file=/etc/cloudera-scm-agent/cert/agent.pem'" -k -K
  4. Update CM configurtion via Web UI to set Use TLS Authentication of Agents to Server, Path to Truststore and Truststore Password. Truststore in our case is the same as Keystore.
  5. Restart CM Server and Agents

You are done now! Ansible can also wrap individual playbooks, so you can execute them on demand later. In general, I found Ansible to be a great tool to manage and execute commands on many servers and it is definitely worth exploring.

Categories: DBA Blogs

Oracle Temporary Tables Getting Closer to be Temporary

Thu, 2013-03-28 06:36

This is a very short blog post to share a good news I  learned last week at the great New Zealand Oracle User Group 2013 conference. During a keynote Tom Kyte mentioned that Oracle is  going to introduce a Temporary Undo.

Most of us know that for years Oracle temporary tables weren’t true temporary tables from technical perspective. As any change would generate UNDO data and UNDO would generate corresponding REDO. Therefore changes on temporary tables generated a REDO. Depending on an application a REDO stream generated from temporary tables could be significant. REDO volumes have a direct impact on database recovery. Database must apply all REDO data in case of recovery (including temporary tables REDO). This going to change soon.

Yury

BTW: I really enjoyed the conference. This was one of the very best if not the best conferences I have attended in the region. If you will have a chance to be part of it in the future then don’t think twice. Go for it :)

Categories: DBA Blogs

Log Buffer #313, A Carnival of the Vanities for DBAs

Thu, 2013-03-28 06:33

The answers to the questions like whether to patch now or wait a little? What quirks are there in that stunning new features? What are the limitations of that fancy index type, any working examples of a particular add-on, are best found in the blogs. This Log Buffer Edition provides you a window to those blogs out there.

Oracle:

Scott Wesley is dishing out the reasons you should upgrade your APEX environment.

In case you haven’t already noticed, PeopleTools 9.2 is out and available for download!

Since two thirds of sensitive data in most organizations resides in databases, consolidation onto private clouds represent an opportunity to improve information security and compliance.

BITeamwork has been out since Oracle Open World 2012 but since its involvement with several customers the product has evolved into a really amazing software offering.

Andrew Reid is giving simple example, tested on Oracle 11.2, shows how to use V$SQL_BIND_CAPTURE to see the value of bind variables used in a WHERE clause.

SQL Server:

Steve Bolton is sharing a rickety stairway to SQL Server Data Mining.

What to Do When You Can’t Wipe an Infected System? Brian Kelley answers.

Policy Based Management has 4 evaluation modes and if you are not already familiar with them you can go here to get more information.

James had someone asked him the other day if they are using SSIS 2012 Project Versions, is there any benefit to using TFS?

Did you know that when you deploy projects to the SSIS catalog in SSIS 2012, the previous version of the project is kept in the server?

MySQL:

Here is how to run your own Web SMS Portal with PointSMS.

Sven is blogging about flexible Fail-over policies using MySQL and Global Transaction Identifiers.

Daniel van Eeden is running Percona XtraDB cluster in a sandbox on Ubuntu.

What other pluggable authentication plugins would you like in MariaDB? Colin Charles asks.

Why MySQL Performance at Low Concurrency is Important? Peter Zaitsev questions.

Categories: DBA Blogs

Performance Settings of Concurrent Managers

Wed, 2013-03-27 08:02

This is the second article in a series about internals and performance of concurrent managers. In this post, we’ll take a look at three important settings that affect the performance of the concurrent managers: number of processes, “sleep seconds”, and “cache size”.  This article might be a bit on the theoretical side, but it should provide a good understanding of how these settings actually affect the behavior and performance of concurrent managers. Most of the statements in this article build off of information from my previous post:  The Internal Workflow of e-Business Suite Concurrent Manager Process. It may be helpful to take a look at it before continuing with this one.

Life cycle of a Concurrent Request

The interesting thing about tuning concurrent managers is the fact that we don’t tune a particular query or a running process, but we actually tune the pending time of concurrent requests. The goal of the tuning is to make sure concurrent requests start executing soon enough after the time they have been scheduled for. Let’s take a look at the life cycle of a concurrent request:

The Lifecycle of a Concurrent Request

The Life cycle of a Concurrent Request (CR)

Based on the diagram above, the pending time of the request is the interval between the time the request was scheduled to start and the time it actually started. This time can be split in two parts:

  1. Pending for Conflict Resolution Manager (CRM) – Here the CRM checks the incompatibility rules effective for the pending concurrent request against other running requests. The CRM allows the request to execute only when all incompatible requests have completed.
  2. Pending for Concurrent Manager (CM) – This is the time spent waiting for an available concurrent manager process. It also includes the time the CM process takes to fetch the request from FND_CONCURRENT_REQUESTS table and to start executing it. “Pending for CM” is the interval that can be tuned by altering the number of manager processes, “sleep seconds” and the “cache size” settings.
Requirements

Understanding the requirements is a mandatory step for any tuning attempt. Otherwise, it’s hard to know when to stop tuning as can become unclear if the performance is sufficient or not. When you’re tuning concurrent managers the requirements can be defined by answering a simple question: How long is the request allowed to stay pending after the scheduled start? You should also keep in mind the following items while thinking about the answer:

  • Be realistic – “Requests should start immediately” is not a valid answer. It’s simply not possible because of how concurrent managers work. If there is anything you have to run immediately – concurrent programs are not the correct way of doing it.
  • Think of groups of concurrent programs – If you have a similar requirements for a group of requests (i.e. a program printing invoices while the customer is waiting on-site should start executing in 10 seconds after it’s submitted), a dedicated concurrent manager should be implemented for them.
  • Unsure of requirements? – If the requirements are not known, ask the end users if they have experienced situations when requests stay in the queue for too long. Was it because the managers couldn’t cope with the amount of incoming requests? If so, the settings might be too low.
  • Work Shifts - If the requirements differ at different times of the day, concurrent manager Work Shifts can be used to define different settings depending on the time of the day.
Settings

The settings related to concurrent managers are explained in the documentation: Oracle E-Business Suite System Administrator’s Guide – Configuration. But I find the explanations are too often unclear to effectively tune the concurrent managers. In this chapter I’ll reveal the basic principles behind each of the three main settings and will describe how I utilize them to control the performance of concurrent managers.

Number of Processes

From the Documentation: “The number of operating system processes you want your work shift to run simultaneously. Each process can run a concurrent request.

  • It’s important to add that there is no coordination between processes of a concurrent manager. If all concurrent processes fetched from the FND_CONCURRENT_REQUESTS table at exactly the same time – they all would read exactly the same information about pending requests. A simple mechanism of row locking is utilized later to allow execution of a concurrent request on a single manager process.
  • Choosing the correct number of concurrent processes is not easy as the workloads are not constant and the number of pending requests can vary. Here are some clues you should consider (I’m also planning a future blog post about measuring the actual utilization levels of concurrent managers):
    • Don’t configure too many processes! Your hardware is limited. If you configure too many processes you’ll exhaust the server resources. For example, if you run lots of big Oracle Reports (CPU-intensive workload) and you have 4 CPUs on your only web/forms/concurrent node, configuring 8 processes for the concurrent manager is risky. If 8 reports where to be generated at the same time, the users could experience web/forms slowness. The same applies to the database tier. Additionally, if too many processes are configured they can use a significant amount of resources even when they are idle. This is especially important for RAC configurations. (This too will be explained in one of my upcoming blog posts)
    • Don’t configure too few processes! If the number of processes is insufficient, the requests will start queuing. If that becomes an issue, increase the number of processes slightly. Consider defining a Work Shift with a different number of processes at different times of the day to accommodate your requirements.
    • Don’t be afraid of queuing! Queuing is normal, especially if all requests still manage to start as expected based on the requirements. If you see recurring queuing at particular times, check with the users if that is going to be a problem.
    • Start low! If you’re unsure of what setting should be used, start with a low number of processes. Check if any queueing occurs and if the users start complaining.
Sleep Seconds

From the Documentation: ”The sleep time for your manager during this work shift. Sleep time is the number of seconds your manager waits between checking the list of pending concurrent requests (concurrent requests waiting to be started)

  • The documentation is inaccurate – It should clearly state that it’s effective for each manager process – i.e. if you have 5 concurrent manager processes for the Standard Manager and the “Sleep Seconds” setting is set to 30, then the average time between checks for pending requests (if all managers are idle) is 30 / 5 = 6 seconds.
  • A manager process sleeps only when there are no pending requests - The manager process checks the requests queue immediately after it has processed the last request it fetched. A common misconception is that if the rate of incoming requests is very high, the “Sleep Seconds” should be low to process all of them quickly. Not true! If the rate of incoming requests is high, there is a good chance some requests will be executing at any given time. So, when they complete, the requests queue will be checked immediately and the new requests will be started.
  • What value to use? - Calculate it! Three parameters are important to estimate the “Sleep Seconds” (S) setting: the number of manager processes (N), the average utilization level (U) of concurrent managers (this setting will be explained later), and the average time of how long the request is allowed to be pending (T). As “Sleep Seconds” are effective only for the idle processes, it can be calculated using the following: S = N * (1 – U) * T.
    • Example 1: if N = 5 processes, U= 20%, T=20 seconds – let’s calculate the “Sleep Seconds” setting: S = 5 * (1 – 0.2) * 20 = 5 * 0.8 * 20 = 80 seconds. It seems high, but think about it – if the average utilization of 5 processes is 20%, then there are 4 idle processes at any given time. Each of these will have a sleep interval of 80 seconds, so on average the requests queue will be checked every 20 seconds.
    • Example 2: if N = 3 processes, U= 90%, T=20 seconds: S = 3 * (1 – 0.9) * 20 = 3 * 0.1 * 20 = 6 seconds.  This example reveals a problem as the calculated “Sleep Seconds” are lower than the requirement we have set – this means the requirement can’t be reached with the number of running processes. Think about it – we have 3 processes each utilized 90% of time, it’s impossible to meet the 20 seconds goal because most of the time all managers are busy. There simply aren’t enough processes to execute the incoming requests. The defined requirements can be reached only if at least one manager process is idle – This scenario also describes a “perfect world”, where all but one manager is busy, so all new requests are picked up in time and the processing overhead of the idle manager processes is minimal.
Cache Size

From the Documentation: “The number of requests your manager remembers each time it reads which requests to run

  • Almost useless setting - Unless you have a manager with only one running process. If multiple manager processes are running, there is a good chance that most of the cached requests will be processed (remember, the processes don’t coordinate the work – they compete the work) by other manager processes while the first request is running.
    • Example, there are 10 manager processes and 10 requests are submitted. One of the manager processes starts executing the 1st request, the other managers start running the remaining requests. So, by the time the request completes all the cached requests will be obsolete, but the manager process will try to lock the corresponding rows in FND_CONCURRENT_REQUESTS table anyway, and will fail for all 9 requests. It will then immediately query the queue to check if more requests are pending.
    • I think it’s best to set “Cache Size” setting to 1 so the hardware resources aren’t spent on trying to lock the processed requests, but rather on checking the requests queue.
  • Request priorities are cached too - If you have a cache size greater than 1, keep in mind the request priorities are cached too. If the priority is changed for a cached request, the manager process will not notice it.
Summary

I have to admit this article turned out to be much more complicated to write than I expected, even if there are just a few settings to describe. The problem is that the concurrent processing environment is changing all the time. At one moment it’s completely idle, and then suddenly there is a spike of incoming requests. It makes it impossible to tune for all situations.  There will always be instances when any configuration works better and others when it makes things worse, but I hope I was able to outline the significance of each configurable parameter so the overall picture is clearer.

As I promised, two other articles are lined up for the future – one to describe the actual utilization levels of concurrent managers and the other to look into overhead of an idle concurrent processing environment. Stay tuned!

Categories: DBA Blogs

REMINDER – MySQL Community Dinner at Pedro’s 2013

Tue, 2013-03-26 14:28

Friends,

I want to remind you, to let us know if you would like to come for the dinner at Pedro’s on Tuesday April, 23.

Actually ALL of you should come !

Pedro’s is asking us to have a better idea of the number of participants, so please go to this page “MySQL Community Dinner at Pedro’s 2013

and subscribe.

We want to have all of you enjoy the evening with us, subscribe & come!

Categories: DBA Blogs

Log Buffer #312, A Carnival of the Vanities for DBAs

Fri, 2013-03-22 11:32

There are no rules for blogging. There cannot be any, because you cannot trap the wind in your hands. It’s innovation, it’s creativity, and it’s right out of the core of the technology from the bleeding edge. This Log Buffer drips into that, and brings you some of the finest posts.

Oracle:

Yes its not a fad. Big Data is the commercial supercomputing in the age of Datafication.

Tanel Poder is listing Exadata storage cells and their configuration info from V$CELL_CONFIG.

How well do you know your hints? When Jonathan asks, it stays asked.

Rather than just describing the contents of the trace file, Dough thought it might be a good idea to tie the various sections into how they might help you solve Real WorldTM problems.

Timur Akhmadeev produces yet another gem of a blog post.

SQL Server:

Michelle Ufford is blogging about conversion issues upgrading to PowerShell 3.0.

Mark Broadbent is figuring out as just how to upload SQL Server 2012 Enterprise/ Developer Edition installation files to his Windows Azure Virtual Machine.

Chris Shaw needs the opinion regarding the sessions for the PASS summit.

Chris Webb writes about UK/US Date Format Bug in PowerPivot and SSAS Tabular.

Thomas LaRock is having a SQL Server Query Performance Tuning: A 12 Step Program.

MySQL:

Among many others, here is yet another live post from Percona 2013.

Petri Virsunen opines about Zero-Downtime MySQL Maintenance And Schema Operations.

ip.access Relies on MySQL to Support Mobile Traffic Growth.

Wanted: Evaluators to Try MongoDB with Fractal Tree Indexing

Zmanda Recovery Manager for MySQL – What’s New in Version 3.5

Categories: DBA Blogs

The Internal Workflow of e-Business Suite Concurrent Manager Process

Thu, 2013-03-21 14:41

Concurrent processing is one of the key elements of any e-Business Suite system. It provides scheduling and queueing functionality for background jobs and it’s used by most of the applications modules. As many things depend on concurrent processing it’s important to make sure the configuration is tuned for your requirements and hardware specification.

This is the first article in a series about performance of concurrent processing. We’ll take a closer look at the internals of concurrent managers, the settings that affect their performance and the ways of diagnosing performance and configuration issues. Today we’ll start with an overview of the internal workflow of a concurrent manager process. Enjoy the reading!

There are few things that need to be clear before we start:

  1. This is only about how the concurrent processing framework (the concurrent managers) works and it’s not about the concurrent requests executed in the framework.
  2. There are multiple types of concurrent managers in eBS – internal manager, conflict resolution manager, workflow agent listener service, standard manager, etc. Their roles are different and in this post I’ll discuss only managers that pick up scheduled concurrent requests from the “queue” and execute them – or specifically all managers that have type of “Concurrent Manager” set in Concurrent managers’ definition form. Typical examples of these managers are “Standard Manager”, “Inventory Manager” and probably your own custom concurrent managers created for processing specific types of concurrent requests.
"Concurrent Managers" Form

“Concurrent Managers” Form

It is important to understand the internal workflow of a concurrent manager because otherwise it’s hard to realize how a configuration change actually affects the system. Several years ago I had to implement an online change of a specialization rule and it triggered a bounce of all Standard Manager processes – that’s when I realized I had to understand how it worked and since then I have spent lots of hours looking into internals of concurrent managers. I’m not saying everything is 100% clear for me now – there are too many little things that matter in certain situations. This series of posts will be more about concepts and I hope you’ll find it useful.

So how does a concurrent manager process work? Here is a diagram I created to explain it:

Internal workflow of a concurrent manager process

Internal workflow of a concurrent manager process

I’ve numbered each step of the diagram to provide a bit more details about them:

  1. This is where the story begins. There is no EXIT state in the diagram as the managers normally process requests in an infinite loop. Obviously, there is a way for a concurrent manager process to receive the command to quit when the managers need to be shut down, but that’s not included here for simplicity.
  2. Internal Concurrent Manager (ICM) requests the Service Manager (FNDSM) to start up the Concurrent Manager process. For the Standard Manager processes the binary executable FNDLIBR is started, for the Inventory Manager it’s  INVLIBR, and there are others too.
  3. The manager process connects to the database and reads the settings (e.g profile options, sleep seconds, cache size).
  4. The process saves information about itself in FND_CONCURRENT_PROCESSES table (os process id, database name, instance name, DB session identifiers, logfile path and name and others). It also updates FND_CONCURRENT_QUEUES by increasing the value of RUNNING_PROCESSES.
  5. The concurrent manager process collects information from the database to build the SQL for querying the FND_CONCURRENT_REQUESTS table. The query will be used every time the manager process will look for scheduled concurrent requests.  This is the only time the manager process reads the Specialization Rules (which programs it is allowed to execute and which not) from the database. Keep in mind, if the specialization rules are changed while the managers are running, they are bounced without warning as that is the only way to update the specialization rules cached by the manager process.
  6. The SQL (from step 4) is executed to collect information about pending concurrent requests from FND_CONCURRENT_REQUESTS table.
  7. The results are checked to verify if any requests are pending for execution.
  8. If no requests are pending for execution the manager process sleeps and then goes to step 5. The “Sleep Seconds” parameter of the  ”Work Shifts” settings of the concurrent manager determines how long the process sleeps before FND_CONCURRENT_REQUESTS table is queried again. This is the only time the “sleep seconds” setting is used.
  9. If there is at least one concurrent request pending for execution the concurrent manager process caches rowids for the FND_CONCURRENT_REQUESTS rows of pending concurrent requests. The “Cache Size” setting of the concurrent manager specifies how many rowids to cache.
  10. The cached list of rowids is checked to verify if there are any unprocessed concurrent requests (rows in FND_CONCURRENT_REQUESTS table) left .If none are left – the processing returns to step 5 and the FND_CONCURRENT_REQUESTS table is queried again.
  11. The next unprocessed rowid is picked from the process cache and the processing starts.
  12. Concurrent manager process executes a SELECT-for-UPDATE statement to lock the STATUS_CODE in FND_CONCURRENT_PROCESSES for the request it’s about to process. This is the mechanism to ensure that each concurrent request is executed only once and only by one manager process even if many processes are running simultaneously. The SELECT-for-UPDATE statement can complete with “ORA-00054: resource busy and acquire with NOWAIT specified” or “0 rows updated” if another manager process has started processing the request already.
  13. If the STATUS_CODE of the request was locked successfully the concurrent manager executes the concurrent request. The processing moves to step 9 where the cached list of concurrent requests (rowids) is being checked again.

The workflow is not very complex, but it’s important to remember there are normally multiple concurrent manager processes running at the same time and they are competing for the requests to run. This competition introduces another dimension of tuning for settings, like number of concurrent manager processes, sleep seconds or cache size. Stay tuned for the next post in the series to find out more!

Categories: DBA Blogs

The Pythian chalkboard wall comes together.

Thu, 2013-03-21 11:29

Categories: DBA Blogs

Oracle Trace Files in User Directories

Thu, 2013-03-21 11:24

I encountered an unusual situation recently where Oracle was writing trace files into the user directory instead of the usual oracle diagnostic destinations. Our monitoring software indicated that the “/home” directory was filling up. When I checked the space, I saw it had become 85% full.

$ df /home
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/mapper/home 20642428 16503552 3090300 85% /home

In digging further, I found the pythian user was using a large amount of space. This is very unusual as the monitoring software we use has a very small footprint and the log files rotate so as not to use a lot of disk space.

$ cd home
$ du -sh *
392K bturner
16K lost+found
....
4.5G pythian
...

Digging further, I found an unfamiliar directory “oradig_pythian” had been created. I had never seen this directory before and it is not related to our monitoring software.

$ cd pythian
$ du -sh *
2.4M avail
11M logs
4.3G oradiag_pythian
...

Drilling down the “oradiag_pythian”, there was an usually named directory “host_1460457767_80″ which was full of subdirectories that are similar in name to the oracle diagnostic directories.

$ pwd
/home/pythian/oradiag_pythian/diag/clients/user_pythian/host_1460457767_80
$ du -sh *
2.9G alert
4.0K cdump
4.0K incident
4.0K incpkg
4.0K lck
276K metadata
4.0K metadata_dgif
4.0K metadata_pv
4.0K stage
4.0K sweep
1.4G trace

The only file in the “trace” directory was a sqlnet.log file. This 1.4GB file was being written to many times per minute complaining of a missing oracle diagnostic directory.

$ ls -ltr trace | tail -3
total 1466924
-rw-rw---- 1 pythian pythian 1500659093 Mar 13 09:36 sqlnet.log
$ cd trace
[pythian@bsn-s-ora11g-1 trace]$ tail -10 sqlnet.log
Wed Mar 13 09:42:04 2013
Directory does not exist for read/write [/u01/app/oracle/product/11.2.0/db_1/log] [/u01/app/oracle/product/11.2.0/db_1/log/diag/clients]
Wed Mar 13 09:42:04 2013
Directory does not exist for read/write [/u01/app/oracle/product/11.2.0/db_1/log] [/u01/app/oracle/product/11.2.0/db_1/log/diag/clients]
Wed Mar 13 09:42:05 2013
Directory does not exist for read/write [/u01/app/oracle/product/11.2.0/db_1/log] [/u01/app/oracle/product/11.2.0/db_1/log/diag/clients]
Wed Mar 13 09:42:05 2013
Directory does not exist for read/write [/u01/app/oracle/product/11.2.0/db_1/log] [/u01/app/oracle/product/11.2.0/db_1/log/diag/clients]
Wed Mar 13 09:42:05 2013
Directory does not exist for read/write [/u01/app/oracle/product/11.2.0/db_1/log] [/u01/app/oracle/product/11.2.0/db_1/log/diag/clients]

In addition, the “alert” directory was full of “log.xml” files.

$ ls -ltr alert | tail -3
-rw-r—– 1 pythian pythian 10485849 Mar 10 14:43 log_293.xml
-rw-r—– 1 pythian pythian 10485849 Mar 12 19:05 log_294.xml
-rw-r—– 1 pythian pythian 2915094 Mar 13 09:36 log.xml
$ ls -lt alert | tail -3
-rw-r—– 1 pythian pythian 10485849 Jul 16 2012 log_3.xml
-rw-r—– 1 pythian pythian 10485849 Jul 15 2012 log_2.xml
-rw-rw—- 1 pythian pythian 10485895 Jul 14 2012 log_1.xml

In checking the contents of the most recent log.xml file, it also shows that log files are being written to and being filled up with messages of the missing directory.

$ tail -40 log.xml
host_addr='10.20.30.40'>
Directory does not exist for read/write [/u01/app/oracle/product/11.2.0/db_1/log] [/u01/app/oracle/product/11.2.0/db_1/log/diag/clients]

Directory does not exist for read/write [/u01/app/oracle/product/11.2.0/db_1/log] [/u01/app/oracle/product/11.2.0/db_1/log/diag/clients]

Directory does not exist for read/write [/u01/app/oracle/product/11.2.0/db_1/log] [/u01/app/oracle/product/11.2.0/db_1/log/diag/clients]

Directory does not exist for read/write [/u01/app/oracle/product/11.2.0/db_1/log] [/u01/app/oracle/product/11.2.0/db_1/log/diag/clients]

Directory does not exist for read/write [/u01/app/oracle/product/11.2.0/db_1/log] [/u01/app/oracle/product/11.2.0/db_1/log/diag/clients]

Directory does not exist for read/write [/u01/app/oracle/product/11.2.0/db_1/log] [/u01/app/oracle/product/11.2.0/db_1/log/diag/clients]

Directory does not exist for read/write [/u01/app/oracle/product/11.2.0/db_1/log] [/u01/app/oracle/product/11.2.0/db_1/log/diag/clients]

In checking for directory, the first directory “/u01/app/oracle/product/11.2.0/db_1/log” did exist. The second directory “/u01/app/oracle/product/11.2.0/db_1/log/diag/clients” did not exist.

$ ls -l /u01/app/oracle/product/11.2.0/db_1/log
total 8
drwxr-xr-t 3 oracle oinstall 4096 May 10 2012 client_hostname
drwxrwxr-x 3 oracle oinstall 4096 May 10 2012 diag

$ ls -l /u01/app/oracle/product/11.2.0/db_1/log/diag/clients
ls: /u01/app/oracle/product/11.2.0/db_1/log/diag/clients: No such file or directory

$ cd /u01/app/oracle/product/11.2.0/db_1/log/diag
$ ls -l
total 8
-rw-r—– 1 oracle oinstall 16 Mar 6 14:32 adrci_dir.mif
drwxr-x— 2 oracle oinstall 4096 Nov 18 22:21 rdbms

I decided to create the directory and see what happens. Nothing happened. And by “nothing”, I mean “nothing” in a good way. Oracle stopped writing to the sqlnet.log file and the log.xml files to indicate the directory was missing.
At the same time, nothing was created in the new clients directory.

$ cd /u01/app/oracle/product/11.2.0/db_1/log/diag
$ mkdir clients
$ cd clients
$ ls

Seems like it was spending a lot of time complaining about a missing directory that it did not need to use. So why did Oracle create the “oradiag_pythian” directory and start complaining about the missing “client” directory ?

I was not able to find anything on metalink that would explain this behaviour. Thankfully I still had the FIRST log.xml file that was created and there was some clue there as to why these files were created.

$ head -40 log_1.xml

Create Relation ADR_CONTROL

Create Relation ADR_INVALIDATION

Create Relation INC_METER_IMPT_DEF

Create Relation INC_METER_PK_IMPTS

Directory does not exist for read/write [/u01/app/oracle/product/11.2.0/db_1/log]
[/u01/app/oracle/product/11.2.0/db_1/log/diag/clients]

Directory does not exist for read/write [/u01/app/oracle/product/11.2.0/db_1/log]
[/u01/app/oracle/product/11.2.0/db_1/log/diag/clients]

<msg org_id=’oracle’ /=” time=’2012-06-18T13:41:01.773-04:00′
type=’UNKNOWN’ level=’16′ host_id=’10.20.30.40′

Using the information in this first log file, I was able to find a similar incident that had occurred for someone else and they had been able to track it down to Oracle bug 58917 ( which I could not access on Metalink ).

In the related incident ( https://savannah.cern.ch/bugs/?58917 ), they noted that this was occured following an ORA-24550, and that

For the record: the Oracle signal handler producing these ORA-24550 can probably be completed disabled (e.g. if one wants to use the ROOT signal handler) by setting DIAG_SIGHANDLER_ENABLED=FALSE

Categories: DBA Blogs

Tuning using v$mystat

Thu, 2013-03-21 11:20

When reviewing the performance of some queries, it is sometimes useful to review the sessions statistics for each execution of the query. These session statistics complement the wait events found in ASH and trace files. The wait events do not show this information. Many of these session level statistics can be quite revealing. They can show some execution differences that are otherwise difficult, if not impossible for some, to see with trace files and ASH data.

I had a situation that required to look at these stats so I could see why one query would run fast and sometimes much slower. ASH data was not exposing this issue clearly. In order to facilitate the comparison of the data from run to run, I wrote a simple wrapper ksh shell script for the query. It saves the session statistics in a table before and after the execution of the query and then prints out the statistics in a pivot report. This turned out to be very handy to me and therefore I chose to share it with the world :)

Below is a sample output of the report. It prints only statistics that have changed and it shows all execution stats in separate columns. The executions are labeled like “run#-HH:MI:SS.ssss”. Those labels show the query’s execution sequence and elapsed time.

NAME                                  '1-00:00:00.0107' '2-00:00:00.0085' '3-00:00:00.0050' '4-00:00:00.0051' '5-00:00:00.0053'
------------------------------------- ----------------- ----------------- ----------------- ----------------- -----------------
CPU used by this session                              1                                   2                 1                 1
calls to get snapshot scn: kcmgss                     2                 3                 2                 2                 2
calls to kcmgcs                                                         4
consistent gets                                                        17
consistent gets from cache                                             17
consistent gets from cache (fastpath)                                  17
cursor authentications                                                                    1
enqueue releases                                      1                 1
enqueue requests                                      1                 1
execute count                                         2                 3                 2                 2                 2
global enqueue gets sync                              2                 2                 2
global enqueue releases                               2                 2                 2
logical read bytes from cache                                      139264
no work - consistent read gets                                         13
non-idle wait count                                   1
opened cursors cumulative                             2                 3                 2                 2                 2
opened cursors current                                1                 1                 1                 1                 1
parse count (hard)                                    1                 1
parse count (total)                                   2                 2                 2                 2                 2
pinned cursors current                                1                 1                 1                 1                 1
recursive calls                                       6                 9                 5                 5                 5
recursive cpu usage                                   1                                   2                 1                 1
session cursor cache hits                                               1
session logical reads                                                  17
session pga memory                                65536            196608            196608            196608            131072
session pga memory max                            65536            196608            196608            196608            131072
session uga memory                                                  65488             65488             65488             65488
session uga memory max                           123512            123512            123512            123512            123512
table scan blocks gotten                                               13
table scan rows gotten                                               1276
table scans (short tables)                                              1
workarea executions - optimal                         1                 1                 1                 1                 1
workarea memory allocated                                              50                37                37                37

Note that the execution time in the labels is the elaspe time difference between the capture of the before and after session stats. Not the real query execution elapse. The difference should be very low, like less than 0.01s as you can see from the above durations. Also, note that the PLSQL block that runs the query does not fetch the data. It only opens the cursor, which executes the query. This shows you how long the query takes to get the resultset ready and removes the network and application related overhead.

Here is the code:

#!/bin/sh

export v_sql_text=$(cat sql_runstat_source_query.sql)

echo 'Enter username:';read username
echo 'Please enter your password: ';stty -echo;read passwd;stty echo

sqlplus -S /nolog<
prompt connecting...
connect $username/$passwd

set echo off feed off verify off
set serveroutput on size 2000
col name for a50

prompt verifying the existence of the table query_stats...
declare
 table_exists number;
begin
 select count(*) into table_exists from dba_tables where owner = user and table_name='QUERY_STATS';
 if table_exists = 0 then
    execute immediate 'CREATE TABLE '||user||'.query_stats AS SELECT systimestamp as timestamp, '' '' before, 0 run, name, value FROM v\$statname NATURAL JOIN v\$mystat where 1=2';
    dbms_output.put_line('Table '||user||'.query_stats created.');
 else
    dbms_output.put_line('Warning! Table '||user||'.query_stats exists already. Appending...');
 end if;
end;
/
drop type query_stats_table_type;
create or replace type query_stats_type
as object
(TIMESTAMP TIMESTAMP(6) WITH TIME ZONE,
 BEFORE    CHAR(1),
 RUN       NUMBER,
 NAME      VARCHAR2(64),
 VALUE     NUMBER
);
/
create or replace type query_stats_table_type
as table of query_stats_type;
/
prompt executing the query and saving pre and post session stats...
declare
  next_run     number;
  stats_before query_stats_table_type;
  stats_after  query_stats_table_type;
  type cur_type is ref cursor;
  c            cur_type;
begin
  -- capturing pre-stats
  select nvl(max(run),-1)+1 into next_run from query_stats;
  SELECT query_stats_type(systimestamp, 'Y', next_run, name, value) bulk collect into stats_before FROM v\$statname NATURAL JOIN v\$mystat;
  -- executing the query from the var v_sql_text, which is loaded from file sql_runstat_source_query.sql
  open c for $v_sql_text;
  -- capturing post-stats
  next_run := next_run + 1;
  SELECT query_stats_type(systimestamp, 'N', next_run, name, value) bulk collect into stats_after FROM v\$statname NATURAL JOIN v\$mystat;
  -- saving stats
  insert into query_stats 
  select * from table(cast(stats_before as query_stats_table_type))
  union all
  select * from table(cast(stats_after as query_stats_table_type))
  ;
  commit;
end;
/

prompt getting the run id list...
col max_run new_value run noprint
select max(run) max_run from query_stats;

-- get the run ids and their durations and join them as CSVs to pass to the pivot query
col runlist new_value runs noprint
select listagg(run,',') within group (order by run) runlist
  from
       (select ''''||ceil(run/2)||'-'||duration_H_M_S||'''' run
          from ( SELECT distinct run,
                        to_char(extract(HOUR   FROM timestamp-lag(timestamp) OVER (ORDER BY name, run)),'FM00')||':'||
                        to_char(extract(MINUTE FROM timestamp-lag(timestamp) OVER (ORDER BY name, run)),'FM00')||':'||
                        to_char(extract(SECOND FROM timestamp-lag(timestamp) OVER (ORDER BY name, run)),'FM00.0000') AS duration_H_M_S
                   FROM query_stats
               )
         where duration_H_M_S is not null
           and mod(run,2)<>0
         order by 1
       )
;

prompt pivoting the run statistics where there is a value difference...
set linesize 200 trim on
select *
  from (
   select ceil(run/2)||'-'||duration_H_M_S as run,
          name,
          value_diff
     from (
           SELECT run,
                  name,
                  value,
                  value-lag(value) OVER (partition by name ORDER BY name, run) as value_diff,
                  to_char(extract(HOUR   FROM timestamp-lag(timestamp) OVER (ORDER BY name, run)),'FM00')||':'||
                  to_char(extract(MINUTE FROM timestamp-lag(timestamp) OVER (ORDER BY name, run)),'FM00')||':'||
                  to_char(extract(SECOND FROM timestamp-lag(timestamp) OVER (ORDER BY name, run)),'FM00.0000') AS duration_H_M_S
             FROM query_stats
           )
   where mod(run,2)<>0
     and value_diff > 0
   )
pivot (sum(value_diff) for run in (&runs))
order by name
;
prompt done!
exit
EOF

Some notes about it:

1- The ksh script runs an anonymous plsql block that gets dynamically modified to incorporate the statement you want to test.

2- The statement you want to test needs to be stored in a file called “sql_runstat_source_query.sql” located in the same folder as the ksh script.

3- Do not add a semi-colon at the end of your statement in the sql_runstat_source_query.sql file.

4- The query_stats table contains all the session statistics before and after each executions, not just those that changed and get printed by the pivot table. It also has the timestamp of the captures. You can therefore get more information from this data if needed. For example, you could tailor the column label to show you the full timestamp, or write your own analytics query to suite your needs.

5- The run numbers in the query_stats table are actually a serial number for the statistics capture, starting at 0. 0=first “before” capture, 1=first “after” capture, 2=second “before capture, and so on.

6- The script will create the query_stats table if it is missing, else it appends to it. Two user TYPEs are also created. None of these get dropped after execution so it’s up to you to remove them if you don’t want to keep them in your schema.

7- Remember to either truncate, rename or drop the query_stats table before testing a different statement.

8- You may need to alter the script to set your ORACLE environment variables.

Enjoy!
Marc

Categories: DBA Blogs

Virtual IOUG Collaborate 13 for only $299

Wed, 2013-03-20 16:42

For only $299 you can access Virtual IOUG Collaborate 13 individually or setup a conference room at your company for the whole team. There will be two tracks broadcasted so if you have demand for both tracks, it make sense to purchase two access passes and setup two tracks broadcasted in parallel in your office so that members of your team can choose individually what to attend. I think that’s a steal even if you want to purchase this package individually.

Virtual attendance doen’t completely replace the physical presence but if you can’t come to Denver in April, this is the next best thing. Relationships that you gain from meeting your peers and speakers face to face are extremely valuable and you will cary them throughout your whole career. I’ve been just recently discussing with Rene Antunez (who’s just joined Pythian) how social media and twitter revolutionized our professional networking, and we agreed that things like Twitter alone won’t cut it for many of us. While some folks only know most of their peers online, the real relationships are established face to face — discussing a presentation with the speaker during lunch, sharing a drink at the reception or having a heated discussion in a pub near the convention center. With such relationships, social media becomes a natural extended communication channel.

But I digress… so those of you who can’t come to Denver for one reason or another, do consider taking Virtual IOUG Collaborate 2013 Pass and enjoy the session in tracks “High Availability, Disaster Recovery, Manageability” and “Performance, Scalability and Internals”. Here are some of the speakers you would have a chance to see over 4 days of sessions broadcasting:

  • Kuassi Mensah, Oracle
  • John Beresniewicz, Oracle
  • Michael Abbey, Pythian
  • Yury Velikanov, Pythian
  • Craig Shallahamer, OraPub
  • Karl Arao, Enkitec
  • Carlos Sierra, Oracle
  • Mark W Farnham, Rightsizing
  • Guy Harrison, Dell
  • Frits Hoogland, VX Company
  • Tim Gorman, Evergreen Database Technologies
  • Kyle Hailey, Delphix
  • Gwen Shapira, Pythian

My favorite virtual track is Performance, Scalability and Internals but I will be at the conference physically so I will also be attending lots of sessions on Big Data and Data Science. By the way, as virtual attendee, you will get access to the recordings as well so even if you miss certain session live, you can watch it later at your convenience.

If your team is dispersed geographically, the members would need a pass each to watch broadcast live but, again, I still think it’s a steal at $299!

Looking forward to see some of you in Denver and some of you on Twitter talking about what you saw at the virtual conference. If you don’t come to Denver and not going to be virtual attendee — tell me why and what else can we do to plug you into Collaborate — I will relay your message to the IOUG Board of Directors and the Conference Committee.

Categories: DBA Blogs

Pythian Days!!!

Wed, 2013-03-20 13:55
Categories: DBA Blogs