DBA Blogs
What’s your take on RDBMS and NoSQL?
Oracle Forums due for Upgrade
This is the announcement.
This is the discussion thread. (requires login to forums.oracle.com)
.
.
.
NoSQL and Oracle, Sex and Marriage
COLLABORATE13 Day One
Another Saturday night and I’ve got lots buddies … a take-off on a Sam Cooke ditty from the 60′s. Denver came alive once I arrived at the Hyatt and went full tilt until the wee hours of the morning. Saturday was all about catching up with friends. Some of these people I have been hanging out with for 22 years at these conferences and they are more like family. First task today was registering which was a breeze. There have been mega improvements in this process as the IT solutions become more mature. The whole process including bag pickup took less than 10 minutes.
Next stop was a high availability session provided by Larry Carpenter and Joe Meeks from Oracle. I have heard these two speak together and apart many times and as usual they were intoxicating. The material they covered was familiar and I was fluent in most of it. We got bogged down when discussing ASM and they gentlemen up front eventually spearheaded a redirection of the discussion to get back on topic. Joe is a perfect example of why we like presenting at Oracle Open World so much. A number of years ago, I gave a session called Stand by Your Standby and even played Patsy Cline’s Standby by Your Man as people filed into the room. At the end of the session, a handful of black shirts came up front to chat. I noticed they were from Oracle and asked what SBU they were and and was told data guard and high availability. I asked how I did and if I misled their clientel during my session. Joe in particular gave ms some positive feedback and told me it was fine. Sigh of relief … but that type of immediate feedback is invaluable and welcome.
Lunch was on 16th street with a handful of colleagues and then back to the convention centre for a session in the leadership track sponsored by the IOUG. It was enlightening and the discussions fruitful. The day was great and met my expectations except for one thing. The internet connectivity in this convention centre is abysmal. Our connections are on and off and it is very annoying. I noticed the same problem when I was in for RMOUG and it was equally annoying. I am not sure if this is deliberate or just the way it is. We smartly try to stay off the local cell networks as the roaming and out-of-home data charges are ridiculous.
Tonight starts with an IOUG 20th anniversary celebration from 5-7pm then off to the ACE dinner at Maggione’s. Tomorrow starts early with a trip to Mile High ballroom 2A to test my Air’s compatibility with the software we are using for the COLLABORATE virtual conference. At the end of my rman session on Wednesday I want to demo table level recovery. I need to take advantage of the magnifying engine on my Air since the code listing will be the size of the head of a pin from the third row of the session room not to mention the back. If I cannot use my Air I cannot do the demo.
COLLABORATE13 Day Zero
Day one, sort of, for #C13DEN. Ever since I started using Twitter in 2011, I have conjured up imaginative reasons for the hash sign prefix to tags. I continue to settle on one rather odd explanation. One of the pseudonyms for that character on the keyboard is the pound sign. Since that is a unit of measurement, I think back to Marty in Back to the Future calling everything “heavy”. Heavy = pound means everything said on Twitter is potentially heavy.
I flew out of Pearson mid-afternoon which was wise. That is probably the least busy time of day and I was in luck. The Air Canada kiosk let me down a different way than I had experienced yet. I went through the lengthy boring dialogue with Martha (my name for the kiosk). We got to the document being used for entry into the USA question and I proudly chose my Nexus. I then placed the card in the scanner as instructed and the machine told the name on my ticket (MichaelS Abbey) was not the same as that registered with Nexus. That was news to me. So we keep going and Martha’s next prompt wants the number from my travel document. I assumed by then that she had already refused to use Nexus so we must be back at the defaul choice – passport. She asks for the number on my travel document so I enter the first two alpha characters of said document and they do not register on the screen. I enter the 6 numbers which do then gaze at the screen showing a 6-digit number rather than an 8-digit alphanumeric string. Silly keyboard – I try again with the same result. I abandon the silly machine and inquire of an attendant who instructs me to proceed to the G counters for assistance.
The line at G is short but still annoyingly long. I have probably had serveral million flights in my time so it seemed especially odd that this time I could not check-in at the kiosk. About 20 minutes later I was on my way to customs. I entered the 2-person Nexus line rather than the 2,000 person non-Nexus line and the machine there froze part way through the dialogue that terminates in the receipt of a piece of paper that allows you into the USA. I re-started the dialogue with a machine beside the one frozen and that worked fine. I figured that Martha had contacted the machine on the right informing her to not allow me into the USA after the position I put Martha in during Air Canada electronic check-in. Fortunately Martha did not have enough time to contact ALL her Nexus counterparts so I was on my way to the USA, card in hand.
I toddled off to gate F34 after handing my entry card to a uniformed USA customs agent and discovered, suprise surprise, that that gate was the furthest away possible. That always reminds me of two other situations where one is ALWAYS plopped somewhere that is the furthest physically away from where one is going. When the front desk person checks you into a hotel, the computer chooses the room furthest away from the elevator. Secondly, when selecting a gate at an arriving airport, Martha’s cousing Jenn selects the gate furthest away from baggage claim. Jenn is so sophisticated that she knows if this is your final destination. If not, she still places you as far away as possible from your connecting gate – preferrably in a different lettered terminal from where you arrived.
So I arrive at F34, some 11.5 kilometres from the customs person who gladly took my card. There were lot of moving walkways to use and I am always intrigued by the fact that nobody gets gobbled up by the treads as they disappears into the floor. I have an hour until the flight leaves. About 15 minutes later an Air Canada person appears. I count to 178 (a random numbr) after his appearance then toddle off to the counter to ask a question that ranks as odd with both the asker and the askee … is there any exit row available? This question is ridiculous from the viewpoint of the person at the gate who has a 212 person plane to load and some idiot wants to talk seat change. It is ridiculus from my standpoint as one almost always gets a “no sorry” from the counter attendant. As ridiculous as it is from both ends, I am overwhelmed with excitement as the attendant utters two of my favourite words when flying – “aisle or window.” Say what? I am in 19F and loving it. I have always wondered why the seats in a 4-seat span are called A, B, E and F. Have C and D been dropped from the English language?
I sleep most of the way to DEN and start writing this very text that is a testimony to the trip I am on to COLLABORATE 13. I have been going to these tech shows sponsored/facilitated by the IOUG since 1990 in Anaheim CA. The trip into town will be taxi and hopefully I will run into someone I know so we can share. The journey from DEN into the city is long, the reason why travellers get there up to a day before a show begins. Getting there so far in advance guarantees that the trip from the airport to the hotel will end before the event starts.
So what will this week bring? My prime mission is two-fold. First is my desire to support this wonderful company I work for and do whatever I can to enhance our profile in the community and market our services and personnel to a hungry customer base. My second is to learn and network. Networking is the primary community based need that all shows foster especially those run by the user groups. Tomorrow is packed with education with a bevvy of well-respected and loved presenters. The show officially starts Monday and my first session is late morning. Stay tuned for more musings … have you had memorable experiences as of late with carriers in North America? It seems that the catch-all phrase for used to “blame” for poor service is that this is the post-911 world and our hands are tied. Agreed or not?
Don’t miss a suite of sessions at COLLABORATE13 from strong Pythian presenters. A synopsis can be views here. See you at the show … please stop one of us in the hall to chat; we’d love to meet each and every one of you.
Nasty ORA 4030 in 11.2.0.3.5 using text index ( makes me grumpy! )
It choked eventually and terminated with ORA 4030 ... have not seen one of those in a really long time.
Apparently Oracle has had a fix identified and available from August 2012 that ( somehow ) they have not seen as important enough to include in last several rounds of database PSU's. Wow ... thanks so much Oracle support!
Looks like the bug / 4030 will only hit things that do large numbers of changes ( inserts also ? dunno ) to text index column ... but still ... not happy here.
Problem Key
ORA-4030
[404008]
[druhopen:hash]
[druhopen:hash table]
Testing of the fix one-off looks ok so far and again ... found a while back ( p14367567_112030_Linux-x86-64.zip ).
The following are the bugs fixed by this patch:
14367567: EXCESSIVE PGA MEMORY GROWTH LEADING TO HIGH SWAPPING AND PERFORMANCE IMPACTS
A Look at SQL Plan Management
All of us have experienced times when the runtime of a regularly executed report or query changes and not always for the better, inexplicably. The impact of these runtime changes can range from a simple annoyance to something that may severely impact business-critical operations. An organization requires consistent behaviors from their systems with little to no regression in runtime. Database queries that have inconsistent response times in many cases exhibit what is known as plan instability. Oracle Database Enterprise Edition 11g introduces SQL Plan Management to address plan stability issues.
The Components of SQL Plan Management
The goal of SQL Plan Management is to provide a consistent execution plan regardless of changes made to the database environment. While consistency is great in the face of changes to an environment, it is important not to miss out when changes would improve the runtime execution. SQL Plan Management has three components that meet this goal.
-
SQL Plan Baseline Capture
-
Create a baseline and a plan history for repeatable SQL
-
-
SQL Plan Baseline Selection
-
Only plans known to be performant are used
-
-
SQL Plan Baseline Evolution
-
Find plans in the plan in history that are better and add them to the baseline
-
A SQL Plan Baseline is an accepted execution plan(s) for SQL statements under plan management. Only accepted plans in the baseline are used for SQL statements under plan management. If a new plan is created, that plan is added to the plan history. All of this information is stored in SQL Management Base in the SYSAUX tablespace.
SQL Plan Baseline Capture
Baseline Capture can be done automatically or via bulk loads. In the case of automatic capture, only SQL statements that have been repeated are captured. Automatic capture can be set at the session or system level by setting the parameter
OPTIMIZER_CAPTURE_SQL_PLAN_BASELINES to true (default is false). Setting this parameter at the session level is recommended as you can control the scope of the collection to a particular session. Below is a typical sequence of events when using automatic capture.
-
Set OPTIMIZER_CAPTURE_SQL_PLAN_BASELINES to true at the session level
-
Execute the query to put into plan management two times
-
Exit the session
After the second execution of the query, the SQL Plan Baseline is created and the first plan is added as both ENABLED and ACCEPTED. In order for a plan to be selected from the baseline, the plan has to be both ENABLED and ACCEPTED. All future executions of the query will use the plan from the baseline, even if a new plan was generated. Any new plans generated will be added to the history.
It is also possible to load plans in bulk from sources such as the cursor cache, stored outlines, other databases, and if you have the license for the Diagnostics and Tuning Packs, you can load plans from SQL Tuning Sets and AWR. When loading plans manually or via bulk methods, the plans are enabled and accepted by default.
SQL Plan Baseline Selection
Once the baselines are created, plan selection is the next component of SQL Plan Management. When using SQL Plan Management, before a query is executed a check is first made to see if a SQL Plan Baseline exists.
If no baseline exists, the query executes as it would before SQL Plan Management–that is the plan generated is used to execute the query. If, however, a baseline does exist, a plan from the baseline is used to execute the query and not the plan generated. Here is where plan stability is ensured. Only plans in the baseline that are ACCEPTED and ENABLED are used.
When new plans are generated, the plans are added to baseline as ENABLED, but not ACCEPTED. There is a very subtle point in this action that really needs to be understood. When a query is under plan management, every plan generated by the query, regardless if used or not, is documented and maintained. These collected plans can be added to the base through the process of SQL Plan Baseline Evolution.
SQL Plan Baseline Evolution
Over the normal course of system operations, changes will be made that provide better plans. We want to be able to take advantage of new plans made in response to these change that produce faster results, but we want to avoid plans that produce slower results.
Using the DBMS_SPM.EVOLVE_SQL_PLAN_BASELINE API plans in the history can be marked as ACCEPTED. Three things can be done with this API.
-
Automatically ACCEPT the plan if it performs better than the existing SQL plan baseline
-
ACCEPT the plan without doing a performance verification
-
Do a performance comparison and generate a report but do not evolve a new plan
Conclusion
SQL Plan Management provides plan stability and only allows new plans to be used if they perform better than the baseline. With plan stability, a production system exhibits a consistent level of service to the business users. While those features alone are remarkable, what is more remarkable is that if you have a license for the Enterprise Edition of the Oracle database you have the ability to use SQL Plan Management now as it is included with Enterprise Edition.
Lessons from preparing my Exadata talk
Well, I’m giving this talk related to Exadata at the Collaborate 13 usergroup conference on Monday. I’ve spent a lot of time – probably too much time – preparing the slides and practicing the talk. I first gave this talk a year ago at our office and then again last October at a conference in North Carolina. But, I’ve never been happy with it. I feel a lot better about it now because the slides and the associated notes have a lot of information on them. I got a lot of great feedback from several people and I’ve made changes accordingly.
But having spent too much time reviewing this I think there are a small number of important concepts that I’m really trying to get across:
- Exadata Smart Scans bypass the block buffer cache
- Exadata Smart Scans happen instead of FULL scans
- Make indexes invisible or increase optimizer_index_cost_adj to encourage FULL scans
- Exadata Smart Scans tend to be part of a HASH JOIN
- HASH JOINs can be sped up by adding PGA memory – pga_aggregate_target
- You can free memory by reducing the size of the block buffer cache – sga_max_size
So, this is the short version of my 45 minute talk More PGA, less SGA – my talk in four words!
- Bobby
The Smallest Details are the Ones that Make you Better
One of the things I love about my career (Oracle DataBase Administrator) , is how fragile it is on a day in and day out basis. I can say with certainty , that every day I turn on my laptop to start working, it can probably be my last day working as a DBA, and this is not because I want it to be, it’s because unlike many jobs out there, the smallest mistake I make, can be my last mistake in my current job.
Believe me, I am not under appreciating other jobs or professions out there, but the level of stress that this job has at its peak moments, is comparable to being a surgeon performing an open heart surgery or an air traffic controller with two airplanes that are heading for a collision.
Just imagine this, if we make a mistake, a surgeon might not have the correct data he needs for the surgery, or the traffic controller may not have the electric power to co-ordinate his or her manoeuvres. Putting it more lightly, we are responsible to having your bank account statements up to date, basically your money. Of course we are the ones who have helped you maintain the history of those online games you played, so you can claim yourself the king of Warcraft or Madden.
What I want to get to, is if you are starting in this job, or you have been doing it for so long, you are only as good as your last 5 minutes, because the last mistake can be your last mistake. So what you want to do is take the necessary steps to avoid any mistake in the environment you are working, this is what we at Pythian call being a FIT-ACER DBA.
The best way that I have found to achieve or being closer to this, is by doing the following in the tools I work with:
PUTTY
The following configuration will help you put your words in Red with a black background, helping you distinguish a Development to a Production environment
Windows --> Colours --> Default Foreground Red ->255, GREEN ->0 , BLUE->0
Toad
This tool is a little more discreet , but the best thing you can do is in your connection window and search for the “Color” column, and switch all your production connections to the color red.
SQLPLUS
The best thing that you can do is in the glogin.sql, that is in $ORACLE_HOME/sqlplus/admin, add the following line
SET SQLPROMPT '&_connect_identifier >'
Also you can do what Uwe Hesse uses
Put this is a script called whoami.sql
set serveroutput on
begin
dbms_output.put_line('USER: '||sys_context('userenv',-
'session_user'));
dbms_output.put_line('SESSION ID: '||sys_context('userenv',-
'sid'));
dbms_output.put_line('CURRENT_SCHEMA: '||sys_context('userenv', -
'current_schema'));
dbms_output.put_line('INSTANCE NAME: '||sys_context('userenv', -
'instance_name'));
dbms_output.put_line('DATABASE ROLE: '||sys_context('userenv', -
'database_role'));
dbms_output.put_line('OS USER: '||sys_context('userenv', -
'os_user'));
dbms_output.put_line('CLIENT IP ADDRESS: '||sys_context('userenv', -
'ip_address'));
dbms_output.put_line('SERVER HOSTNAME: '||sys_context('userenv', -
'server_host'));
dbms_output.put_line('CLIENT HOSTNAME: '||sys_context('userenv', -
'host'));
end;
/
And your result will be the following:
USER: SYS SESSION ID: 23 CURRENT_SCHEMA: SYS INSTANCE NAME: TESTDB DATABASE ROLE: PRIMARY OS USER: oracle CLIENT IP ADDRESS: SERVER HOSTNAME: servidor1.localdomain CLIENT HOSTNAME: servidor1.localdomain
UNIX
For us who handle unix and linux environments, one of our best friends to help us identify where we are and which is the current ORACLE_SID we have set in our environment is with the PS1 variable. With the example below, you can see several things which can help you identifying where you are at:
PS1="
`echo ${LOGNAME}`@`hostname` [\${ORACLE_SID}] \${PWD}
`echo ${LOGNAME}` $ "
export PS1
- Example without PS1 environment set as above
[pythian@servidor1.localdomain ~]$ echo "Hi" Hi
- Example with PS1 environment set as above
pythian@servidor1.localdomain [TESTDB] /home/pythian/working/antunez pythian $ echo "Hi" Hi
SQL Developer
This Oracle tool allows you to put all your connections into folders, which will allow you to easily identify which type of connections you have
Right Click on the connection name --> Add To Folder
All of these tips might not seem like a lot, but believe me that when you are facing one of those stressful moments , you want to make sure that you are working where you should be working, and have faith in me that this is a little grain of salt that will help you maintain your job, at least for one more day :)
Log Buffer #314, A Carnival of the Vanities for DBAs
It is an industry norm, supported by analyst researches that blogs are the best informal medium to share the information. This unique medium provides readers with increased leverage in negotiating the hard turns in their technological lives. Log Buffer makes that leverage more powerful.
Oracle:
Alan Gardner is deploying Cloudera Impala on EC2 with Example Live Demo.
If you want to follow the real news and insight into the online learning revolution, then e-Literate has always been the blog to follow.
Tuula Fai wants to do things, and when he cannot he reverts to a little help.
Here is another quick post about yet another new feature in PS6 . As many of you know we have supported identity propagation for a long time .
With the explosion of connected devices and the ubiquity of high speed wireless connectivity, Internet of Things takes on a new relevance.
SQL Server:
As per Microsoft Books Online and SQL Server Security best practice white paper, it is recommended to periodically review privileges granted to public role, Basit serves SQL Server tip.
You can get backups of your WASD databases. But, if you want them to be transactionally consistent, you need to do a little extra work. Scary DBA writes.
How I prepared myself for the MCSE certification, Koen Verbeeck shares.
Ever wanted to know all there is to know about SQL Server, and afraid to ask? Pearl Knows.
If you want to make you backup faster, you can try BUFFERCOUNT parameter with compression backup. See the footprint of James.
MySQL:
MariaDB 5.5 is now inside of OpenBSD, the free, functional & secure OS. Colin Charles blogs.
How InnoDB accidentally reserved only 1 bit for table format, Jeremy Cole tells.
Artem Livshits gives an inside perspective of MySQL 5.6
Simon Mudd has a good post about warming up a MySQL 5.6 server.
The new MySQL 5.6 GA release delivers a host of new capabilities to support developers releasing new services faster, with more agility, performance and security.
Which Partitions Does the SQL Statement Access?
This is a short blog post on how one can prove that a particular partition of the table or index is accessed by a specific query.
The solution to this question seemed interesting enough to share it with you all. Here’s a short description of what I had:
- A SQL with sql_id = ’9kuvj1g38hja2′ is the TOP 1 statement in “User I/O” section of AWR report.
- The statement queries data from multiple tables, one of them is a fairly large table named “ALERT”. It is range partitioned by date and each year has it’s own partition (Y2013, Y2012, …)
- The SQL takes 5 – 10 seconds to complete and is executed 500 – 5000 times each hour (sometimes even ~10 of them are running simultaneously)
- ~95% of elapsed time is spent on db file sequential reads
- Diagnostic pack is licensed (it is required for this method as ASH is used)
We also had a statement that none of the queries should be reading data older then 1 year, but looking at the statistics I suspected there was something wrong. So here’s how I found out older data from the ALERT table was accessed:
- As the query is executed very often it has many appearances in ASH
- As the statement does lots of db file sequential reads, there are lot of p1 (file_id) and p2 (block_id) combinations available from ASH
- file_id and block_id can be joined to DBA_EXTENTS to find out the segment which the block belongs to. That is where the partition names being accessed by the SQL where found out.
The challenge with this approach was the fact that I had more then 100K samples for ’9kuvj1g38hja2′ waiting on db file sequential reads in DBA_HIST_ACTIVE_SESS_HISTORY and there were millions of extents in the database. How do I join them efficiently? I can fetch only the segments we’re interested in from DBA_EXTENTS before joining it all to DBA_HIST_ACTIVE_SESS_HISTORY (Thanks Yury for this great hint)! So this is the query that solved the mystery:
with
exts as (select /*+ materialize */ partition_name, file_id, block_id, blocks
from dba_extents where owner='SECRET' and segment_name='ALERT'),
stats as (select /*+ materialize */ p1, p2, count(*) cnt
from DBA_HIST_ACTIVE_SESS_HISTORY where sql_id='9kuvj1g38hja2'
and event='db file sequential read' group by p1, p2)
select partition_name, sum(s.cnt) read_count
from exts e, stats s
where e.file_id=s.p1
and s.p2 between e.block_id and e.block_id+e.blocks-1
group by partition_name
order by 2;
It’s short and simple, and it also answered the question I asked. Is old data accessed by the SQL statement? Yes.
PARTITION_NAME READ_COUNT ------------------------------ ---------- Y2006 111 Y2007 5645 Y2008 5698 Y2013 10388 Y2009 13473 Y2010 21600 Y2011 37186 Y2012 56184 8 rows selected
Be careful interpreting the results! If the SQL does not report a partition as being accessed, it doesn’t mean it was not accessed. ASH is based on sampling – maybe we got “lucky” and samples didn’t cover reads from that partition? Maybe all blocks belonging from to the partition were in buffer cache and no reads were needed? But, if the SQL reported a partition as being accessed, we can be 100% sure it really was, and for me that was enough.
Deploying Cloudera Impala on EC2 with Example Live Demo
A little while ago I blogged about (and open sourced) an Impala-powered soccer visualization demo, designed to demonstrate just how responsive Impala queries can be. Since not everyone has the time or resources to run the project themselves, we’ve decided to host it ourselves on an EC2 instance. You can try the visualization; we’ve also opened up the Impala web interface, where you can see query profiles and performance numbers, and Hue (username and password are both ‘test’), where you can run your own queries on the dataset.
Deploying Impala on EC2While there are many tools to deploy a Hadoop cluster on EC2 – like Apache Whirr, or even Cloudera Manager – I only wanted to use a single instance for the entire cluster. Starting from the base Ubuntu (Precise) image, I added Cloudera’s apt repos, and installed the single node configuration. Impala doesn’t support using Derby for the Hive metastore, so I installed MySQL and configured Hive to use it instead. Then I installed Impala using Cloudera’s instructions. Impala, and all of the Hadoop daemons, are running comfortably on one M3 2XLarge EC2 instance. Given our modest demands, this may actually be overkill; I over-specced the server trying to find a (now-obvious) performance problem involving short-circuit reads.
Short-Circuit ReadsOn the Pythian cluster, we could consistently return a query in around half a second. On EC2 queries took closer to 5 seconds. A bit of investigation showed that in getting the server up and running, I had disabled short-circuit reads, which slows down Impala considerably. While Impala isn’t supposed to start without short-circuit reads, it only throws an error if you have short-circuit reads enabled but misconfigured. If short-circuits are off in the hdfs-site configuration, it will happily start and run very slowly. With the default DEB install, the libhadoop library isn’t installed to the LD_PATH on Ubuntu, which prevents short-circuit read from working. The easiest solution was to create symlinks for libhadoop to /usr/lib/, then run ldconfig:
ln -s /usr/lib/hadoop/lib/native/libhadoop.so /usr/lib/
ln -s /usr/lib/hadoop/lib/native/libhadoop.so.1.0.0 /usr/lib/
ldconfig
To confirm whether your cluster has short-circuit reads enabled, you can visit the Impala web interface (by default, port 25000 on any system running impalad) and click on the ‘/varz’ tab. Search for ‘dfs.client.read.shortcircuit’ – it should be set to ‘true’.
PartitioningWith libhadoop installed and short-circuit reads enabled, the next greatest performance improvement came from partitioning the table on the sensor id. Since all of our web interface queries filter by sensor id, Impala can perform some serious partition elimination: looking at the query profiles, partitioning the table reduced the amount of data read from HDFS from 4GB to 50MB, and the query time from 2.6s to 130ms. The README on Github has instructions on how to use dynamic partitioning in Hive to quickly partition the soccer data; these steps can be generalized to any dataset.
I’m Back !! Oracle Index Internals Seminar Scheduled in Europe For June.
COLLABORATE 13 à la Pythian …
As per many previous IOUG/OAUG/Quest shows, Pythian is will be in Denver next week! It was a sunny day in the fall of 1991 when I gave my first paper at International Oracle User Week (IOUW), a pre-cursor to COLLABORATE and a few earlier incarnations called IOUG-Live and IOUG-Alive! It has been a whirlwind of papers and sessions since then and I for one am still speaking mainly about the good old database. It is the heart of my skill set and always will be part of it. Outside of our participation at the speaker level, Pythian is also in the exhibit hall in booth 606 with prime real estate close to the main entrance. Just hang a left and we are the 5th or 6th booth on your right. Pythian rock stars will be at the booth off and on all week. You never know who you will meet there as the social network following and interacting with Pythian is vast and deep.
Speaking of vast … we look forward to your attendance at one of our sessions as per the details next:
Maris Elsins (@MarisELsins) & Yury Velikanov (@yvelik) speak on Oracle Cluster 11GR2 with Oracle Applications on Sun. Apr. 7 from 1:15 pm – 2:15 pm as well as Running E-Business Suite Database on Oracle Database Appliance on Mon. Apr. 8 from 9:45 am – 10:45 am.
Alex Gorbachev (@alexgorbachev) presents Practical Hadoop by Example on Mon. Apr. 8 from 1:15 pm – 2:15 pm in addition to Under The Hood of Oracle Pluggable Databases on Mon. Apr. 8 from 2:30 pm – 3:30 pm.
A popular recent arrival at Pythian René Antunez (@AntunezRene) brings you How to survive a disaster with RMAN on Mon. Apr. 8 from 1:15 pm – 2:15 pm.
The “Big Data Lady”, a.k.a. Gwen Shapira (@gwenshap) cannot be missed at RAC Experts Panel on Mon. Apr. 8 from 3:45 pm – 4:45 pm. Her prowess also brings you Engineered Systems Curriculum: Databases in a solid state world: how Exadata X3 and other database systems leverage the performance of flash on Mon. Apr. 8 from 5:00 pm – 6:00 pm. You will not be disappointed at Gwen’s Building the Integrated Data Warehouse With Oracle Database and Hadoop on
Tue. Apr. 9 from 10:45 am – 11:45 am. Catch Gwen as well at Big Data SIG Meeting on Wed. Apr. 10 from 3:00 pm – 4:00 pm and last but not least, Queues, Pools and Caches – The Right Way to Scale OLTP on Thu. Apr. 11 from 11:00 am – 12:00 pm.
The dynamic duo of Paul Vallée (@paulvallee) and Alex Gorbachev (@alexgorbachev) can be seen at IT’s own Last Mile Problem: Privileged Access Surveillance on Mon. Apr. 8 from 5:00 pm – 6:00 pm.
Our very own, well-known and loved in the Twitter world Yury Velikanov (@yvelik) delivers You most probably don’t need RMAN catalog database on Wed. Apr. 10 from 9:30 am – 10:30 am.
Christo Kutrovsky (@kutrovsky) discusses Delivering Big Data with Exadata on Wed. Apr. 10 from 9:30 am – 10:30 am .
One of our many Apps specialist Vasu Balla (@vasuballa) presents Anatomy of Autoconfig in Oracle E-Business Suite on Wed. April 10 from 11:00 am – 12:00 pm as well as Staged Patching Approach in Oracle E-Business Suite on Thu. Apr. 11 from 12:15 pm – 1:15 pm
Yours truly(@michaelabbeyCAN) discusses Pluggable Databases – An Intro on Mon. Apr. 8 from 11:00 am – 12:00 pm as well RMAN: New Features on Wed. Apr. 10 from 8:15 am – 9:15 am.
The list is endless but NOT the talent. Pythian employs well-known, well-respected and very talented people and, oh ya just in case you are wondering, we are hiring :). See you on Denver, the 1.6 kilometre high city as we call it in Canada. If you are thinking about getting involved in a user group you will find representatives all over the show in Denver and please speak to Alex Gorbachev (serving IOUG board member) or yours truly who was on and off the board between 1993 and 2007.
Vanessa Simmons, Pythian’s Director of Business Development will be at the show and would love to meet and speak with any/all interested. Please get her as @pythiansimmons on Twitter or simmons@pythian.com.
Meet Pythian’s Oracle Apps Experts at Collaborate 13!
There’s a rare event happening next week: All three of Pythian’s Oracle Apps tech leads will be in the same place. Vasu Balla, Maris Elsins, and recently-minted Oracle Ace Director Yury Velikanov will all be at Collab 13 this year. Since we’re a globally distributed, round-the-clock team, this is not something that can happen very often; in fact, Yury and Vasu will be meeting face-to-face for the first time.
This isn’t a team-building exercise, though. Our 3 biggest Apps brains have 5 presentations for the community this year, from large-scale systems (EBS on 11gR2 clusters and ODA) to nitty-gritty details (Autoconfig and Apps patching), to essential fundamentals (RMAN).
The presentationsI’ve had an opportunity to review several of the presentations by the Pythian EBS teams, so I can assure you that you’re in for a treat.
Maris and Yury are presenting on complex configurations on the database side of Oracle E-Business Suite:
- Session #11900 – Oracle Cluster 11GR2 with Oracle Applications (Sunday, April 7, 1:15 pm – 2:15 pm ): Everything you need to know about setting up EBS to work properly with SCAN in an 11gR2 database cluster, and some of the challenges of managing concurrent processing in that environment
- Session #11792 – Running E-Business Suite Database on Oracle Database Appliance (Monday, April 8 9:45 am – 10:45 am): Engineered systems like ODA make a lot of things super-simple, but if you’re an Apps DBA, you’re probably used to the idea that EBS always needs a little “something special.” Get all of those “special” details in this talk!
Yury also has a presentation for our Core DBA compatriots, and it’s good stuff for Apps DBAs as well:
- Session #362 – You most probably don’t need RMAN catalog database (Wednesday, April 10 9:30 am – 10:30 am ): Is a catalog database critical to an effective use of RMAN? Can you get away without using one? Challenge your assumptions.
Vasu will be talking about some of the nuts and bolts of the E-Business Suite tech stack:
- Session #12112 – Anatomy of Autoconfig in Oracle E-Business Suite (Wednesday, April 10 11:00 am – 12:00 pm): Get the whole story on what Autonconfig does, and learn how to set it up properly, so you never again have to utter that dreaded phrase, “Yeah, the last Autoconfig run broke XYZ”
- Session #12125 – Staged Patching Approach in Oracle E-Business Suite (Thursday, April 11 12:15 pm – 1:15 pm): Save some of your eneergy for the last day of the conference, because Vasu has good stuff about the benefits of staged patching, and a sneak peek at what’s coming in release 12.2!
Since you’re reading this blog, you already know we have a pretty strong commitment to social media in these parts. If you miss them at their presentations, the best way to find Maris, Yury (“Mr. Social Media”), and Vasu is on Twitter: @yvelik, @MarisElsins, and @vasuballa. We’ll have several meetups arranged via Twitter throughout the conference. Also, consider stopping by Pythian’s booth (#606) to meet some of our other knowledgeable and entertaining colleagues! We look forward to the opportunity to meeting our clients, colleagues and friends (new and old). See you at the show!
BTW, this post is supposed to be about our rockstar presenters, but just a note: I’ll be there too. Find me (John) at @jpiwowar. :)
How I Put My Collaborate Agenda Together
It is less than a week before the Collaborate 2013 conference. Most of us are busy putting together our individual agendas for what is shaping up to be a very exciting week. There are so many interesting sessions to choose from! Conference organizers kindly introduced a Show Planner to make planning a bit easier, but in this blog post, I will share with you how I created my agenda for Collaborate.
Disclaimer: It is easy to miss some very interesting sessions. Sessions that I may be interested in will not meet your profile. Please use your own judgement and do not rely 100% on my approach.
Explore sessions from strong technical companiesPythian is the company I have enjoyed working for almost 4 years now; in fact, I will celebrate 42 months working at Pythian during Collaborate :). However, there is something else that I would like to talk about here. Pythian is one of very few companies that supports and encourages its employees to stand up and speak about the exciting things we deal with on a day-to-day basis. As a result, 9 Pythian employees will deliver 16 sessions at Collaborate this year. So if you, like me, are after great technical content from seasoned professionals, then add some Pythian sessions to your Collaborate 2013 agenda.
Pythian isn’t the only company that encourages their employees to share knowledge and that invest heavily in Oracle Community programs. There are many others; however, I would like to mention the 2 companies that are at the top of my list:
- Dbvist – I am glad to see that Arjen (the founder) and his crew are part of Collaborate this year. The content they deliver is very good.
- Enkitec – I personally love to listen to these speakers.
In the comments section of this blog post, please do not hesitate to share information about other good companies that contribute to the Oracle Community and are part of the Collaborate 2013.
Remember the pastAnother tip for putting together a great agenda for Collaborate 2013 is to see if speakers you’ve enjoyed listening to at previous conferences are returning this year. You can find my memories about last year’s conference under the following blog post Collaborate 12: Speakers I’d like to hear again.
Attend RACSIG & RAC Attack eventsI am glad to inform you that RAC Special Interest Group and RAC Attack are part of Collaborate 2013 again this year. Traditionally, RAC Attack has been a place where people with expert technical skills meetup. I don’t expect Collaborate 2013 to be any different. You can find information about RAC Attack sessions here.
RACSIG is organizing several sessions including RAC Experts Panel, Oracle Database Appliance Customer Panel and RAC SIG Birds of a Feather. I am part of the RACSIG board and we have put together a list of RAC-related sessions (here) to make your search efforts a bit easier.
Additional sourcesThere are several other sources I have used to build my Collaborate 2013 agenda. Some of them are listed below:
- E-Business Suite Technology Sessions at OAUG Collaborate 13 – A list of Oracle Apps ATG sessions kindly put together by Steven Chan’s team (the author of the blog post is Max Arderius).
- COLLABORATE 13 – IOUG Forum Virtual Conference – The conference organizers are running a virtual conference this year. I bet they have invested some effort to put together a good agenda for it.
- Collaborate speakers/participants on Twitter – Use Twitter to set up in-person meetings with people you’ve met via social media.
- Oracle ACEs presenting at Collaborate – Oracle ACEs are known for their contribution to Oracle Community. They tend to deliver a good content ;) {added on 2013.04.02}
As with any blog post, it is very easy to miss mentioning someone. I did my best to avoid naming specific sessions with the exception of panels and SIG meetings. My intention was to give you some hints on how to build your Collaborate agenda. Please do not hesitate to add some additional references and tips at the end of this post.
See you soon in Denver folks,
Yury
How to Get the Most Out of Collaborate! Talking from Experience
First, the most important advice I can give you is to relax and have FUN!
Personal ExperienceUntil recently, most of the conferences I participated in were very stressful for me personally. My very first big conference was back in 2000 – the Oracle User Group conference in Copenhagen. Since then, I have participated in 1 or 2 conferences per year. I always do a lot of prep work to ensure I get the most value out of the conference. I try my best to participate in as many sessions as possible and get very upset if I miss a session for any reason. It’s only during this last year or so, after participating in some of the best conferences of my life, that I now understand how much I was missing before. Getting as stressed as I did prevented me from learning as much as I could.
My Advice to YouDuring the last few conferences, I have learned more from sessions and networking than ever before. Here’s what I learned.
Don’t worry if you deviate from your original agendaDon’t get me wrong, it is a great idea to plan which sessions you want to attend before you get to the conference; however, don’t spend too much time planning and don’t worry if you don’t attend every session in your agenda. Yes, you are there to learn. Yes, you have paid a significant amount of money to be there, but you will learn a lot more if you aren’t too stressed about keeping to your original schedule. My best advice is to build a good agenda based on your interests and do your best to follow it. Don’t worry if you miss one or two sessions. Enjoy networking, and you will learn more during the next session. If you feel tired, take a break, relax, and ask yourself “What would I like to do next?” and then do it, even if it isn’t in your original agenda. Explore the exhibition, talk to someone, or maybe take a break from the conference entirely and get outside for 30 minutes or so.
As the saying goes, the only constant is change, so don’t worry about sticking to your agenda too strictly. You will get more value out of the conference if you network with people, listen to their advice, take a break, and then adjust your schedule.
Mix sessions with expert panels and SIG meetingsWhile you can learn a lot by attending sessions focused on a particular topic with one speaker or one presentation, you may get tired faster. I suggest mixing sessions with community discussions. Most conferences offer many panels and special interest group (SIG) meetings. I am sure you will find some that suit your needs and interests. Attend a few and get involved in the discussions. Build new connections and talk with people from your interest group. Ask them which sessions they are planning to attend. Chances are they can recommend a very good speaker or session you may not have considered when building your agenda. Tag along and try something you hadn’t planned on.
Use Twitter for conference communicationsMany people think Twitter is a waste of time and sometimes it is, but not when it comes to staying up-to-date at conferences. For me, Twitter is the best communication media and way to stay in touch at any conference. It can help you to do many things, including:
- Receive the latest news related to the conference (an upcoming event, a good giveaway, an appreciation event, a good person to meet, etc)
- Ask conference-related questions (pose your question using the conference’s hashtag and someone will answer you)
- Discuss ongoing sessions in real time and share information you just learned with other conference participants
- Meet with other professionals from your industry
- Use Twitter to find people (many use Twitter instead of a phone or any other communication media) who may want to share a beer with a stranger or know of a good party :)
Below are a few suggestions for getting started on Twitter once you’ve created an account:
- Start by searching for tweets using the conference’s hashtag(s). For Collaborate 2013, the main hashtag is #C13DEN
- If the main conference hashtag gets too noisy, start following a few folks who are tweeting about topics you are interested in
- Often (and Collaborate isn’t an exception) conferences have sub-hashtags that may be less noisy and more focused on the areas you are interested in. For example, #C13TCH is the hashtag for technical context at Collaborate 2013
- Build your private “Twitter list” of people you would like to follow during the conference. You can add and remove people if those get too noisy.
- It is easy to use any Twitter client and switch between lists, searches, and streams depending on your preferences. Try it and you will find the best for you in no time.
PS: Don’t forget to fully charge your mobile’s battery each evening (or even buy an external battery for recharging as you go)
Network with peopleTwitter and some other communication media are great tools for getting information, but don’t forget that a conference is a place to meet PEOPLE. Do not hesitate to say “Hi” to the person standing next to you in a lunch queue or sitting beside you waiting for a session to start. Part of the challenge is finding the right people to talk to. You rarely talk about databases with your wife or husband, do you? But chances are the person sitting next to you has similar interests. They are there for exactly the same reasons you are. It’s a great opportunity to learn, listen, and share your experiences. Everybody wins.
Have funLearning and new opportunities are two of the main reasons for attending conferences, but another reason is to take a break from day-to-day life and have fun. As they say, a change is as good as a rest. Think about it! You get the chance to get away from work, routine, neighbours, commuting, etc. Use the change! Have fun, smile more, get a bit crazy. Do things that you may not get to do that often. Try new things. You don’t see these people on a daily basis. Leave them with the impression that you are a fun and friendly person!
Have fun, my friend, and see you very soon in Denver :)
Yury
You can follow me on Twitter: @yvelik
You can follow many other Collaborate conference participants here
Feel free to share your hints and give me your feedback using Twitter or the comment section below :)
Oracle 12c Gives Fresh Life to the Relational Database Movement
Using Ansible to Secure Cloudera Manager Installation on a Hadoop Cluster
Building a secure Hadoop cluster requires protecting a number of services which comprise Hadoop infrastructure. If you are using CDH distribution, then Cloudera Manager (CM) is one of the components that needs to be secured. CM consists of Server, Agents running on all cluster machines and web UI. CM provides you with 3 levels of securing (good/better/best) it’s communication:
- Transport Layer Security (TLS) Encryption for Cloudera Manager
- TLS Authentication of Server to Agent and Users
- TLS Authentication of Agents to Server
There is a good step by step guide in CM documentation, and it’s easy to follow for one server, but what when you have hundreds of them? There are different approaches to the problem of managing server’s configuration at scale, but I’d like to focus on Ansible which is a neat framework for parallel commands execution and complex rollouts. And it’s written in Python! Ansible is easy to install and requires only a couple of Python libraries on a “master” node and nothing else that python2.6 on “slaves”. You don’t have to setup ssh-keys and configures passwordless access across all machines, which is important if we are talking about security. To give you and idea of how Ansible command looks like, here is how you can restart CM Agents on all Hadoop data nodes:
ansible hadoop_data_nodes -m service\ -a "name=cloudera-scm-agent state=restarted" -k --sudo
This command will read a list of hosts from your /etc/ansible/hosts file or any other file specified by ANSIBLE_HOSTS environment variable. It will find a section [hadoop_data_nodes] in the hosts file (which has ini-like structure) it and will execute given command for all servers in this section. Ansible will ask you for both ssh and sudo passwords only once and will use them to execute command on target servers. Ansible relies on modules (specified by -m option) to perform specific tasks like restarting services, executing shell commands or manipulating text files. Full list of modules can be found here. You can write your own, of course, if you need.
Below is the step by step guide on configuring all 3 levels of TLS for CM. In some aspects it repeats steps from CM documentation guide, but there are some important nuances that I have discovered. This guide is also scalable: you can apply it with very little modifications to clusters of any size.
Level I. TLS Encryption for Cloudera ManagerFollowing steps assume that you have installed Ansible on the same server as your CM Server and properly configured it to access all nodes in your cluster.
- Create a directory for CM keystore:
mkdir /etc/cloudera-scm-server/keystore
- Generate a certificate for CM. You will be prompted for new Keystore password. Also make sure CN field that you will be asked to specify matches CM Server hostname. It’s not *your* name!
keytool -validity 1095 -keystore\ /etc/cloudera-scm-server/keystore/scm-keystore -alias jetty\ -genkeypair -keyalg RSA
note -validity option, new certificate will be valid for 3 years in this case
- Restrict permissions to the Keystore:
chown -R cloudera-scm:cloudera-scm /etc/cloudera-scm-server/keystore/ chmod o-r /etc/cloudera-scm-server/keystore/scm-keystore
- Enable TLS Encryptions for Agents, provide path and password to the Keystore in CM web UI. You can refer to CM documentation on how to do this.
- Next, we need to update CM Agent configuration files to set use_tls=1 option. For this Ansible lineinfile module can be used:
ansible hadoop -m lineinfile -a\ "dest=/etc/cloudera-scm-agent/config.ini state=present regexp='use_tls.*'\ line='use_tls=1'" -k -KThere are several assumptions made here. First of all, your Ansible host list has a [hadoop] section in it, which covers all Hadoop cluster nodes, or it hase sections like [hadoop_namenodes], [hadoop_datanodes], etc. Ansible can recognise patterns. Second, your template for CM Agnet config.ini files has all security related options in it, but they are commented out. Maintaining a template for all configuration files is a good idea and Ansible can help you here as well, but it’s beyond the scope of this guide.
- Restart CM Server (again, I assume that you run Ansible commands from the same machine):
sudo /etc/init.d/cloudera-scm-agent restart
- Restart CM Agents and check their health:
ansible hadoop -m service -a\ 'name=cloudera-scm-agent state=restarted' -k -K; ansible hadoop -m shell -a\ "tail /var/log/cloudera-scm-agent/cloudera-scm-agent.log" -k -K
You have already seen an example with service Ansible module. New module used here is shell. It allows you to run arbitrary shell commands. It’s a good idea to verify that all agents started fine, so you can check the log files on all servers.
If all is fine, at this point you have encrypted communication between CM Server and CM Agents.
Level II. TLS Authentication of Server to Agent and UsersAt this level you will force CM Agents to check a certificate for CM Server to make sure they are talking to the right machine.
- Export CM Server Certificate from Keystore and converting it to .der format:
keytool -exportcert -keystore\ /etc/cloudera-scm-server/keystore/scm-keystore -alias jetty\ -file scm-server.der openssl x509 -out scm-server.pem -in scm-server.der -inform der
- Create new cert dirs on all agent servers:
ansible hadoop -m shell -a "mkdir /etc/cloudera-scm-agent/cert" -k -K
- Copy server cert to agents:
ansible hadoop -m copy -a "src=/tmp/scm-server.pem dest=/etc/cloudera-scm-agent/cert/ owner=root group=root" -k -K - Change very_cert_file option in agent’s config.ini:
ansible hadoop -m lineinfile -a "dest=/etc/cloudera-scm-agent/config.ini state=present regexp='verify_cert_file' line='verify_cert_file=/etc/cloudera-scm-agent/cert/scm-server.pem'" -k -K
- Enabled Use TLS Encryption for Admin Console in CM Web UI
- Restart CM server and Agents. See Steps 6-7 in Level I section.
This is similar to previous level, but requires certificates to be generated for all agents, so they can authenticate to CM Server. Here I describe an approach with self-signed certificates which is fine for development or POC clusters. For production clusters you may need to comply with your organisation’s standards and obtain properly signed certificates.
- Generated password for Agent keys and copied it to all agents machines:
ansible hadoop -m shell -a\ 'echo PASSWORD > /etc/cloudera-scm-agent/cert/agent_cert.pwd' -k -K ansible hadoop -m shell -a\ 'chmod o-rx /etc/cloudera-scm-agent/cert/agent_cert.pwd' -k -K
- Next step is to generate private key and certificate for each CM Agent. To automate this task I wrote a quick script — https://github.com/dazbur/morecerts. It takes list of hosts in a plain text file and produces private key and certificate for each host. Resulting files will have agent_HOSTNAME.key and agent_HOSTNAME.pem format. It also takes care of providing Distinguished Name options for keys-generating commands:
- Create a text file with list of Agent’s IP adress. You can just copy it from Ansible hosts lists
- Create agent key password and put it into a text file
- Generate keys:
./morecerts.py -f agentiplist.file -p agent.pass.file gencerts
- Add keys to CM Server keystore:
sudo ./morecerts.py -f agentiplist.file -p agent.pass.file -k\ /etc/cloudera-scm-server/keystore/scm-keystore\ -w KEYSTORE_PASSOWORD addtokeystore
- Copy keys and certs to agent machines (assuming you have generated keys in your home directory):
ansible hadoop -m copy -a "src=~/agent_$inventory_hostname.key dest=/etc/cloudera-scm-agent/cert/agent.key owner=root group=root" -k -K ansible hadoop -m copy -a "src=~/agent_$inventory_hostname.pem dest=/etc/cloudera-scm-agent/cert/agent.pem owner=root group=root" -k -K
Here you can see another nice trick Ansible can do: $inventory_hostname variable corresponds to the server on which command will be executed. This allows me to copy specific files to specific servers.
- Update agent config.ini files to set client_key_file, client_keypw_file and client_cert_file options:
ansible hadoop -m lineinfile -a "dest=/etc/cloudera-scm-agent/config.ini state=present regexp='client_key_file.*' line='client_key_file=/etc/cloudera-scm-agent/cert/agent.key'" -k -K ansible hadoop -m lineinfile -a "dest=/etc/cloudera-scm-agent/config.ini state=present regexp='client_keypw_file.*' line='client_keypw_file=/etc/cloudera-scm-agent/cert/agent_cert.pwd'" -k -K ansible hadoop -m lineinfile -a "dest=/etc/cloudera-scm-agent/config.ini state=present regexp='client_cert_file.*' line='client_cert_file=/etc/cloudera-scm-agent/cert/agent.pem'" -k -K
- Update CM configurtion via Web UI to set Use TLS Authentication of Agents to Server, Path to Truststore and Truststore Password. Truststore in our case is the same as Keystore.
- Restart CM Server and Agents
You are done now! Ansible can also wrap individual playbooks, so you can execute them on demand later. In general, I found Ansible to be a great tool to manage and execute commands on many servers and it is definitely worth exploring.
Yet another Exadata slides update
Link to the latest version of my Exadata talk slides: zip
Here are the details of my presentation time and place if you are going to Collaborate 13 in Denver:
Session title: Exadata Distinctives
Room: Mile High Ballroom 1C
Date and Time: 1:15PM-2:15PM
I look forward to seeing you there!
- Bobby
PS. Minor revision today 04/04/2013. Got to practice with a few coworkers listening. Feel like this is it for the slides.


