Skip navigation.

Feed aggregator

Exploring DevOps with Chef and WebLogic Server

Steve Button - Wed, 2014-12-10 20:58
I'm about to embark on a journey that explores the use of WebLogic Server within a DevOps regime.  My first port of call for this journey will be using Chef.

A loose travel itinerary is:
  • Setting up an environment to explore the basic operations of Chef - using the Chef Development Kit (ChefDK)
  • Exploring the basics of how Chef works to install Java and WebLogic Server on a single node
  • Installing and examining some of the existing cookbooks that are available for Java and WebLogic Server
  • Extending the environment to provision multiple nodes to create a typical multiple machine clustered WebLogic Server environment
I've started working on the first task, where I've also explored using Docker to create an isolated, reusable and easily shareable environment that contains the ChefDK.

The Docker project is here on GitHub:
I also tried a quick experiment with using Oracle Linux as the base docker image:
The Dockerfile contains the set of instructions required to install the ChefDK and the necessary utilities into the docker image when it is built.

#
# Dockerfile for Chef 4 WLS Environment
#

FROM ubuntu

MAINTAINER Steve Button <>

ENV DEBIAN_FRONTEND noninteractive

# Install Utilities
RUN apt-get update
RUN apt-get install -yq wget
RUN apt-get install -yq curl
RUN apt-get install -yq git

# Install Chef
RUN wget https://opscode-omnibus-packages.s3.amazonaws.com/ubuntu/12.04/x86_64/chefdk_0.3.5-1_amd64.deb
RUN dpkg -i chefdk*.deb

# Verify and Setup Chef
RUN chef verify
RUN echo 'eval "$(chef shell-init bash)"' << ~/.bashrc

...

CMD ["/bin/bash"]

With this Dockerfile a build operation can be performed that produces a docker image, which can then be run to provide an environment in which start exploring the Chef.

$ docker build -t buttso/chef4wls .

$ docker run -ti buttso/chef4wls

oracle@5481a3330f27:~$ which chef-client
/opt/chefdk/embedded/bin/chef-client

This is just a brief outline - I will describe this first task in more detail once I get a bit further along and can verify everything has been installed and works correctly.

UKOUG 2014 - Middleware Day 3

Yann Neuhaus - Wed, 2014-12-10 15:05

Today, no more sessions on middleware, but a lot of ADF, APEX, developments sessions that looks like really interesting! Unfortunately I can’t attend each one but here are some I selected!

The picture that paints a thousand words: Data Visualization (ADF) Duncan Mills (Oracle)

In this session Duncan was talking about my favorite topic: data visualization. I am quite used to ADF as I developed an interface for my Capacity Planning project. It was sometimes hard to have something clean and well built, a little bit confusing…

But this time as ended! With the new Alta interface!

Instead of building another stack over hundreds stacks, Oracle preferred to re-imagine and rebuild the whole interface. What a good idea. Then Alta was born, more clear, more precise and more “actual”.

It has been rewritten in a mobile based way as it’s the trend. But a new design is not sufficient it also includes more interactions and animations. The UI is designed to be more responsive, clean and simplified. One goal is to build the UI as disclosure info, it means when you click something in a chart the rest will disappear and go into a more detailed view about what you clicked. This way the UI will lead the user to the flow of the information. The user has to “search” the info through a path which builds the context of this information.

There is now a lot of slides and effect to increase this flow. For example you have a new chart build as a multi-layered circle so when you click on a part of the circle it will go into and open this part then build another whole circle to create a new level of information.

You have over than 50 graphs and charts types. All with the same kind of interactions to help the user have a coherence between charts such as zooming, zoomout, selection, redesigned time axis, improved data labeling and so on.

One of the new chars is called NBox. It’s a chart used for categorizing things or people. You have boxes, and you put people in it regarding a filter such as the company sector. You can drag and drop easily, it’s really to classify things.

ADF is getting really beautiful thanks to this new UI design and interactions, allowing more mobile app look and feel is a good thing as more and more processes in the industry can be monitored directly through smartphones.

SQL injection in APEX - More Attacks (& Defences) Tim Austwick and Nathan Catlow (Recx)

This session is more talking about security and SQL injections than APEX itself but they did some demos showing APEX could ease the usage of SQL injections.

Tim started by introducing his company Recx as they built a tool called ApexSec which can point out SQL injections spots in an APEX application; they said it can find about 90% of breaches. Thanks to their tool they heavily improved the security of Oracle Application Express, a tool from Oracle helping building application quite fast.

For Tim and Nathan, a SQL injection can be spotted when you don’t execute the SQL directly but you first generate the SQL and then send it to Oracle Database. In fact there is several kind of breaches such as dynamic SQL or query substitutions; these are the most found. It happens most of time when SQL write and execution are separated.

With SQL injections you can gather more data by adding columns or table joins, this is a “select” injection. You can also corrupt data, this is called an “update” injection. Then you have more serious issues when you can insert code between BEGIN and END in a PL/SQL function as you can call other functions or procedures such as Oracle procedures, which can be harmful for your application or database.

For example Tim did a live demo on how he could insert a call to the LDAP package and then connect it to his own LDAP on his machine. The result? He managed to get credentials directly in his LDAP console! Could be a serious issue isn’t it?

Vulnerabilities can appears if you append a variable in the build process of your query: when you put it at the end. Also when you create a function returning a SQL command. You better use bind variables instead of substitutions.

Tim also presented another demo where he managed to push a hidden button that committed a text area through a short javascript script. Thanks to a breach in the SQL behind the text area he could get the database version. With a simple research he found a known issue in this version. Then he wrote a pl/sql script and injected it in the text area. He managed to set the java permission to the schema, then he wrote a file directly on the file system with a script within. He could execute the file as sysdba as it was on the local system. The script gave the schema the DBA privileges, then he could get the passwords.

It was an amazing session.

Their advices to avoid SQL injection are the following:

- Avoid substitutions directly in SQL and prefer bind variables
- Ensure that APEX objects are protected
- Use APEX 4.2.1 or above
- Avoid dynamic SQL

Conclusion

This is the last day of UKOUG 2014 and we clearly see that the middlewares trend is the Cloud, Mobile applications and of course security. I can’t wait to see more sessions in the next UKOUG!

The Increasing Role of ERP in Customer Experience and B2B E-Commerce

Linda Fishman Hoyle - Wed, 2014-12-10 11:23

A Guest Post by Justin King, B2B E-Commerce Strategist (pictured left)

When most people in the e-commerce industry discuss the digital B2B buyer, they often talk about the consumerization of B2B. Consumerization of B2B means that your buyers bring expectations to your site because they have been conditioned in their home buying experiences. That conditioning creates expectations that you need to meet in the online customer experience.

Consumerization is a big deal. But maybe not the biggest.

In most B2B businesses, there exists tremendous complexity. There are complex ordering processes; complex products with complex attributes; and finally elaborate back-end systems with their own complexity. So, while companies want to create a “consumer-like experience in B2B”, they must do that in the context of the inherent complexity that exists.

The Increasing Role of the ERP System

Of all the backend systems, the complexity of the ERP system is one of the key differences between B2B e-commerce and B2C e-commerce. The ERP is the lifeblood of many B2B organizations. The reality is that B2B companies have to focus a tremendous amount of resources on integrating the ERP to plug into that lifeblood. Think about this:

  • Integrating to the ERP system(s) will most likely be one of the biggest initial and ongoing costs in a company's B2B e-commerce project.
  • Most B2B companies have more than one ERP system because of multiple acquisitions and the time and cost involved in consolidating into a single ERP.
  • Many of the B2B features that exist on websites today are really just about making ERP functions customer facing.

It is natural then to ask, "With all of that complexity, you see the role of the ERP system increasing?"

Yes.

ERP—The Foundation of a Great Customer Experience

In an article by Manufacturing Business Technology magazine, results of a survey describe the significant role of the ERP system in the Customer Experience.

The ERP has a significant role in streamlining interactions with suppliers and customers:

  • Nearly 40 percent of all Industrial Machinery and Equipment (IME) manufacturers surveyed believe the ERP is as vital as the platform that connects the back office and front office
  • Nearly 85 percent of large IME organizations with 5,000 employees or more indicated that their ERP is a vital platform for delivering a good customer experience
  • IME manufacturers have come to realize that providing superior customer experience has to be one of their top initiatives

To improve the Front Office (customer experience), you must consider the Back Office (ERP).

When you think about it, most B2B e-commerce sites are simply about exposing specific functions of their ERP system to customers.

Take for example the shopping cart. By providing an online shopping cart, the online buyer controls order-entry instead of an inside salesperson keying the order into the ERP. By providing online pricing based on the terms and conditions of the contract, pricing moves from the ERP system to the online buyer.

The Customer-Facing ERP

Consider these B2B best practices: providing order status, inventory availability by location, ATP (available to promise), taking payments, and even the online catalog. All of these best practices are about giving B2B buyers more transparency into the back office. Previously, product content was housed only in marketing catalogs and cryptic codes in the ERP; now there is customer-facing product content.

This transparency started very basic, but take a look at the backlog of change requests for your e-commerce site. Are there any patterns related to your ERP? We don't think that is going to stop. If the secret of B2B e-commerce is about helping B2B buyers do their job easier, then that means giving them more and more access to complete self service. That equates to more transparency into the ERP—the new Customer-Facing ERP.

Webcast: Next-Gen Enterprise Content Management in the Cloud

WebCenter Team - Wed, 2014-12-10 10:42
Oracle Corporation  Key to Digital Business Transformation Getting Content Management Right
in the Cloud


Digital business is the catalyst for the next wave of revenue growth, business efficiency and service excellence. Business success depends on collaboration both inside and outside of the organization – with customers, partners, suppliers, remote employees – at anytime, anywhere, on any device.

At the forefront are organizations that are going beyond 1st generation Content Cloud solutions and adopting the Next-Gen of ECM in the Cloud to:
  • Speed decisions by quickly and easily liberating enterprise content and “mobilizing” information in the cloud
  • Deliver best-in-class customer experiences that are frictionless, innovative and secure
  • Avoid unnecessary security risks and proliferation of new information silos associated with first-gen content cloud offerings
Join our live webcast to discover the unique benefits
Oracle Hybrid Enterprise Content Management provides for today’s
digital business.

Red Button Top Register Now Red Button Bottom Live Webcast Calendar December 11, 2014
10:00 a.m. PT /
1:00 p.m. ET Hardware and Software Engineered to Work Together Copyright © 2014, Oracle Corporation and/or its affiliates.
All rights reserved.
Contact Us | Legal Notices and Terms of Use | Privacy Statement

Oracle University Expert Summit 2015 in Dubai

The Oracle Instructor - Wed, 2014-12-10 08:45

Oracle University Expert Summit Dubai 2015

Together with Craig Shallahamer, Lucas Jellema, Mark Rittman, Martin Bach, Pete Finnigan and Tim Fox, I will be presenting in Dubai. My topic is Minimizing Downtime with Rolling Upgrade using Data Guard

Click on the picture for details, please!

Hope to see you there :-)


Categories: DBA Blogs

Taskset: a useful tool to set CPU affinity

Yann Neuhaus - Wed, 2014-12-10 05:14

Today at the #ukoug_tech14 I had the chance to attend to the Christo Kutrovky's session @kutrovsky (Pythian) about "Measuring Performance in Oracle Solaris & Oracle Linux". This session aimed to present how to measure system utilization in the Linux and Oracle Solaris operating systems and how to use these information for tuning and capacity planning. During this session we had a very good introduction to a bunch of performance monitoring tools that can be categorized in four categories (non exhaustive list of tool):


1. CPU:

  • top
  • vmstat
  • time
  • mpstat

2. Memory - RAM

2.1 Global

  • /proc/meminfo
  • vmstat
  • ipcs

2.2 Per process

  • -pmap -x

3. Disk

  • vmstat
  • iostat
  • iotop
  • dtrace

4. Network

  • ifconfig
  • netstat
  • nicstat
  • iftop
  • iperf
  • tcpdump

Regarding this list, I would like to point out a specific command which could be useful in the context of a performance test. This command is taskset. According to Manual page of taskset: taskset  is  used  to  set  or  retrieve  the CPU affinity of a running process given its PID or to launch a new COMMAND  with a given CPU affinity.  CPU affinity is a scheduler property that "bonds" a process to a given set of CPUs on the system. The Linux scheduler will honor the given CPU affinity and the process will not run on any other CPUs.

Let's try to make a short test, with sysbench and mysql in order to see how taskset works. This test consists of running sysbench with 4096 threads, the first time without taskset and a second time with taskset by setting CPU affinity on CPU number 3.

 

1. Sysbench test without CPU affinity

1.1 Let's tart the benchmark with sysbench

mysql@ThinkPad-T540p:/home/mysql/ [mysqld1] ./innodb/runallinnodb5.ksh

 

1.2 In order to proove that sysbench is running 4096 threads we can execute "top + C":

22110 mysql     20   0 4747720 334964   1548 S   5,3  2,1   0:06.55 sysbench --db-driver mysql --test oltp --num-threads 4096 --mysql-user sbtest --mysql-password sbtest --mysql-db sysbench .....

 

1.3 Let's now have a look on CPU usage with command "top + 1":

top - 11:47:38 up 42 min,  4 users,  load average: 5,97, 15,54, 16,48
Tasks: 238 total,   2 running, 236 sleeping,   0 stopped,   0 zombie
%Cpu0  : 25,5 us,  1,7 sy,  0,0 ni, 69,9 id,  3,0 wa,  0,0 hi,  0,0 si,  0,0 st
%Cpu1  : 15,1 us,  0,7 sy,  0,0 ni, 78,9 id,  5,4 wa,  0,0 hi,  0,0 si,  0,0 st
%Cpu2  : 21,3 us,  1,3 sy,  0,0 ni, 77,4 id,  0,0 wa,  0,0 hi,  0,0 si,  0,0 st
%Cpu3  : 15,8 us,  1,0 sy,  0,0 ni, 83,2 id,  0,0 wa,  0,0 hi,  0,0 si,  0,0 st
%Cpu4  : 18,6 us,  1,3 sy,  0,0 ni, 79,8 id,  0,3 wa,  0,0 hi,  0,0 si,  0,0 st
%Cpu5  : 15,4 us,  1,0 sy,  0,0 ni, 83,7 id,  0,0 wa,  0,0 hi,  0,0 si,  0,0 st
%Cpu6  : 27,8 us,  1,0 sy,  0,0 ni, 71,2 id,  0,0 wa,  0,0 hi,  0,0 si,  0,0 st
%Cpu7  : 14,8 us,  0,0 sy,  0,0 ni, 85,2 id,  0,0 wa,  0,0 hi,  0,0 si,  0,0 st
KiB Mem:  16309072 total,  4207804 used, 12101268 free,   206348 buffers
KiB Swap: 16651260 total,        0 used, 16651260 free.  1628872 cached Mem

 

1.4 As we could expect all CPUs are in use because no CPU affinity has been set:

steulet@ThinkPad-T540p:~$ sudo taskset -pc 22110

pid 21767's current affinity list: 0-7

 

2. Sysbench test with CPU affinity

2.1 Let's start by setting CPU Affinity of mysql process:

 

steulet@ThinkPad-T540p:~$ sudo taskset -pc 3 22110

pid 22110's current affinity list: 0-7
pid 22110's new affinity list: 3

 

2.2 We can now restart the benchmark:

 

mysql@ThinkPad-T540p:/home/mysql/ [mysqld1] ./innodb/runallinnodb5.ksh

 

2.3 Let's now have a look on CPU usage using command "top + 1":

Tasks: 240 total,   2 running, 238 sleeping,   0 stopped,   0 zombie
%Cpu0  :  1,7 us,  0,3 sy,  0,0 ni, 97,7 id,  0,0 wa,  0,0 hi,  0,3 si,  0,0 st
%Cpu1  :  0,3 us,  0,0 sy,  0,0 ni, 99,7 id,  0,0 wa,  0,0 hi,  0,0 si,  0,0 st
%Cpu2  :  0,3 us,  0,0 sy,  0,0 ni, 99,7 id,  0,0 wa,  0,0 hi,  0,0 si,  0,0 st
%Cpu3  : 98,0 us,  1,7 sy,  0,0 ni,  0,3 id,  0,0 wa,  0,0 hi,  0,0 si,  0,0 st
%Cpu4  :  2,0 us,  0,3 sy,  0,0 ni, 97,3 id,  0,3 wa,  0,0 hi,  0,0 si,  0,0 st
%Cpu5  :  0,7 us,  0,0 sy,  0,0 ni, 99,3 id,  0,0 wa,  0,0 hi,  0,0 si,  0,0 st
%Cpu6  :  0,3 us,  0,7 sy,  0,0 ni, 97,7 id,  1,3 wa,  0,0 hi,  0,0 si,  0,0 st
%Cpu7  :  0,7 us,  0,0 sy,  0,0 ni, 95,7 id,  3,7 wa,  0,0 hi,  0,0 si,  0,0 st

 

The processor 3 own all the mysql load as expected. Thanks to taskset you can also have a better understanding of HiperThreaded CPUs load. In order to better understand the problems related to the monitoring of Hyperthreaded CPU and CPU affinity, I suggest you to read the following blog: http://www.pythian.com/blog/virtual-cpus-with-amazon-web-services/

A few numbers from MapR

DBMS2 - Wed, 2014-12-10 00:55

MapR put out a press release aggregating some customer information; unfortunately, the release is a monument to vagueness. Let me start by saying:

  • I don’t know for sure, but I’m guessing Derrick Harris was incorrect in suspecting that this release was a reaction to my recent post about Hortonworks’ numbers. For one thing, press releases usually don’t happen that quickly.
  • And as should be obvious from the previous point — notwithstanding that MapR is a client, I had no direct involvement in this release.
  • In general, I advise clients and other vendors to put out the kind of aggregate of customer success stories found in this release. However, I would like to see more substance than MapR offered.

Anyhow, the key statement in the MapR release is:

… the number of companies that have a paid subscription for MapR now exceeds 700.

Unfortunately, that includes OEM customers as well as direct ones; I imagine MapR’s direct customer count is much lower.

In one gesture to numerical conservatism, MapR did indicate by email that it counts by overall customer organization, not by department/cluster/contract (i.e., not the way Hortonworks does).

The MapR press release also said:

As of November 2014, MapR has one or more customers in eight vertical markets that have purchased more than one million dollars of MapR software and services.  These vertical markets are advertising/media, financial services, healthcare, internet, information technology, retail, security, and telecom.

Since the word “each” isn’t in that quote, so we don’t even know whether MapR is referring to individual big customers or just general sector penetration. We also don’t know whether the revenue is predominantly subscription or some other kind of relationship.

MapR also indicated that the average customer more than doubled its annualized subscription rate vs. a year ago; the comparable figure — albeit with heavy disclaimers — from Hortonworks was 25%.

Categories: Other

File Encoding in the Next Generation Outline Extractor

Tim Tow - Tue, 2014-12-09 20:11
We had a couple of issues reported with the output of the Next Generation Outline Extractor where the exported file did not work properly as a load file. After some investigation, we found that the file encoding was incorrect. We were encoding the files using the Unicode/UTF-8 format. We chose this encoding so that we could write all characters in all languages, but we did not consider that UTF-8 is only valid for loading Unicode databases in Essbase.

To resolve this issue, we decided to add a configuration to the Next Generation Outline Extractor to allow users to select the file encoding. Here is a screenshot showing the new configuration setting.



As of yesterday, December 8, 2014, the updated Next Generation Outline Extractor is available on our website. The first version to feature this functionality is version 2.0.2.692. Version 2.0.2.692 is available for all Essbase versions from Essbase 9.3.1 forward. We are also happy to announce that his version of the Next Generation Outline Extractor is the first version to support the recently released Essbase 11.1.2.3.505.

If you encounter any issues with the Next Generation Outline Extractor, please don't hesitate to contact our support team at support@appliedolap.com.

Categories: BI & Warehousing

Parse Time

Jonathan Lewis - Tue, 2014-12-09 17:53

Here’s a little query I wrote some time ago to see where my time went while running a query. It’s nothing sophisticated, just one of those simple things you can do with v$active_session_history (or dba_hist_active_sess_history, if you don’t get to the crime scene in time).


set null CPU

select
        sql_exec_id, in_parse, in_hard_parse, event, count(*)
from
        v$active_session_history
where
        sql_id = '{your choice here}'
group by
        sql_exec_id, in_parse, in_hard_parse, event
order by
        sql_exec_id, in_parse, in_hard_parse, count(*)
;

SQL_EXEC_ID I I EVENT                                            COUNT(*)
----------- - - ---------------------------------------------- ----------
   40341649 N N db file sequential read                                68
   40341649 N N CPU                                                    21
   40341649 N N db file scattered read                                  9
            Y Y CPU                                                     7

I had run the query that I was tracking exactly once, but my ASH query allows for, and separates, multiple executions of the same query by summing on sql_exec_id (the thing that the SQL Monitor also uses). The last row looks a little odd, though: it does’t have a value for sql_exec_id; that’s because those are ASH samples when the query is being optimized, not being executed – note that I’ve reported the columns in_parse and in_hard_parse – and both are set to “Y” for that row.

So  (statistically speaking) it’s probably taken about 7 CPU seconds for Oracle to optimise the statement, and from the rest of the results you can see that it’s taken about 21 CPU seconds to run, with 68 seconds spent on random I/Os and 9 seconds spent on multiblock reads for a total of 103 seconds elapsed.

Seven seconds sounds like quite a lot of time for parsing – but it was a fairly complex statement. However, the reason I’d been running the statement on a test system (a fairly good clone of production) was that I’d been seeing something stranger on production and I needed to get a baseline on the test system before I starting trying to fix the problem. Here’s the equivalent ASH reports for the same statement when it had run on production at a time that allowed me to capture its ASH samples.


SQL_EXEC_ID I I EVENT                                            COUNT(*)
----------- - - ---------------------------------------------- ----------
  514257929 N N CPU                                                    21
  514257929 Y Y latch: row cache objects                                1
            Y Y CPU                                                   119

Note the 119 CPU seconds spent parsing to run a 22 second query ! But that wasn’t the worst of it – sometimes the results looked more like this:


SQL_EXEC_ID I I EVENT                                            COUNT(*)
----------- - - ---------------------------------------------- ----------
  523748347 N N db file sequential read                                 1
  523748347 N N CPU                                                    32
            Y Y resmgr:cpu quantum                                     11
            Y Y latch: row cache objects                               38
            Y Y CPU                                                   415

That’s a pretty catastrophic optimsation time – especially since the statement can be optimised in seven seconds in another environment. You might note the resource manager kicking in there, the session is exceeding the CPU limit set for its resource group – though not very often given how infrequently it seems to be waiting on “resmgr:cpu quantum”. But there’s another important wrinkle to this report – which you can see when compare v$active_session_history with v$sql.


  1  select
  2     sql_id,
  3     round(cpu_time/1000000,2) cpu,
  4     round(elapsed_time/1000000,2) ela from v$sql
  5  where
  6     sql_text like '{some identifying text}'
  7* and        sql_text not like '%v$sql%'
SQL> /

SQL_ID               CPU        ELA
------------- ---------- ----------
2atyuc3vtpswy     285.81     516.13

The figures from v$sql don’t match very well with the summed results from ASH which has a total sample of 497 seconds and a CPU sample of 447 seconds. I think I can live with a statistical error of 4% ((516-497)/516) in a random sample for total time, but how do you explain the 36% error in the CPU time ?

The samples reporting “resmgr:cpu quantum” are a clue: the machine is overloaded; it’s trying to use far more CPU time than is available. As a result a process that gets pushed off the CPU by the operating system scheduler while it’s running can spend a long time in the run queue waiting to start running again. And if it’s an Oracle process that got pre-empted it doesn’t “know” that it’s not running, it didn’t put itself into a wait state so all it “knows” is that it’s not in a wait state.

So how do ASH and v$sql differ ? The code that derives the cpu_time for v$sql issues a call to the O/S asking “how much CPU have I used”. The code that takes an ASH sample says: “is this session active, if so is it in a wait state and if it’s not in a wait state then it’s either on the CPU or in the run queue waiting for the CPU”. So when we compare v$sql with ASH the difference in CPU is (statistically speaking) time spent in the run queue. So of our 447 seconds of CPU recorded by ASH, we spent 161 seconds in the CPU run queue waiting for CPU.

We still have to account for the difference between the 7 CPU seconds on a test system and the variation between 119 CPU seconds and 415 CPU seconds in optimisation on production. In a word – concurrency. Apart from everything else going on at the time there were, in the worst case, 16 slightly different versions of the same statement being called at the same time (on a machine with 24 cores) – all 16 statement were competing violently for the same resources at the same time, and the escalating conflict as more session joined in produced an exponential growth in time spent competing for resources rather than doing the job. (I managed to demonstrate the effect quite nicely during the evening by limiting the batch to 4 concurrent executions – and got a typical parse time of 40 CPU seconds).

I’ve often warned people about the problems of concurrency and encouraged them to think about how much time is being spent in competition rather then doing the job; I think this is the most extreme case I’ve seen in a production system. Given how dramatic the variation is, I can’t help wondering if the problem has been exaggerated by some corner case of sessions spinning for mutexes or latches; perhaps even an error in the code that allows resource management to put a session into “resmgr:cpu quantum” while it’s holding a latch or mutex. (I wasn’t able to emulate such an extreme display of the problem on a slightly newer version of Oracle, but I was able to construct a test that demonstrated the effect with a much smaller wastage of CPU.)

The solution (almost certainly): the statements are extremely similar, varying in just one predicate that is using a literal constant. It ought to be a relatively safe and simple change to make the query use a bind variable in that predicate. If the solution is adopted I’d expect to see the (once only) parse time on production drop back to about 7 seconds. Of course, if any other session tries to call the same statement at the same time it ought to end up reporting 7 seconds waiting on “cursor: pin S wait on X” before it starts executing – but that 7 seconds wait is a lot better than an extra 493 CPU seconds trying to optimise the “same” statement at the same time.

Footnote:

Running a minor variation on my ASH query to report the sql_plan_hash_value along with the sql_exec_id, I found that the “fix-up” code that updates older ASH rows with information that only becomes available later (e.g. plan_hash_value when optimising, or long time_waited values for a single wait event) only goes back 255 rows – so when I queried ASH for the statements that took 500 seconds to optimizer only 255 of the in_parse rows showed the final sql_plan_hash_value.

 


Call for Papers for the O’Reilly MySQL Conference

Pythian Group - Tue, 2014-12-09 14:35

The call for papers for the O’Reilly MySQL Conference is now open, and closes October 25th.  Submit your proposal now at http://en.oreilly.com/mysql2011/user/proposal/propose/cfp/126!

Categories: DBA Blogs

User Experience y desarrollo enfocado al contexto: Shape and ShipIt Design Jam

Usable Apps - Tue, 2014-12-09 12:13

Desarrollador de Experiencias de Usuario (User Experience Developer), Sarahi Mireles escribe:

El pasado 4 y 5 de Noviembre, tuve la oportunidad de participar en el Shape and ShipIt Design Jam interno que se llevo a cabo en Oracle HQ. Ahí, diferentes miembros del equipo de User Experience nos reunimos para investigar e innovar soluciones móviles empresariales.

¿El objetivo de todo esto? Conocer más sobre el concepto de desarrollo enfocado al contexto, lo que da como resultado una interacción más natural e intuitiva entre el usuario y las soluciones empresariales que utiliza día con día.

Participantes Cindy Fong, Sarahi Mireles, Tony Orciuoli, y Thao Nguyen [foto: Karen Scipi]

Estuvimos trabajando en equipos durante dos días, y debo decir que fue muy divertido (¿quién dice que el trabajo no puede ser divertido?). En ese tiempo hicimos lluvia de ideas, las afinamos, hicimos nuestros propios wireframes basados en casos de uso y finalmente comenzamos a codificar.

 Karen Scipi]

Participantes Luis Galeana, Julian Orr, Raymond Xie, Thao Nguyen, y Anthony Lai [foto: Karen Scipi]

¿El resultado? Soluciones empresariales fáciles de entender, de usar y relevantes, brindando al usuario la información necesaria en el momento más oportuno, lo que se ve reflejado en una experiencia de usuario simplemente increíble.

 Karen Scipi]

Equipo ASCII_kerz! presentando su solución a los jueces (jueces (sentados) Jeremy Ashley y Bill Kraus; participantes (de pie) Cindy Fong, Sarahi Mireles, y Tony Orciuoli) [foto: Karen Scipi]

Si quieres conocer más acerca de Oracle Applications User Experience visita el sitio de Usable Apps, y el blog theappslab.com para conocer más acerca de lo que el equipo de Jake Kuramoto (@jkuramot) está haciendo. Y por supuesto, sí quieres conocer más acerca del Oracle MDC (México Development Center) echa un vistazo a nuestra página de Facebook.

Final Videos of Open DB Camp Online:

Pythian Group - Tue, 2014-12-09 12:11

The final videos from Open DB Camp back in May in Sardinia, Italy are now online.  The full matrix of sessions, videos and slides can be found on the schedule page.

Hands on JDBC by Sandro Pinna – video

“MySQL Plugins, What are They? How you can use them to do wonders” by Sergei Golubchek of MariaDBvideo

The State of Open Source Databases by Kaj Arnö of SkySQL – video

Coming soon, videos from OSCon Data!

Categories: DBA Blogs

Postgresql 9.1 – Part 1: General Features

Pythian Group - Tue, 2014-12-09 12:00
General scope

Postgresql 9.1 runs over the theme “features, innovation and extensibility” and it really does. This version was born to overcome Postgresql 9.0 ‘s limitations and known bugs in replication. If you are developing over 9.0, it’s time to think seriously about preparing your code for Postgresql 9.1.

The intent of this series of posts are not to be another release features posts. I offer a vision based on my personal experience and focus on the features that I saw exciting for the most of the projects where I’m involved. If you want to read an excellent general article about the new features of this version, web to [2].

At the moment of this post, the last PostgreSQL version is 9.1.1 . It includes 11 commits to fix GIST memory leaks, VACUUM improvements, catalog fixes and others. A description of the minor release can be check at [3].

The main features included are:

  • Synchronous Replication
  • Foreign Data Support
  • Per Column collation support
  • True SSI (Serializable Snapshot Isolation)
  • Unlogged tables for ephemeral data
  • Writable Common Table Expressions
  • K-nearest-neighbor added to GIST indexes
  • Se-linux integration with the SECURITY LEVEL command
  • Update the PL/Python server-side language
  • To come: PGXN Client for install extensions easily from the command line. More information: http://pgxn.org/faq/  The source will be onhttps://github.com/pgxn/pgxn-client

Some of these features could be considered minor, but many think they are very cool while using 9.1  in their environments.

Considerations before migrating

If you are an old Pg user, you may already know the migration risks listed on the next page. Still, I advise that you note and carefully learn about these risks. Many users freeze their developments to older versions simply because they didn’t know how to solve new issues. The most notable case is when 8.3 stopped using implicit casts for some datatypes and many queries didn’t work as a result.

There are some important changes that could affect your queries, so take a pen and note:

  • The default value of standard_conforming_strings is now turned on by default. That means that backslashes are normal characters (which is the SQL standard behavior). So, if you have backslashes in your SQL code, you must add E’’ strings. For example: E’Don’t’
  • Function-style and attribute-style data type casts were disallowed for composite types. If you have code like value_composite.text or text(value_composite), you will need to use CAST or :: operator.
  • Whereas before the checks were skipped, domains are now based on arrays when they are updated, which results in a rechecking of the constraints.
  • String_to_array function returns now an empty array for a zero-length string (before it returned NULL). The same function splits into characters if you use the NULL separator.
  • The inclusion of the INSTEAD OF action for triggers will require you to recheck the logic of your triggers.
  • If you are an actual 9.0 replication user, you may know that in 9.1 you can control the side effects of VACUUM operations during big queries execution and replication. This is a really important improvement. Basically, if you run a big query in the slave server and the master starts a VACUUM process, the slave server can request the master postpone the cleanup of death rows that are being used by the query.
Brief description of main features

Don’t worry about the details, we’ll cover each feature in future posts.

  • Synchronous Replication
    • This feature enhances the durability of the data. Only one server can be synchronous with the master, the rest of the replicated servers will be asynchronous. If the actual synchronous server goes down, another server will become synchronous (using a list of servers insynchronous_standby_names).  Failover is not automatic, so you must use external tools to activate the standby sync server, one of the most popular is pgpool [4].
  • Foreign Data Support
    • The feature of Foreign Data Wrappers has been included since 8.4, but now it is possible to reach data from any database where a plugin exists. Included in the contribs, is a file called file_fwd, which connects CSV files to a linked table. Basically it provides an interface to connect to external data. In my opinion, this is perhaps one of the most useful features of this versions, especially if you have different data sources in your environment.
  • Serializable Snapshot Isolation
    • This new level of serialization is the strictest. Postgres now supports READ COMMITED, REPEATABLE READ (old serializable) and SERIALIZABLE. It uses predicate locking to keep the lock if the write would have an impact on the result. You will not need explicit locks to use this level, due to the automatic protection provided.
  • Unlogged tables
    • Postgres uses the WAL log to have a log of all the data changes to prevent data loss and guarantee consistency in the event of a crash, but it consumes resources and sometimes we have data that we can recover from other sources or that is ephemeral. In these cases, creation of unlogged tables allows the database to have tables without logging into the WAL, reducing the writes to disk. Otherwise, this data will not be replicated, due to the mechanism of replication used by Postgres (through WAL records shipping).
  • Writable Common Table Expressions
    • CTE was included in 8.4 version, but in this version, it was improved to allow you to use writes inside the CTE (WITH clause). This could save a lot of code in your functions.
  • K-nearest-neighbor added to GIST indexes
    • Postgres supports multiple types of indexes; one of them is GiST (Generalized Search Tree). With 9.1, we can define a ‘distance’ for datatypes and use it for with a GiST index. Right now, this feature is implemented for point, pg_trgm contrib and others btree_gist datatypes. The operator for distance is <-> . Another feature you will enjoy is that LIKE and ILIKE operators can use the tgrm index without scanning the whole table.
  • SE-Linux integration
    • Postgres is now the first database to be fully integrated with military security-grade. SECURITY LABEL applies a security label to a database object. This facility is intended to allow integration with label-based mandatory access control (MAC) systems such as SE-Linux instead of the more traditional access control – discretionary with users and groups. (DAC).

References:

[1] http://www.postgresql.org/docs/9.1/static/release-9-1.html
[2] http://wiki.postgresql.org/wiki/What%27s_new_in_PostgreSQL_9.1
[3] http://www.postgresql.org/docs/9.1/static/release-9-1-1.html
[4] http://pgpool.projects.postgresql.org/

Categories: DBA Blogs

Linux cluster sysadmin — Parallel command execution with PDSH

Rittman Mead Consulting - Tue, 2014-12-09 11:40

In this series of blog posts I’m taking a look at a few very useful tools that can make your life as the sysadmin of a cluster of Linux machines easier. This may be a Hadoop cluster, or just a plain simple set of ‘normal’ machines on which you want to run the same commands and monitoring.

Previously we looked at using SSH keys for intra-machine authorisation, which is a pre-requisite what we’ll look at here — executing the same command across multiple machines using PDSH. In the next post of the series we’ll see how we can monitor OS metrics across a cluster with colmux.

PDSH is a very smart little tool that enables you to issue the same command on multiple hosts at once, and see the output. You need to have set up ssh key authentication from the client to host on all of them, so if you followed the steps in the first section of this article you’ll be good to go.

The syntax for using it is nice and simple:

  • -w specifies the addresses. You can use numerical ranges [1-4] and/or comma-separated lists of hosts. If you want to connect as a user other than the current user on the calling machine, you can specify it here (or as a separate -l argument)
  • After that is the command to run.

For example run against a small cluster of four machines that I have:

robin@RNMMBP $ pdsh -w root@rnmcluster02-node0[1-4] date

rnmcluster02-node01: Fri Nov 28 17:26:17 GMT 2014
rnmcluster02-node02: Fri Nov 28 17:26:18 GMT 2014
rnmcluster02-node03: Fri Nov 28 17:26:18 GMT 2014
rnmcluster02-node04: Fri Nov 28 17:26:18 GMT 2014

PDSH can be installed on the Mac under Homebrew (did I mention that Rittman Mead laptops are Macs, so I can do all of this straight from my work machine… :-) )

brew install pdsh

And if you want to run it on Linux from the EPEL yum repository (RHEL-compatible, but packages for other distros are available):

yum install pdsh

You can run it from a cluster node, or from your client machine (assuming your client machine is mac/linux).

Example – install and start collectl on all nodes

I started looking into pdsh when it came to setting up a cluster of machines from scratch. One of the must-have tools I like to have on any machine that I work with is the excellent collectl. This is an OS resource monitoring tool that I initially learnt of through Kevin Closson and Greg Rahn, and provides the kind of information you’d get from top etc – and then some! It can run interactively, log to disk, run as a service – and it also happens to integrate very nicely with graphite, making it a no-brainer choice for any server.

So, instead of logging into each box individually I could instead run this:

pdsh -w root@rnmcluster02-node0[1-4] yum install -y collectl
pdsh -w root@rnmcluster02-node0[1-4] service collectl start
pdsh -w root@rnmcluster02-node0[1-4] chkconfig collectl on

Yes, I know there are tools out there like puppet and chef that are designed for doing this kind of templated build of multiple servers, but the point I want to illustrate here is that pdsh enables you to do ad-hoc changes to a set of servers at once. Sure, once I have my cluster built and want to create an image/template for future builds, then it would be daft if I were building the whole lot through pdsh-distributed yum commands.

Example – setting up the date/timezone/NTPD

Often the accuracy of the clock on each server in a cluster is crucial, and we can easily do this with pdsh:

Install packages

robin@RNMMBP ~ $ pdsh -w root@rnmcluster02-node0[1-4] yum install -y ntp ntpdate

Set the timezone:

robin@RNMMBP ~ $ pdsh -w root@rnmcluster02-node0[1-4] ln -sf /usr/share/zoneinfo/Europe/London /etc/localtime

Force a time refresh:

robin@RNMMBP ~ $ pdsh -w root@rnmcluster02-node0[1-4] ntpdate pool.ntp.org
rnmcluster02-node03: 30 Nov 20:46:22 ntpdate[27610]: step time server 176.58.109.199 offset -2.928585 sec
rnmcluster02-node02: 30 Nov 20:46:22 ntpdate[28527]: step time server 176.58.109.199 offset -2.946021 sec
rnmcluster02-node04: 30 Nov 20:46:22 ntpdate[27615]: step time server 129.250.35.250 offset -2.915713 sec
rnmcluster02-node01: 30 Nov 20:46:25 ntpdate[29316]: 178.79.160.57 rate limit response from server.
rnmcluster02-node01: 30 Nov 20:46:22 ntpdate[29316]: step time server 176.58.109.199 offset -2.925016 sec

Set NTPD to start automatically at boot:

robin@RNMMBP ~ $ pdsh -w root@rnmcluster02-node0[1-4] chkconfig ntpd on

Start NTPD:

robin@RNMMBP ~ $ pdsh -w root@rnmcluster02-node0[1-4] service ntpd start

Example – using a HEREDOC (here-document) and sending quotation marks in a command with PDSH

Here documents (heredocs) are a nice way to embed multi-line content in a single command, enabling the scripting of a file creation rather than the clumsy instruction to “open an editor and paste the following lines into it and save the file as /foo/bar”.

Fortunately heredocs work just fine with pdsh, so long as you remember to enclose the whole command in quotation marks. And speaking of which, if you need to include quotation marks in your actual command, you need to escape them with a backslash. Here’s an example of both, setting up the configuration file for my ever-favourite gnu screen on all the nodes of the cluster:

robin@RNMMBP ~ $ pdsh -w root@rnmcluster02-node0[1-4] "cat > ~/.screenrc <<EOF
hardstatus alwayslastline \"%{= RY}%H %{kG}%{G} Screen(s): %{c}%w %=%{kG}%c  %D, %M %d %Y  LD:%l\"
startup_message off
msgwait 1
defscrollback 100000
nethack on
EOF
"

Now when I login to each individual node and run screen, I get a nice toolbar at the bottom:

Combining commands

To combine commands together that you send to each host you can use the standard bash operator semicolon ;

robin@RNMMBP ~ $ pdsh -w root@rnmcluster02-node0[1-4] "date;sleep 5;date"
rnmcluster02-node01: Sun Nov 30 20:57:06 GMT 2014
rnmcluster02-node03: Sun Nov 30 20:57:06 GMT 2014
rnmcluster02-node04: Sun Nov 30 20:57:06 GMT 2014
rnmcluster02-node02: Sun Nov 30 20:57:06 GMT 2014
rnmcluster02-node01: Sun Nov 30 20:57:11 GMT 2014
rnmcluster02-node03: Sun Nov 30 20:57:11 GMT 2014
rnmcluster02-node04: Sun Nov 30 20:57:11 GMT 2014
rnmcluster02-node02: Sun Nov 30 20:57:11 GMT 2014

Note the use of the quotation marks to enclose the entire command string. Without them the bash interpretor will take the ; as the delineator of the local commands, and try to run the subsequent commands locally:

robin@RNMMBP ~ $ pdsh -w root@rnmcluster02-node0[1-4] date;sleep 5;date
rnmcluster02-node03: Sun Nov 30 20:57:53 GMT 2014
rnmcluster02-node04: Sun Nov 30 20:57:53 GMT 2014
rnmcluster02-node02: Sun Nov 30 20:57:53 GMT 2014
rnmcluster02-node01: Sun Nov 30 20:57:53 GMT 2014
Sun 30 Nov 2014 20:58:00 GMT

You can also use && and || to run subsequent commands conditionally if the previous one succeeds or fails respectively:

robin@RNMMBP $ pdsh -w root@rnmcluster02-node[01-4] "chkconfig collectl on && service collectl start"

rnmcluster02-node03: Starting collectl: [  OK  ]
rnmcluster02-node02: Starting collectl: [  OK  ]
rnmcluster02-node04: Starting collectl: [  OK  ]
rnmcluster02-node01: Starting collectl: [  OK  ]

Piping and file redirects

Similar to combining commands above, you can pipe the output of commands, and you need to use quotation marks to enclose the whole command string.

robin@RNMMBP ~ $ pdsh -w root@rnmcluster02-node[01-4] "chkconfig|grep collectl"
rnmcluster02-node03: collectl           0:off   1:off   2:on    3:on    4:on    5:on    6:off
rnmcluster02-node01: collectl           0:off   1:off   2:on    3:on    4:on    5:on    6:off
rnmcluster02-node04: collectl           0:off   1:off   2:on    3:on    4:on    5:on    6:off
rnmcluster02-node02: collectl           0:off   1:off   2:on    3:on    4:on    5:on    6:off

However, you can pipe the output from pdsh to a local process if you want:

robin@RNMMBP ~ $ pdsh -w root@rnmcluster02-node[01-4] chkconfig|grep collectl
rnmcluster02-node02: collectl           0:off   1:off   2:on    3:on    4:on    5:on    6:off
rnmcluster02-node04: collectl           0:off   1:off   2:on    3:on    4:on    5:on    6:off
rnmcluster02-node03: collectl           0:off   1:off   2:on    3:on    4:on    5:on    6:off
rnmcluster02-node01: collectl           0:off   1:off   2:on    3:on    4:on    5:on    6:off

The difference is that you’ll be shifting the whole of the pipe across the network in order to process it locally, so if you’re just grepping etc this doesn’t make any sense. For use of utilities held locally and not on the remote server though, this might make sense.

File redirects work the same way – within quotation marks and the redirect will be to a file on the remote server, outside of them it’ll be local:

robin@RNMMBP ~ $ pdsh -w root@rnmcluster02-node[01-4] "chkconfig>/tmp/pdsh.out"
robin@RNMMBP ~ $ ls -l /tmp/pdsh.out
ls: /tmp/pdsh.out: No such file or directory

robin@RNMMBP ~ $ pdsh -w root@rnmcluster02-node[01-4] chkconfig>/tmp/pdsh.out
robin@RNMMBP ~ $ ls -l /tmp/pdsh.out
-rw-r--r--  1 robin  wheel  7608 30 Nov 19:23 /tmp/pdsh.out

Cancelling PDSH operations

As you can see from above, the precise syntax of pdsh calls can be hugely important. If you run a command and it appears ‘stuck’, or if you have that heartstopping realisation that the shutdown -h now you meant to run locally you ran across the cluster, you can press Ctrl-C once to see the status of your commands:

robin@RNMMBP ~ $ pdsh -w root@rnmcluster02-node[01-4] sleep 30
^Cpdsh@RNMMBP: interrupt (one more within 1 sec to abort)
pdsh@RNMMBP:  (^Z within 1 sec to cancel pending threads)
pdsh@RNMMBP: rnmcluster02-node01: command in progress
pdsh@RNMMBP: rnmcluster02-node02: command in progress
pdsh@RNMMBP: rnmcluster02-node03: command in progress
pdsh@RNMMBP: rnmcluster02-node04: command in progress

and press it twice (or within a second of the first) to cancel:

robin@RNMMBP ~ $ pdsh -w root@rnmcluster02-node[01-4] sleep 30
^Cpdsh@RNMMBP: interrupt (one more within 1 sec to abort)
pdsh@RNMMBP:  (^Z within 1 sec to cancel pending threads)
pdsh@RNMMBP: rnmcluster02-node01: command in progress
pdsh@RNMMBP: rnmcluster02-node02: command in progress
pdsh@RNMMBP: rnmcluster02-node03: command in progress
pdsh@RNMMBP: rnmcluster02-node04: command in progress
^Csending SIGTERM to ssh rnmcluster02-node01
sending signal 15 to rnmcluster02-node01 [ssh] pid 26534
sending SIGTERM to ssh rnmcluster02-node02
sending signal 15 to rnmcluster02-node02 [ssh] pid 26535
sending SIGTERM to ssh rnmcluster02-node03
sending signal 15 to rnmcluster02-node03 [ssh] pid 26533
sending SIGTERM to ssh rnmcluster02-node04
sending signal 15 to rnmcluster02-node04 [ssh] pid 26532
pdsh@RNMMBP: interrupt, aborting.

If you’ve got threads yet to run on the remote hosts, but want to keep running whatever has already started, you can use Ctrl-C, Ctrl-Z:

robin@RNMMBP ~ $ pdsh -f 2 -w root@rnmcluster02-node[01-4] "sleep 5;date"
^Cpdsh@RNMMBP: interrupt (one more within 1 sec to abort)
pdsh@RNMMBP:  (^Z within 1 sec to cancel pending threads)
pdsh@RNMMBP: rnmcluster02-node01: command in progress
pdsh@RNMMBP: rnmcluster02-node02: command in progress
^Zpdsh@RNMMBP: Canceled 2 pending threads.
rnmcluster02-node01: Mon Dec  1 21:46:35 GMT 2014
rnmcluster02-node02: Mon Dec  1 21:46:35 GMT 2014

NB the above example illustrates the use of the -f argument to limit how many threads are run against remote hosts at once. We can see the command is left running on the first two nodes and returns the date, whilst the Ctrl-C – Ctrl-Z stops it from being executed on the remaining nodes.

PDSH_SSH_ARGS_APPEND

By default, when you ssh to new host for the first time you’ll be prompted to validate the remote host’s SSH key fingerprint.

The authenticity of host 'rnmcluster02-node02 (172.28.128.9)' can't be established.
RSA key fingerprint is 00:c0:75:a8:bc:30:cb:8e:b3:8e:e4:29:42:6a:27:1c.
Are you sure you want to continue connecting (yes/no)?

This is one of those prompts that the majority of us just hit enter at and ignore; if that includes you then you will want to make sure that your PDSH call doesn’t fall in a heap because you’re connecting to a bunch of new servers all at once. PDSH is not an interactive tool, so if it requires input from the hosts it’s connecting to it’ll just fail. To avoid this SSH prompt, you can set up the environment variable PDSH_SSH_ARGS_APPEND as follows:

export PDSH_SSH_ARGS_APPEND="-q -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null"

The -q makes failures less verbose, and the -o passes in a couple of options, StrictHostKeyChecking to disable the above check, and UserKnownHostsFile to stop SSH keeping a list of host IP/hostnames and corresponding SSH fingerprints (by pointing it at /dev/null). You’ll want this if you’re working with VMs that are sharing a pool of IPs and get re-used, otherwise you get this scary failure:

@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@    WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!     @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
Someone could be eavesdropping on you right now (man-in-the-middle attack)!
It is also possible that a host key has just been changed.
The fingerprint for the RSA key sent by the remote host is
00:c0:75:a8:bc:30:cb:8e:b3:8e:e4:29:42:6a:27:1c.
Please contact your system administrator.

For both of these above options, make sure you’re aware of the security implications that you’re opening yourself up to. For a sandbox environment I just ignore them; for anything where security is of importance make sure you are aware of quite which server you are connecting to by SSH, and protecting yourself from MitM attacks.

PDSH Reference

You can find out more about PDSH at https://code.google.com/p/pdsh/wiki/UsingPDSH

Summary

When working with multiple Linux machines I would first and foremost make sure SSH keys are set up in order to ease management through password-less logins.

After SSH keys, I would recommend pdsh for parallel execution of the same SSH command across the cluster. It’s a big time saver particularly when initially setting up the cluster given the installation and configuration changes that are inevitably needed.

In the next article of this series we’ll see how the tool colmux is a powerful way to monitor OS metrics across a cluster.

So now your turn – what particular tools or tips do you have for working with a cluster of Linux machines? Leave your answers in the comments below, or tweet them to me at @rmoff.

Categories: BI & Warehousing

PalominoDB Percona Live: London Slides are up!

Pythian Group - Tue, 2014-12-09 11:35

Percona Live: London was a rousing success for PalominoDB.  I was sad that I could not attend, but I got a few people who sent “hellos” to me via my coworkers.  But on to the most important stuff — slides from our presentations are online!

René Cannao spoke about MySQL Backup and Recovery Tools and Techniques (description) – slides (PDF)

 

Jonathan delivered a 3-hour tutorial about Advanced MySQL Scaling Strategies for Developers (description) – slides (PDF)

Enjoy!

Categories: DBA Blogs

UKOUG 2014 - Middleware Day 2

Yann Neuhaus - Tue, 2014-12-09 11:06

Today the sessions were more “high level” so don’t expect deep information and concrete implementations.

Roadmap to Cloud Infrastructure and Service Integration with cloud application foundation and SOA Suite Frances Zhao-Perez and Simone Greib(Oracle)

Here Frances was talking about CAF (Cloud Application Foundation) which regroup products like weblogic, coherence and so on. She introduced the Oracle’s strategic investments list for this topic:

- #1 Mobile
- Fusion Middleware and applications
- Continuous availability
- Multitenancy for Density/Utilization
- Cloud management and operations

She also talk about future features in 12cR2 such as:

- Multi Datacenters support
- Coherence federated caching
- Recoverable caching
- Full Java EE7
- Java SE8
- Available in Cloud

Frances also briefly talk about ODA and OHS roadmaps but it was only from marketing side :)

Then Simone took the lead and made a recap’ of SOA key features such as:

- Operation made simple (startup acceleration, tuned profiles…)
- Developer productivity (Debugger and tester, Java database instead of full big one…)
- Mobile standards (REST, Json…)
- Cloud: SAP/JDE adapters

A new feature in the cloud is MFT (Managed File Transfer) for file-based integrations.

She also remind us about how simple it is to upgrade from SOA suite 11g to 12c and began with new incoming features such as:

- Business Insight: Business Process Monitoring (Business process simplified without JDeveloper)
- Internet of Things (IoT): Events driven actions
- BAM predictive analytics & BI Integration: it could build trends using our business data. For example it could predict the market for next weeks.

Mobile enabling your enterprise with Oracle SOA Suite Simone Greib and Yogesh Sontakke(Oracle)

This session was more oriented on the mobile part of SOA. Yogesh and Simone explained that you can support SOAP and REST on mobiles and they demonstrated how simple it is to build UI and business behind by exposing as a service.

They talked about architecture of mobile UI and their integration with a lot of adapters for different products. They took “Kiloutou”, in France, as an example of mobile application user as they use an application to manage their stocks, commands and services.

They also made a real live demo of how to use JSon or XML to manage events and communications between elements or services.

Maximun Availability in he Cloud: Oracle Weblogic Server and Oracle Coherence Frances Zhao-Perez(Oracle)

This session was heavily oriented on MAA (Maximum Availability Architecture) and Frances strongly underlined that Oracle is investing in maximum availability.

The goals of MAA are the following:

- Prevent business interruption
- Prevent data loss
- Deliver adequate response time
- Cost: Reduce deployments, managements and support costs
- Risk: Consistently achieve required service level

Here are the High Availability requirements for Oracle:

- Outage protection
- Recovery time
- Testing frequency
- Typical data loss
- Complexity in deployments
- ROI (Return on Investment)

Frances talked about Multi-data MAA solutions such as stretch cluster/domains, cache safety, Tx Logs in database, database Global Data Services, federated caching, recoverable caching and storage replication.

She introduced fastly Oracle Site Guard which provides recovery automation. And talked about next version features.

12.1.3:

- No Tlog option - Phase 1 (other phases will be implemented at each new releases)
- XA Transactions recovery across site

12.2.1 (will be a huge update next year):

- JTA HA
- Cross site Txn recovery
- Density for GridLink deployments
- Application continuity (Optimize connection harvesting on down events)

She finished on Coherence caching recovery allowing recover data from cache directly.

10128 trace to see partition pruning

Bobby Durrett's DBA Blog - Tue, 2014-12-09 10:57

I am working on an SR with Oracle support and they asked me to do a 10128 trace to look at how the optimizer is doing partition pruning.  I did some quick research on this trace and wanted to pass it along.

Here are the names of the two Oracle support documents that I found most helpful:

How to see Partition Pruning Occurred? (Doc ID 166118.1)

Partition Pruning Min/Max Optimization Fails when Parallel Query Run in Serial (Doc ID 1941770.1)

The first was the one Oracle support recommended.  But, the SR said to run both a level 2 and a level 7 trace and the first document did not mention level 7.  But, the second document has an example of a level 7 trace and more details on how to set it up.

I also found these two non-Oracle sites or blog posts:

http://cbohl.blogspot.com/2006/10/verify-that-partition-pruning-works.html

http://www.juliandyke.com/Diagnostics/Events/EventReference.html#10128

I do not have time to delve into this further now but if you are trying to understand partition pruning then the 10128 trace may help you understand how it works.

– Bobby


Categories: DBA Blogs

Cedar’s new website is live – get ready for the blog!

Duncan Davies - Tue, 2014-12-09 08:00

I’m really pleased that Cedar have got our new website live – just in time for UKOUG Apps 14. website As you would expect it highlights the services that Cedar provides – both Oracle Cloud (Fusion and Taleo) and obviously PeopleSoft implementation, hosting and support. It contains details of our people and locations (we’ve offices in Kings Cross, London, plus India, Switzerland and Australia).  It also contains case studies of some of the project successes that we’ve had, and some of the nice things that clients have said about us. One of the things I’m most excited about is the blog. Make sure you add it to your feed reader as we’re going to be sharing some good content there from all of the practices within our company (plus the occasional post of us doing fun things!).

The new website can be found here: http://www.cedarconsulting.co.uk/


Linux cluster sysadmin — SSH keys

Rittman Mead Consulting - Tue, 2014-12-09 05:34

In this short series of blog posts I’m going to take a look at a few very useful tools that can make your life as the sysadmin of a cluster of Linux machines easier. This may be a Hadoop cluster, or just a plain simple set of ‘normal’ machines on which you want to run the same commands and monitoring.

To start with, we’re going to use the ever-awesome ssh keys to manage security on the cluster. After that we’ll look at executing the same command across multiple machines at the same time using PDSH, and then monitoring OS metrics across a cluster with colmux.

In a nutshell, ssh keys enable us to do password-less authentication in a secure way. You can find a detailed explanation of them in a previous post that I wrote, tips and tricks for OBIEE Linux sysadmin. Beyond the obvious time-saving function of not having to enter a password each time we connect to a machine, having SSH keys in place enable the use of the tools we discuss later, pdsh and colmux.

Working with SSH keys involves taking the public key from a pair, and adding that to another machine in order to allow the owner of the pair’s private key to access that machine. What we’re going to do here is generate a unique key pair that will be used as the identity across the cluster. So each node will have a copy of the private key, in order to be able to authenticate to any other node, which will be holding a copy of the public key (as well as, in turn, the same private key).

In this example I’m going to use my own client machine to connect to the cluster. You could easily use any of the cluster nodes too if a local machine would not be appropriate.
As a side-note, this is another reason why I love the fact that Rittman Mead standard-issue laptop is a MacBook, and just under the covers of Mac OS is a *nix-based command-line meaning that a lot of sysadmin work can be done natively without needing additional tools that you would on Windows (e.g. PuTTY, WinSCP, Pageant, etc etc).

SSH key strategy

We’ve several ways we could implement the SSH keys. Because it’s a purely sandbox cluster, I could use the same SSH key pair that I generate for the cluster on my machine too, so the same public/private key pair is distributed thus:

If we wanted a bit more security, a better approach might be to distribute my personal SSH key’s public key across the cluster too, and leave the cluster’s private key to truly identify cluster nodes alone. An additional benefit of this approach is that is the client does not need to hold a copy of the cluster’s SSH private key, instead just continuing to use their own.

For completeness, the extreme version of the key strategy would be for each machine to have its own ssh key pair (i.e. its own security identity), with the corresponding public keys distributed to the other nodes in the cluster:

But anyway, here we’re using the second option – a unique keypair used across the cluster and the client’s public ssh key distributed across the cluster too.

Generating the SSH key pair

First, we need to generate the key. I’m going to create a folder to hold it first, because in a moment we’re going to push it and a couple of other files out to all the servers in the cluster and it’s easiest to do this from a single folder.

mkdir /tmp/rnmcluster02-ssh-keys

Note that in the ssh-keygen command below I’m specifying the target path for the key with the -f argument; if you don’t then watch out that you don’t accidentally overwrite your own key pair in the default path of ~/.ssh.

The -q -N "" flags instruct the key generation to use no passphrase for the key and to not prompt for it either. This is the lowest friction approach (you don’t need to unlock the ssh key with a passphrase before use) but also the least secure. If you’re setting up access to a machine where security matters then bear in mind that without a passphrase on an ssh key anyone who obtains it can therefore access any machine to which the key has been granted access (i.e. on which its public key has been deployed).

ssh-keygen -f /tmp/rnmcluster02-ssh-keys/id_rsa -q -N ""

This generates in the tmp folder two files – the private and public (.pub) keys of the pair:

robin@RNMMBP ~ $ ls -l /tmp/rnmcluster02-ssh-keys
total 16
-rw-------  1 robin  wheel  1675 30 Nov 17:28 id_rsa
-rw-r--r--  1 robin  wheel   400 30 Nov 17:28 id_rsa.pub

Preparing the authorized_keys file

Now we’ll prepare the authorized_keys file which is where the public SSH key of any identity permitted to access the machine is stored. Note that each user on a machine has their own authorized_keys file, in ~/.ssh/. So for example, the root user has the file in /root/.ssh/authorized_keys and any public key listed in that file will be able to connect to the server as the root user. Be aware the American [mis-]spelling of “authorized” – spell it [correctly] as “authorised” and you’ll not get any obvious errors, but the ssh key login won’t work either.

So we’re going to copy the public key of the unique pair that we just created for the cluster into the authorized_keys file. In addition we will copy in our own personal ssh key (and any other public key that we want to give access to all the nodes in the cluster):

cp /tmp/rnmcluster02-ssh-keys/id_rsa.pub /tmp/rnmcluster02-ssh-keys/authorized_keys
# [optional] Now add any other keys (such as your own) into the authorized_keys file just created
cat ~/.ssh/id_rsa.pub >> /tmp/rnmcluster02-ssh-keys/authorized_keys
# NB make sure the previous step is a double >> not > since the double appends to the file, a single overwrites.

Distributing the SSH artefacts

Now we’re going to push this set of SSH files out to the .ssh folder of the target user on each node, which in this case is the root user. From a security point of view it’s probably better to use a non-root user for login and then sudo as required, but we’re keeping things simple (and less secure) to start with here. So the files in our folder are:

  • id_rsa – the private key of the key pair
  • id_rsa.pub – the public key of the key pair. Strictly speaking this doesn’t need distributing to all nodes, but it’s conventional and handy to hold it alongside the private key.
  • authorized_keys – this is the file that the sshd daemon on each node will look at to validate an incoming login request’s offered private key, and so needs to hold the public key of anyone who is allowed to access the machine as this user.

To copy the files we’ll use scp, but how you get them in place doesn’t really matter so much, so long as they get to the right place:

scp -r /tmp/rnmcluster02-ssh-keys root@rnmcluster02-node01:~/.ssh

At this point you’ll need to enter the password for the target user, but rejoice! This is the last time you’ll need to enter it as subsequent logins will be authenticated using the ssh keys that you’re now configuring.

Run the scp for all nodes in the cluster. If you’ve four nodes in the cluster your output should look something like this:

$ scp -r /tmp/rnmcluster02-ssh-keys/ root@rnmcluster02-node01:~/.ssh
root@rnmcluster02-node01's password:
authorized_keys                                                  100%  781     0.8KB/s   00:00
id_rsa                                                           100% 1675     1.6KB/s   00:00
id_rsa.pub                                                       100%  400     0.4KB/s   00:00
$ scp -r /tmp/rnmcluster02-ssh-keys/ root@rnmcluster02-node02:~/.ssh
Warning: Permanently added the RSA host key for IP address '172.28.128.7' to the list of known hosts.
root@rnmcluster02-node02's password:
authorized_keys                                                  100%  781     0.8KB/s   00:00
id_rsa                                                           100% 1675     1.6KB/s   00:00
id_rsa.pub                                                       100%  400     0.4KB/s   00:00
$ scp -r /tmp/rnmcluster02-ssh-keys/ root@rnmcluster02-node03:~/.ssh
root@rnmcluster02-node03's password:
authorized_keys                                                  100%  781     0.8KB/s   00:00
id_rsa                                                           100% 1675     1.6KB/s   00:00
id_rsa.pub                                                       100%  400     0.4KB/s   00:00
$ scp -r /tmp/rnmcluster02-ssh-keys/ root@rnmcluster02-node04:~/.ssh
root@rnmcluster02-node04's password:
authorized_keys                                                  100%  781     0.8KB/s   00:00
id_rsa                                                           100% 1675     1.6KB/s   00:00
id_rsa.pub                                                       100%  400     0.4KB/s   00:00

Testing login authenticated through SSH keys

The moment of truth. From your client machine, try to ssh to each of the cluster nodes. If you are prompted for a password, then something is not right – see the troubleshooting section below.

If you put your own public key in authorized_keys when you created it then you don’t need to specify which key to use when connecting because it’ll use your own private key by default:

robin@RNMMBP ~ $ ssh root@rnmcluster02-node01
Last login: Fri Nov 28 17:13:23 2014 from 172.28.128.1



[root@localhost ~]#

There we go – logged in automagically with no password prompt. If we’re using the cluster’s private key (rather than our own) you need to specify it with -i when you connect.

robin@RNMMBP ~ $ ssh -i /tmp/rnmcluster02-ssh-keys/id_rsa root@rnmcluster02-node01
Last login: Fri Nov 28 17:13:23 2014 from 172.28.128.1



[root@localhost ~]#

Troubleshooting SSH key connections

SSH keys are one of the best things in a sysadmin’s toolkit, but when they don’t work can be a bit tricky to sort out. The first thing to check is that on the target machine the authorized_keys file that does all the magic (by listing the ssh keys that are permitted to connect inbound on a host to the given user) is in place:

[root@localhost .ssh]# ls -l ~/.ssh/authorized_keys
-rw-r--r-- 1 root root 775 Nov 30 18:55 /root/.ssh/authorized_keys

If you get this:

[root@localhost .ssh]# ls -l ~/.ssh/authorized_keys
ls: cannot access /root/.ssh/authorized_keys: No such file or directory

then you have a problem.

One possible issue in this specific instance could be that the above pre-canned scp assumes that the user’s .ssh folder doesn’t already exist (since it doesn’t, on brand new servers) and so specifies it as the target name for the whole rnmcluster02-ssh-keys folder. However if it does already exist then it ends up copying the rnmcluster02-ssh-keys folder into the .ssh folder:

[root@localhost .ssh]# ls -lR
.:
total 12
-rw------- 1 root root 1675 Nov 22  2013 id_rsa
-rw-r--r-- 1 root root  394 Nov 22  2013 id_rsa.pub
drwxr-xr-x 2 root root 4096 Nov 30 18:49 rnmcluster02-ssh-keys

./rnmcluster02-ssh-keys:
total 12
-rw-r--r-- 1 root root  775 Nov 30 18:49 authorized_keys
-rw------- 1 root root 1675 Nov 30 18:49 id_rsa
-rw-r--r-- 1 root root  394 Nov 30 18:49 id_rsa.pub
[root@localhost .ssh]#

To fix this simply move the authorized_keys from rnmcluster02-ssh-keys back into .ssh:

[root@localhost .ssh]# mv ~/.ssh/rnmcluster02-ssh-keys/authorized_keys ~/.ssh/

Other frequent causes of problems are file/folder permissions that are too lax on the target user’s .ssh folder (which can be fixed with chmod -R 700 ~/.ssh) or the connecting user’s ssh private key (fix: chmod 600 id_rsa). The latter will show on connection attempts very clearly:

robin@RNMMBP ~ $ ssh -i /tmp/rnmcluster02-ssh-keys/id_rsa root@rnmcluster02-node01
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@         WARNING: UNPROTECTED PRIVATE KEY FILE!          @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
Permissions 0777 for '/tmp/rnmcluster02-ssh-keys/id_rsa' are too open.
It is required that your private key files are NOT accessible by others.
This private key will be ignored.
bad permissions: ignore key: /tmp/rnmcluster02-ssh-keys/id_rsa

Another one that has bitten me twice over time – and that eludes the troubleshooting I’ll demonstrate in a moment – is that SELinux gets stroppy about root access using ssh keys. I always just take this as a handy reminder to disable selinux (in /etc/selinux/config, set SELINUX=disabled), having never had cause to leave it enabled. But, if you do need it enabled you’ll need to hit the interwebs to check the exact cause/solution for this problem.

So to troubleshoot ssh key problems in general do two things. Firstly from the client side, specify verbosity (-v for a bit of verbosity, -vvv for most)

ssh -v -i /tmp/rnmcluster02-ssh-keys/id_rsa root@rnmcluster02-node01

You should observe ssh trying to use the private key, and if the server rejects it it’ll fall back to any other ssh private keys it can find, and then password authentication:

[...]
debug1: Offering RSA public key: /tmp/rnmcluster02-ssh-keys/id_rsa
debug1: Authentications that can continue: publickey,gssapi-keyex,gssapi-with-mic,password
debug1: Next authentication method: password

Quite often the problem will be on the server side, so assuming that you can still connect to the server (eg through the physical console, or using password authentication) then go and check /var/log/secure where you’ll see all logs relating to attempted connections. Here’s the log file corresponding to the above client log, where ssh key authentication is attempted but fails, and then password authentication is used to successfully connect:

Nov 30 18:15:05 localhost sshd[13156]: Authentication refused: bad ownership or modes for file /root/.ssh/authorized_keys
Nov 30 18:15:15 localhost sshd[13156]: Accepted password for root from 172.28.128.1 port 59305 ssh2
Nov 30 18:15:15 localhost sshd[13156]: pam_unix(sshd:session): session opened for user root by (uid=0)

Now we can see clearly what the problem is – “bad ownership or modes for file /root/.ssh/authorized_keys”.

The last roll of the troubleshooting dice is to get sshd (the ssh daemon that runs on the host we’re trying to connect to) to issue more verbose logs. You can either set LogLevel DEBUG1 (or DEBUG2, or DEBUG3) in /etc/ssh/sshd_config and restart the ssh daemon (service sshd restart), or you can actually run a (second) ssh daemon from the host with specific logging. This would be appropriate on a multi-user server where you can’t just go changing sshd configuration. To run a second instance of sshd you’d use:

/usr/sbin/sshd -D -d -p 2222

You have to run sshd from an absolute path (you’ll get told this if you try not to). The -D flag stops it running as a daemon and instead runs interactively, so we can see easily all the output from it. -d specifies the debug logging (-dd or -ddd for greater levels of verbosity), and -p 2222 tells sshd to listen on port 2222. Since we’re doing this on top of the existing sshd, we obviously can’t use the default ssh port (22) so pick another port that is available (and not blocked by a firewall).

Now on the client retry the connection, but pointing to the port of the interactive sshd instance:

ssh -v -p 2222 -i /tmp/rnmcluster02-ssh-keys/id_rsa root@rnmcluster02-node01

When you run the command on the client you should get both the client and host machine debug output go crackers for a second, giving you plenty of diagnostics to pore through and analyse the ssh handshake etc to get to the root of the issue.

Hopefully you’ve now sorted your SSH keys, because in the next article we’re going to see how we can use them to run commands against multiple servers at once using pdsh.

Summary

When working with multiple Linux machines I would first and foremost make sure SSH keys are set up in order to ease management through password-less logins.

We’ll see in the next couple of articles some other tools that are useful when working on a cluster:

  • pdsh
  • colmux

I’m interested in what you think – what particular tools or tips do you have for working with a cluster of Linux machines? Leave your answers in the comments below, or tweet them to me at @rmoff.

Categories: BI & Warehousing

UKOUG Tech14 slides – Exadata Security Best Practices

Dan Norris - Tue, 2014-12-09 04:54

I think 2 years is long enough to wait between posts!

Today I delivered a session about Oracle Exadata Database Machine Best Practices and promised to post the slides for it (though no one asked about them :). I’ve also posted them to the Tech14 agenda as well.

Direct download: UKOUG Tech14 Exadata Security slides