Skip navigation.

Feed aggregator

Call for Papers for the O’Reilly MySQL Conference

Pythian Group - Tue, 2014-12-09 14:35

The call for papers for the O’Reilly MySQL Conference is now open, and closes October 25th.  Submit your proposal now at http://en.oreilly.com/mysql2011/user/proposal/propose/cfp/126!

Categories: DBA Blogs

User Experience y desarrollo enfocado al contexto: Shape and ShipIt Design Jam

Usable Apps - Tue, 2014-12-09 12:13

Desarrollador de Experiencias de Usuario (User Experience Developer), Sarahi Mireles escribe:

El pasado 4 y 5 de Noviembre, tuve la oportunidad de participar en el Shape and ShipIt Design Jam interno que se llevo a cabo en Oracle HQ. Ahí, diferentes miembros del equipo de User Experience nos reunimos para investigar e innovar soluciones móviles empresariales.

¿El objetivo de todo esto? Conocer más sobre el concepto de desarrollo enfocado al contexto, lo que da como resultado una interacción más natural e intuitiva entre el usuario y las soluciones empresariales que utiliza día con día.

Participantes Cindy Fong, Sarahi Mireles, Tony Orciuoli, y Thao Nguyen [foto: Karen Scipi]

Estuvimos trabajando en equipos durante dos días, y debo decir que fue muy divertido (¿quién dice que el trabajo no puede ser divertido?). En ese tiempo hicimos lluvia de ideas, las afinamos, hicimos nuestros propios wireframes basados en casos de uso y finalmente comenzamos a codificar.

 Karen Scipi]

Participantes Luis Galeana, Julian Orr, Raymond Xie, Thao Nguyen, y Anthony Lai [foto: Karen Scipi]

¿El resultado? Soluciones empresariales fáciles de entender, de usar y relevantes, brindando al usuario la información necesaria en el momento más oportuno, lo que se ve reflejado en una experiencia de usuario simplemente increíble.

 Karen Scipi]

Equipo ASCII_kerz! presentando su solución a los jueces (jueces (sentados) Jeremy Ashley y Bill Kraus; participantes (de pie) Cindy Fong, Sarahi Mireles, y Tony Orciuoli) [foto: Karen Scipi]

Si quieres conocer más acerca de Oracle Applications User Experience visita el sitio de Usable Apps, y el blog theappslab.com para conocer más acerca de lo que el equipo de Jake Kuramoto (@jkuramot) está haciendo. Y por supuesto, sí quieres conocer más acerca del Oracle MDC (México Development Center) echa un vistazo a nuestra página de Facebook.

Final Videos of Open DB Camp Online:

Pythian Group - Tue, 2014-12-09 12:11

The final videos from Open DB Camp back in May in Sardinia, Italy are now online.  The full matrix of sessions, videos and slides can be found on the schedule page.

Hands on JDBC by Sandro Pinna – video

“MySQL Plugins, What are They? How you can use them to do wonders” by Sergei Golubchek of MariaDBvideo

The State of Open Source Databases by Kaj Arnö of SkySQL – video

Coming soon, videos from OSCon Data!

Categories: DBA Blogs

Postgresql 9.1 – Part 1: General Features

Pythian Group - Tue, 2014-12-09 12:00
General scope

Postgresql 9.1 runs over the theme “features, innovation and extensibility” and it really does. This version was born to overcome Postgresql 9.0 ‘s limitations and known bugs in replication. If you are developing over 9.0, it’s time to think seriously about preparing your code for Postgresql 9.1.

The intent of this series of posts are not to be another release features posts. I offer a vision based on my personal experience and focus on the features that I saw exciting for the most of the projects where I’m involved. If you want to read an excellent general article about the new features of this version, web to [2].

At the moment of this post, the last PostgreSQL version is 9.1.1 . It includes 11 commits to fix GIST memory leaks, VACUUM improvements, catalog fixes and others. A description of the minor release can be check at [3].

The main features included are:

  • Synchronous Replication
  • Foreign Data Support
  • Per Column collation support
  • True SSI (Serializable Snapshot Isolation)
  • Unlogged tables for ephemeral data
  • Writable Common Table Expressions
  • K-nearest-neighbor added to GIST indexes
  • Se-linux integration with the SECURITY LEVEL command
  • Update the PL/Python server-side language
  • To come: PGXN Client for install extensions easily from the command line. More information: http://pgxn.org/faq/  The source will be onhttps://github.com/pgxn/pgxn-client

Some of these features could be considered minor, but many think they are very cool while using 9.1  in their environments.

Considerations before migrating

If you are an old Pg user, you may already know the migration risks listed on the next page. Still, I advise that you note and carefully learn about these risks. Many users freeze their developments to older versions simply because they didn’t know how to solve new issues. The most notable case is when 8.3 stopped using implicit casts for some datatypes and many queries didn’t work as a result.

There are some important changes that could affect your queries, so take a pen and note:

  • The default value of standard_conforming_strings is now turned on by default. That means that backslashes are normal characters (which is the SQL standard behavior). So, if you have backslashes in your SQL code, you must add E’’ strings. For example: E’Don’t’
  • Function-style and attribute-style data type casts were disallowed for composite types. If you have code like value_composite.text or text(value_composite), you will need to use CAST or :: operator.
  • Whereas before the checks were skipped, domains are now based on arrays when they are updated, which results in a rechecking of the constraints.
  • String_to_array function returns now an empty array for a zero-length string (before it returned NULL). The same function splits into characters if you use the NULL separator.
  • The inclusion of the INSTEAD OF action for triggers will require you to recheck the logic of your triggers.
  • If you are an actual 9.0 replication user, you may know that in 9.1 you can control the side effects of VACUUM operations during big queries execution and replication. This is a really important improvement. Basically, if you run a big query in the slave server and the master starts a VACUUM process, the slave server can request the master postpone the cleanup of death rows that are being used by the query.
Brief description of main features

Don’t worry about the details, we’ll cover each feature in future posts.

  • Synchronous Replication
    • This feature enhances the durability of the data. Only one server can be synchronous with the master, the rest of the replicated servers will be asynchronous. If the actual synchronous server goes down, another server will become synchronous (using a list of servers insynchronous_standby_names).  Failover is not automatic, so you must use external tools to activate the standby sync server, one of the most popular is pgpool [4].
  • Foreign Data Support
    • The feature of Foreign Data Wrappers has been included since 8.4, but now it is possible to reach data from any database where a plugin exists. Included in the contribs, is a file called file_fwd, which connects CSV files to a linked table. Basically it provides an interface to connect to external data. In my opinion, this is perhaps one of the most useful features of this versions, especially if you have different data sources in your environment.
  • Serializable Snapshot Isolation
    • This new level of serialization is the strictest. Postgres now supports READ COMMITED, REPEATABLE READ (old serializable) and SERIALIZABLE. It uses predicate locking to keep the lock if the write would have an impact on the result. You will not need explicit locks to use this level, due to the automatic protection provided.
  • Unlogged tables
    • Postgres uses the WAL log to have a log of all the data changes to prevent data loss and guarantee consistency in the event of a crash, but it consumes resources and sometimes we have data that we can recover from other sources or that is ephemeral. In these cases, creation of unlogged tables allows the database to have tables without logging into the WAL, reducing the writes to disk. Otherwise, this data will not be replicated, due to the mechanism of replication used by Postgres (through WAL records shipping).
  • Writable Common Table Expressions
    • CTE was included in 8.4 version, but in this version, it was improved to allow you to use writes inside the CTE (WITH clause). This could save a lot of code in your functions.
  • K-nearest-neighbor added to GIST indexes
    • Postgres supports multiple types of indexes; one of them is GiST (Generalized Search Tree). With 9.1, we can define a ‘distance’ for datatypes and use it for with a GiST index. Right now, this feature is implemented for point, pg_trgm contrib and others btree_gist datatypes. The operator for distance is <-> . Another feature you will enjoy is that LIKE and ILIKE operators can use the tgrm index without scanning the whole table.
  • SE-Linux integration
    • Postgres is now the first database to be fully integrated with military security-grade. SECURITY LABEL applies a security label to a database object. This facility is intended to allow integration with label-based mandatory access control (MAC) systems such as SE-Linux instead of the more traditional access control – discretionary with users and groups. (DAC).

References:

[1] http://www.postgresql.org/docs/9.1/static/release-9-1.html
[2] http://wiki.postgresql.org/wiki/What%27s_new_in_PostgreSQL_9.1
[3] http://www.postgresql.org/docs/9.1/static/release-9-1-1.html
[4] http://pgpool.projects.postgresql.org/

Categories: DBA Blogs

Linux cluster sysadmin — Parallel command execution with PDSH

Rittman Mead Consulting - Tue, 2014-12-09 11:40

In this series of blog posts I’m taking a look at a few very useful tools that can make your life as the sysadmin of a cluster of Linux machines easier. This may be a Hadoop cluster, or just a plain simple set of ‘normal’ machines on which you want to run the same commands and monitoring.

Previously we looked at using SSH keys for intra-machine authorisation, which is a pre-requisite what we’ll look at here — executing the same command across multiple machines using PDSH. In the next post of the series we’ll see how we can monitor OS metrics across a cluster with colmux.

PDSH is a very smart little tool that enables you to issue the same command on multiple hosts at once, and see the output. You need to have set up ssh key authentication from the client to host on all of them, so if you followed the steps in the first section of this article you’ll be good to go.

The syntax for using it is nice and simple:

  • -w specifies the addresses. You can use numerical ranges [1-4] and/or comma-separated lists of hosts. If you want to connect as a user other than the current user on the calling machine, you can specify it here (or as a separate -l argument)
  • After that is the command to run.

For example run against a small cluster of four machines that I have:

robin@RNMMBP $ pdsh -w root@rnmcluster02-node0[1-4] date

rnmcluster02-node01: Fri Nov 28 17:26:17 GMT 2014
rnmcluster02-node02: Fri Nov 28 17:26:18 GMT 2014
rnmcluster02-node03: Fri Nov 28 17:26:18 GMT 2014
rnmcluster02-node04: Fri Nov 28 17:26:18 GMT 2014

PDSH can be installed on the Mac under Homebrew (did I mention that Rittman Mead laptops are Macs, so I can do all of this straight from my work machine… :-) )

brew install pdsh

And if you want to run it on Linux from the EPEL yum repository (RHEL-compatible, but packages for other distros are available):

yum install pdsh

You can run it from a cluster node, or from your client machine (assuming your client machine is mac/linux).

Example – install and start collectl on all nodes

I started looking into pdsh when it came to setting up a cluster of machines from scratch. One of the must-have tools I like to have on any machine that I work with is the excellent collectl. This is an OS resource monitoring tool that I initially learnt of through Kevin Closson and Greg Rahn, and provides the kind of information you’d get from top etc – and then some! It can run interactively, log to disk, run as a service – and it also happens to integrate very nicely with graphite, making it a no-brainer choice for any server.

So, instead of logging into each box individually I could instead run this:

pdsh -w root@rnmcluster02-node0[1-4] yum install -y collectl
pdsh -w root@rnmcluster02-node0[1-4] service collectl start
pdsh -w root@rnmcluster02-node0[1-4] chkconfig collectl on

Yes, I know there are tools out there like puppet and chef that are designed for doing this kind of templated build of multiple servers, but the point I want to illustrate here is that pdsh enables you to do ad-hoc changes to a set of servers at once. Sure, once I have my cluster built and want to create an image/template for future builds, then it would be daft if I were building the whole lot through pdsh-distributed yum commands.

Example – setting up the date/timezone/NTPD

Often the accuracy of the clock on each server in a cluster is crucial, and we can easily do this with pdsh:

Install packages

robin@RNMMBP ~ $ pdsh -w root@rnmcluster02-node0[1-4] yum install -y ntp ntpdate

Set the timezone:

robin@RNMMBP ~ $ pdsh -w root@rnmcluster02-node0[1-4] ln -sf /usr/share/zoneinfo/Europe/London /etc/localtime

Force a time refresh:

robin@RNMMBP ~ $ pdsh -w root@rnmcluster02-node0[1-4] ntpdate pool.ntp.org
rnmcluster02-node03: 30 Nov 20:46:22 ntpdate[27610]: step time server 176.58.109.199 offset -2.928585 sec
rnmcluster02-node02: 30 Nov 20:46:22 ntpdate[28527]: step time server 176.58.109.199 offset -2.946021 sec
rnmcluster02-node04: 30 Nov 20:46:22 ntpdate[27615]: step time server 129.250.35.250 offset -2.915713 sec
rnmcluster02-node01: 30 Nov 20:46:25 ntpdate[29316]: 178.79.160.57 rate limit response from server.
rnmcluster02-node01: 30 Nov 20:46:22 ntpdate[29316]: step time server 176.58.109.199 offset -2.925016 sec

Set NTPD to start automatically at boot:

robin@RNMMBP ~ $ pdsh -w root@rnmcluster02-node0[1-4] chkconfig ntpd on

Start NTPD:

robin@RNMMBP ~ $ pdsh -w root@rnmcluster02-node0[1-4] service ntpd start

Example – using a HEREDOC (here-document) and sending quotation marks in a command with PDSH

Here documents (heredocs) are a nice way to embed multi-line content in a single command, enabling the scripting of a file creation rather than the clumsy instruction to “open an editor and paste the following lines into it and save the file as /foo/bar”.

Fortunately heredocs work just fine with pdsh, so long as you remember to enclose the whole command in quotation marks. And speaking of which, if you need to include quotation marks in your actual command, you need to escape them with a backslash. Here’s an example of both, setting up the configuration file for my ever-favourite gnu screen on all the nodes of the cluster:

robin@RNMMBP ~ $ pdsh -w root@rnmcluster02-node0[1-4] "cat > ~/.screenrc <<EOF
hardstatus alwayslastline \"%{= RY}%H %{kG}%{G} Screen(s): %{c}%w %=%{kG}%c  %D, %M %d %Y  LD:%l\"
startup_message off
msgwait 1
defscrollback 100000
nethack on
EOF
"

Now when I login to each individual node and run screen, I get a nice toolbar at the bottom:

Combining commands

To combine commands together that you send to each host you can use the standard bash operator semicolon ;

robin@RNMMBP ~ $ pdsh -w root@rnmcluster02-node0[1-4] "date;sleep 5;date"
rnmcluster02-node01: Sun Nov 30 20:57:06 GMT 2014
rnmcluster02-node03: Sun Nov 30 20:57:06 GMT 2014
rnmcluster02-node04: Sun Nov 30 20:57:06 GMT 2014
rnmcluster02-node02: Sun Nov 30 20:57:06 GMT 2014
rnmcluster02-node01: Sun Nov 30 20:57:11 GMT 2014
rnmcluster02-node03: Sun Nov 30 20:57:11 GMT 2014
rnmcluster02-node04: Sun Nov 30 20:57:11 GMT 2014
rnmcluster02-node02: Sun Nov 30 20:57:11 GMT 2014

Note the use of the quotation marks to enclose the entire command string. Without them the bash interpretor will take the ; as the delineator of the local commands, and try to run the subsequent commands locally:

robin@RNMMBP ~ $ pdsh -w root@rnmcluster02-node0[1-4] date;sleep 5;date
rnmcluster02-node03: Sun Nov 30 20:57:53 GMT 2014
rnmcluster02-node04: Sun Nov 30 20:57:53 GMT 2014
rnmcluster02-node02: Sun Nov 30 20:57:53 GMT 2014
rnmcluster02-node01: Sun Nov 30 20:57:53 GMT 2014
Sun 30 Nov 2014 20:58:00 GMT

You can also use && and || to run subsequent commands conditionally if the previous one succeeds or fails respectively:

robin@RNMMBP $ pdsh -w root@rnmcluster02-node[01-4] "chkconfig collectl on && service collectl start"

rnmcluster02-node03: Starting collectl: [  OK  ]
rnmcluster02-node02: Starting collectl: [  OK  ]
rnmcluster02-node04: Starting collectl: [  OK  ]
rnmcluster02-node01: Starting collectl: [  OK  ]

Piping and file redirects

Similar to combining commands above, you can pipe the output of commands, and you need to use quotation marks to enclose the whole command string.

robin@RNMMBP ~ $ pdsh -w root@rnmcluster02-node[01-4] "chkconfig|grep collectl"
rnmcluster02-node03: collectl           0:off   1:off   2:on    3:on    4:on    5:on    6:off
rnmcluster02-node01: collectl           0:off   1:off   2:on    3:on    4:on    5:on    6:off
rnmcluster02-node04: collectl           0:off   1:off   2:on    3:on    4:on    5:on    6:off
rnmcluster02-node02: collectl           0:off   1:off   2:on    3:on    4:on    5:on    6:off

However, you can pipe the output from pdsh to a local process if you want:

robin@RNMMBP ~ $ pdsh -w root@rnmcluster02-node[01-4] chkconfig|grep collectl
rnmcluster02-node02: collectl           0:off   1:off   2:on    3:on    4:on    5:on    6:off
rnmcluster02-node04: collectl           0:off   1:off   2:on    3:on    4:on    5:on    6:off
rnmcluster02-node03: collectl           0:off   1:off   2:on    3:on    4:on    5:on    6:off
rnmcluster02-node01: collectl           0:off   1:off   2:on    3:on    4:on    5:on    6:off

The difference is that you’ll be shifting the whole of the pipe across the network in order to process it locally, so if you’re just grepping etc this doesn’t make any sense. For use of utilities held locally and not on the remote server though, this might make sense.

File redirects work the same way – within quotation marks and the redirect will be to a file on the remote server, outside of them it’ll be local:

robin@RNMMBP ~ $ pdsh -w root@rnmcluster02-node[01-4] "chkconfig>/tmp/pdsh.out"
robin@RNMMBP ~ $ ls -l /tmp/pdsh.out
ls: /tmp/pdsh.out: No such file or directory

robin@RNMMBP ~ $ pdsh -w root@rnmcluster02-node[01-4] chkconfig>/tmp/pdsh.out
robin@RNMMBP ~ $ ls -l /tmp/pdsh.out
-rw-r--r--  1 robin  wheel  7608 30 Nov 19:23 /tmp/pdsh.out

Cancelling PDSH operations

As you can see from above, the precise syntax of pdsh calls can be hugely important. If you run a command and it appears ‘stuck’, or if you have that heartstopping realisation that the shutdown -h now you meant to run locally you ran across the cluster, you can press Ctrl-C once to see the status of your commands:

robin@RNMMBP ~ $ pdsh -w root@rnmcluster02-node[01-4] sleep 30
^Cpdsh@RNMMBP: interrupt (one more within 1 sec to abort)
pdsh@RNMMBP:  (^Z within 1 sec to cancel pending threads)
pdsh@RNMMBP: rnmcluster02-node01: command in progress
pdsh@RNMMBP: rnmcluster02-node02: command in progress
pdsh@RNMMBP: rnmcluster02-node03: command in progress
pdsh@RNMMBP: rnmcluster02-node04: command in progress

and press it twice (or within a second of the first) to cancel:

robin@RNMMBP ~ $ pdsh -w root@rnmcluster02-node[01-4] sleep 30
^Cpdsh@RNMMBP: interrupt (one more within 1 sec to abort)
pdsh@RNMMBP:  (^Z within 1 sec to cancel pending threads)
pdsh@RNMMBP: rnmcluster02-node01: command in progress
pdsh@RNMMBP: rnmcluster02-node02: command in progress
pdsh@RNMMBP: rnmcluster02-node03: command in progress
pdsh@RNMMBP: rnmcluster02-node04: command in progress
^Csending SIGTERM to ssh rnmcluster02-node01
sending signal 15 to rnmcluster02-node01 [ssh] pid 26534
sending SIGTERM to ssh rnmcluster02-node02
sending signal 15 to rnmcluster02-node02 [ssh] pid 26535
sending SIGTERM to ssh rnmcluster02-node03
sending signal 15 to rnmcluster02-node03 [ssh] pid 26533
sending SIGTERM to ssh rnmcluster02-node04
sending signal 15 to rnmcluster02-node04 [ssh] pid 26532
pdsh@RNMMBP: interrupt, aborting.

If you’ve got threads yet to run on the remote hosts, but want to keep running whatever has already started, you can use Ctrl-C, Ctrl-Z:

robin@RNMMBP ~ $ pdsh -f 2 -w root@rnmcluster02-node[01-4] "sleep 5;date"
^Cpdsh@RNMMBP: interrupt (one more within 1 sec to abort)
pdsh@RNMMBP:  (^Z within 1 sec to cancel pending threads)
pdsh@RNMMBP: rnmcluster02-node01: command in progress
pdsh@RNMMBP: rnmcluster02-node02: command in progress
^Zpdsh@RNMMBP: Canceled 2 pending threads.
rnmcluster02-node01: Mon Dec  1 21:46:35 GMT 2014
rnmcluster02-node02: Mon Dec  1 21:46:35 GMT 2014

NB the above example illustrates the use of the -f argument to limit how many threads are run against remote hosts at once. We can see the command is left running on the first two nodes and returns the date, whilst the Ctrl-C – Ctrl-Z stops it from being executed on the remaining nodes.

PDSH_SSH_ARGS_APPEND

By default, when you ssh to new host for the first time you’ll be prompted to validate the remote host’s SSH key fingerprint.

The authenticity of host 'rnmcluster02-node02 (172.28.128.9)' can't be established.
RSA key fingerprint is 00:c0:75:a8:bc:30:cb:8e:b3:8e:e4:29:42:6a:27:1c.
Are you sure you want to continue connecting (yes/no)?

This is one of those prompts that the majority of us just hit enter at and ignore; if that includes you then you will want to make sure that your PDSH call doesn’t fall in a heap because you’re connecting to a bunch of new servers all at once. PDSH is not an interactive tool, so if it requires input from the hosts it’s connecting to it’ll just fail. To avoid this SSH prompt, you can set up the environment variable PDSH_SSH_ARGS_APPEND as follows:

export PDSH_SSH_ARGS_APPEND="-q -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null"

The -q makes failures less verbose, and the -o passes in a couple of options, StrictHostKeyChecking to disable the above check, and UserKnownHostsFile to stop SSH keeping a list of host IP/hostnames and corresponding SSH fingerprints (by pointing it at /dev/null). You’ll want this if you’re working with VMs that are sharing a pool of IPs and get re-used, otherwise you get this scary failure:

@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@    WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!     @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
Someone could be eavesdropping on you right now (man-in-the-middle attack)!
It is also possible that a host key has just been changed.
The fingerprint for the RSA key sent by the remote host is
00:c0:75:a8:bc:30:cb:8e:b3:8e:e4:29:42:6a:27:1c.
Please contact your system administrator.

For both of these above options, make sure you’re aware of the security implications that you’re opening yourself up to. For a sandbox environment I just ignore them; for anything where security is of importance make sure you are aware of quite which server you are connecting to by SSH, and protecting yourself from MitM attacks.

PDSH Reference

You can find out more about PDSH at https://code.google.com/p/pdsh/wiki/UsingPDSH

Summary

When working with multiple Linux machines I would first and foremost make sure SSH keys are set up in order to ease management through password-less logins.

After SSH keys, I would recommend pdsh for parallel execution of the same SSH command across the cluster. It’s a big time saver particularly when initially setting up the cluster given the installation and configuration changes that are inevitably needed.

In the next article of this series we’ll see how the tool colmux is a powerful way to monitor OS metrics across a cluster.

So now your turn – what particular tools or tips do you have for working with a cluster of Linux machines? Leave your answers in the comments below, or tweet them to me at @rmoff.

Categories: BI & Warehousing

PalominoDB Percona Live: London Slides are up!

Pythian Group - Tue, 2014-12-09 11:35

Percona Live: London was a rousing success for PalominoDB.  I was sad that I could not attend, but I got a few people who sent “hellos” to me via my coworkers.  But on to the most important stuff — slides from our presentations are online!

René Cannao spoke about MySQL Backup and Recovery Tools and Techniques (description) – slides (PDF)

 

Jonathan delivered a 3-hour tutorial about Advanced MySQL Scaling Strategies for Developers (description) – slides (PDF)

Enjoy!

Categories: DBA Blogs

UKOUG 2014 - Middleware Day 2

Yann Neuhaus - Tue, 2014-12-09 11:06

Today the sessions were more “high level” so don’t expect deep information and concrete implementations.

Roadmap to Cloud Infrastructure and Service Integration with cloud application foundation and SOA Suite Frances Zhao-Perez and Simone Greib(Oracle)

Here Frances was talking about CAF (Cloud Application Foundation) which regroup products like weblogic, coherence and so on. She introduced the Oracle’s strategic investments list for this topic:

- #1 Mobile
- Fusion Middleware and applications
- Continuous availability
- Multitenancy for Density/Utilization
- Cloud management and operations

She also talk about future features in 12cR2 such as:

- Multi Datacenters support
- Coherence federated caching
- Recoverable caching
- Full Java EE7
- Java SE8
- Available in Cloud

Frances also briefly talk about ODA and OHS roadmaps but it was only from marketing side :)

Then Simone took the lead and made a recap’ of SOA key features such as:

- Operation made simple (startup acceleration, tuned profiles…)
- Developer productivity (Debugger and tester, Java database instead of full big one…)
- Mobile standards (REST, Json…)
- Cloud: SAP/JDE adapters

A new feature in the cloud is MFT (Managed File Transfer) for file-based integrations.

She also remind us about how simple it is to upgrade from SOA suite 11g to 12c and began with new incoming features such as:

- Business Insight: Business Process Monitoring (Business process simplified without JDeveloper)
- Internet of Things (IoT): Events driven actions
- BAM predictive analytics & BI Integration: it could build trends using our business data. For example it could predict the market for next weeks.

Mobile enabling your enterprise with Oracle SOA Suite Simone Greib and Yogesh Sontakke(Oracle)

This session was more oriented on the mobile part of SOA. Yogesh and Simone explained that you can support SOAP and REST on mobiles and they demonstrated how simple it is to build UI and business behind by exposing as a service.

They talked about architecture of mobile UI and their integration with a lot of adapters for different products. They took “Kiloutou”, in France, as an example of mobile application user as they use an application to manage their stocks, commands and services.

They also made a real live demo of how to use JSon or XML to manage events and communications between elements or services.

Maximun Availability in he Cloud: Oracle Weblogic Server and Oracle Coherence Frances Zhao-Perez(Oracle)

This session was heavily oriented on MAA (Maximum Availability Architecture) and Frances strongly underlined that Oracle is investing in maximum availability.

The goals of MAA are the following:

- Prevent business interruption
- Prevent data loss
- Deliver adequate response time
- Cost: Reduce deployments, managements and support costs
- Risk: Consistently achieve required service level

Here are the High Availability requirements for Oracle:

- Outage protection
- Recovery time
- Testing frequency
- Typical data loss
- Complexity in deployments
- ROI (Return on Investment)

Frances talked about Multi-data MAA solutions such as stretch cluster/domains, cache safety, Tx Logs in database, database Global Data Services, federated caching, recoverable caching and storage replication.

She introduced fastly Oracle Site Guard which provides recovery automation. And talked about next version features.

12.1.3:

- No Tlog option - Phase 1 (other phases will be implemented at each new releases)
- XA Transactions recovery across site

12.2.1 (will be a huge update next year):

- JTA HA
- Cross site Txn recovery
- Density for GridLink deployments
- Application continuity (Optimize connection harvesting on down events)

She finished on Coherence caching recovery allowing recover data from cache directly.

10128 trace to see partition pruning

Bobby Durrett's DBA Blog - Tue, 2014-12-09 10:57

I am working on an SR with Oracle support and they asked me to do a 10128 trace to look at how the optimizer is doing partition pruning.  I did some quick research on this trace and wanted to pass it along.

Here are the names of the two Oracle support documents that I found most helpful:

How to see Partition Pruning Occurred? (Doc ID 166118.1)

Partition Pruning Min/Max Optimization Fails when Parallel Query Run in Serial (Doc ID 1941770.1)

The first was the one Oracle support recommended.  But, the SR said to run both a level 2 and a level 7 trace and the first document did not mention level 7.  But, the second document has an example of a level 7 trace and more details on how to set it up.

I also found these two non-Oracle sites or blog posts:

http://cbohl.blogspot.com/2006/10/verify-that-partition-pruning-works.html

http://www.juliandyke.com/Diagnostics/Events/EventReference.html#10128

I do not have time to delve into this further now but if you are trying to understand partition pruning then the 10128 trace may help you understand how it works.

– Bobby


Categories: DBA Blogs

Cedar’s new website is live – get ready for the blog!

Duncan Davies - Tue, 2014-12-09 08:00

I’m really pleased that Cedar have got our new website live – just in time for UKOUG Apps 14. website As you would expect it highlights the services that Cedar provides – both Oracle Cloud (Fusion and Taleo) and obviously PeopleSoft implementation, hosting and support. It contains details of our people and locations (we’ve offices in Kings Cross, London, plus India, Switzerland and Australia).  It also contains case studies of some of the project successes that we’ve had, and some of the nice things that clients have said about us. One of the things I’m most excited about is the blog. Make sure you add it to your feed reader as we’re going to be sharing some good content there from all of the practices within our company (plus the occasional post of us doing fun things!).

The new website can be found here: http://www.cedarconsulting.co.uk/


Linux cluster sysadmin — SSH keys

Rittman Mead Consulting - Tue, 2014-12-09 05:34

In this short series of blog posts I’m going to take a look at a few very useful tools that can make your life as the sysadmin of a cluster of Linux machines easier. This may be a Hadoop cluster, or just a plain simple set of ‘normal’ machines on which you want to run the same commands and monitoring.

To start with, we’re going to use the ever-awesome ssh keys to manage security on the cluster. After that we’ll look at executing the same command across multiple machines at the same time using PDSH, and then monitoring OS metrics across a cluster with colmux.

In a nutshell, ssh keys enable us to do password-less authentication in a secure way. You can find a detailed explanation of them in a previous post that I wrote, tips and tricks for OBIEE Linux sysadmin. Beyond the obvious time-saving function of not having to enter a password each time we connect to a machine, having SSH keys in place enable the use of the tools we discuss later, pdsh and colmux.

Working with SSH keys involves taking the public key from a pair, and adding that to another machine in order to allow the owner of the pair’s private key to access that machine. What we’re going to do here is generate a unique key pair that will be used as the identity across the cluster. So each node will have a copy of the private key, in order to be able to authenticate to any other node, which will be holding a copy of the public key (as well as, in turn, the same private key).

In this example I’m going to use my own client machine to connect to the cluster. You could easily use any of the cluster nodes too if a local machine would not be appropriate.
As a side-note, this is another reason why I love the fact that Rittman Mead standard-issue laptop is a MacBook, and just under the covers of Mac OS is a *nix-based command-line meaning that a lot of sysadmin work can be done natively without needing additional tools that you would on Windows (e.g. PuTTY, WinSCP, Pageant, etc etc).

SSH key strategy

We’ve several ways we could implement the SSH keys. Because it’s a purely sandbox cluster, I could use the same SSH key pair that I generate for the cluster on my machine too, so the same public/private key pair is distributed thus:

If we wanted a bit more security, a better approach might be to distribute my personal SSH key’s public key across the cluster too, and leave the cluster’s private key to truly identify cluster nodes alone. An additional benefit of this approach is that is the client does not need to hold a copy of the cluster’s SSH private key, instead just continuing to use their own.

For completeness, the extreme version of the key strategy would be for each machine to have its own ssh key pair (i.e. its own security identity), with the corresponding public keys distributed to the other nodes in the cluster:

But anyway, here we’re using the second option – a unique keypair used across the cluster and the client’s public ssh key distributed across the cluster too.

Generating the SSH key pair

First, we need to generate the key. I’m going to create a folder to hold it first, because in a moment we’re going to push it and a couple of other files out to all the servers in the cluster and it’s easiest to do this from a single folder.

mkdir /tmp/rnmcluster02-ssh-keys

Note that in the ssh-keygen command below I’m specifying the target path for the key with the -f argument; if you don’t then watch out that you don’t accidentally overwrite your own key pair in the default path of ~/.ssh.

The -q -N "" flags instruct the key generation to use no passphrase for the key and to not prompt for it either. This is the lowest friction approach (you don’t need to unlock the ssh key with a passphrase before use) but also the least secure. If you’re setting up access to a machine where security matters then bear in mind that without a passphrase on an ssh key anyone who obtains it can therefore access any machine to which the key has been granted access (i.e. on which its public key has been deployed).

ssh-keygen -f /tmp/rnmcluster02-ssh-keys/id_rsa -q -N ""

This generates in the tmp folder two files – the private and public (.pub) keys of the pair:

robin@RNMMBP ~ $ ls -l /tmp/rnmcluster02-ssh-keys
total 16
-rw-------  1 robin  wheel  1675 30 Nov 17:28 id_rsa
-rw-r--r--  1 robin  wheel   400 30 Nov 17:28 id_rsa.pub

Preparing the authorized_keys file

Now we’ll prepare the authorized_keys file which is where the public SSH key of any identity permitted to access the machine is stored. Note that each user on a machine has their own authorized_keys file, in ~/.ssh/. So for example, the root user has the file in /root/.ssh/authorized_keys and any public key listed in that file will be able to connect to the server as the root user. Be aware the American [mis-]spelling of “authorized” – spell it [correctly] as “authorised” and you’ll not get any obvious errors, but the ssh key login won’t work either.

So we’re going to copy the public key of the unique pair that we just created for the cluster into the authorized_keys file. In addition we will copy in our own personal ssh key (and any other public key that we want to give access to all the nodes in the cluster):

cp /tmp/rnmcluster02-ssh-keys/id_rsa.pub /tmp/rnmcluster02-ssh-keys/authorized_keys
# [optional] Now add any other keys (such as your own) into the authorized_keys file just created
cat ~/.ssh/id_rsa.pub >> /tmp/rnmcluster02-ssh-keys/authorized_keys
# NB make sure the previous step is a double >> not > since the double appends to the file, a single overwrites.

Distributing the SSH artefacts

Now we’re going to push this set of SSH files out to the .ssh folder of the target user on each node, which in this case is the root user. From a security point of view it’s probably better to use a non-root user for login and then sudo as required, but we’re keeping things simple (and less secure) to start with here. So the files in our folder are:

  • id_rsa – the private key of the key pair
  • id_rsa.pub – the public key of the key pair. Strictly speaking this doesn’t need distributing to all nodes, but it’s conventional and handy to hold it alongside the private key.
  • authorized_keys – this is the file that the sshd daemon on each node will look at to validate an incoming login request’s offered private key, and so needs to hold the public key of anyone who is allowed to access the machine as this user.

To copy the files we’ll use scp, but how you get them in place doesn’t really matter so much, so long as they get to the right place:

scp -r /tmp/rnmcluster02-ssh-keys root@rnmcluster02-node01:~/.ssh

At this point you’ll need to enter the password for the target user, but rejoice! This is the last time you’ll need to enter it as subsequent logins will be authenticated using the ssh keys that you’re now configuring.

Run the scp for all nodes in the cluster. If you’ve four nodes in the cluster your output should look something like this:

$ scp -r /tmp/rnmcluster02-ssh-keys/ root@rnmcluster02-node01:~/.ssh
root@rnmcluster02-node01's password:
authorized_keys                                                  100%  781     0.8KB/s   00:00
id_rsa                                                           100% 1675     1.6KB/s   00:00
id_rsa.pub                                                       100%  400     0.4KB/s   00:00
$ scp -r /tmp/rnmcluster02-ssh-keys/ root@rnmcluster02-node02:~/.ssh
Warning: Permanently added the RSA host key for IP address '172.28.128.7' to the list of known hosts.
root@rnmcluster02-node02's password:
authorized_keys                                                  100%  781     0.8KB/s   00:00
id_rsa                                                           100% 1675     1.6KB/s   00:00
id_rsa.pub                                                       100%  400     0.4KB/s   00:00
$ scp -r /tmp/rnmcluster02-ssh-keys/ root@rnmcluster02-node03:~/.ssh
root@rnmcluster02-node03's password:
authorized_keys                                                  100%  781     0.8KB/s   00:00
id_rsa                                                           100% 1675     1.6KB/s   00:00
id_rsa.pub                                                       100%  400     0.4KB/s   00:00
$ scp -r /tmp/rnmcluster02-ssh-keys/ root@rnmcluster02-node04:~/.ssh
root@rnmcluster02-node04's password:
authorized_keys                                                  100%  781     0.8KB/s   00:00
id_rsa                                                           100% 1675     1.6KB/s   00:00
id_rsa.pub                                                       100%  400     0.4KB/s   00:00

Testing login authenticated through SSH keys

The moment of truth. From your client machine, try to ssh to each of the cluster nodes. If you are prompted for a password, then something is not right – see the troubleshooting section below.

If you put your own public key in authorized_keys when you created it then you don’t need to specify which key to use when connecting because it’ll use your own private key by default:

robin@RNMMBP ~ $ ssh root@rnmcluster02-node01
Last login: Fri Nov 28 17:13:23 2014 from 172.28.128.1



[root@localhost ~]#

There we go – logged in automagically with no password prompt. If we’re using the cluster’s private key (rather than our own) you need to specify it with -i when you connect.

robin@RNMMBP ~ $ ssh -i /tmp/rnmcluster02-ssh-keys/id_rsa root@rnmcluster02-node01
Last login: Fri Nov 28 17:13:23 2014 from 172.28.128.1



[root@localhost ~]#

Troubleshooting SSH key connections

SSH keys are one of the best things in a sysadmin’s toolkit, but when they don’t work can be a bit tricky to sort out. The first thing to check is that on the target machine the authorized_keys file that does all the magic (by listing the ssh keys that are permitted to connect inbound on a host to the given user) is in place:

[root@localhost .ssh]# ls -l ~/.ssh/authorized_keys
-rw-r--r-- 1 root root 775 Nov 30 18:55 /root/.ssh/authorized_keys

If you get this:

[root@localhost .ssh]# ls -l ~/.ssh/authorized_keys
ls: cannot access /root/.ssh/authorized_keys: No such file or directory

then you have a problem.

One possible issue in this specific instance could be that the above pre-canned scp assumes that the user’s .ssh folder doesn’t already exist (since it doesn’t, on brand new servers) and so specifies it as the target name for the whole rnmcluster02-ssh-keys folder. However if it does already exist then it ends up copying the rnmcluster02-ssh-keys folder into the .ssh folder:

[root@localhost .ssh]# ls -lR
.:
total 12
-rw------- 1 root root 1675 Nov 22  2013 id_rsa
-rw-r--r-- 1 root root  394 Nov 22  2013 id_rsa.pub
drwxr-xr-x 2 root root 4096 Nov 30 18:49 rnmcluster02-ssh-keys

./rnmcluster02-ssh-keys:
total 12
-rw-r--r-- 1 root root  775 Nov 30 18:49 authorized_keys
-rw------- 1 root root 1675 Nov 30 18:49 id_rsa
-rw-r--r-- 1 root root  394 Nov 30 18:49 id_rsa.pub
[root@localhost .ssh]#

To fix this simply move the authorized_keys from rnmcluster02-ssh-keys back into .ssh:

[root@localhost .ssh]# mv ~/.ssh/rnmcluster02-ssh-keys/authorized_keys ~/.ssh/

Other frequent causes of problems are file/folder permissions that are too lax on the target user’s .ssh folder (which can be fixed with chmod -R 700 ~/.ssh) or the connecting user’s ssh private key (fix: chmod 600 id_rsa). The latter will show on connection attempts very clearly:

robin@RNMMBP ~ $ ssh -i /tmp/rnmcluster02-ssh-keys/id_rsa root@rnmcluster02-node01
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@         WARNING: UNPROTECTED PRIVATE KEY FILE!          @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
Permissions 0777 for '/tmp/rnmcluster02-ssh-keys/id_rsa' are too open.
It is required that your private key files are NOT accessible by others.
This private key will be ignored.
bad permissions: ignore key: /tmp/rnmcluster02-ssh-keys/id_rsa

Another one that has bitten me twice over time – and that eludes the troubleshooting I’ll demonstrate in a moment – is that SELinux gets stroppy about root access using ssh keys. I always just take this as a handy reminder to disable selinux (in /etc/selinux/config, set SELINUX=disabled), having never had cause to leave it enabled. But, if you do need it enabled you’ll need to hit the interwebs to check the exact cause/solution for this problem.

So to troubleshoot ssh key problems in general do two things. Firstly from the client side, specify verbosity (-v for a bit of verbosity, -vvv for most)

ssh -v -i /tmp/rnmcluster02-ssh-keys/id_rsa root@rnmcluster02-node01

You should observe ssh trying to use the private key, and if the server rejects it it’ll fall back to any other ssh private keys it can find, and then password authentication:

[...]
debug1: Offering RSA public key: /tmp/rnmcluster02-ssh-keys/id_rsa
debug1: Authentications that can continue: publickey,gssapi-keyex,gssapi-with-mic,password
debug1: Next authentication method: password

Quite often the problem will be on the server side, so assuming that you can still connect to the server (eg through the physical console, or using password authentication) then go and check /var/log/secure where you’ll see all logs relating to attempted connections. Here’s the log file corresponding to the above client log, where ssh key authentication is attempted but fails, and then password authentication is used to successfully connect:

Nov 30 18:15:05 localhost sshd[13156]: Authentication refused: bad ownership or modes for file /root/.ssh/authorized_keys
Nov 30 18:15:15 localhost sshd[13156]: Accepted password for root from 172.28.128.1 port 59305 ssh2
Nov 30 18:15:15 localhost sshd[13156]: pam_unix(sshd:session): session opened for user root by (uid=0)

Now we can see clearly what the problem is – “bad ownership or modes for file /root/.ssh/authorized_keys”.

The last roll of the troubleshooting dice is to get sshd (the ssh daemon that runs on the host we’re trying to connect to) to issue more verbose logs. You can either set LogLevel DEBUG1 (or DEBUG2, or DEBUG3) in /etc/ssh/sshd_config and restart the ssh daemon (service sshd restart), or you can actually run a (second) ssh daemon from the host with specific logging. This would be appropriate on a multi-user server where you can’t just go changing sshd configuration. To run a second instance of sshd you’d use:

/usr/sbin/sshd -D -d -p 2222

You have to run sshd from an absolute path (you’ll get told this if you try not to). The -D flag stops it running as a daemon and instead runs interactively, so we can see easily all the output from it. -d specifies the debug logging (-dd or -ddd for greater levels of verbosity), and -p 2222 tells sshd to listen on port 2222. Since we’re doing this on top of the existing sshd, we obviously can’t use the default ssh port (22) so pick another port that is available (and not blocked by a firewall).

Now on the client retry the connection, but pointing to the port of the interactive sshd instance:

ssh -v -p 2222 -i /tmp/rnmcluster02-ssh-keys/id_rsa root@rnmcluster02-node01

When you run the command on the client you should get both the client and host machine debug output go crackers for a second, giving you plenty of diagnostics to pore through and analyse the ssh handshake etc to get to the root of the issue.

Hopefully you’ve now sorted your SSH keys, because in the next article we’re going to see how we can use them to run commands against multiple servers at once using pdsh.

Summary

When working with multiple Linux machines I would first and foremost make sure SSH keys are set up in order to ease management through password-less logins.

We’ll see in the next couple of articles some other tools that are useful when working on a cluster:

  • pdsh
  • colmux

I’m interested in what you think – what particular tools or tips do you have for working with a cluster of Linux machines? Leave your answers in the comments below, or tweet them to me at @rmoff.

Categories: BI & Warehousing

UKOUG Tech14 slides – Exadata Security Best Practices

Dan Norris - Tue, 2014-12-09 04:54

I think 2 years is long enough to wait between posts!

Today I delivered a session about Oracle Exadata Database Machine Best Practices and promised to post the slides for it (though no one asked about them :). I’ve also posted them to the Tech14 agenda as well.

Direct download: UKOUG Tech14 Exadata Security slides

Going Beyond MapReduce for Hadoop ETL Pt.3 : Introducing Apache Spark

Rittman Mead Consulting - Tue, 2014-12-09 02:00

In the first two posts in this three part series on going beyond MapReduce for Hadoop ETL, we looked at why MapReduce and Hadoop 1.0 was only really suitable for batch processing, and how the new Apache Tez framework enabled by Apache YARN on the Hadoop 2.0 platform can be swapped-in for MapReduce to improve the performance of existing Pig and Hive scripts. Today though in the final post I want to take a look at Apache Spark, the next-gen compute framework that Cloudera are backing as the long-term successor to MapReduce.

Like Tez, Apache Spark supports DAGs that describe the entire dataflow process, not just individual map and reduce jobs. Like Pig, it has a concept of datasets (Pig’s aliases and relations), but crucially these datasets (RDDs, or “resilient distributed datasets”) can be cached in-memory, fail-back gracefully to disk and can be rebuilt using a graph that says how to reconstruct. With Tez, individual jobs in the DAG can now hand-off their output to the next job in-memory rather than having to stage in HDFS, but Spark uses memory for the actual datasets and is a much better choice for the types of iterative, machine-learning tasks that you tend to do on Hadoop systems. Moreover, Spark has arguably a richer API and when used with Scala, a functional programming-orientated language that uses Java libraries and whose collections framework maps well on to the types of operations you’d want to make use of with dataflow-type applications on a cluster.

Spark can run standalone, on YARN or on other cluster management platforms, and comes with a handy command-line interpreter that you can use to interactively load, filter, analyse and work with RDDs. Cloudera CDH5.2 comes with Spark 1.0.1 and can either be configured standalone or to run on YARN, with Spark as a service added to nodes in the cluster using Cloudera Manager. 

NewImage

So looking back at the Pig example, we create the dataflow using a number of aliases in that case, that we progressively filter, transform, join together and then aggregate to get to the final top ten set of pages from the website logs. Translating that dataflow to Spark we end up with a similar set of RDDs that take our initial set of logs, apply transformations and join the datasets to store the final aggregated output back on HDFS.

NewImage

Spark supports in-memory sharing of data within a single DAG (i.e. RDD to RDD), but also between DAGs running in the same Spark instance. As such, Spark becomes a great framework for doing iterative and cyclic data analysis, and can make much better use of the memory on cluster servers whilst still using disk for overflow data and persistence.

Moreover, Spark powers a number of higher-level tools build on the core Spark engine to provide features like real-time loading and analysis (Spark Streaming), SQL access and integration with Hive (Spark SQL), machine learning (MLib) and so forth. In fact, as well as Hive and Pig being reworked to run on Tez there’s also projects underway to port them both to Spark, though to be honest they’re both at early stages compared to Tez integration and most probably you’ll be using Scala, Java or Python to work with Spark now.

NewImage

So taking the Pig script we had earlier and translating that to the same dataflow in Spark and Scala, we end up with something like this:

package com.cloudera.analyzeblog
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.sql.SQLContext
case class accessLogRow(host: String, identity: String, user: String, time: String, request: String, status: String, size: String, referer: String, agent: String)
case class pageRow(host: String, request_page: String, status: String, agent: String)
case class postRow(post_id: String, title: String, post_date: String, post_type: String, author: String, url: String, generated_url: String)
object analyzeBlog {
        def getRequestUrl(s: String): String = {
        try {
                s.split(' ')(1)
        } catch {
                case e: ArrayIndexOutOfBoundsException => { "N/A" }
        }
}
        def main(args: Array[String]) {
val sc = new SparkContext(new SparkConf().setAppName("analyzeBlog"))
val sqlContext = new SQLContext(sc)
import sqlContext._
val raw_logs = "/user/mrittman/rm_logs"
//val rowRegex = """^([0-9.]+)\s([\w.-]+)\s([\w.-]+)\s(\[[^\[\]]+\])\s"((?:[^"]|\")+)"\s(\d{3})\s(\d+|-)\s"((?:[^"]|\")+)"\s"((?:[^"]|\")+)"$""".r
val rowRegex = """^([\d.]+) (\S+) (\S+) \[([\w\d:/]+\s[+\-]\d{4})\] "(.+?)" (\d{3}) ([\d\-]+) "([^"]+)" "([^"]+)".*""".r

val logs_base = sc.textFile(raw_logs) flatMap {
                        case rowRegex(host, identity, user, time, request, status, size, referer, agent) =>
                                Seq(accessLogRow(host, identity, user, time, request, status, size, referer, agent))
                        case _ => Nil
                                }
val logs_base_nobots = logs_base.filter( r => ! r.request.matches(".*(spider|robot|bot|slurp|bot|monitis|Baiduspider|AhrefsBot|EasouSpider|HTTrack|Uptime|FeedFetcher|dummy).*"))

val logs_base_page = logs_base_nobots.map { r =>
  val request = getRequestUrl(r.request)
  val request_formatted = if (request.charAt(request.length-1).toString == "/") request else request.concat("/")
  (r.host, request_formatted, r.status, r.agent)
}

val logs_base_page_schemaRDD = logs_base_page.map(p => pageRow(p._1, p._2, p._3, p._4))

logs_base_page_schemaRDD.registerAsTable("logs_base_page")

val page_count = sql("SELECT request_page, count(*) as hits FROM logs_base_page GROUP BY request_page").registerAsTable("page_count")

val postsLocation = "/user/mrittman/posts.psv"

val posts = sc.textFile(postsLocation).map{ line =>
        val cols=line.split('|')

        postRow(cols(0),cols(1),cols(2),cols(3),cols(4),cols(5),cols(6).concat("/"))
}

posts.registerAsTable("posts")

val pages_and_posts_details = sql("SELECT p.request_page, p.hits, ps.title, ps.author FROM page_count p JOIN posts ps ON p.request_page = ps.generated_url ORDER BY hits DESC LIMIT 10")

pages_and_posts_details.saveAsTextFile("/user/mrittman/top_10_pages_and_author4")

        }
}

I’ll do a code-walkthrough for this Spark application in a future post, but for now note the map and flatMap Scala collection functions used to transform RDDs, and the sql(“…”) function that allows us to register RDDs as tables and then manipulate the contents using SQL, including joining to other RDDs registered as tables. For now though, let’s run the application on the CDH5.2 using YARN and see how long it takes to process the same set of log files (remember, the Pig script on this CDH5.2 cluster took around 5 minutes to run, and the Pig on Tez version on the Hortonworks cluster was around 2.5 minutes:

[mrittman@bdanode1 analyzeBlog]$ spark-submit --class com.cloudera.analyzeblog.analyzeBlog --master yarn target/analyzeblog-0.0.1-SNAPSHOT.jar 
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/jars/spark-assembly-1.1.0-cdh5.2.0-hadoop2.5.0-cdh5.2.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/jars/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
14/12/06 19:18:25 INFO SecurityManager: Changing view acls to: mrittman
14/12/06 19:18:25 INFO SecurityManager: Changing modify acls to: mrittman
...
14/12/06 19:19:41 INFO DAGScheduler: Stage 0 (takeOrdered at basicOperators.scala:171) finished in 3.585 s
14/12/06 19:19:41 INFO SparkContext: Job finished: takeOrdered at basicOperators.scala:171, took 53.591560036 s
14/12/06 19:19:41 INFO SparkContext: Starting job: saveAsTextFile at analyzeBlog.scala:56
14/12/06 19:19:41 INFO DAGScheduler: Got job 1 (saveAsTextFile at analyzeBlog.scala:56) with 1 output partitions (allowLocal=false)
14/12/06 19:19:41 INFO DAGScheduler: Final stage: Stage 3(saveAsTextFile at analyzeBlog.scala:56)
14/12/06 19:19:41 INFO DAGScheduler: Parents of final stage: List()
14/12/06 19:19:41 INFO DAGScheduler: Missing parents: List()
14/12/06 19:19:41 INFO DAGScheduler: Submitting Stage 3 (MappedRDD[15] at saveAsTextFile at analyzeBlog.scala:56), which has no missing parents
14/12/06 19:19:42 INFO MemoryStore: ensureFreeSpace(64080) called with curMem=407084, maxMem=278302556
14/12/06 19:19:42 INFO MemoryStore: Block broadcast_5 stored as values in memory (estimated size 62.6 KB, free 265.0 MB)
14/12/06 19:19:42 INFO MemoryStore: ensureFreeSpace(22386) called with curMem=471164, maxMem=278302556
14/12/06 19:19:42 INFO MemoryStore: Block broadcast_5_piece0 stored as bytes in memory (estimated size 21.9 KB, free 264.9 MB)
14/12/06 19:19:42 INFO BlockManagerInfo: Added broadcast_5_piece0 in memory on bdanode1.rittmandev.com:44486 (size: 21.9 KB, free: 265.3 MB)
14/12/06 19:19:42 INFO BlockManagerMaster: Updated info of block broadcast_5_piece0
14/12/06 19:19:42 INFO DAGScheduler: Submitting 1 missing tasks from Stage 3 (MappedRDD[15] at saveAsTextFile at analyzeBlog.scala:56)
14/12/06 19:19:42 INFO YarnClientClusterScheduler: Adding task set 3.0 with 1 tasks
14/12/06 19:19:42 INFO TaskSetManager: Starting task 0.0 in stage 3.0 (TID 215, bdanode5.rittmandev.com, PROCESS_LOCAL, 3331 bytes)
14/12/06 19:19:42 INFO BlockManagerInfo: Added broadcast_5_piece0 in memory on bdanode5.rittmandev.com:13962 (size: 21.9 KB, free: 530.2 MB)
14/12/06 19:19:42 INFO TaskSetManager: Finished task 0.0 in stage 3.0 (TID 215) in 311 ms on bdanode5.rittmandev.com (1/1)
14/12/06 19:19:42 INFO YarnClientClusterScheduler: Removed TaskSet 3.0, whose tasks have all completed, from pool 
14/12/06 19:19:42 INFO DAGScheduler: Stage 3 (saveAsTextFile at analyzeBlog.scala:56) finished in 0.312 s
14/12/06 19:19:42 INFO SparkContext: Job finished: saveAsTextFile at analyzeBlog.scala:56, took 0.373096676 s

It ran in just over a minute in the end, and most of that was around submitting the job to YARN – not bad. We’ll be covering more of Spark on the blog over the next few weeks including streaming and machine learning examples, and connecting it to ODI and OBIEE via Hive on Spark, and Spark SQL’s own Hive-compatible Thrift server. I’ll also be taking a look at Pig on Spark (or “Spork”…) to see how well that works, and most interestingly how Pig and Hive on Spark compares to running them on Tez – watch this space as they say.

Categories: BI & Warehousing

Video: Best Practices for Application Performance, Scalability, and Availability

Christopher Jones - Mon, 2014-12-08 23:50

Nancy Ikeda nails it in a great Oracle OpenWorld recording of her Best Practices for Application Performance, Scalability, and Availability session now viewable on the Oracle Call Interface page

The session covered:

Best practice coding samples and techniques show how to resolve connection management, statement execution, and data fetching inefficiencies in applications using APIs such as JDBC, OCI, ODBC, ODP.Net, or higher-level scripting languages. This session shows how the Automatic Workload Repository feature of Oracle Database and Automatic Database Diagnostic Monitor profiling tools help diagnose application design and coding issues. Specific solutions show how to resolve these and other issues to enhance applications for scalability and resilience. Among the solutions discussed are Oracle Database 12c’s new client configuration file. Developers or DBAs can use it to tune and configure applications without modifying code. Examples use JDBC and OCI but are applicable to all APIs.

Nancy is one of Oracle's senior developers working in the call interface group.

Changing The Number Of Oracle Database 12c Log Writers

Changing The Number Of Oracle Database 12c Log Writers
In an Oracle Database 12c instance you will likely see multiple log writer (LGWR) background processes. When you first start the Oracle instance you will likely see a parent and two redo workers. This is a very big deal and something many of us have been waiting for - for many years!

While I'm excited about the change, if I can't control the number of LGWRs I could easily find myself once again constrained by the lack of LGWRs!

So, my question is how do I manipulate the number of LGWRs from the default. And what is the default based on? It's these types of questions that led me on this quest. I hope you enjoy the read!


Serialization Is Death
Multiple LGWRs is great news because serialization is death to computing performance. Think of it like this. A computer program is essentially lines of code and each line of code takes a little bit of time to execute. A CPU can only process N lines of code per second. This means every serial executing program has a maximum through capability. With a single log writer (LGWR) background process the amount of redo that can be processed is similarly constrained.

An Example Of Serialization Throughput Limitation
Suppose a CPU can process 1000 instructions per millisecond. Also, assume through some research a DBA determined it takes the LGWR 10 instructions to process 10 KB of redo. (I know DBAs who have taken the time to figure this stuff out.) Given these two pieces of data, how many KB of redo can the CPU theoretically process per second?

? KB of redo/sec = (1000 inst / 1 ms)*(10 KB redo / 10 instr)*(1000 ms / 1 sec)* (1 MB / 1000 KB) = 1000 KB redo/sec

This is a best case scenario. As you can see, any sequential process can become a bottleneck. One solution to this problem is to parallelize.

Note: Back in April of 2010 I posted a series of articles about parallelism. If you are interested in this topic, I highly recommend you READ THE POSTS.

Very Cool! Multiple 12c LGWRs... But Still A Limit?
Since serialization is death... and parallelism is life, I was really excited when I saw on my 12c Oracle instance by default it had two redo workers in addition to the "parent" log writer. On my Oracle version 12.0.1.0.2.0 Linux machine this is what I see:
$ ps -eaf|grep prod40 | grep ora_lg
oracle 54964 1 0 14:37 ? 00:00:00 ora_lgwr_prod40
oracle 54968 1 0 14:37 ? 00:00:00 ora_lg00_prod40
oracle 54972 1 0 14:37 ? 00:00:00 ora_lg01_prod40

This is important. While this is good news, unless Oracle or I have the ability to change and increase the number of LGWR redo workers, at some point the two redo workers, will become saturated bringing us back to the same serial LGWR process situation. So, I want and need some control.

Going Back To Only One LGWR
Interestingly, starting in Oracle Database version 12.0.1.0.2.0 there is an instance parameter _use_single_log_writer. I was able to REDUCE the number LGWRs to only one by setting the instance parameter _use_single_log_writer=TRUE. But that's the wrong direction I want to go!

More Redo Workers: "CPU" Instance Parameters
I tried a variety of CPU related instance parameters with no success. Always two redo workers.

More Redo Workers: Set Event...
Using my OSM script listeventcodes.sql I scanned the Oracle events (not wait events) but was unable to find any related Oracle events. Bummer...

More Redo Workers: More Physical CPUs Needed?
While talking to some DBAs about this, one of them mentioned they heard Oracle sets the number of 12c log writers is based on the number of physical CPUs. Not the number CPU cores but the number of physical CPUs. On a Solaris box with 2 physical CPUs (verified using the command, psrinfo -pv) upon startup there was still on two redo workers.

$ psrinfo -p
2
$ psrinfo -pv
The physical processor has 1 virtual processor (0)
UltraSPARC-III (portid 0 impl 0x14 ver 0x3e clock 900 MHz)
The physical processor has 1 virtual processor (1)
UltraSPARC-III (portid 1 impl 0x14 ver 0x3e clock 900 MHz)

More Redo Workers: Adaptive Behavior?
Looking closely at the Solaris LGWR trace file I repeatedly saw this:

Created 2 redo writer workers (2 groups of 1 each)
kcrfw_slave_adaptive_updatemode: scalable->single group0=375 all=384 delay=144 r
w=7940

*** 2014-12-08 11:33:39.201
Adaptive scalable LGWR disabling workers
kcrfw_slave_adaptive_updatemode: single->scalable redorate=562 switch=23

*** 2014-12-08 15:54:10.972
Adaptive scalable LGWR enabling workers
kcrfw_slave_adaptive_updatemode: scalable->single group0=1377 all=1408 delay=113
rw=6251

*** 2014-12-08 22:01:42.176
Adaptive scalable LGWR disabling workers

It looks to me like Oracle has programed in some sweeeeet logic to adapt the numbers of redo workers based the redo load.

So I created six Oracle sessions that simply inserted rows into a table and ran all six at the same time. But it made no difference in the number of redo workers. No increase or decrease or anything! I let this dml load run for around five minutes. Perhaps that wasn't long enough, the load was not what Oracle was looking for or something else. But the number of redo workers always remained at two.

Summary & Conclusions
It appears at instance startup the default number of Oracle Database 12c redo workers is two. It also appears that Oracle has either already built or is building the ability for Oracle to adapt to changing redo activity by enabling and disabling redo workers. Perhaps the number of physical CPUs (not CPU cores but physical CPUs) plays a part in this algorithm.

While this was not my research objective, I did discover a way to set the number of redo workers back to the traditional single LGWR background process.

While I enjoyed doing the research for this article, it was disappointing that I was unable to influence Oracle to increase the number of redo workers. I sure hope Oracle either gives me control or the adaptive behavior actually works. If not, two redo workers won't be enough for many Oracle systems.

All the best in your Oracle performance endeavors!

Craig.


Categories: DBA Blogs

What Are SI Groups and How Can They Help a Customer User Administrator (CUA)?

Joshua Solomin - Mon, 2014-12-08 19:39
MOS Oracle Support Blog

If you are an administrator for your organization's Oracle software and systems, you may have been tasked with managing your employees' access to My Oracle Support, the central support hub for all of your Oracle products. Each user that accesses My Oracle Support has an assigned Support Identifier (SI) that links him or her to a particular piece of Oracle hardware or software. Support Identifiers define the resources available to users when they access My Oracle Support.

However, suppose your My Oracle Support users are scattered across a broad geographic area, or you have numerous Support Identifiers referencing dozens (or possibly hundreds) of software and hardware assets. Mapping users to the correct Support Identifier can become time consuming, especially if you need to align privileges and service request flows to specific projects, locations, or assets.

Support Identifier Groups (SIGs) simplify this process by allowing Customer User Administrators (CUAs) to group common SIs together. This makes it dramatically easier to group users at a common location, or who work on a common software or hardware asset.

To use the new Support Identifier Groups, you will need to pre-plan how users and assets are best organized. Once defined you can set up SI Groups in My Oracle Support, adding users and assets logically the way you need them.

Simple. Easy. Maintainable.

When your organization adds new hardware or software (with an associated new SI), you can automatically re-assign them to a designated default group. New assets added to a default SI Group are immediately available to the group's associated users; you do not have to setup or re-approve users you have already assigned.

Learn more about SI Groups and how they can help you.


SQL*Plus COPY Command is back as BRIDGE

Yann Neuhaus - Mon, 2014-12-08 15:38

Did you ever use the COPY command in sqlplus? It's very old, and documentation says :
The COPY command is not being enhanced to handle datatypes or features introduced with, or after Oracle8i. The COPY command is likely to be deprecated in a future release.

Deprecated? But it is back, with a new name, in the new SQL Developer based SQL*Plus (currently called sdsql in beta)

Cardinality Change

Jonathan Lewis - Mon, 2014-12-08 15:35

Here’s an entertaining little change across versions of Oracle, brought to my attention by Tony Hasler during UKOUG Tech 14. It’s a join cardinality estimate, so here are a couple of tables to demonstrate the issue – the only columns needed are the alpha_06 columns, but I reused some code from other demonstrations to create my test case, so there are lots of irrelevant columns in the create table script:


create table t1 nologging as
with generator as (
        select rownum id
        from dual
        connect by rownum <= 1000
)
select
        rownum                                          id,
        mod(rownum-1,200)                               mod_200,
        trunc(dbms_random.value(0,300))                 rand_300,
        mod(rownum-1,10000)                             mod_10000,
        trunc(sysdate) +
                trunc(dbms_random.value(0,1000))        date_1000,
        dbms_random.string('l',6)                       alpha_06,
        dbms_random.string('l',20)                      alpha_20
from
        generator,
        generator
where
        rownum <= 1e6
;

execute dbms_stats.gather_table_stats(user,'t1',method_opt=>'for all columns size 1')

create table t2 nologging as select * from t1;
execute dbms_stats.gather_table_stats(user,'t2',method_opt=>'for all columns size 1')

I’m going to join t1 to t2 with a predicate based on the alpha_06 columns – using a LIKE predicate. Before I do so I’ll point out that there are are 1,000,000 rows in the table, and (checking the column stats) 985,920 distinct values for alpha_06. Here’s my query, with the execution plan I got from 11.1.0.7:


select
        count(*)
from
        t1, t2
where
        t2.alpha_06 like t1.alpha_06
;

----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |     1 |    14 |  1122M  (6)|999:59:59 |
|   1 |  SORT AGGREGATE     |      |     1 |    14 |            |          |
|   2 |   NESTED LOOPS      |      |    50G|   651G|  1122M  (6)|999:59:59 |
|   3 |    TABLE ACCESS FULL| T1   |  1000K|  6835K|  1123   (6)| 00:00:06 |
|*  4 |    TABLE ACCESS FULL| T2   | 50000 |   341K|  1122   (6)| 00:00:06 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter("T2"."ALPHA_06" LIKE "T1"."ALPHA_06")

The 50,000 cardinality estimate for t2 looks like the standard 5% guess for “column >= {unknown value}”, following which the join cardinality of 50G is the same 5% guess applied to the Cartesian join between t1 and t2 (1M * 1M * 0.05). It’s not a good estimate in my case because the right answer happens to be close to 1M rows, specifically 1,003,176. So let’s upgrade to 11.2.0.4 and see what we get instead:

----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |     1 |    14 |  1050M  (6)|999:59:59 |
|   1 |  SORT AGGREGATE     |      |     1 |    14 |            |          |
|   2 |   NESTED LOOPS      |      |  2014K|    26M|  1050M  (6)|999:59:59 |
|   3 |    TABLE ACCESS FULL| T1   |  1000K|  6835K|  1051   (6)| 00:00:06 |
|*  4 |    TABLE ACCESS FULL| T2   |     2 |    14 |  1050   (6)| 00:00:06 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter("T2"."ALPHA_06" LIKE "T1"."ALPHA_06")

The estimate has dropped from 50 Billion rows down to 2 Million – a factor of about 25,000: possibly an indicator that the algorithm has changed, and that a few people might find execution plans changing as they upgrade to a newer version of Oracle. The change occurred at 11.2.0.2 as revealed by fix control 9303766 which has the description: “use 1/NDV+1/NROWS for col1 LIKE col2 selectivities”.

Just as a quick check on the arithmetic: there are 1 million rows in table t2, with (as noted above) 985,920 distinct values in the column, so the selectivity should be: 1/1000000 + 1/985920 = 2.014281 * e-6. Multiply the selectivity by 1e6 and you get 2, the cardinality estimate for t2; multiply the selectivity by 1M*1M (the Cartesian join) and you get 2,014,281, the cardinality estimate of the join. QED.

There are workarounds, of course. One would be to reverse out the fix control, either as an initialisation parameter or in a session logon trigger, another might be to modify the SQL – I think the following would be equivalent:


select
        *
from    t1, t2
where
        t2.alpha_06 like substr(t1.alpha_06,1,length(t1.alpha_06))||'%'
and     t1.alpha_06 is not null
and     t2.alpha_06 is not null

This changes the critical predicate from the form “col1 like col2″ to “col1 like {unknown value from function}” i.e. back to a case where the optimizer uses the 5% guess, and the cardinality estimates go back the original values.


Treasures From The Road

Floyd Teter - Mon, 2014-12-08 15:25
Reading departure signs in some big airport
Reminds me of the places I've been.Visions of good times that brought so much pleasureMakes me want to go back again.If it suddenly ended tomorrow,I could somehow adjust to the fall.Good times and riches and son of a b*****s,I've seen more than I can recall                      — From Jimmy Buffett’s "Changes in Lattitude, Changes in Attitude"
I’ve been traveling lately.  In fact, since OpenWorld this year, I’ve been on the road around 75% of my time.  All in the continental USA.  From my viewpoint, that’s lots of travel.  But it’s been good - full of variety, working with lots of customers and partners.  Mobile, SaaS applications, BI, UX (including conducting an Oracle HCM UX workshop for Oracle Partners), keynoting at the East Coast Oracle conference (just got the feedback on my talk, and I’m really pleased with it).  I’m not complaining.  Now that I’m done for this calendar year, I have a chance to reflect on the treasures I’ve learned lately.
A really cool thing in all that travel has been talking and working with many of Oracle’s Higher Education customers.  I’m still getting to know that market, so it’s been an enriching experience.  And I’ve gathered some questions posed, observations collected, and commentary about Oracle Cloud Applications that seems to be pretty consistent across those Higher Education users. In fact, they seem pretty consistent in general.  So I thought I’d share them, along with my own thoughts about each, and see what y’all think.
1.  Comment:  My institution/enterprise/organization/firm really needs to get out from under the maintenance costs of our Oracle applications.  We’re continually burning resources with patching and upgrading.  We’re on a very expensive treadmill.  
My Response:  Software-as-a-Service was made for customers like you. With SaaS, Oracle works the patching and upgrading (on your schedule, by the way).  So your resources are now liberated from operational maintenance to work on projects and tasks related to your core mission, whatever that might be.
2.  Comment:  We have unique needs/customers/business processes and we can’t customize in the cloud.  
My Response:  Keep the things that differentiate you or make you unique on-premise.  Take the things that are necessary to do business, but aren’t part of your core mission, and move them into SaaS.  This allows you to stay unique at your core, but gives the operational headaches of the necessary but mundane things to someone else.  This approach is known as a “Hybrid” or “Co-Existence” deployment.  
Optional “Add-On” Response:  Best-practice business processes are baked into the cloud applications.  This is one of the unsung value-adds in SaaS.  It may be worthwhile to take a look at those best-practice business processes and consider whether they might work for you.
3.  Comment:  I’m worried about security in putting sensitive data out on the cloud.  
My Answer comes in two parts:  A) Is data security part of your core business?  It is part of Oracle’s core business.  As a result, they hire platoons of the best “A-Team” security experts to fend off thousands of attacks every day.  So perhaps your data might actually be better protected in the cloud than it is today?  B) Did you know that Oracle does not commingle customer data?  The data of every SaaS customer is physically and virtually separated from every other SaaS customer.  So, in terms of data separation, you won’t sacrifice anything by moving to the cloud.
4.  Observation:  The position of UX in the applications market has changed dramatically over the past year.  
Rather than being an optional value-add, UX is now a basic requirement for a seat at the table.  Ugly, complicated applications just don’t sell anymore.  Basic, packaged applications must meet the standards of elegant, consistent user interfaces and simple paths to results  just to enter the market.  And end users now expect custom, home-grown applications to meet those same standards.  If you can’t punch your UX ticket, you just can’t play…period.
5.  Question:  What’s the difference between Workday and Oracle Cloud Application Services?  
NOTE:  I knew this one would get your attention.  We're playing with fire now ;)
My Response:  I have a great deal of respect for what Workday is doing.  They’re designing and building elegant, clean, simple applications that have turned the entire enterprise applications market on its head.  I’m a fan.  That being said, I think Oracle has a differentiator with a deeper and richer set of features…you’ll find standard features that only exist in Oracle’s cloud applications.  That’s important because it allows Oracle to address a wider and more complex range of use cases and business processes.  There are other factors for and against both Workday and Oracle.  But, in my mind, that’s the big difference.
So those are the big treasures from my recent travels.  Thoughts? Feedback?  Comments please.

<b>Contributions by Angela Golla,

Oracle Infogram - Mon, 2014-12-08 13:34
Contributions by Angela Golla, Infogram Deputy-Editor

My Oracle Support Essentials Series
The My Oracle Support Essentials Series brings interactive expertise straight to your desktop. The goal of this program is to communicate with our customers and partners, provide tips and tricks on how to effectively work with Oracle Support, and take advantage of the PROACTIVE tools that are provided with your Oracle Support contract.

Learn more at Doc ID:553747.1.

Six Months an ACE…”Boy, that escalated quickly”

Rittman Mead Consulting - Mon, 2014-12-08 12:34

Boy, that escalated quickly. I mean, my path to becoming an Oracle ACE since joining Rittman Mead!

When I began with Rittman Mead back in March 2012, I wasn’t planning on joining the ACE program. In fact, I really wasn’t sure what an Oracle ACE even was. If you’re not familiar, the Oracle ACE program “highlights excellence within the global Oracle community by recognizing individuals who have demonstrated both technical proficiency and strong credentials as community enthusiasts and advocates”. That’s quite a mouthful! Now let’s find out how I got to this point in my career.

I wrote my first blog post ever shortly after joining Rittman Mead, sharing an innovative way to integrate Oracle Data Integrator and GoldenGate for data warehousing. This lead to submitting my first abstracts to several Oracle User Group conferences. My goal was to challenge myself to grow as a public speaker, something I had never been very good at in the past, and to share my data integration know-how, of course. To my amazement, the first abstract submitted, for UKOUG 2012, was accepted. Then, the other conferences continued to accept my abstracts! During the first couple of years, I kept blogging, speaking, wrote an article for RMOUG SQL>UPDATE Magazine, and joined in on a couple of ODTUG ODI expert webcasts and OTN ArchBeat podcasts over the past couple of years. Before I knew it, I had taken advantage of many knowledge sharing opportunities and built a decent list for my Oracle ACE nomination. I’ve found that networking and sharing your experiences are both wonderful ways to get noticed by other Oracle experts.

BI Forum 2014

So now what is an Oracle ACE to me? I think it is someone knowledgeable about an Oracle product (or products) who enjoys sharing their knowledge and loves contributing to the greater Oracle community. It’s that simple. One thing about the Rittman Mead organization, sharing what we know is a part of the ethos of the company. If you’re an avid reader this blog, you’re well aware that Mark Rittman and others love to share their technical knowledge. We also have an internal system that allows us to share our know-how amongst each other. That’s one of the great benefits of working at Rittman Mead. If I don’t know the answer to a question, I can easily reach out to the entire company and get a response from experts like Mark, James Coyle, Andy Rocha or others within minutes. I love that about this organization!

Well, around March or April timeframe, it was mentioned in a company meeting that if anyone has the goal of joining the ACE program, reach out to one of our current ACEs or ACE Directors and work on a plan to get there (again, a testament to how the RM organization works and thinks). I talked with Stewart Bryson, Oracle ACE Director. After reviewing my contribution to the Oracle community he agreed to submit the nomination for me. Again, to my amazement, and very much my delight, my nomination was accepted as an Oracle ACE! What began as several small goals of improving my writing and speaking skills, sharing technical content, and having an article published, led to the overall end goal (though not known to me at the time) of an Oracle ACE program invitation. I worked hard to earn the nomination, but I also know I was very fortunate to have been surrounded by, and mentored by, some great individuals (and I still am!).

oow14-badge-small

Since joining the ACE program, I’ve gained some additional Twitter followers, a great new ribbon on my conference badges, and some sweet ACE gear – but really it feels like not too much else has changed. Well, I do have an additional group of experts available to help me when I’m in a bind, because that’s what ACEs tend to do. I can now send a question via a tweet or email to someone in the ACE program and I’ll typically get an expert answer. It’s a community of like-minded individuals that love their specific Oracle technology – and love to share knowledge and help others. If this sounds like you, it’s time to work towards that ACE nomination!

Here at Rittman Mead we have several ACEs: myself, Venkat JanakiramanEdelweiss Kammermann, and our newest ACE, Robin Moffatt, as well as an ACE Director, Mark Rittman. And our ACEs don’t just sit back and blog or speak at conferences (not that blogging and speaking isn’t hard work – it is!). We also provide all of the Rittman Mead services offered as consultants: training, consulting services, managed services support – all of it! Which means that when you hire Rittman Mead, the 5 folks in the ACE program are included, along with the other 100+ BI experts throughout the world. If you need some help on your Business Intelligence, Data Integration, or Advanced Analytics project, drop us a line at info@rittmanmead.com and we’ll be happy to have a chat.

Now that I’ve met the goal of Oracle ACE, what’s ahead for me? Well, I’m constantly working to keep up on the latest and greatest from the Oracle Data Integration team, while also consulting full-time. I plan to continue speaking and writing blog posts, but also work towards some additional published content in newsletters and magazines for the Oracle community. And I’m going to keep learning, because education is a never-ending journey. “They’ve done studies, you know. 60 percent of the time, it works every time.” — Brian Fantana 

By the way, if you think Rittman Mead is the type of company you’d like to work for, give us a shout at careers@rittmanmead.com.

Categories: BI & Warehousing