Skip navigation.

BI & Warehousing

Preview of the Rittman Mead BI Forum in Atlanta

Rittman Mead Consulting - Fri, 2014-04-25 07:20

Mark has done a great job of previewing the upcoming content for both BI Forums, the one running locally for us in Atlanta, as well as the one in Brighton, UK. We have an exceptional Master Class this year with Lars George from Cloudera, including an introduction to the Cloudera Big Data stack with full details on building, loading and analyzing Hadoop clusters. The exact details of what’s covered, as well as the timetable for all speaker presentations, is listed here. Additionally, Mark posted on the two special presentations occurring at the two events: Maria Colgan on the In-Memory database option in Atlanta, and myself and Andrew Bond covering the latest iteration of Oracle’s Information Management Reference Architecture in Brighton. And finally, Mark also covered three presentations for the Atlanta event covering Advanced Visualizations and Mobility. Instead of rehashing all of that, I wanted to do a blog post diving a bit more into the Atlanta event, and some of the content not previously mentioned, especially those by Oracle. We’ve always had incredible representation from Oracle at the BI Forum, and we are very appreciate that the different teams consider our event to be so important in the community.

I wanted to start off by discussing the venue a bit: the Renaissance Hotel in Midtown Atlanta. It’s a modern, upscale Atlanta hotel in Midown that also has the amazing Rooftop 866 bar with incredible views of the city (those of you that have “socialized” with me over the years know I’ll be spending some time up there). I’m confident this will be our best venue to date.


Before diving into the sessions that Oracle will be presenting in Atlanta, it seems prudent to give those folks a “warm and fuzzy” feeling, show our appreciation, and make them feel safe and sound. So here’s an image that many of our readers will already recognize; for those who don’t, I’m sure you’ll know it by heart when the two events conclude:


Philippe Lions will be back again this year previewing the newest version of Sample App. For customers and partners who are like us at Rittman Mead, Sample App is a pivotal part of your OBIEE methodology. It allows us the ability to demonstrate anything from simple OBIEE analyses and dashboards, to some of the crazy mad-scientist stuff that Philippe’s team comes up with. If Oracle and Philippe didn’t design and build Sample App and keep it current, then we would have to build it ourselves. From my understanding, this will be the first time Philippe has previewed this content external to Oracle, so we are pleased and honored that he chose the BI Forum as the venue for this. It’s also worth noting that Philippe is a BI Forum veteran… he has never missed the Atlanta event since it’s inception four years ago.

We also have Jack Berkowitz, VP of Product Management for Business Analytics at Oracle, speaking on “Analytics Applications and the Cloud”. He’ll be discussing Oracle BI Applications (OBIA) in detail and the roadmap Oracle has for deploying those applications in the Cloud. I imagine that Jack will be giving the Wednesday night Keynote (as he did last year with Philippe), which is always a crowd-pleaser. Jack also spoke on the new Mobile Application Designer last year, so I imagine he will also be able to update us on that product even though his focus at Oracle has shifted. Also from Oracle we have Matt Bedin (another BI Forum veteran) talking about Oracle BI and Cloud, but with a focus on Oracle’s roadmap with regular Oracle Analytics in the Cloud, which equates to having a Cloud-optimized OBIEE running in Oracle’s Public Cloud. As this product is not yet generally available, attendees will get the scoop on where this product is going… and we might even get some hints on when to expect it.

We are excited to have Chris Lynskey, Senior Director, Product Management and Strategy at Oracle, making his first appearance at the BI Forum. He’ll be speaking on “Endeca Information Technology for Self-Service and Big Data”, so we’ll see Endeca’s positioning for structured and non-structured reporting on an adhoc basis. We’ll have several presentations that delve into Endeca, but it will be good to hear from Chris on this topic, as he was with Endeca prior to the acquisition by Oracle, and has been deeply involved with the 3.1 release. Rounding out Oracle’s participation is BI Forum newcomer Susan Cheung, Vice President of Product Management for Oracle TimesTen. Susan will be speaking on “TimesTen In-Memory Database for Analytics – Best Practices and Use Cases”. So it will be interesting to have both Susan and Maria Colgan at the Forum, so attendees will have a chance to see Oracle’s complete In-Memory strategy and roadmap at one setting.

The final session I’d like to discuss is an entry from yours truly on “ExtremeBI: Agile, Real-Time BI with Oracle Business Intelligence, Oracle Data Integrator and Oracle GoldenGate”. I know… it’s an incredibly long title… but I had to get in all the buzz words. I also rely heavily on the Information Management Reference architecture that Andrew Bond and I are presenting at the UK BI Forum, so my Atlanta session will be based around this newest release. I love this content, and I think it shows with my excitement level every time I present it. I describe an Agile methodology that utilizes Oracle’s BI stack to the fullest: integrating OBIEE, ODI and perhaps the most beneficial element: Oracle GoldenGate. For those organizations who are investigating ways to deliver content rapidly while also making the end user central to the development process, then this session is for you.

Manifesto for Agile Software Development

Their are still slots available at both venues, so feel free to contact me directly if you have questions about either event.

Categories: BI & Warehousing

Simple Data Manipulation and Reporting using Hive, Impala and CDH5

Rittman Mead Consulting - Thu, 2014-04-24 13:54

Althought I’m pretty clued-up on OBIEE, ODI, Oracle Database and so on, I’m relatively new to the worlds of Hadoop and Big Data, so most evenings and weekends I play around with Hadoop clusters on my home VMWare ESXi rig and try and get some experience that might then come in useful on customer projects. A few months ago I went through an example of loading-up flight delays data into Cloudera CDH4 and then analysing it using Hive and Impala, but realistically it’s unlikely the data you’ll analyse in Hadoop will come in such convenient, tabular form. Something that’s more realistic is analysing log files from web servers or other high-volume, semi-structured sources, so I asked Robin to download the most recent set of Apache log files from our website, and I thought I’d have a go at analysing them using Pig and Hive, and maybe the visualise the output using OBIEE (if possible, later on).

As I said, I’m not an expert in Hadoop and the Cloudera platform, so I thought it’d be interesting to describe the journey I went through, and also give some observations from myself on when to use Hive and when to use Pig; when products like Cloudera Impala could be useful, and also the general state-of-play with the Cloudera Hadoop platform. So the files I started off with were Apache weblog files, with 10 in total and sizes ranging from 350MB to around 2MB.


Looking inside one of the log files, they’re in the standard Apache log file format (or “combined log format”), where the visitor’s IP address is recorded, the date of access, some other information and the page (or resource) they requested:


What I’m looking to do is count the number of visitors on a day, which was the most popular page, what time of day are we most busy, and so on. I’ve got a Cloudera Hadoop CDH5.0 6-node cluster running on a VMWare ESXi server at home, so the first thing to do is log into Hue, the web-based developer admin tool that comes with CDH5, and upload the files to a directory on HDFS (Hadoop Distributed File System), the Unix-like clustered file system that underpins most of Hadoop.


You can, of course, SFTP the files to one of the Hadoop nodes and use the “hadoop fs” command-line tool to copy the files into HDFS, but for relatively small files like these it’s easier to use the web interface to upload them from your workstation. Once I’ve done that, I can then view the log files in the HDFS directory, just as if they were sitting on a regular Unix filesystem.


At this point though, the files are still “unstructured’ – just a single log entry per line – and I’ll therefore need to do something before I can count things like number of hits per day, what pages were requested and so on. At this beginners level, there’s two main options you can use – Hive, a SQL interface over HDFS that lets you select from, and do set-based transformations with, files of data; or Pig, a more procedural language that lets you manipulate file contents as a series of step-by-step tasks. For someone like myself with a relational data warehousing background, Hive is probably easier to work with but it comes with some quite significant limitations compared to a database like Oracle – we’ll see more on this later.

Whilst Hive tables are, at the most simplest level, mapped onto comma or otherwise-delimted files, another neat feature in Hive is that you can use what’s called a “SerDe”, or “Serializer-Deserializer”, to map more complex file structures into regular table columns. In the Hive DDL script below, I use this SerDe feature to have a regular expression parse the log file into columns, with the data source being an entire directory of files, not just a single one:

  host STRING,
  identity STRING,
  user STRING,
  time STRING,
  request STRING,
  status STRING,
  size STRING,
  referer STRING,
  agent STRING)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
  "input.regex" = "([^ ]*) ([^ ]*) ([^ ]*) (-|\\[[^\\]]*\\]) ([^ \"]*|\"[^\"]*\") (-|[0-9]*) (-|[0-9]*)(?: ([^ \"]*|\"[^\"]*\") ([^ \"]*|\"[^\"]*\"))?",
  "output.format.string" = "%1$s %2$s %3$s %4$s %5$s %6$s %7$s %8$s %9$s"
LOCATION '/user/root/logs';

Things to note in the above DDL are:

  • EXTERNAL table means that the datafile used to populate the Hive table sits somewhere outside Hive’s usual /user/hive/warehouse directory, in this case in the /user/root/logs HDFS directory.
  • ROW FORMAT SERDE ‘org.apache.hadoop.hive.contrib.serde2.RegexSerDe’ tells Hive to use the Regular Expressions Serializer-Deserializer to interpret the source file contents, and 
  • WITH SERDEPROPERTIES … gives the SerDe the regular expression to use, in this case to decode the Apache log format.

Probably the easiest way to run the Hive DDL command to create the table is to use the Hive query editor in Hue, but there’s a couple of things you’ll need to do before this particular command will work:

1. You’ll need to get hold of the JAR file in the Hadoop install that provides this SerDE (hive-contrib-0.12.0-cdh5.0.0.jar) and then copy it to somewhere on your HDFS file system, for example /user/root. In my CDH5 installation, this file was at opt/cloudera/parcels/CDH/lib/hive/lib/, but it’ll probably be at /usr/lib/hive/lib if you installed CDH5 using the traditional packages (rather than parcels) route. Also if you’re using a version of CDH prior to 5, the filename will be renamed accordingly. This JAR file then needs to accessible to Hive, and whilst there’s various more-permanent ways you can do this, the easiest is to point to the JAR file in an entry in the query editor File Resources section as shown below.

2. Whilst you’re there, un-check the “Enable Parameterization” checkbox, otherwise the query editor will interpret the SerDe output string as parameter references.


Once the command has completed, you can click over to the Hive Metastore table browser, and see the columns in the new table. 


Behind the scenes, Hive maps its table structure onto all the files in the /user/root/logs HDFS directory, and when I run a SELECT statement against it, for example to do a simple row count, MapReduce mappers, shufflers and sorters are spun-up to return the count of rows to me.


But in its current form, this table still isn’t all that useful – I’ve just got raw IP addresses for page requesters, and the request date is a format that’s not easy to work with. So let’s do some further manipulation, creating another table that splits out the request date into year, month, day and time, using Hive’s CREATE TABLE AS SELECT command to transform and then load in one command:

CREATE TABLE apachelog_date_split_parquet
ROW FORMAT SERDE 'parquet.hive.serde.ParquetHiveSerDe'
    INPUTFORMAT "parquet.hive.DeprecatedParquetInputFormat"
    OUTPUTFORMAT "parquet.hive.DeprecatedParquetOutputFormat"
SELECT host,
       substr(time,9,4)  year,
       substr(time,5,3)  month,
       substr(time,2,2)  day,
       substr(time,14,2) hours,
       substr(time,17,2) secs,
       substr(time,20,2) mins,
FROM   apachelog

Note the ParquetHive SerDe I’m using in this table’s row format definition – Parquet is a compressed, column-store file format developed by Cloudera originally for Impala (more on that in a moment), that from CDH4.6 is also available for Hive and Pig. By using Parquet, we potentially take advantage of speed and space-saving advantages compared to regular files, so let’s use that feature now and see where it takes us. After creating the new Hive table, I can then run a quick query to count web server hits per month:


So – getting more useful, but it’d be even nicer if I could map the IP addresses to actual countries, so I can see how many hits came from the UK, how many from the US, and so on. To do this, I’d need to use a lookup service or table to map my IP addresses to countries or cities, and one commonly-used such service is the free GeoIP database provided by MaxMind, where you turn your IP address into an integer via a formula, and then do a BETWEEN to locate that IP within ranges defined within the database. How best to do this though?

There’s several ways that you can enhance and manipulate data in your Hadoop system like this. One way, and something I plan to look at on this blog later in this series, is to use Pig, potentially with a call-out to Perl or Python to do the lookup on a row-by-row (or tuple-by-tuple) basis – this blog article on the Cloudera site goes through a nice example. Another way, and again something I plan to cover in this series on the blog, is to use something called “Hadoop Streaming” – the ability within MapReduce to “subcontract” the map and reduce parts of the operation to external programs or scripts, in this case a Python script that again queries the MaxMind database to do the IP-to-country lookup.

But surely it’d be easiest to just calculate the IP address integer and just join my existing Hive table to this GeoIP lookup table, and do it that way? Let’s start by trying to do this, first by modifying my final table design to include the IP address integer calculation defined on the MaxMind website: 

CREATE TABLE apachelog_date_ip_split_parquet
ROW FORMAT SERDE 'parquet.hive.serde.ParquetHiveSerDe'
    INPUTFORMAT "parquet.hive.DeprecatedParquetInputFormat"
    OUTPUTFORMAT "parquet.hive.DeprecatedParquetOutputFormat"
SELECT host,
      (cast(split(host,'\\.')[0] as bigint) * 16777216) 
     + (cast(split(host,'\\.')[1] as bigint) * 65535) 
     + (cast(split(host,'\\.')[2] as bigint) * 256) 
     + (cast(split(host,'\\.')[3] as bigint)) ip_add_int,
       substr(time,9,4)  year,
       substr(time,5,3)  month,
       substr(time,2,2)  day,
       substr(time,14,2) hours,
       substr(time,17,2) secs,
       substr(time,20,2) mins,
FROM   apachelog

Now I can query this from the Hive query editor, and I can see the IP address integer calculations that I can then use to match to the GeoIP IP address ranges.


I then upload the IP Address to Countries CSV file from the MaxMind site to HDFS, and define a Hive table over it like this:

create external table geo_lookup (
  ip_start      string,
  ip_end        string,
  ip_int_start  int,
  ip_int_end    int,
  country_code  string,
  country_name  string
row format DELIMITED 
LOCATION '/user/root/lookups/geo_ip';

Then I try some variations on the BETWEEN clause, in a SELECT with a join:

select, l.country_name
from apachelog_date_ip_split a join geo_lookup l 
on (a.ip_add_int > l.ip_int_start) and (a.ip_add_int < l.ip_int_end)
group by, l.country_name;

select, l.country_name
from apachelog_date_ip_split_parquet a join geo_lookup l 
on a.ip_add_int between l.ip_int_start and l.ip_int_end;

.. which all fail, because Hive only supports equi-joins. One option is to use a Hive UDF (user-defined function) such as this one here to implement a GeoIP lookup, but something that’s probably a bit more promising is to switch over to Impala, which has the ability to do non-equality joins through the crossjoin feature (Hive can in fact also use cross-joins, but they’re not very efficient). Impala also has the benefit of being much faster for BI-type queries than Hive, and it’s also designed to work with Parquet, so let’s switch over to the Impala query editor, run the “invalidate metadata” command to re-sync it’s table view with Hive’s table metastore, and then try the join in there:


Not bad. Of course this is all fairly simple stuff, and we’re still largely working with relational-style set-based transformations. In the next two posts in the series though I want get a bit more deep into Hadoop-style transformations – first by using a feature called “Hadoop Streaming” to process data on its way into Hadoop, done in parallel, by calling out to Python and Perl scripts; and then take a look at Pig, the more “procedural” alternative to Hive – with the objective being to enhance this current dataset to bring in details of the pages being requested, filter out the non-page requests, and do some work with authors, tag and clickstream analysis.

Categories: BI & Warehousing

Previewing Three Oracle Data Visualization Sessions at the Atlanta US BI Forum 2014

Rittman Mead Consulting - Tue, 2014-04-22 04:30

Many of the sessions at the UK and US Rittman Mead BI Forum 2014 events in May focus on the back-end of BI and data warehousing, with for example Chris Jenkins’ session on TimesTen giving us some tips and tricks from TimeTen product development, and Wayne Van Sluys’s session on Essbase looking at what’s involved in Essbase database optimisation (full agendas for the two events can be found here). But two areas within BI that have got a lot of attention over the past couple of years are (a) data visualisation, and (b) mobile, so I’m particularly pleased that our Atlanta event has three of the most innovative practitioners in this area – Kevin McGinley from Accenture (left in pictures below), Christian Screen from Art of BI (centre), and Patrick Rafferty from Branchbird (right), talking about what they’ve been doing in these areas.


If you were at the BI Forum a couple of years ago you’ll of course know Kevin McGinley, who won “best speaker” award the previous year and most recently has gone on to organise the BI track at ODTUG KScope and write for OTN and his own blog, Kevin also hosts, along with our own Stewart Bryson, a video podcast series on iTunes called “Real-Time BI with Kevin & Stewart”, and I’m excited that he’s joining us again at this year’s BI Forum in Atlanta to talk about adding 3rd party visualisations to OBIEE. Over to Kevin…

“I can’t tell you how many times I’ve told someone that I can’t precisely meet a certain charting requirement because of a lack of configurability or variety in the OBIEE charting engine.  Combine that with an increase in the variety and types of data people are interested in visualizing within OBIEE and you have a clear need.  Fortunately, OBIEE is web-based tool and can leverage other visualization engines, if you just know how to work with the engine and embed it into OBIEE.

In my session, I’ll walk through a variety of reasons you might want to do this and the various approaches for doing it.  Then, I’ll take two specific engines and show you the process for building a visualization with them right in an OBIEE Analysis.  In both examples, you’ll come away with a capability you’ve never been able to do directly in OBIEE before.”


Another speaker, blogger, writer and developer very-well known to the OBIEE community is Art of BI Software’s Christian Screen, co-author of the Packt book “Oracle Business Intelligence Enterprise Edition 11g: A Hands-On Tutorial” and developer of the OBIEE collaboration add-in, BITeamwork. Last year Christian spoke to us about developing plug-ins for OBIEE, but this year he’s returned to a topic he’s very passionate about – mobile BI, and in particular, Oracle’s Mobile App Designer. According to Christian:

“Last year Oracle marked its mobile business intelligence territory by updating its Oracle BI iOS application with a new look and feel. Unbeknownst to many, they also released the cutting-edge Oracle BI Mobile Application Designer (MAD). These are both components available as part of the Oracle BI Foundation Suite. But it is where they are taking the mobile analytics platform that is most interesting at the moment as we look at the mobile analytics consumption chain. MAD is still in its 1.x release and there is a lot of promise with this tool to satisfy the analytical cravings growing in the bellies of many enterprise organizations. There is also quite a bit of discussion around building new content just for mobile consumption compared to viewing existing content through the mobile applications native to major mobile devices.

The “Oracle BI Got MAD and You Should be Happy” session will discuss these topics and I’ll be sharing the stage with Jayant Sharma from Oracle BI Product Development where we’ll also be showing some cutting edge material and demos for Oracle BI MAD.  Because MAD provides a lot of flexibility for development customizations, compared to the Oracle BI iOS/Android applications, our session will explore business use cases around pre-built MAD applications, HTML5, mobile security, and development of plug-ins using the MAD SDK.  One of the drivers for this session is to show how many of the Oracle Analytics components integrate with MAD and how an Oracle BI developer can quickly leverage the capabilities of MAD to show the tool’s value within their current Oracle BI implementation.

We will also discuss the common concern of mobile security by touching on the BitzerMobile acquisition and using the central mobile configuration settings for Oracle BI Mobile. The crowd will hopefully walk away with a better understanding of Oracle BI mobility with MAD and a desire to go build something.”


As well as OBIEE and Oracle Mobile App Designer, Oracle also have another product, Oracle Endeca Information Discovery, that combines a data aggregation and search engine with dashboard visuals and data discovery. One of the most innovative partner companies in the Endeca space are Branchbird, and we’re very pleased to have Branchbird’s Patrick Rafferty join us to talk about “More Than Mashups – Advanced Visualizations and Data Discovery”. Over to Patrick …

“In this session, we’ll explore how Oracle Endeca customers are moving beyond simple dashboards and charts and creating exciting visualizations on top of their data using Oracle Endeca Studio. We’ll discuss how the latest trends in data visualization, especially geospatial and temporal visualization, can be brought into the enterprise and how they drive competitive advantage.

This session will show in-production real-life examples of how extending Oracle Endeca Studio’s visualization capabilities to integrate technology like D3 can create compelling discovery-driven visualizations that increase revenue, cut cost and enhance the ability to answer unknown questions through data discovery.”


The full agenda for the Atlanta and Brighton BI Forum agendas can be found on this blog post, and full details of both events, including registration links, links to book accommodation and details of the Lars George Cloudera Hadoop masterclass, can be found on the Rittman Mead BI Forum 2014 home page.

Categories: BI & Warehousing

Preview of Maria Colgan, and Andrew Bond/Stewart Bryson Sessions at RM BI Forum 2014

Rittman Mead Consulting - Wed, 2014-04-16 02:11

We’ve got a great selection of presentations at the two upcoming Rittman Mead BI Forum 2014 events in Brighton and Atlanta, including sessions on Endeca, TimesTen, OBIEE (of course), ODI, GoldenGate, Essbase and Big Data (full timetable for both events here). Two of the sessions I’m particularly looking forward to though are ones by Maria Colgan, product manager for the new In-Memory Option for Oracle Database, and another by Andrew Bond and Stewart Bryson, on an update to Oracle’s reference architecture for Data Warehousing and Information Management.

The In-Memory Option for Oracle Database was of course the big news item from last year’s Oracle Openworld, promising to bring in-memory analytics and column-storage to the Oracle Database. Maria is of course well known to the Oracle BI and Data Warehousing community through her work with the Oracle Database Cost-Based Optimizer, so we’re particular glad to have her at the Atlanta BI Forum 2014 to talk about what’s coming with this new feature. I asked Maria to jot down a few worlds for the blog on what she’ll be covering, so over to Maria:

NewImage“Given this announcement and the performance improvements promised by this new functionality is it still necessary to create a separate access and performance layer in your data warehouse environment or to run  your Oracle data warehouse  on an Exadata environment?“At Oracle Open World last year, Oracle announced the upcoming availability of the Oracle Database In-Memory option, a solution for accelerating database-driven business decision-making to real-time. Unlike specialized In-Memory Database approaches that are restricted to particular workloads or applications, Oracle Database 12c leverages a new in-memory column store format to speed up analytic workloads.

This session explains in detail how Oracle Database In-Memory works and will demonstrate just how much performance improvements you can expect. We will also discuss how it integrates into the existing Oracle Data Warehousing Architecture and with an Exadata environment.”

The other session I’m particularly looking forward to is one being delivered jointly by Andrew Bond, who heads-up Enterprise Architecture at Oracle and was responsible along with Doug Cackett for the various data warehousing, information management and big data reference architectures we’ve covered on the blog over the past few years, including the first update to include “big data” a year or so ago.


Back towards the start of this year, Stewart, myself and Jon Mead met up with Andrew and his team to work together on an update to this reference architecture, and Stewart carried on with the collaboration afterwards, bringing in some of our ideas around agile development, big data and data warehouse design into the final architecture. Stewart and Andrew will be previewing the updated reference architecture at the Brighton BI Forum event, and in the meantime, here’s a preview from Andrew:

“I’m very excited to be attending the event and unveiling Oracle’s latest iteration of the Information Management reference architecture. In this version we have focused on a pragmatic approach to “Analytics 3.0″ and in particular looked at bringing an agile methodology to break the IT / business barrier. We’ve also examined exploitation of in-memory technologies and the Hadoop ecosystem and guiding the plethora of new technology choices.

We’ve worked very closely with a number of key customers and partners on this version – most notably Rittman Mead and I’m delighted that Stewart and I will be able to co-present the architecture and receive immediate feedback from delegates.”

Full details of the event, running in Brighton on May 7-9th 2014 and Atlanta, May 15th-17th 2014, can be found on the Rittman Mead BI Forum 2014 homepage, and the agendas for the two days are on this blog post from earlier in the week.

Categories: BI & Warehousing

Final Timetable and Agenda for the Brighton and Atlanta BI Forums, May 2014

Rittman Mead Consulting - Mon, 2014-04-14 07:00

It’s just a few weeks now until the Rittman Mead BI Forum 2014 events in Brighton and Atlanta, and there’s still a few spaces left at both events if you’d still like to come – check out the main BI Forum 2014 event page, and the booking links for Brighton (May 7th – 9th 2014) and Atlanta (May 14th – 16th 2014).

We’re also able now to publish the timetable and running order for the two events – session order can still change between now at the events, but this what we’re planning to run, first of all in Brighton, with the photos below from last year’s BI Forum.


Brighton BI Forum 2014, Hotel

Seattle, Brighton

Wednesday 7th May 2014 – Optional 1-Day Masterclass, and Opening Drinks, Keynote and Dinner

  • 9.00 – 10.00 – Registration
  • 10.00 – 11.00 : Lars George Hadoop Masterclass Part 1
  • 11.00 – 11.15 : Morning Coffee
  • 11.15 – 12.15 : Lars George Hadoop Masterclass Part 2
  • 12.15 – 13.15 : Lunch
  • 13.15 – 14.15 : Lars George Hadoop Masterclass Part 3
  • 14.15 – 14.30 : Afternoon Tea/Coffee/Beers
  • 14.30 – 15.30 : Lars George Hadoop Masterclass Part 4
  • 17.00 – 19.00 : Registration and Drinks Reception
  • 19.00 – Late :  Oracle Keynote and Dinner at Hotel
Thursday 8th May 2014
  • 08.45 – 09.00 : Opening Remarks Mark Rittman, Rittman Mead
  • 09.00 – 10.00 : Emiel van Bockel : Extreme Intelligence, made possible by …
  • 10.00 – 10.30 : Morning Coffee
  • 10.30 – 11.30 : Chris Jenkins : TimesTen for Exalytics: Best Practices and Optimisation
  • 11.30 – 12.30 : Robin Moffatt : No Silver Bullets : OBIEE Performance in the Real World
  • 12.30 – 13.30 : Lunch
  • 13.30 – 14.30 : Adam Bloom : Building a BI Cloud
  • 14.30 – 14.45 : TED: Paul Oprea : “Extreme Data Warehousing”
  • 14.45 – 15.00 : TED : Michael Rainey :  “A Picture Can Replace A Thousand Words”
  • 15.00 – 15.30 : Afternoon Tea/Coffee/Beers
  • 15.30 – 15.45 : Reiner Zimmerman : About the Oracle DW Global Leaders Program
  • 15.45 – 16.45 : Andrew Bond & Stewart Bryson : Enterprise Big Data Architecture
  • 19.00 – Late: Depart for Gala Dinner, St Georges Church, Brighton

Friday 9th May 2014

  • 9.00 – 10.00 : Truls Bergensen – Drawing in a New Rock on the Map – How will of Endeca Fit in to Your Oracle BI Topography
  • 10.00 – 10.30 : Morning Coffee
  • 10.30 – 11.30 : Nicholas Hurt & Michael Rainey : Real-time Data Warehouse Upgrade – Success Stories
  • 11.30 – 12.30 : Matt Bedin & Adam Bloom : Analytics and the Cloud
  • 12.30 – 13.30 : Lunch13.30 – 14.30 : Gianni Ceresa : Essbase within/without OBIEE – not just an aggregation engine
  • 14.30 – 14.45 : TED : Marco Klaassens : “Speed up RPD Development”
  • 14.45 – 15:00 : TED : Christian Berg : “Neo’s Voyage in OBIEE:”
  • 15.00 – 15.30 : Afternoon Tea/Coffee/Beers
  • 15.30 – 16.30 : Alistair Burgess : “Tuning TimesTen with Aggregate Persistence”
  • 16.30 – 16.45 : Closing Remarks (Mark Rittman)
Then directly after Brighton we’ve got the US Atlanta event, running the week after, Wednesday – Friday, with last year’s photos below: Us

Atlanta BI Forum 2014, Renaissance Mid-Town Hotel, Atlanta

Wednesday 14th May 2014 – Optional 1-Day Masterclass, and and Opening Drinks, Keynote and Dinner

  • 9.00-10.00 – Registration
  • 10.00 – 11.00 : Lars George Hadoop Masterclass Part 1
  • 11.00 – 11.15 : Morning Coffee
  • 11.15 – 12.15 : Lars George Hadoop Masterclass Part 2
  • 12.15 – 13.15 : Lunch
  • 13.15 – 14.15 : Lars George Hadoop Masterclass Part 3
  • 14.15 – 14.30 : Afternoon Tea/Coffee/Beers
  • 14.30 – 15.30 : Lars George Hadoop Masterclass Part 4
  • 16.00 – 18.00 : Registration and Drinks Reception
  • 18.00 – 19.00 : Oracle Keynote & Dinner

Thursday 15th May 2014

  • 08.45 – 09.00 : Opening Remarks Mark Rittman, Rittman Mead
  • 09.00 – 10.00 : Kevin McGinley : Adding 3rd Party Visualization to OBIEE
  • 10.00 – 10.30 : Morning Coffee
  • 10.30 – 11.30 : Richard Tomlinson : Endeca Information Discovery for Self-Service and Big Data
  • 11.30 – 12.30 : Omri Traub : Endeca and Big Data: A Vision for the Future
  • 12.30 – 13.30 : Lunch
  • 13.30 – 14.30 : Dan Vlamis : Capitalizing on Analytics in the Oracle Database in BI Applications
  • 14.30 – 15.30 : Susan Cheung : TimesTen In-Memory Database for Analytics – Best Practices and Use Cases
  • 15.30 – 15.45 : Afternoon Tea/Coffee/Beers
  • 15.45 – 16.45 : Christian Screen : Oracle BI Got MAD and You Should Be Happy
  • 18.00 – 19.00 : Special Guest Keynote : Maria Colgan : An introduction to the new Oracle Database In-Memory option
  • 19.00 – leave for dinner

Friday 16th May 2014

  • 09.00 – 10.00 : Patrick Rafferty : More Than Mashups – Advanced Visualizations and Data Discovery
  • 10.00 – 10.30 : Morning Coffee
  • 10.30 – 12. 30 : Matt Bedin : Analytic Applications and the Cloud
  • 12.30 – 13.30 : Lunch
  • 13.30 – 14.30 : Philippe Lions : What’s new on 2014 HY1 OBIEE SampleApp
  • 14.30 – 15.30 : Stewart Bryson : ExtremeBI: Agile, Real-Time BI with Oracle Business Intelligence, Oracle Data Integrator and Oracle GoldenGate
  • 15.30 – 16.00 : Afternoon Tea/Coffee/Beers
  • 16.00 – 17.00 : Wayne Van Sluys : Everything You Know about Oracle Essbase Tuning is Wrong or Outdated!
  • 17.00 – 17.15 : Closing Remarks (Mark Rittman)
Full details of the two events, including more on the Hadoop Masterclass with Cloudera’s Lars George, can be found on the BI Forum 2014 home page.

Categories: BI & Warehousing

The Riley Family, Part III

Chet Justice - Thu, 2014-04-10 20:44

That's Mike and Lisa, hanging out at the hospital. Mike's in his awesome cookie monster pajamas and robe...must be nice, right? Oh wait, it's not. You probably remember why he's there, Stage 3 cancer. The joys.

In October, we helped to send the entire family to Game 5 of the World Series (Cards lost, thanks Red Sox for ruining their night).

In November I started a GoFundMe campaign, to date, with your help, we've raised $10,999. We've paid over 9 thousand dollars to the Riley family (another check to be cut shortly).

In December, Mike had surgery. Details can be found here. Shorter: things went fairly well, then they didn't. Mike spent 22 days in the hospital and lost 40 lbs. He missed Christmas and New Years at home with his family. But, as I've learned over the last 6 months, the Riley family really knows how to take things in stride.

About 6 weeks ago Mike started round 2 of chemo, he's halfway through that one now. He complains (daily, ugh) about numbness, dizziness, feeling cold (he lives in St. Louis, are you sure it's not the weather?), and priapism (that's a lie...I hope).

Mike being Mike though, barely a complaint (I'll let you figure out where I'm telling a lie).

Four weeks ago, a chilly (65) Saturday night, Mike and Lisa call. "Hey, I've got some news for you."

"Sweet," I think to myself. Gotta be good news.

"Lisa was just diagnosed with breast cancer."


ARE YOU KIDDING ME? (Given Mike's gallows humor, it's possible).

"Nope. Stage 1. Surgery on April 2nd."


(Surgery was last week. It went well. No news on that front yet.)

Talking to them two of them that evening you would have no idea they BOTH have cancer. Actually, one of my favorite stories of the year...the hashtag for Riley Family campaign was #fmcuta. Fuck Mike's Cancer (up the ass). I thought that was hilarious, but I didn't think the Riley's would appreciate it. They did. They loved it. I still remember Lisa's laugh when I first suggested it. They've dropped the latest bad news and Lisa is like, "Oh, wait until you hear this. I have a hashtag for you."

"What is it?" (I'm thinking something very...conservative. Not sure why, I should know better by now).


I think about that for about .06 seconds. Holy shit! Did you just say tna? Like "tits and ass?"

(sounds of Lisa howling in the background).

Awesome. See what I mean? Handling it in stride.

"We're going to need a bigger boat." All I can think about now is, "what can we do now?"

First, I raised the campaign goal to 50k. This might be ambitious, that's OK, cancer treatments are expensive enough for one person, and 10K (the original amount) was on the low side. So...50K.

Second, Scott Spendolini created a very cool APEX app, ostensibly called the Riley Support Group (website? gah). It's a calendar/scheduling app that allows friends and family coordinate things like meals, young human (children) care and other things that most of us probably take for granted. Pretty cool stuff. For instance, Tim Gorman provides pizza on Monday nights (Dinner from pizza hut...1 - large hand-tossed cheese lovers, 1 - large thin-crispy pepperoni, 1 - 4xpepperoni rolls, 1 - cheesesticks).

Third. There is no third.

So many of you have donated your hard earned cash to the Riley family, they are incredibly humbled by, and grateful for, everyone's generosity. They aren't out of the woods yet. Donate more. Please. If you can't donate, see if there's something you can help out with (hit me up for details, Tim lives in CO, he's not really close). If you can't do either of those things, send them your prayers or your good thoughts. Any and all help will be greatly appreciated.
Categories: BI & Warehousing

Details on Two New Upcoming Courses in Brighton – Including ODI12c

Rittman Mead Consulting - Fri, 2014-04-04 05:09

Just a quick note to mention two courses that are being run from our Brighton office soon, that might interested readers of the blog.

Our OBIEE 11g Bootcamp course, newly updated for OBIEE, is a start-to-finish introduction to Oracle Business Intelligence Enterprise Edition 11g aimed at developers and admins new to the platform. Starting with an overview of the platform, then taking you through RPD development, creating reports and dashboards and through to security, configuring for high-availability and systems management, this is our most popular course and includes a free copy of my book, “Oracle Business Intelligence Developers Guide”.

This 5-day course covers the following topics:

  • OBIEE 11g Overview & Product Architecture
  • Installation, Configuration & Upgrades
  • Creating Repositories from Relational Sources
  • Advanced Repository Modelling from Relational Sources
  • Systems Management using Oracle Enterprise Manager
  • Creating Analyses and Dashboards
  • Actionable Intelligence
  • KPIs, Scorecards & Strategy Management
  • Creating Published Reports (BI Publisher)
  • OBIEE 11g Security
  • High-Availability, Scaleout & Clustering
  •  OBIEE 11g Bootcamp, April 28th-May 2nd, 2014

Details on the full course agenda are available here, and the next run of the course is in Brighton on April 28th – May 2nd, 2014 – register using that link.

We’re also very pleased to announce the availability of our new Data Integrator 12c 3-day course, aimed at developers new to ODI as well as those upgrading their knowledge from the ODI11g release. Written entirely by our ODI development team and also used by us to train our own staff, this is a great opportunity to learn ODI12c based on Rittman Mead’s own, practical field experience.

The topics we’ll be covering in this course are:

  • Getting Started with ODI 12c
  • ODI 12c TopologyODI 12c Projects
  • Models & Datastores
  • Data Quality in a Model
  • Introduction to ODI Mappings
  • ODI Procedures, Variables, Sequences, & User Functions
  • Advanced ODI Mappings
  • ODI Packages
  • Scenarios in ODI
  • The ODI Debugger
  • ODI Load Plans

We’re running this course for the first time, in Brighton on May 12th – 14th 2014, and you can register using this link.

Full details of all our training courses, including public scheduled courses and private, standard or customised courses, can be found on our Training web page or for more information, contact the Rittman Mead Training Team.

Categories: BI & Warehousing

BI Forum 2014 preview – No Silver Bullets : OBIEE Performance in the Real World

Rittman Mead Consulting - Thu, 2014-04-03 03:35

I’m honoured to have been accepted to speak at this year’s Rittman Mead BI Forum, the sixth year of this expert-level conference that draws some of the best Oracle BI/DW minds together from around the world. It’s running May 8th-9th in Brighton, and May 15-16th in Atlanta, with an optional masterclass from Cloudera’s Lars George the day before the conference itself at each venue.

My first visit to the BI Forum was in 2009 where I presented Performance Testing OBIEE, and now five years later (five years!) I’m back, like a stuck record, talking about the same thing – performance. That I’m still talking about it means that there’s still an audience for it, and this time I’m looking beyond just testing performance, but how it’s approached by people working with OBIEE. For an industry built around 1s and 0s, computers doing just what you tell them to and nothing else, there is a surprising amount of suspect folklore and “best practices” used when it comes to “fixing” performance problems.

OBIEE performance good luck charm

Getting good performance with OBIEE is just a matter of being methodical. Understanding where to look for information is half the battle. By understanding where the time goes, improvements can be targeted where they will be most effective. Heavily influence by Cary Millsap and his Method-R approach to performance, I will look at how to practically apply this to OBIEE. Most of the information needed to build up a full picture is available readily from OBIEE’s log files

I’ll also dig a bit deeper into OBIEE, exploring how to determine how the system’s behaving “under the covers”. The primary technique for this is through OBIEE’s DMS metrics which I have written about recently in relation to the new Rittman Mead open-source tool, obi-metrics-agent and am using day-to-day to rapidly examine and resolve performance problems that clients see.

I’m excited to be presenting again on this topic, and I hope to see you in Brighton next month. The conference always sells out, so don’t delay – register today!

Categories: BI & Warehousing

Data Integration Tips: ODI 12c – Varchar2 (CHAR or BYTE)

Rittman Mead Consulting - Tue, 2014-04-01 11:28

Continuing with our Data Integration Tips series, here’s one that applies to both Oracle Data Integrator 11g and 12c. This “tip” was actually discovered as a part of a larger issue involving GoldenGate, Oracle Datapump, and ODI. Maybe a future post will dive deeper into those challenges, but here I’m going to focus just on the ODI bit.

The Scenario

During our setup of GoldenGate and ODI, it was discovered that the source and target databases were set to use different character sets.

Source:  WE8ISO8859P1

Target (DW):  AL32UTF8

During my research, I found that the source is a single-byte character set and the target is multi-byte. What this means is that a special character, such as  “Ǣ“, for example, may take up more than one byte when stored in a column with a VARCHAR2 datatype (as described in the Oracle documentation – “Choosing a Character Set“). When attempting to load a column of datatype VARCHAR2(1) containing the text “Ǣ”, the load would fail with an error, similar to the one below.

ORA-12899: value too large for column "COL_NAME" (actual: 2, maximum: 1)

The difference in character sets is clearly the issue, but how do we handle this when performing a load between the two databases? Reading through the Oracle doc referenced above, we can see that it all depends on the target database column length semantics. Specifically, for the attributes of VARCHAR2 datatype, we need to use character semantics in the target, “VARCHAR2(1 CHAR)”, rather than byte semantics, “VARCHAR2(1 BYTE)”. The former can handle multi-byte character sets simply by storing the characters as they are inserted. The latter will store each byte necessary for the character value individually. Looking back at the example, the character “Ǣ” inserted into a column using byte semantics (which is the default, in this case, when BYTE or CHAR is not specified) would require 2 bytes, thus causing the error.

Here’s the Tip…

The overall solution is to modify any VARCHAR2 columns that may have special characters inserted to use character semantics in the target database. Quite often we cannot determine which columns may or may not contain certain data, requiring the modification of all columns to use character semantics. Using the database system tables, the alter table script to make the necessary changes to existing columns can be generated and executed. But what about new columns generated by ODI? Here we’ll need to use the power of the Oracle Data Integrator metadata to create a new datatype.

In the ODI Topology, under the Physical Architecture accordion, the technologies that can be used as a data source or target are listed. Each technology, in turn, has a set of datatypes defined that may be used as Datastore Attributes when the technology is chosen in a Model.

Oracle Technology

Further down in the list, you will find the VARCHAR2 datatype. Double-click the name to open the object. In the SQL Generation Code section we will find the syntax used when DDL is generated for a column of type VARCHAR2.

Oracle technology - VARCHAR2 datatype

As you can see, the default is to omit the type of semantics used in the datatype syntax, which most likely means BYTE semantics are used, as this is typically the default in an Oracle database. This syntax can be modified to always produce character semantics by adding the CHAR keyword after the length substitution value.


Before making the change to the “out of the box” VARCHAR2 datatype, you may want to think about how this datatype will be used on Oracle targets and sources. Any DDL generated by ODI will use this syntax when VARCHAR2 is selected for an attribute datatype. In some cases, this might be just fine as the ODI tool is only used for a single target data warehouse. But quite often, ODI is used in many different capacities, such as data migrations, data warehousing, etc. To handle both forms of semantics, the best approach is to duplicate the VARCHAR2 datatype and create a new version for the use of characters.

VARCHAR2 datatype edited

Now we can assign the datatype VARCHAR2 (CHAR) to any of our Datastore columns. I recommend the use of a Groovy script if changing Attributes in multiple Datastores.

Change Datatype - VARCHAR2 CHAR

Now when Generate DDL is executed on the Model, the Create Table step will have the appropriate semantics for the VARCHAR2 datatype.


As you can see, the power of Oracle Data Integrator and the ability to modify and customize its metadata provided me with the solution in this particular situation. Look for more Data Integration Tips from Rittman Mead – coming soon!

Categories: BI & Warehousing

Data Warehouse for Big Data: Scale-Up vs. Scale-Out

Dylan Wan - Thu, 2014-01-02 15:33

Found a very good paper:

This paper discuss if it is a right approach of using Hadoop as the analytics infrastructure.

It is hard to argue with the industry trend.  However, Hadoop is not
new any more.  It is time for people to calm down and rethink about the
real benefits.

Categories: BI & Warehousing