Skip navigation.

Rittman Mead Consulting

Syndicate content
Delivering Oracle Business Intelligence
Updated: 9 hours 14 min ago

Simple Data Manipulation and Reporting using Hive, Impala and CDH5

Thu, 2014-04-24 13:54

Althought I’m pretty clued-up on OBIEE, ODI, Oracle Database and so on, I’m relatively new to the worlds of Hadoop and Big Data, so most evenings and weekends I play around with Hadoop clusters on my home VMWare ESXi rig and try and get some experience that might then come in useful on customer projects. A few months ago I went through an example of loading-up flight delays data into Cloudera CDH4 and then analysing it using Hive and Impala, but realistically it’s unlikely the data you’ll analyse in Hadoop will come in such convenient, tabular form. Something that’s more realistic is analysing log files from web servers or other high-volume, semi-structured sources, so I asked Robin to download the most recent set of Apache log files from our website, and I thought I’d have a go at analysing them using Pig and Hive, and maybe the visualise the output using OBIEE (if possible, later on).

As I said, I’m not an expert in Hadoop and the Cloudera platform, so I thought it’d be interesting to describe the journey I went through, and also give some observations from myself on when to use Hive and when to use Pig; when products like Cloudera Impala could be useful, and also the general state-of-play with the Cloudera Hadoop platform. So the files I started off with were Apache weblog files, with 10 in total and sizes ranging from 350MB to around 2MB.

NewImage

Looking inside one of the log files, they’re in the standard Apache log file format (or “combined log format”), where the visitor’s IP address is recorded, the date of access, some other information and the page (or resource) they requested:

NewImage

What I’m looking to do is count the number of visitors on a day, which was the most popular page, what time of day are we most busy, and so on. I’ve got a Cloudera Hadoop CDH5.0 6-node cluster running on a VMWare ESXi server at home, so the first thing to do is log into Hue, the web-based developer admin tool that comes with CDH5, and upload the files to a directory on HDFS (Hadoop Distributed File System), the Unix-like clustered file system that underpins most of Hadoop.

NewImage

You can, of course, SFTP the files to one of the Hadoop nodes and use the “hadoop fs” command-line tool to copy the files into HDFS, but for relatively small files like these it’s easier to use the web interface to upload them from your workstation. Once I’ve done that, I can then view the log files in the HDFS directory, just as if they were sitting on a regular Unix filesystem.

NewImage

At this point though, the files are still “unstructured’ – just a single log entry per line – and I’ll therefore need to do something before I can count things like number of hits per day, what pages were requested and so on. At this beginners level, there’s two main options you can use – Hive, a SQL interface over HDFS that lets you select from, and do set-based transformations with, files of data; or Pig, a more procedural language that lets you manipulate file contents as a series of step-by-step tasks. For someone like myself with a relational data warehousing background, Hive is probably easier to work with but it comes with some quite significant limitations compared to a database like Oracle – we’ll see more on this later.

Whilst Hive tables are, at the most simplest level, mapped onto comma or otherwise-delimted files, another neat feature in Hive is that you can use what’s called a “SerDe”, or “Serializer-Deserializer”, to map more complex file structures into regular table columns. In the Hive DDL script below, I use this SerDe feature to have a regular expression parse the log file into columns, with the data source being an entire directory of files, not just a single one:

CREATE EXTERNAL TABLE apachelog (
  host STRING,
  identity STRING,
  user STRING,
  time STRING,
  request STRING,
  status STRING,
  size STRING,
  referer STRING,
  agent STRING)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
  "input.regex" = "([^ ]*) ([^ ]*) ([^ ]*) (-|\\[[^\\]]*\\]) ([^ \"]*|\"[^\"]*\") (-|[0-9]*) (-|[0-9]*)(?: ([^ \"]*|\"[^\"]*\") ([^ \"]*|\"[^\"]*\"))?",
  "output.format.string" = "%1$s %2$s %3$s %4$s %5$s %6$s %7$s %8$s %9$s"
)
STORED AS TEXTFILE
LOCATION '/user/root/logs';

Things to note in the above DDL are:

  • EXTERNAL table means that the datafile used to populate the Hive table sits somewhere outside Hive’s usual /user/hive/warehouse directory, in this case in the /user/root/logs HDFS directory.
  • ROW FORMAT SERDE ‘org.apache.hadoop.hive.contrib.serde2.RegexSerDe’ tells Hive to use the Regular Expressions Serializer-Deserializer to interpret the source file contents, and 
  • WITH SERDEPROPERTIES … gives the SerDe the regular expression to use, in this case to decode the Apache log format.

Probably the easiest way to run the Hive DDL command to create the table is to use the Hive query editor in Hue, but there’s a couple of things you’ll need to do before this particular command will work:

1. You’ll need to get hold of the JAR file in the Hadoop install that provides this SerDE (hive-contrib-0.12.0-cdh5.0.0.jar) and then copy it to somewhere on your HDFS file system, for example /user/root. In my CDH5 installation, this file was at opt/cloudera/parcels/CDH/lib/hive/lib/, but it’ll probably be at /usr/lib/hive/lib if you installed CDH5 using the traditional packages (rather than parcels) route. Also if you’re using a version of CDH prior to 5, the filename will be renamed accordingly. This JAR file then needs to accessible to Hive, and whilst there’s various more-permanent ways you can do this, the easiest is to point to the JAR file in an entry in the query editor File Resources section as shown below.

2. Whilst you’re there, un-check the “Enable Parameterization” checkbox, otherwise the query editor will interpret the SerDe output string as parameter references.

NewImage

Once the command has completed, you can click over to the Hive Metastore table browser, and see the columns in the new table. 

NewImage

Behind the scenes, Hive maps its table structure onto all the files in the /user/root/logs HDFS directory, and when I run a SELECT statement against it, for example to do a simple row count, MapReduce mappers, shufflers and sorters are spun-up to return the count of rows to me.

NewImage

But in its current form, this table still isn’t all that useful – I’ve just got raw IP addresses for page requesters, and the request date is a format that’s not easy to work with. So let’s do some further manipulation, creating another table that splits out the request date into year, month, day and time, using Hive’s CREATE TABLE AS SELECT command to transform and then load in one command:

CREATE TABLE apachelog_date_split_parquet
ROW FORMAT SERDE 'parquet.hive.serde.ParquetHiveSerDe'
  STORED AS 
    INPUTFORMAT "parquet.hive.DeprecatedParquetInputFormat"
    OUTPUTFORMAT "parquet.hive.DeprecatedParquetOutputFormat"
AS
SELECT host,
       identity,
       user,
       substr(time,9,4)  year,
       substr(time,5,3)  month,
       substr(time,2,2)  day,
       substr(time,14,2) hours,
       substr(time,17,2) secs,
       substr(time,20,2) mins,
       request,
       status,
       size,
       referer,
       agent
FROM   apachelog
;

Note the ParquetHive SerDe I’m using in this table’s row format definition – Parquet is a compressed, column-store file format developed by Cloudera originally for Impala (more on that in a moment), that from CDH4.6 is also available for Hive and Pig. By using Parquet, we potentially take advantage of speed and space-saving advantages compared to regular files, so let’s use that feature now and see where it takes us. After creating the new Hive table, I can then run a quick query to count web server hits per month:

NewImage

So – getting more useful, but it’d be even nicer if I could map the IP addresses to actual countries, so I can see how many hits came from the UK, how many from the US, and so on. To do this, I’d need to use a lookup service or table to map my IP addresses to countries or cities, and one commonly-used such service is the free GeoIP database provided by MaxMind, where you turn your IP address into an integer via a formula, and then do a BETWEEN to locate that IP within ranges defined within the database. How best to do this though?

There’s several ways that you can enhance and manipulate data in your Hadoop system like this. One way, and something I plan to look at on this blog later in this series, is to use Pig, potentially with a call-out to Perl or Python to do the lookup on a row-by-row (or tuple-by-tuple) basis – this blog article on the Cloudera site goes through a nice example. Another way, and again something I plan to cover in this series on the blog, is to use something called “Hadoop Streaming” – the ability within MapReduce to “subcontract” the map and reduce parts of the operation to external programs or scripts, in this case a Python script that again queries the MaxMind database to do the IP-to-country lookup.

But surely it’d be easiest to just calculate the IP address integer and just join my existing Hive table to this GeoIP lookup table, and do it that way? Let’s start by trying to do this, first by modifying my final table design to include the IP address integer calculation defined on the MaxMind website: 

CREATE TABLE apachelog_date_ip_split_parquet
ROW FORMAT SERDE 'parquet.hive.serde.ParquetHiveSerDe'
  STORED AS 
    INPUTFORMAT "parquet.hive.DeprecatedParquetInputFormat"
    OUTPUTFORMAT "parquet.hive.DeprecatedParquetOutputFormat"
AS
SELECT host,
      (cast(split(host,'\\.')[0] as bigint) * 16777216) 
     + (cast(split(host,'\\.')[1] as bigint) * 65535) 
     + (cast(split(host,'\\.')[2] as bigint) * 256) 
     + (cast(split(host,'\\.')[3] as bigint)) ip_add_int,
       identity,
       user,
       substr(time,9,4)  year,
       substr(time,5,3)  month,
       substr(time,2,2)  day,
       substr(time,14,2) hours,
       substr(time,17,2) secs,
       substr(time,20,2) mins,
       request,
       status,
       size,
       referer,
       agent
FROM   apachelog
;

Now I can query this from the Hive query editor, and I can see the IP address integer calculations that I can then use to match to the GeoIP IP address ranges.

NewImage

I then upload the IP Address to Countries CSV file from the MaxMind site to HDFS, and define a Hive table over it like this:

create external table geo_lookup (
  ip_start      string,
  ip_end        string,
  ip_int_start  int,
  ip_int_end    int,
  country_code  string,
  country_name  string
  )
row format DELIMITED 
FIELDS TERMINATED BY '|' 
LOCATION '/user/root/lookups/geo_ip';

Then I try some variations on the BETWEEN clause, in a SELECT with a join:

select a.host, l.country_name
from apachelog_date_ip_split a join geo_lookup l 
on (a.ip_add_int > l.ip_int_start) and (a.ip_add_int < l.ip_int_end)
group by a.host, l.country_name;

select a.host, l.country_name
from apachelog_date_ip_split_parquet a join geo_lookup l 
on a.ip_add_int between l.ip_int_start and l.ip_int_end;

.. which all fail, because Hive only supports equi-joins. One option is to use a Hive UDF (user-defined function) such as this one here to implement a GeoIP lookup, but something that’s probably a bit more promising is to switch over to Impala, which has the ability to do non-equality joins through the crossjoin feature (Hive can in fact also use cross-joins, but they’re not very efficient). Impala also has the benefit of being much faster for BI-type queries than Hive, and it’s also designed to work with Parquet, so let’s switch over to the Impala query editor, run the “invalidate metadata” command to re-sync it’s table view with Hive’s table metastore, and then try the join in there:

NewImage

Not bad. Of course this is all fairly simple stuff, and we’re still largely working with relational-style set-based transformations. In the next two posts in the series though I want get a bit more deep into Hadoop-style transformations – first by using a feature called “Hadoop Streaming” to process data on its way into Hadoop, done in parallel, by calling out to Python and Perl scripts; and then take a look at Pig, the more “procedural” alternative to Hive – with the objective being to enhance this current dataset to bring in details of the pages being requested, filter out the non-page requests, and do some work with authors, tag and clickstream analysis.

Categories: BI & Warehousing

Previewing Three Oracle Data Visualization Sessions at the Atlanta US BI Forum 2014

Tue, 2014-04-22 04:30

Many of the sessions at the UK and US Rittman Mead BI Forum 2014 events in May focus on the back-end of BI and data warehousing, with for example Chris Jenkins’ session on TimesTen giving us some tips and tricks from TimeTen product development, and Wayne Van Sluys’s session on Essbase looking at what’s involved in Essbase database optimisation (full agendas for the two events can be found here). But two areas within BI that have got a lot of attention over the past couple of years are (a) data visualisation, and (b) mobile, so I’m particularly pleased that our Atlanta event has three of the most innovative practitioners in this area – Kevin McGinley from Accenture (left in pictures below), Christian Screen from Art of BI (centre), and Patrick Rafferty from Branchbird (right), talking about what they’ve been doing in these areas.

NewImage

If you were at the BI Forum a couple of years ago you’ll of course know Kevin McGinley, who won “best speaker” award the previous year and most recently has gone on to organise the BI track at ODTUG KScope and write for OTN and his own blog, Oranalytics.blogspot.com. Kevin also hosts, along with our own Stewart Bryson, a video podcast series on iTunes called “Real-Time BI with Kevin & Stewart”, and I’m excited that he’s joining us again at this year’s BI Forum in Atlanta to talk about adding 3rd party visualisations to OBIEE. Over to Kevin…

“I can’t tell you how many times I’ve told someone that I can’t precisely meet a certain charting requirement because of a lack of configurability or variety in the OBIEE charting engine.  Combine that with an increase in the variety and types of data people are interested in visualizing within OBIEE and you have a clear need.  Fortunately, OBIEE is web-based tool and can leverage other visualization engines, if you just know how to work with the engine and embed it into OBIEE.

In my session, I’ll walk through a variety of reasons you might want to do this and the various approaches for doing it.  Then, I’ll take two specific engines and show you the process for building a visualization with them right in an OBIEE Analysis.  In both examples, you’ll come away with a capability you’ve never been able to do directly in OBIEE before.”

NewImage

Another speaker, blogger, writer and developer very-well known to the OBIEE community is Art of BI Software’s Christian Screen, co-author of the Packt book “Oracle Business Intelligence Enterprise Edition 11g: A Hands-On Tutorial” and developer of the OBIEE collaboration add-in, BITeamwork. Last year Christian spoke to us about developing plug-ins for OBIEE, but this year he’s returned to a topic he’s very passionate about – mobile BI, and in particular, Oracle’s Mobile App Designer. According to Christian:

“Last year Oracle marked its mobile business intelligence territory by updating its Oracle BI iOS application with a new look and feel. Unbeknownst to many, they also released the cutting-edge Oracle BI Mobile Application Designer (MAD). These are both components available as part of the Oracle BI Foundation Suite. But it is where they are taking the mobile analytics platform that is most interesting at the moment as we look at the mobile analytics consumption chain. MAD is still in its 1.x release and there is a lot of promise with this tool to satisfy the analytical cravings growing in the bellies of many enterprise organizations. There is also quite a bit of discussion around building new content just for mobile consumption compared to viewing existing content through the mobile applications native to major mobile devices.

The “Oracle BI Got MAD and You Should be Happy” session will discuss these topics and I’ll be sharing the stage with Jayant Sharma from Oracle BI Product Development where we’ll also be showing some cutting edge material and demos for Oracle BI MAD.  Because MAD provides a lot of flexibility for development customizations, compared to the Oracle BI iOS/Android applications, our session will explore business use cases around pre-built MAD applications, HTML5, mobile security, and development of plug-ins using the MAD SDK.  One of the drivers for this session is to show how many of the Oracle Analytics components integrate with MAD and how an Oracle BI developer can quickly leverage the capabilities of MAD to show the tool’s value within their current Oracle BI implementation.

We will also discuss the common concern of mobile security by touching on the BitzerMobile acquisition and using the central mobile configuration settings for Oracle BI Mobile. The crowd will hopefully walk away with a better understanding of Oracle BI mobility with MAD and a desire to go build something.”

NewImage

As well as OBIEE and Oracle Mobile App Designer, Oracle also have another product, Oracle Endeca Information Discovery, that combines a data aggregation and search engine with dashboard visuals and data discovery. One of the most innovative partner companies in the Endeca space are Branchbird, and we’re very pleased to have Branchbird’s Patrick Rafferty join us to talk about “More Than Mashups – Advanced Visualizations and Data Discovery”. Over to Patrick …

“In this session, we’ll explore how Oracle Endeca customers are moving beyond simple dashboards and charts and creating exciting visualizations on top of their data using Oracle Endeca Studio. We’ll discuss how the latest trends in data visualization, especially geospatial and temporal visualization, can be brought into the enterprise and how they drive competitive advantage.

This session will show in-production real-life examples of how extending Oracle Endeca Studio’s visualization capabilities to integrate technology like D3 can create compelling discovery-driven visualizations that increase revenue, cut cost and enhance the ability to answer unknown questions through data discovery.”

NewImage

The full agenda for the Atlanta and Brighton BI Forum agendas can be found on this blog post, and full details of both events, including registration links, links to book accommodation and details of the Lars George Cloudera Hadoop masterclass, can be found on the Rittman Mead BI Forum 2014 home page.

Categories: BI & Warehousing

Preview of Maria Colgan, and Andrew Bond/Stewart Bryson Sessions at RM BI Forum 2014

Wed, 2014-04-16 02:11

We’ve got a great selection of presentations at the two upcoming Rittman Mead BI Forum 2014 events in Brighton and Atlanta, including sessions on Endeca, TimesTen, OBIEE (of course), ODI, GoldenGate, Essbase and Big Data (full timetable for both events here). Two of the sessions I’m particularly looking forward to though are ones by Maria Colgan, product manager for the new In-Memory Option for Oracle Database, and another by Andrew Bond and Stewart Bryson, on an update to Oracle’s reference architecture for Data Warehousing and Information Management.

The In-Memory Option for Oracle Database was of course the big news item from last year’s Oracle Openworld, promising to bring in-memory analytics and column-storage to the Oracle Database. Maria is of course well known to the Oracle BI and Data Warehousing community through her work with the Oracle Database Cost-Based Optimizer, so we’re particular glad to have her at the Atlanta BI Forum 2014 to talk about what’s coming with this new feature. I asked Maria to jot down a few worlds for the blog on what she’ll be covering, so over to Maria:


NewImage“Given this announcement and the performance improvements promised by this new functionality is it still necessary to create a separate access and performance layer in your data warehouse environment or to run  your Oracle data warehouse  on an Exadata environment?“At Oracle Open World last year, Oracle announced the upcoming availability of the Oracle Database In-Memory option, a solution for accelerating database-driven business decision-making to real-time. Unlike specialized In-Memory Database approaches that are restricted to particular workloads or applications, Oracle Database 12c leverages a new in-memory column store format to speed up analytic workloads.

This session explains in detail how Oracle Database In-Memory works and will demonstrate just how much performance improvements you can expect. We will also discuss how it integrates into the existing Oracle Data Warehousing Architecture and with an Exadata environment.”

The other session I’m particularly looking forward to is one being delivered jointly by Andrew Bond, who heads-up Enterprise Architecture at Oracle and was responsible along with Doug Cackett for the various data warehousing, information management and big data reference architectures we’ve covered on the blog over the past few years, including the first update to include “big data” a year or so ago.

NewImage

Back towards the start of this year, Stewart, myself and Jon Mead met up with Andrew and his team to work together on an update to this reference architecture, and Stewart carried on with the collaboration afterwards, bringing in some of our ideas around agile development, big data and data warehouse design into the final architecture. Stewart and Andrew will be previewing the updated reference architecture at the Brighton BI Forum event, and in the meantime, here’s a preview from Andrew:

“I’m very excited to be attending the event and unveiling Oracle’s latest iteration of the Information Management reference architecture. In this version we have focused on a pragmatic approach to “Analytics 3.0″ and in particular looked at bringing an agile methodology to break the IT / business barrier. We’ve also examined exploitation of in-memory technologies and the Hadoop ecosystem and guiding the plethora of new technology choices.

We’ve worked very closely with a number of key customers and partners on this version – most notably Rittman Mead and I’m delighted that Stewart and I will be able to co-present the architecture and receive immediate feedback from delegates.”

Full details of the event, running in Brighton on May 7-9th 2014 and Atlanta, May 15th-17th 2014, can be found on the Rittman Mead BI Forum 2014 homepage, and the agendas for the two days are on this blog post from earlier in the week.

Categories: BI & Warehousing

Final Timetable and Agenda for the Brighton and Atlanta BI Forums, May 2014

Mon, 2014-04-14 07:00

It’s just a few weeks now until the Rittman Mead BI Forum 2014 events in Brighton and Atlanta, and there’s still a few spaces left at both events if you’d still like to come – check out the main BI Forum 2014 event page, and the booking links for Brighton (May 7th – 9th 2014) and Atlanta (May 14th – 16th 2014).

We’re also able now to publish the timetable and running order for the two events – session order can still change between now at the events, but this what we’re planning to run, first of all in Brighton, with the photos below from last year’s BI Forum.

Brighton

Brighton BI Forum 2014, Hotel

Seattle, Brighton

Wednesday 7th May 2014 – Optional 1-Day Masterclass, and Opening Drinks, Keynote and Dinner

  • 9.00 – 10.00 – Registration
  • 10.00 – 11.00 : Lars George Hadoop Masterclass Part 1
  • 11.00 – 11.15 : Morning Coffee
  • 11.15 – 12.15 : Lars George Hadoop Masterclass Part 2
  • 12.15 – 13.15 : Lunch
  • 13.15 – 14.15 : Lars George Hadoop Masterclass Part 3
  • 14.15 – 14.30 : Afternoon Tea/Coffee/Beers
  • 14.30 – 15.30 : Lars George Hadoop Masterclass Part 4
  • 17.00 – 19.00 : Registration and Drinks Reception
  • 19.00 – Late :  Oracle Keynote and Dinner at Hotel
Thursday 8th May 2014
  • 08.45 – 09.00 : Opening Remarks Mark Rittman, Rittman Mead
  • 09.00 – 10.00 : Emiel van Bockel : Extreme Intelligence, made possible by …
  • 10.00 – 10.30 : Morning Coffee
  • 10.30 – 11.30 : Chris Jenkins : TimesTen for Exalytics: Best Practices and Optimisation
  • 11.30 – 12.30 : Robin Moffatt : No Silver Bullets : OBIEE Performance in the Real World
  • 12.30 – 13.30 : Lunch
  • 13.30 – 14.30 : Adam Bloom : Building a BI Cloud
  • 14.30 – 14.45 : TED: Paul Oprea : “Extreme Data Warehousing”
  • 14.45 – 15.00 : TED : Michael Rainey :  “A Picture Can Replace A Thousand Words”
  • 15.00 – 15.30 : Afternoon Tea/Coffee/Beers
  • 15.30 – 15.45 : Reiner Zimmerman : About the Oracle DW Global Leaders Program
  • 15.45 – 16.45 : Andrew Bond & Stewart Bryson : Enterprise Big Data Architecture
  • 19.00 – Late: Depart for Gala Dinner, St Georges Church, Brighton

Friday 9th May 2014

  • 9.00 – 10.00 : Truls Bergensen – Drawing in a New Rock on the Map – How will of Endeca Fit in to Your Oracle BI Topography
  • 10.00 – 10.30 : Morning Coffee
  • 10.30 – 11.30 : Nicholas Hurt & Michael Rainey : Real-time Data Warehouse Upgrade – Success Stories
  • 11.30 – 12.30 : Matt Bedin & Adam Bloom : Analytics and the Cloud
  • 12.30 – 13.30 : Lunch13.30 – 14.30 : Gianni Ceresa : Essbase within/without OBIEE – not just an aggregation engine
  • 14.30 – 14.45 : TED : Marco Klaassens : “Speed up RPD Development”
  • 14.45 – 15:00 : TED : Christian Berg : “Neo’s Voyage in OBIEE:”
  • 15.00 – 15.30 : Afternoon Tea/Coffee/Beers
  • 15.30 – 16.30 : Alistair Burgess : “Tuning TimesTen with Aggregate Persistence”
  • 16.30 – 16.45 : Closing Remarks (Mark Rittman)
Then directly after Brighton we’ve got the US Atlanta event, running the week after, Wednesday – Friday, with last year’s photos below: Us

Atlanta BI Forum 2014, Renaissance Mid-Town Hotel, Atlanta

Wednesday 14th May 2014 – Optional 1-Day Masterclass, and and Opening Drinks, Keynote and Dinner

  • 9.00-10.00 – Registration
  • 10.00 – 11.00 : Lars George Hadoop Masterclass Part 1
  • 11.00 – 11.15 : Morning Coffee
  • 11.15 – 12.15 : Lars George Hadoop Masterclass Part 2
  • 12.15 – 13.15 : Lunch
  • 13.15 – 14.15 : Lars George Hadoop Masterclass Part 3
  • 14.15 – 14.30 : Afternoon Tea/Coffee/Beers
  • 14.30 – 15.30 : Lars George Hadoop Masterclass Part 4
  • 16.00 – 18.00 : Registration and Drinks Reception
  • 18.00 – 19.00 : Oracle Keynote & Dinner

Thursday 15th May 2014

  • 08.45 – 09.00 : Opening Remarks Mark Rittman, Rittman Mead
  • 09.00 – 10.00 : Kevin McGinley : Adding 3rd Party Visualization to OBIEE
  • 10.00 – 10.30 : Morning Coffee
  • 10.30 – 11.30 : Richard Tomlinson : Endeca Information Discovery for Self-Service and Big Data
  • 11.30 – 12.30 : Omri Traub : Endeca and Big Data: A Vision for the Future
  • 12.30 – 13.30 : Lunch
  • 13.30 – 14.30 : Dan Vlamis : Capitalizing on Analytics in the Oracle Database in BI Applications
  • 14.30 – 15.30 : Susan Cheung : TimesTen In-Memory Database for Analytics – Best Practices and Use Cases
  • 15.30 – 15.45 : Afternoon Tea/Coffee/Beers
  • 15.45 – 16.45 : Christian Screen : Oracle BI Got MAD and You Should Be Happy
  • 18.00 – 19.00 : Special Guest Keynote : Maria Colgan : An introduction to the new Oracle Database In-Memory option
  • 19.00 – leave for dinner

Friday 16th May 2014

  • 09.00 – 10.00 : Patrick Rafferty : More Than Mashups – Advanced Visualizations and Data Discovery
  • 10.00 – 10.30 : Morning Coffee
  • 10.30 – 12. 30 : Matt Bedin : Analytic Applications and the Cloud
  • 12.30 – 13.30 : Lunch
  • 13.30 – 14.30 : Philippe Lions : What’s new on 2014 HY1 OBIEE SampleApp
  • 14.30 – 15.30 : Stewart Bryson : ExtremeBI: Agile, Real-Time BI with Oracle Business Intelligence, Oracle Data Integrator and Oracle GoldenGate
  • 15.30 – 16.00 : Afternoon Tea/Coffee/Beers
  • 16.00 – 17.00 : Wayne Van Sluys : Everything You Know about Oracle Essbase Tuning is Wrong or Outdated!
  • 17.00 – 17.15 : Closing Remarks (Mark Rittman)
Full details of the two events, including more on the Hadoop Masterclass with Cloudera’s Lars George, can be found on the BI Forum 2014 home page.

Categories: BI & Warehousing