Skip navigation.

BI & Warehousing

Previewing Four Sessions at the Atlanta Rittman Mead BI Forum 2015

Rittman Mead Consulting - Thu, 2015-04-09 08:00

In a post earlier this week I previewed three sessions at the upcoming Brighton Rittman Mead BI Forum 2015; in this post I’m going to look at four particularly interesting sessions at the Atlanta Rittman Mead BI Forum 2015 event running the week after Brighton, on May 13th-15th 2015 at the Renaissance Atlanta Midtown Hotel, Atlanta GA. As well as an optional one-day masterclass on big data development by myself and Jordan Meyer on the 13th, the main event itself has keynotes and product update sessions from Oracle’s BI product management team, a data visualisation challenge and a guest talk by John Foreman, author of the book “Data Smart” and Chief Data Scientist at Mailchimp; in terms of the main sessions though there are four that I’m particularly interested in, starting with one by a speaker new to the BI Forum, Qualogy’s Hasso Schaap, who’ll be talking to us about their use of Oracle’s new BI Cloud Service in his session “Developing strategic analytics applications on OBICS PaaS”

NewImage

“In this session I’ll tell how we use the Oracle BI Cloud Service in our development plans for a strategic analytics application. Focussing on Strategic HR Planning there’s so much you can do with your data that we decided to put it in a packaged app. I will discuss the important parts of the development process and show how we fixed the issues we came up with. Developing in the BI Cloud is different and expectations are also different. 
As an example there’s the part of prediction. How do we predict based on data in the BI Cloud and what are other possibilities. With prediction we were able to tell our customers a different story. A story that was different than before using old-school tools and techniques. In this session I will uncover some of the most appreciated functionality and will happily elaborate on the story behind ‘The present, the future, development and scenario planning’.”

My second featured session is by someone very-well known to previous BI Forum attendees, and to the wider Oracle BI+DW community: Stewart Bryson. Stewart of course used to head-up Rittman Mead in the US and then went-on to become our first Chief Innovation Officer, before leaving to start his own company Red Pill Analytics with Kevin McGinley, another old friend of Rittman Mead and the BI Forum. We’re very pleased to have both Stewart and Kevin delivering sessions at the Atlanta BI Forum, and for Stewart’s session he’s talking about something very close to his heart – “Supercharging BI Delivery with Continuous Integration”:

NewImage

“One of the things I’ve never understood about the lifecycle features in most BI tools is why the designers feel the need to roll their own source control and DevOps features. Instead of focusing on deeper integration with tools and processes that exist in the other 90% of development paradigms, BI vendors instead start with a clean palette and create something completely siloed and desperately alone. 
In this presentation, we’ll take a look at how some of these other development paradigms approach DevOps — paying perhaps the closest attention to the world of Java development and other JVM languages. We’ll see how approaches such as continuous integration and continuous delivery play a part in rapid, iterative delivery, and how we can apply some of those approaches to the world of OBIEE development.”

My third session is by another speaker new to the BI Forum, but someone who’s well-known in the BI and data warehousing world and who I met in-person for the first time at last year’s Oracle Openworld: Sumit Sarkar. Sumit works for Progress Software, makers of the DataDirect ODBC drivers that powers OBIEE’s connection to Hadoop, for example, as well as connectors to MongoDB, Salesforce, Oracle RightNow and Eloqua, and as he’ll explain in his session “Make sense of NoSQL data using OBIEE”, NoSQL databases : 

NewImage

“NoSQL databases have stormed the top 10 db-engines rankings with MongoDB at #4 and Cassandra at #8.  It’s inevitable that these NoSQL databases, storing unstructured data without a standard query language, will have BI requirements for unarmed OBIEE teams.  Not even a complete Oracle stack can save you with the release of Oracle NoSQL.This will be the first session of its kind to tackle standards based NoSQL connectivity.  
So join me at BI Forum ’15  to take control of NoSQL data with your RPD and expand big data skills and thought leadership within your organization.  Learn how organizations are using SQL access to NoSQL databases for integration across existing business intelligence platforms. We’ll talk about common challenges and gotchas that shops are facing when exposing unstructured NoSQL data to OBIEE.  It can get out of hand pretty quickly otherwise …”

My final selection is from CERN, the European Organization for Nuclear Research and home of course of the Large Hadron Collider (and who announced on April 1st the first unequivocal evidence for The Force, almost upstaging our announcement of Oracle E-Business Suite being ported to Hadoop and MongoDB). There’s several session at both the Brighton and Atlanta BI Forums on Oracle’s new Big Data Discovery tool, and in this session CERN’s Manuel Martin Marquez will be talking about their work in this area, in his session “Governed Information Discovery: Data-driven decisions for more efficient operations at CERN”

NewImage

“The European Centre for Nuclear Research, CERN, is running the world’s largest and more powerful particle accelerator complex in order to shed light on how the Universe works and which are its main building blocks.  CERN’s particle accelerators and detectors infrastructure is comprehensively heterogeneous and complex. A number of critical subsystems, which represent cutting-edge technology in several engineering fields, need to be considered: cryogenics, power converters, magnet protection, etc. The historical monitoring and control data derived from these systems has persisted mainly using Oracle database technologies, but also other sorts of data formats such as JSOM, XML and plain text files. All of these must be integrated and combined in order to provide a full picture and better understanding of the overall status of the accelerator complex.
Therefore, a key challenge is to facilitate easy access to, flexible interaction with, and dynamic visualization of heterogeneous data from different sources and domains.  In our session, we will share our experience with a potential solution for finding insights within our data, Oracle Endeca Data Discovery. In addition, we will feature practical examples relating to future possibilities for improving the control and monitoring of CERN’s accelerator complex, optimization results for accelerator operations and a demo of the implemented solution”

Full agenda details on the Atlanta Rittman Mead BI Forum 2015 can be found on the event homepage, along with details of the optional one-day masterclass on Delivering the Oracle Information Management and Big Data Reference Architecture, and our first-ever Data Visualisation Bake-Off, using the DonorsChoose.org dataset. Registration is now open and the event takes place between May 13th and 15th April 2015, at the Renaissance Atlanta Midtown Hotel, Atlanta GA. 

 
Categories: BI & Warehousing

OBIEE and the Oracle Database 12c In-Memory Option – Article and New Services from Rittman Mead

Rittman Mead Consulting - Wed, 2015-04-08 07:00

NewImageMy latest Business Intelligence column for Oracle Magazine is on the In-Memory Option for Oracle Database 12c, and using it to speed-up dashboards and reports in OBIEE11g. In the article I go through the basics of the in-memory option explaining how it adds in-memory columnar processing to the standard Oracle Database Enterprise Edition, and then I take the Airline Flight Delays dashboard in the OBIEE11g SampleApp v406 and enable it for in-memory processing; for queries that go against the base detail-level tables in the Oracle Database queries run roughly twice-as-fast, whilst queries going against aggregate tables return data instantaneously, all without any need to alter the underlying database schema or migrate to a new database engine.

To my mind there are two main groups of customers who could benefit from moving to Oracle Database 12c and the In-Memory Option; customers who are currently using earlier version of Oracle Database with regular disk-stored row-based storage (or indeed customers using other databases, for example Teradata or Microsoft SQL Server), and customers who’ve implemented Oracle Exalytics with TimesTen as the in-memory database cache, and who would now like to take advantage of the additional features and lower cost-of-ownership with in-memory processing directly in the Oracle Database.

If you already have licenses for Oracle Database Enterprise Edition you’ll only need to add the additional In-Memory Option license to enable these new features, whereas if you’re using TimesTen on Exalytics there are special terms for customers who wish to trade-in those licenses for Oracle Database Enterprise Edition and In-Memory Options licenses – and once you’ve moved over to Oracle Database 12c and the In-Memory Option, you’ll benefit from:

  • Access to full Oracle SQL including advanced analytics functions, aggregation and transformation capabilities
  • Moving to Oracle’s strategic database technology for in-memory analytics and Exalytics in-memory aggregate caching
  • Compatibility with existing Oracle Database functionality, making it easy to move reporting databases into Exalytics and enable for in-memory analytics
  • Columnar processing, an alternative to traditional row-based storage that’s better suited to BI-style filtering against attribute values
  • Full compatibility with all reporting and ETL tools that support access to Oracle Database data sources
  • Additional optimisations around aggregation, table joining and other BI-style queries
  • Faster dashboards, more interactive reporting and less maintenance compared to maintaining TimesTen
NewImage

To get you started with either of these options, Rittman Mead have created two packages for customers looking to adopt Oracle Database 12c In-Memory Option; one for customers on traditional data warehouse databases looking to use In-memory for the first time, and another for customers using Exalytics who want to migrate from Oracle TimesTen. Full details of these two packages are now up on our website at our Supercharge OBIEE with the Oracle 12c In-Memory Option web page, or you can contact us at enquiries@rittmanmead.com to talk through your particular requirements in more detail.

Categories: BI & Warehousing

Take Part in the BI Survey 15, and Have Your Voice Heard!

Rittman Mead Consulting - Wed, 2015-04-08 04:00

Long-term readers of this blog will know that we’ve supported for many years the BI Survey, an independent survey of BI tools customers and implementors. Rittman Mead have no (financial or other) interest in the BI Survey or its organisers, but we like the way it gathers in detailed data on which tools work best and when, and it’s been a useful set of data for companies such as Oracle when they prioritise their investment in tools such as OBIEE, Essbase and the BI Applications.

NewImage

Here’s the invite text and link to the survey:

“We would like to invite you to participate in The BI Survey 15, the world’s largest annual survey of business intelligence (BI) users.
BARC’s annual survey gathers input from thousands of organizations to analyze their buying decisions, implementation cycles and the benefits they achieve from using BI software.
As a participant, you will:

  • Receive a summary of the results from the survey when it is published
  • Be entered into a draw to win one of ten $50 Amazon vouchers
  • Ensure that your experiences are included in the final analyses

Click here to take part
Business and technical users, as well as vendors and consultants, are all welcome to participate.
You will be able to answer questions on your usage of a BI product from any vendor and your experience with your service provider.
The BI Survey 15 is strictly vendor-independent: It is not sponsored by any vendor and the results are analyzed and published independently. 
Your answers will be used anonymously and your personal details will not be passed on to software vendors or other third parties.
The BI Survey 15 should take about 20 minutes to complete. For further information, please contact Adrian Wyszogrodzki at BARC (awyszogrodzki@barc.de). 

Thank you in advance for taking part.”

 

Categories: BI & Warehousing

Previewing Three Sessions at the Brighton Rittman Mead BI Forum 2015

Rittman Mead Consulting - Tue, 2015-04-07 03:00

As well as a one-day masterclass by myself and Jordan Meyer, a data visualisation challenge, keynotes and product update sessions from Oracle and our guest speaker from the Oracle Data Warehouse Global Leaders Program, the Brighton Rittman Mead BI Forum 2015 has of course a fantastic set of speakers and sessions on a wide range of topics around Oracle BI, data warehousing and big data. In this blog post I’m going to highlight three sessions at the Brighton BI Forum, and later in the week I’ll be doing the same with three sessions from the Atlanta event – so let’s start with a speaker who’s new to the BI Forum but very well-known to the UK OBIEE community – Steve Devine.

Steve is one of the most experienced OBIEE practitioners in the Europe, recently with Edenbrook / Hitachi Consulting, Claremont and now working with Altius in the UK. In his session at the Brighton BI Forum 2015 Steve’s going to talk to us about what’s probably the hottest topic around OBIEE at the moment in his session “The Art and Science of Creating Effective Data Visualisations”. Over to Steve:

NewImage

“These days, news publications and the internet are packed with eye-catching data visualisations and infographics – the New York Times, the Guardian or Information Is Beautiful to name but a few. Yet the scientists and statisticians tell us that everything could be a bar chart, and that nothing should ever be a pie chart! How do we make sense of these seemingly disparate, contrasting views?
My presentation provides an introduction on how graphic design principles complement the more science orientated aspects of data viz design. It will focus on a simple-to-apply design framework that brings all of these principles together, enabling you to create visualisations that have the right balance of aesthetics and function. By example, I’ll apply this framework to traditional BI scenarios such as operational and exploratory dashboards, as well as new areas that BI tools are just beginning to support such as commentary and storytelling. I’ll also look at how well Oracle’s BI tools address today’s data visualisation needs, and how they compare to the competition.”

On the topic of data visualisation, I’m also very pleased to have Daniel Adams from Rittman Mead’s US office coming over to the Brighton BI Forum to talk about effective dashboard design. Daniel’s been working with Rittman Mead clients in the US and Europe for the past year helping them apply data visualisation and dashboard design best practices to their dashboards and reports, and he’ll be sharing some of his methods and approaches in his session “User Experience First: Guided information and attractive dashboard design”:

NewImage

“Most front end OBI developers can give users exactly what they ask for, but will that lead to insightful dashboards that improve data culture and escalate the user xperience? One the biggest  mistakes I see as a designer, are dashboards that are a cluttered collection of tables and graphs. Poorly designed dashboards can prevent users from adopting a BI implementation, diminishing the ROI. 
In this session, attendees will learn to design dashboards that inform, instruct, and lead to smart discussion and decisions.  This includes learning to visualize data to convey meaning, implementing attractive visual design, and creating a layout that leads users through a target rich environment. We will walk through a series of “before” and “after” dashboards that demonstrate the difference between meeting a requirement, and using proven UX and UI design concepts to make OBIEE dashboards insightful and enjoyable to use.”

Finally, someone I’m very pleased to have over to the Brighton BI Forum for the first time is Gerd Aiglstorfer. I first met Gerd at an Oracle event in Germany several years ago, and since then I’ve noticed several of his blogs and the launch of his Oracle University Expert Sessions on OBIEE development, administration and RPD modelling. Gerd is one of Europe’s premier experts in OBIEE and Oracle BI, and for his inaugural BI Forum presentation he’ll be deep-diving into one of the most complex topics around repository modeling in his session “Driving OBIEE Join Semantics on Multi Star Queries as User”:

NewImage

“Multi star queries are a very useful and powerful functionality of OBIEE. But when I examine reports developed by business users or report developers I often find some misunderstandings on how it is working and queries are build by OBIEE. As additionally the execution strategy in OBIEE 11.1.1.7 has changed to generate SQL of multi star queries I had the idea to introduce the topic at the BI Forum. Thus, it’s a quite interesting topic to go into technical details of OBIEE SQL generator engine.
I’ll introduce how users can drive join semantics on common fields in multi star queries. You will get a full picture of the functionality for a better understanding of how report creation affects SQL generation. I recognized some inconsistencies during my tests of the new OBIEE 11.1.1.7 logic in January 2014. I will demonstrate the issues and would like to discuss if you would say: “It’s a defect within the SQL generator engine” – as I do.”

Full agenda details on the Brighton Rittman Mead BI Forum 2015 can be found on the event homepage, along with details of the optional one-day masterclass on Delivering the Oracle Information Management and Big Data Reference Architecture, and our first-ever Data Visualisation Bake-Off, using the DonorsChoose.org dataset.

Categories: BI & Warehousing

Realtime BI Show with Kevin and Stewart – BI Forum 2015 Special!

Rittman Mead Consulting - Mon, 2015-04-06 08:00

Jordan Meyer and I were very pleased to be invited onto the Realtime BI Show podcast last week, run by Kevin McGinley and Stewart Bryson, to talk about the upcoming Rittman Mead BI Forum running in Brighton and Atlanta in May 2015. Stewart and Kevin are of course speaking at the Atlanta BI Forum event on May 13th-15th 2015 at the Renaissance Atlanta Midtown Hotel, Atlanta, and in the podcast we talk about the one-day masterclass that Jordan and I are running, some of the sessions at the event, and the rise of big data and data discovery within the Oracle BI+DW industry.

Full details on the two BI Forum 2015 events can be found on the event homepage, along with details of the optional one-day masterclass on Delivering the Oracle Information Management and Big Data Reference Architecture, the guest speakers and the inaugural Data Visualization Challenge. Registration is now open and can be done online using the two links below.

  • Rittman Mead BI Forum 2015, Brighton –  May 6th – 8th 2015 

We’ve also set up a special discount code for listeners to the Realtime BI Show, with 10%-off both registration and the masterclass fee for both the Brighton and Atlanta events – use code RTBI10 on the Eventbrite registration forms to qualify.

Categories: BI & Warehousing

Market Segmentation and Data Mining

Dylan's BI Notes - Fri, 2015-04-03 18:30
1. Market Segmentation in the academic world  Market Segmentation is part of marketing process. It is described in Philip Kotler’s book as part of the step of defining the market strategy.The idea is to segment consumer market by some variables and to divide the market into different segments. Selecting the segments for your products is the result of the […]
Categories: BI & Warehousing

How to integrate Oracle Data Mining and Oracle BI

Dylan's BI Notes - Fri, 2015-04-03 01:25
Here are the various ways that we can use Data Mining inside BI. We can build Advanced Analytics applications. The scoring function can be called within the opaque view or with EVALUATE function. The opaque view method may provide a better flexibility since multiple columns can be exposed. Here is an old Oracle white paper […]
Categories: BI & Warehousing

Data Mining Scoring Development Process

Dylan Wan - Thu, 2015-04-02 23:39

I think that the process of building a data mining scoring engine is similar to develop an application.


We have the requirement analysis, functional design, technical design, coding, testing, deployment, etc. phases.



Categories: BI & Warehousing

Data Mining Scoring Engine Development Process

Dylan's BI Notes - Wed, 2015-04-01 20:29
Here is how I view data mining: The target is to build a scoring engine. It accepts an input and produces the output. The development process can be separate as Requirement, Design, Coding, and Deploy.  Similar to typical software development phases.
Categories: BI & Warehousing

Analysing ODI performance with Flame Graphs

Rittman Mead Consulting - Wed, 2015-04-01 17:01

Flame Graphs are a visualisation that I learnt about through the excellent Linux systems performance work of Brendan Gregg, and saw Luca Canali talk about recently at UKOUG Tech 14. They’re a brilliant way of summarising extremely dense information in a way from which the main components accounting for the most time can be identified. I was recently doing some analysis for a client on their ODI batch runtime and I thought it would be a good idea to try them out. Load Plans can have complex hierarchies to them and working out which main sections account for what time can be tricky, as can following a load plan step through to a session and on to a session step and its constituent parts.

A flame graph is made up of the “stack trace” on the y-axis, and the amount of time spent in each on the x-axis. This is different from most other standard visualisations where the x-axis represents the passage of time, and instead summarises the data at multiple levels of the stack trace hierarchy. The “stack trace” in this case with ODI is Load plan -> load plan step (load plan step […]) -> session -> session step -> task. It’s as easy to see the overall run time as it is a load plan step part way down, as a constituent task of a session step. And what’s more, flame graphs look nice! This may seem a flimsy reason for using them on their own, but it’s a bonus over trawling through dull tables of data alone.

Looking at the flame graph above (taken from a demo BI Apps implementation) it’s nice and easy to see that the Warehouse Load Phase accounts for c.75% of the time, within which the two areas accounting for most time are AP and AR balances. This is from literally a single glance at one graphic. Flame Graphs are built as SVGs which enables them to be interactive (here’s an example). Clicking on any of the stack trace boxes drills into that area, so for the tasks taking less time (and so displaying less text) this is useful to see the specifics. Here’s the GL balance load in detail, showing how long the row inserts take in proportion to the index build:

 

Creating the flame graph is simple. You just need a stack trace that is semi-colon separated, followed by a space-delimited counter value at the end. A bit of recursive SQL magic with the SNP_ tables (helpfully documented by Oracle here) gives us this kind of output file with one line for every task executed and its duration:

;Start_Load_Plan;Global_Variable_Refresh;Source_Extract_Phase;1_General;2_General_PRE-SDE;3_PRE-SDE_Day;Finalize_Day;Finalize_W_DAY_D;CREATE_INDEXES;Create_Indexes_:_W_DAY_D_2/2;EXEC_TABLE_MAINT_PROC;TABLE_MAINT_PROC;Create Indexes 3
[...]

which you then run through the Flame Graph tool:

cat /tmp/odi.out |~/git/FlameGraph/flamegraph.pl --title "EBSVISION FIN HR_21_20141021_223159 / 2014-10-24 15:41:42" > /tmp/odi-flame-graph.svg

Simply load the resulting SVG into a web browser such as Chrome, and you’re done. Here’s an example that you can download and try out.

Categories: BI & Warehousing

Why Use Data Mining with Data Warehouse?

Dylan's BI Notes - Wed, 2015-04-01 13:42
1. Use the data warehouse data as the training set Data Mining requires the training data to train the learning algorithm.  The data warehoucing processes provide the following services: Consolidate the data from different sources Aggregate the data: for example, we have the order return transactions but the training data can be # of returns […]
Categories: BI & Warehousing

Announcing Oracle E-Business Suite for Hadoop and MongoDB

Rittman Mead Consulting - Tue, 2015-03-31 23:50

Rittman Mead are very pleased today to announce our special edition of Oracle E-Business Suite R12 running on Apache Hadoop and MongoDB, for customers looking for the ultimate in scalability, flexible data storage and lower cost-of-ownership. Powered by Hadoop technologies such as Apache Hive, HDFS and MapReduce, optional reference data storage in MongoDB and reporting provided by Apache Pig, we think this represents the ultimate platform for large deployments of Oracle’s premier ERP suite.

NewImage

In this special edition of Oracle E-Business Suite R12, we’ve replaced the Oracle Database storage engine with Hadoop, MapReduce and Apache Hive, with MapReduce providing the data processing engine and Apache Hive providing a SQL layer integrated with Oracle Forms. We’ve replaced Oracle Workflow with Apache Oozie and MongoDB as the optional web-scale NoSQL database for document and reference data storage, freeing you from the size limitations of relational databases, the hassles of referential integrity and restrictions of defined schemas. Developer access is provided through Apache Hue, or you can write your own Java MapReduce and or JavaScript MongoDB API programs to extend E-Business Suite’s functionality. Best of all, there’s no need for expensive DBAs as developers handle all data-modeling themselves (with MongoDB’s collections automatically adapting to new data schemas), and HDFS’s three-node replication removes the need for complicated backup & recovery procedures.

NewImage

We’ve also brought Oracle Reports into the 21st century by replacing it with Apache Pig, a high-level abstraction language for Hadoop that automatically compiles your “Pig Latin” programs into MapReduce code, and allows you to bring in data from Facebook, Twitter to combine with your main EBS dataset stored in Hive and MongoDB.

Pigebs

On the longer-term roadmap, features and enhancements we’re planning include:

  • Loosening the current INSERT-only restriction to allow UPDATES, DELETEs and full ACID semantics once HIVE-5317 is implemented. 
  • Adding MongoDB’s new write-reliabiity and durability so that data is always saved when EBS writes it to the underlying MongoDB collection
  • Reducing the current 5-30 minute response times to less than a minute by moving to Tez or Apache Spark
  • Providing integration with Oracle Discoverer 9iAS to delight end-users, and provide ad-hoc reporting truly at the speed-of-thought

For more details on our special Oracle E-Business Suite for Hadoop edition, contact us at enquiries@rittmanmead.com – but please note we’re only accepting new customers for today, April 1st 2015. 

Categories: BI & Warehousing

Business Intelligence Cloud Service – Data Modeler

Dylan's BI Notes - Mon, 2015-03-30 19:27
These video shows how the data are loaded to BI Cloud Service and are modeled as dimensions and facts. We do not need to use the BI admin tool to create model. For BICS, we can create model using the browser.
Categories: BI & Warehousing

Oracle GoldenGate, MySQL and Flume

Rittman Mead Consulting - Mon, 2015-03-30 13:05

Back in September Mark blogged about Oracle GoldenGate (OGG) and HDFS . In this short followup post I’m going to look at configuring the OGG Big Data Adapter for Flume, to trickle feed blog posts and comments from our site to HDFS. If you haven’t done so already, I strongly recommend you read through Mark’s previous post, as it explains in detail how the OGG BD Adapter works.  Just like Hive and HDFS, Flume isn’t a fully-supported target so we will use Oracle GoldenGate for Java Adapter user exits to achieve what we want.

What we need to do now is

  1. Configure our MySQL database to be fit for duty for GoldenGate.
  2. Install and configure Oracle GoldenGate for MySQL on our DB server
  3. Create a new OGG Extract and Trail files for the database tables we want to feed to Flume
  4. Configure a Flume Agent on our Cloudera cluster to ‘sink’ to HDFS
  5. Create and configure the OGG Java adapter for Flume
  6. Create External Tables in Hive to expose the HDFS files to SQL access

OGG and Flume

Setting up the MySQL Database Source Capture

The MySQL database I will use for this example contains blog posts, comments etc from our website. We now want to use Oracle GoldenGate to capture new blog post and our readers’ comments and feed this information in to the Hadoop cluster we have running in the Rittman Mead Labs, along with other feeds, such as Twitter and activity logs.

The database has to be configured to user binary logging and also we need to ensure that the socket file can be found in /tmp/mysql.socket. You can find the details for this in the documentation. Also we need to make sure that the tables we want to extract from are using the InnoDB engine and not the default MyISAM one. The engine can easily be changed by issuing

alter table wp_mysql.wp_posts engine=InnoDB;

Assuming we already have installed OGG for MySQL on /opt/oracle/OGG/ we can now go ahead and configure the Manager process and the Extract for our tables. The tables we are interested in are

wp_mysql.wp_posts
wp_mysql.wp_comments
wp_mysql.wp_users
wp_mysql.wp_terms
wp_mysql.wp_term_taxonomy

First configure the manager

-bash-4.1$ cat dirprm/mgr.prm 
PORT 7809
PURGEOLDEXTRACTS /opt/oracle/OGG/dirdat/*, USECHECKPOINTS

Now configure the Extract to capture changes made to the tables we are interested in

-bash-4.1$ cat dirprm/mysql.prm 
EXTRACT mysql
SOURCEDB wp_mysql, USERID root, PASSWORD password
discardfile /opt/oracle/OGG/dirrpt/FLUME.dsc, purge
EXTTRAIL /opt/oracle/OGG/dirdat/et
GETUPDATEBEFORES
TRANLOGOPTIONS ALTLOGDEST /var/lib/mysql/localhost-bin.index
TABLE wp_mysql.wp_comments;
TABLE wp_mysql.wp_posts;
TABLE wp_mysql.wp_users;
TABLE wp_mysql.wp_terms;
TABLE wp_mysql.wp_term_taxonomy;

We should now be able to create the extract and start the process, as with a normal extract.

ggsci>add extract mysql, tranlog, begin now
ggsci>add exttrail ./dirdat/et, extract mysql
ggsci>start extract mysql
ggsci>info mysql
ggsci>view report mysql

We will also have to generate metadata to describe the table structures in the MySQL database. This file will be used by the Flume adapter to map columns and data types to the Avro format.

-bash-4.1$ cat dirprm/defgen.prm 
-- To generate trail source-definitions for GG v11.2 Adapters, use GG 11.2 defgen,
-- or use GG 12.1.x defgen with "format 11.2" definition format.
-- If using GG 12.1.x as a source for GG 11.2 adapters, also generate format 11.2 trails.

-- UserId logger, Password password
SOURCEDB wp_mysql, USERID root, PASSWORD password

DefsFile dirdef/wp.def

TABLE wp_mysql.wp_comments;
TABLE wp_mysql.wp_posts;
TABLE wp_mysql.wp_users;
TABLE wp_mysql.wp_terms;
TABLE wp_mysql.wp_term_taxonomy;
-bash-4.1$ ./defgen PARAMFILE dirprm/defgen.prm 

***********************************************************************
        Oracle GoldenGate Table Definition Generator for MySQL
      Version 12.1.2.1.0 OGGCORE_12.1.2.1.0_PLATFORMS_140920.0203
...

***********************************************************************
**            Running with the following parameters                  **
***********************************************************************
SOURCEDB wp_mysql, USERID root, PASSWORD ******
DefsFile dirdef/wp.def
TABLE wp_mysql.wp_comments;
Retrieving definition for wp_mysql.wp_comments.
TABLE wp_mysql.wp_posts;
Retrieving definition for wp_mysql.wp_posts.
TABLE wp_mysql.wp_users;
Retrieving definition for wp_mysql.wp_users.
TABLE wp_mysql.wp_terms;
Retrieving definition for wp_mysql.wp_terms.
TABLE wp_mysql.wp_term_taxonomy;
Retrieving definition for wp_mysql.wp_term_taxonomy.


Definitions generated for 5 tables in dirdef/wp.def.

Setting up the OGG Java Adapter for Flume

The OGG Java Adapter for Flume will use the EXTTRAIL created earlier as a source, pack the data up and feed to the cluster Flume Agent, using Avro and RPC. The Flume Adapter thus needs to know

  • Where is the OGG EXTTRAIL to read from
  • How to treat the incoming data and operations (e.g. Insert, Update, Delete)
  • Where to send the Avro messages to

First we create a parameter file for the Flume Adapter

-bash-4.1$ cat dirprm/flume.prm
EXTRACT flume
SETENV ( GGS_USEREXIT_CONF = "dirprm/flume.props")
CUSEREXIT libggjava_ue.so CUSEREXIT PASSTHRU INCLUDEUPDATEBEFORES
GETUPDATEBEFORES
NOCOMPRESSUPDATES
SOURCEDEFS ./dirdef/wp.def
DISCARDFILE ./dirrpt/flume.dsc, purge

TABLE wp_mysql.wp_comments;
TABLE wp_mysql.wp_posts;
TABLE wp_mysql.wp_users;
TABLE wp_mysql.wp_terms;
TABLE wp_mysql.wp_term_taxonomy;

There are two things to note here

  • The OGG Java Adapter User Exit is configured in a file called flume.props
  • The source tables’ structures are defined in wp.def

The flume.props file is a ‘standard’ User Exit config file

-bash-4.1$ cat dirprm/flume.props 
gg.handlerlist=ggflume

gg.handler.ggflume.type=com.goldengate.delivery.handler.flume.FlumeHandler
gg.handler.ggflume.host=bd5node1.rittmandev.com
gg.handler.ggflume.port=4545

gg.handler.ggflume.rpcType=avro
gg.handler.ggflume.delimiter=;
gg.handler.ggflume.mode=tx
gg.handler.ggflume.includeOpType=true
# Indicates if the operation timestamp should be included as part of output in the delimited separated values
# true - Operation timestamp will be included in the output
# false - Operation timestamp will not be included in the output
# Default :- true
gg.handler.ggflume.includeOpTimestamp=true

# Optional properties to use the transaction grouping functionality
#gg.handler.ggflume.maxGroupSize=1000
#gg.handler.ggflume.minGroupSize=1000

### native library config ###
goldengate.userexit.nochkpt=TRUE
goldengate.userexit.timestamp=utc
goldengate.log.logname=cuserexit
goldengate.log.level=INFO
goldengate.log.tofile=true
goldengate.userexit.writers=javawriter

gg.report.time=30sec
gg.classpath=AdapterExamples/big-data/flume/target/flume-lib/*

javawriter.stats.full=TRUE
javawriter.stats.display=TRUE
javawriter.bootoptions=-Xmx32m -Xms32m -Djava.class.path=ggjava/ggjava.jar -Dlog4j.configuration=log4j.properties

Some points of interest here are

  • The Flume agent we will send our data to is running on port 4545 on host bd5node1.rittmandev.com
  • We want each record to be prefixed with I(nsert), U(pdated) or D(delete)
  • We want each record to be postfixed with a timestamp of the transaction date
  • The Java class com.goldengate.delivery.handler.flume.FlumeHandler will do the actual work. (The curios reader can view the code in /opt/oracle/OGG/AdapterExamples/big-data/flume/src/main/java/com/goldengate/delivery/handler/flume/FlumeHandler.java)

Before starting up the OGG Flume, let’s first make sure that the Flume agent on bd5node1 is configure to receive our Avro message (Source) and also what to do with the data (Sink)

a1.channels = c1
a1.sources = r1
a1.sinks = k2
a1.channels.c1.type = memory
a1.sources.r1.channels = c1 
a1.sources.r1.type = avro 
a1.sources.r1.bind = bda5node1
a1.sources.r1.port = 4545
a1.sinks.k2.type = hdfs
a1.sinks.k2.channel = c1
a1.sinks.k2.hdfs.path = /user/flume/gg/%{SCHEMA_NAME}/%{TABLE_NAME} 
a1.sinks.k2.hdfs.filePrefix = %{TABLE_NAME}_ 
a1.sinks.k2.hdfs.writeFormat=Writable 
a1.sinks.k2.hdfs.rollInterval=0
a1.sinks.k2.hdfs.hdfs.rollSize=1048576
a1.sinks.k2.hdfs.rollCount=0
a1.sinks.k2.hdfs.batchSize=100 
a1.sinks.k2.hdfs.fileType=DataStream

Here we note that

  • The agent’s source (inbound data stream) is to run on port 4545 and to use avro
  • The agent’s sink will write to HDFS and store the files  in /user/flume/gg/%{SCHEMA_NAME}/%{TABLE_NAME}
  • The HDFS files will be rolled over every 1Mb (1048576 bytes)

We are now ready to head back to the webserver that runs the MySQL database and start the Flume extract, that will feed all committed MySQL transactions against our selected tables to the Flume Agent on the cluster, which in turn will write the data to HDFS

-bash-4.1$ export LD_LIBRARY_PATH=/usr/lib/jvm/jdk1.7.0_55/jre/lib/amd64/server
-bash-4.1$ export JAVA_HOME=/usr/lib/jvm/jdk1.7.0_55/
-bash-4.1$ ./ggsci
ggsci>add extract flume, exttrailsource ./dirdat/et 
ggsci>start flume
ggsci>info flume
EXTRACT    FLUME     Last Started 2015-03-29 17:51   Status RUNNING
Checkpoint Lag       00:00:00 (updated 00:00:06 ago)
Process ID           24331
Log Read Checkpoint  File /opt/oracle/OGG/dirdat/et000008
                     2015-03-29 17:51:45.000000  RBA 7742

If I now submit this blogpost I should see the results showing up our Hadoop cluster in the Rittman Mead Labs.

[oracle@bda5node1 ~]$ hadoop fs -ls /user/flume/gg/wp_mysql/wp_posts
-rw-r--r--   3 flume  flume   3030 2015-03-30 16:40 /user/flume/gg/wp_mysql/wp_posts/wp_posts_.1427729981456

We can quickly create an externally organized table in Hive to view the results with SQL

hive> CREATE EXTERNAL TABLE wp_posts(
     op string, 
 ID                     int,
 post_author            int,
 post_date              String,
 post_date_gmt          String,
 post_content           String,
 post_title             String,
 post_excerpt           String,
 post_status            String,
 comment_status         String,
 ping_status            String,
 post_password          String,
 post_name              String,
 to_ping                String,
 pinged                 String,
 post_modified          String,
 post_modified_gmt      String,
 post_content_filtered  String,
 post_parent            int,
 guid                   String,
 menu_order             int,
 post_type              String,
 post_mime_type         String,
 comment_count          int,
     op_timestamp timestamp
  )
 COMMENT 'External table ontop of GG Flume sink, landed in hdfs'
 ROW FORMAT DELIMITED FIELDS TERMINATED BY ';'
 STORED AS TEXTFILE
 LOCATION '/user/flume/gg/wp_mysql/wp_posts/';

hive> select post_title from gg_flume.wp_posts where op='I' and id=22112;
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1427647277272_0017, Tracking URL = http://bda5node1.rittmandev.com:8088/proxy/application_1427647277272_0017/
Kill Command = /opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/hadoop/bin/hadoop job  -kill job_1427647277272_0017
Hadoop job information for Stage-1: number of mappers: 2; number of reducers: 0
2015-03-30 16:51:17,715 Stage-1 map = 0%,  reduce = 0%
2015-03-30 16:51:32,363 Stage-1 map = 50%,  reduce = 0%, Cumulative CPU 1.88 sec
2015-03-30 16:51:33,422 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 3.38 sec
MapReduce Total cumulative CPU time: 3 seconds 380 msec
Ended Job = job_1427647277272_0017
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 2   Cumulative CPU: 3.38 sec   HDFS Read: 3207 HDFS Write: 35 SUCCESS
Total MapReduce CPU Time Spent: 3 seconds 380 msec
OK
Oracle GoldenGate, MySQL and Flume
Time taken: 55.613 seconds, Fetched: 1 row(s)

Please leave a comment and you’ll be contributing to an OGG Flume!

Categories: BI & Warehousing

Dimensional Modeling

Dylan's BI Notes - Mon, 2015-03-30 11:06
Moved the content into a page – Dimensional Modeling
Categories: BI & Warehousing

Lifting the Lid on OBIEE Internals with Linux Diagnostics Tools

Rittman Mead Consulting - Fri, 2015-03-27 08:44

There comes the point in any sufficiently complex or difficult problem diagnosis that the log files in OBIEE alone are not sufficient for building up a complete picture of what’s going on. Even with the debug/trace data that Presentation Services and other components can be configured precisely to write you’re sometimes just left having to guess what is going on inside the black box of each of the OBIEE system components.

Here we’re going to look at a couple of examples of lifting the lid just a little bit further on what OBIEE is up to, using standard Linux diagnostic tools. These are not something to be reaching for in the first instance, but more getting on to a last resort. Almost always the problem is simpler than you’ll think, and leaping for an network trace or stack trace is going to be missing the wood for the trees.

Diagnostics in action

At a client recently they had a problem with a custom skin deployment on a clustered (scaled-out) OBIEE deployment. Amongst other things the skin was setting the default palette for charts (viewui/chart/dvt-graph-skin.xml), and they were seeing only 50% of chart executions pick up the custom palette – the other 50% used the default. If either entire node was shut down, things were fine, but otherwise it was a 50:50 chance what the colours would be. Most odd….

When you configure a custom skin in OBIEE you should be setting CustomerResourcePhysicalPath in instanceconfig.xml, along with CustomerResourceVirtualPath. Both these are necessary so that Presentation Services knows:

  1. Logical – How to generate URLs for content requested by the user’s browser (eg logos, CSS files, etc).
  2. Physical – How to physically reference files on the file system that are read by OBIEE itself (eg XML files, language files)

The way the client had configured their custom skin was that it was on storage local to each node, and in a node-specific path, something like this:

  • /data/instance1/s_custom/
  • /data/instance2/s_custom/

Writing out the details in hindsight always makes a problem’s root cause a lot more obvious, but at the time this was a tricky problem. Let’s start with the basics. Java Host is responsible for rendering charts, and for some reason, it was not reading the custom colour scheme file from the custom skin correctly. Presentation Services uses all the available Java Hosts in a cluster to request charts, presumably on some kind of round-robin basis. An analysis request on NODE01 has a 50:50 chance of getting its chart rendered on Java Host on NODE01 or Java Host on NODE02:


Turned all the log files up to 11 didn’t yield anything useful. For some reason half the time Java Host would just “ignore” the custom skin. Shutting down each node proved that in isolation the custom skin configuration on each node was definitely correct, because then the colours started working just fine. It was only when multiple Java Hosts across the nodes were active that there was a problem.

How Java Host picks up the custom skin is entirely undocumented, and I ended up figuring out that it must get the path to the skin as part of the chart request from Presentation Services. Since Presentation Services on NODE01 has been configured with a CustomerResourcePhysicalPath of /data/instance1/s_custom/, Java Host on NODE02 would fail to find this path (since on NODE02 the skin is located at /data/instance2/s_custom/) and so fall back on the default. This was my hypothesis that I then proved by making the path available for each skin available on each node (symlink, or using a standard path would also have worked, eg /data/shared/s_custom, or even better, a shared mount point), and from there everything worked just fine.

But a hypothesis and successful resolution alone wasn’t entirely enough. Sure the client was happy, but there was that little itch, that unknown “black box” system that appeared to behave how I had deduced, but could we know for sure?

tcpdump – network analysis

All of the OBIEE components communicate with each other and the outside world over TCP. When Presentation Services wants a chart rendered it does so by sending a request to Java Host – over TCP. Using the tcpdump tool we can see that in action, and inspect what gets sent:

$ sudo tcpdump -i venet0 -i lo -nnA 'port 9810'

The -A flag capture the ASCII representation of the packet; use -X if you want ASCII and hex. Port 9810 is the Java Host listen port.

The output looks like this:


You’ll note that in this case it’s intra-node communication, i.e. src and dest IP addresses are the same. The port for Java Host (9810) is clear, and we can verify that the src port (38566) is Presentation Services with the -p (process) flag of netstat:

$ sudo netstat -pn |grep 38566
tcp        0      0 192.168.10.152:38566        192.168.10.152:9810         ESTABLISHED 5893/sawserver

So now if you look in a bit more detail at the footer of the request from Presentation Services that tcpdump captured you’ll see loud and clear (relatively) the custom skin path with the graph customisation file:


Proof that the Presentation Services is indeed telling Java Host where to go and look for the custom attributes (including colours)! NB this is on a test environment, so that paths vary from the /data/instance... example above)

strace – system call analysis

So tcpdump gives us the smoking gun, but can we find the corpse as well? Sure we can! strace is a tool for tracing system calls, and a fantastically powerful one, but here’s a very simple example:

$strace -o /tmp/obijh1_strace.log -f -p $(pgrep -f obijh1)

-o means to write it to file, -f follows child processes as well, and -p passes the process id that strace should attach to. Have set the trace running I run my chart, and then go and pick through my trace file.

We know it’s the dvt-graph-skin.xml file that Java Host should be reading to pick up the custom colours, so let’s search for that:


Well there we go – Java Host went to go and look for the skin in the path that it was given by Presentation Services, and couldn’t find it. From there it’ll fall back on the product defaults.

Right Tool, Right Job

As as I said at the top of this article, these diagnostic tools are not the kind of things you’d be using day to day. Understanding their output is not always easy and it’s probably easy to do more harm than good with false assumption about what a trace is telling you. But, in the right situations, they are great for really finding out what is going on under the covers of OBIEE.

If you want to find out more about this kind of thing, this page is a great starting point.

Categories: BI & Warehousing

New Oracle Big Data Quick-Start Packages from Rittman Mead

Rittman Mead Consulting - Wed, 2015-03-25 05:00

Many organisations using Oracle’s business intelligence and data warehousing tools are now looking to extend their capabilities using “big data” technologies. Customers running their data warehouses on Oracle Databases are now looking to use Hadoop to extend their storage capacity whilst offloading initial data loading and ETL to this complementary platform; other customers are using Hadoop and Oracle’s Big Data Appliance to add new capabilities around unstructured and sensor data analysis, all at considerably lower-cost than traditional database storage.

NewImage

In addition, as data and analytics technologies and capabilities have evolved, there has never been a better opportunity to reach further into your data to exploit more value. Big Data platforms, Data Science methods and data discovery technologies make it possible to unlock the power of your data and put it in the hands of your  executives and team members – but what is it worth to you? What’s the value to your organisation of exploring deeper int the data you have, and how do you show return?

Many organisations have begin to explore Big Data technologies to understand where they can exploit value and extend their existing analytics platforms, but what’s the business case? The good news is, using current platforms, and following architectures like the Oracle Information Management and Big Reference Architecture written in conjunction with Rittman Mead, the foundation is in place to unlock a range of growth opportunities. Finding new value in existing data, predictive analytics, data discovery, reducing the cost of data storage, ETL offloading are all starter business cases proven to return value quickly.

NewImage

To help you start on the Oracle big data journey, Rittman Mead have put together two quick-start packages focuses on the most popular Oracle customer use-cases;

If this sounds like something you or your organization might be interested in, take a look at our new Quick Start Oracle Big Data and Big Data Discovery packages from Rittman Mead home page, or drop me an email at mark.rittman@rittmanmead.com and I’ll let you know how we can help.

Categories: BI & Warehousing

Source Dependent Extract and Source Independent Load

Dylan's BI Notes - Tue, 2015-03-24 17:40
Typical data warehousing ETL process involves Extract, Transform, and Load. The concept of Source Dependent Extract (SDE) and Source Independent Load (SIL) are unique part of the BI Apps ETL since BI Apps has a universal data warehouse. Since the staging schema are designed according to the universal data warehouse design, the logic of loading data […]
Categories: BI & Warehousing

RM BI Forum 2015 : Justification Letters for Employers

Rittman Mead Consulting - Tue, 2015-03-24 03:48

(Thanks to Christian Berg @Nephentur for the suggestion, and acknowledgements to ODTUG KScope for the original idea – our favourite conference after the BI Forum)

The Rittman Mead BI Forum 2015 promises to be our best BI Forum yet, with fantastic speakers at each event, keynotes and guest speakers from Oracle and John Foreman, author of the bestselling book “Data Smart”, a data visualisation challenge and an optional one-day masterclass on delivering Oracle’s new Information Management and Big Data reference architecture by Rittman Mead’s Mark Rittman and Jordan Meyer. Uniquely amongst Oracle BI events we keep the numbers attending very limited and run just a single stream at each event, so everyone takes part in the same sessions and gets to meet all the attendees and speakers over the three days.

Sometimes though, management within organizations require special justification for team members to attend events like these, and to help you put your case together and get across the unique education and networking benefits of the Rittman Mead BI Forum, we’ve prepared justification letters for you to complete with your details, one each for the Brighton and Atlanta events. Click on the links below to download sample justification letters for the Brighton BI Forum running on May 6th-8th 2015, and the Atlanta one running the week after on May 13th-15th 2015:

Full details on the BI Forum 2015 agenda and how to register can be found on the Rittman Mead BI Forum 2015 home page, with registration open until the weekend before each event – hurry though as attendee numbers are strictly limited.

Categories: BI & Warehousing

OBIEE nqcmd Tidbits

Rittman Mead Consulting - Mon, 2015-03-23 21:42

nqcmd is the ODBC command line tool that always has, and hopefully always will, shipped with OBIEE. It enables you to manually fire queries directly at the BI Server, rather than through the usual way of Presentation Services generating Logical SQL and sending it to BI Server. This can be useful in several cases:

  1. Automated cache purging, by sending one of the SAPurge[…] ODBC commands to the BI Server, usually done as part of a script
  2. Automated execution of Logical SQL, often done to support testing scenarios
  3. Load Testing the BI Server (via a magic undocumented switch, SA_NQCMD_ADVANCED)
  4. Manual interogation of the BI Server – if you want to poke and prod nqsserver without launching a web browser, nqcmd is your friend :)

In using nqcmd there’re a couple of things I want to demonstrate here that I find useful but haven’t seen discussed [in detail] elsewhere.

Query Log via nqcmd

All BI Server queries run with a LOGLEVEL>=1 will write some log details to nqquery.log. The usual route to view this is either on the server directly itself, transferring it off with a tool such as WinSCP, or through the Administration page of OBIEE. Another option that is available is from nqcmd itself. You need to do two things:

  1. Set the environment variable SA_NQCMD_ADVANCED to Yes
  2. Include the command line arguments -ShowQueryLog -H when you invoke nqcmd. I don’t know what -H does – it’s just specified as being required for this to work.

Here’s a simple example in action:

[oracle@demo ~]$ export SA_NQCMD_ADVANCED=Yes
[oracle@demo ~]$ nqcmd -d AnalyticsWeb -u prodney -p Admin123 -ShowQueryLog -H

-------------------------------------------------------------------------------
          Oracle BI ODBC Client
          Copyright (c) 1997-2013 Oracle Corporation, All rights reserved
-------------------------------------------------------------------------------



Connection open with info:
[0][State: 01000] [DataDirect][ODBC lib] Application's WCHAR type must be UTF16, because odbc driver's unicode type is UTF16

        [T]able info
        [C]olumn info
        [D]ata type info
        [F]oreign keys info
        [P]rimary key info
        [K]ey statistics info
        [S]pecial columns info
        [Q]uery statement
Select Option: Q

Give SQL Statement: SET VARIABLE LOGLEVEL=1:SELECT "A - Sample Sales"."Base Facts"."1- Revenue" s_1 FROM "A - Sample Sales"
SET VARIABLE LOGLEVEL=1:SELECT "A - Sample Sales"."Base Facts"."1- Revenue" s_1 FROM "A - Sample Sales"
-----------------------
s_1
-----------------------
70000000.00
-----------------------
Row count: 1
-----------------------
[2015-03-21T16:36:31.000+00:00] [OracleBIServerComponent] [TRACE:1] [USER-0] [] [ecid: 0054Sw944KmFw000jzwkno0003ac0000rl,0] [tid: 56660700] [requestid: 201f0002] [sessionid: 201f0000] [username: prodney] ###
########################################### [[
-------------------- SQL Request, logical request hash:
d2294415
SET VARIABLE LOGLEVEL=1:SELECT "A - Sample Sales"."Base Facts"."1- Revenue" s_1 FROM "A - Sample Sales"

]]
[2015-03-21T16:36:31.000+00:00] [OracleBIServerComponent] [TRACE:1] [USER-34] [] [ecid: 0054Sw94mRzFw000jzwkno0003ac0000ro,0] [tid: 56660700] [requestid: 201f0002] [sessionid: 201f0000] [username: prodney] -------------------- Query Status: Successful Completion [[

]]
[2015-03-21T16:36:31.000+00:00] [OracleBIServerComponent] [TRACE:1] [USER-28] [] [ecid: 0054Sw94mRzFw000jzwkno0003ac0000ro,0] [tid: 56660700] [requestid: 201f0002] [sessionid: 201f0000] [username: prodney] -------------------- Physical query response time 0 (seconds), id <<333971>> [[

]]

]]
[2015-03-21T16:36:31.000+00:00] [OracleBIServerComponent] [TRACE:1] [USER-29] [] [ecid: 0054Sw94mRzFw000jzwkno0003ac0000ro,0] [tid: 56660700] [requestid: 201f0002] [sessionid: 201f0000] [username: prodney] -------------------- Physical Query Summary Stats: Number of physical queries 1, Cumulative time 0, DB-connect time 0 (seconds) [[

]]
[2015-03-21T16:36:31.000+00:00] [OracleBIServerComponent] [TRACE:1] [USER-33] [] [ecid: 0054Sw94mRzFw000jzwkno0003ac0000ro,0] [tid: 56660700] [requestid: 201f0002] [sessionid: 201f0000] [username: prodney] -------------------- Logical Query Summary Stats: Elapsed time 0, Response time 0, Compilation time 0 (seconds) [[

]]

Neat! But so what? Well, I see two uses straight away:

  1. In some situations you may not have access to the filesystem of the server on which the BI Server is running. For example, as a consultant I’ve been to clients where I’m given the Administration Tool client installation only. If I want to debug an RPD that I’m developing I’ll usually want to poke around in nqquery.log to see quite what physical SQL is being generated – and now I can.
  2. There was a discussion on the EMG mailing list recently about generating Physical SQL without executing it on the database. I’m going to discuss this in the next section of this article, and to do the analysis for this rapidly I’m using the inline query log.
Generating Physical SQL for OBIEE without Executing it – SKIP_PHYSICAL_QUERY_EXEC

OBIEE generates the Physical SQL that it runs against the database dynamically, at runtime. It takes the Logical request (“Logical SQL”), runs it through the RPD and generates one or more “Physical SQL” statements to be executed on the database as required to pull back the necessary data. A question arose recently on the EMG mailing list as to whether it is possible to get the Physical SQL – without executing it. You can imagine the benefits of this (namely, regression testing) since executing the database query each time is typically going to be expensive in machine resource and time consuming.

In SampleApp v406 there is a /home/oracle/scripts/PhysicalSQLGenerator, which does two things. First off it generates the Logical SQL for a given analysis, presumably using the generateReportSQL web service. It then takes that and runs it through nqcmd, scraping the nqquery.log for the resulting Physical SQL. In all of this no database queries get run. Very cool. But what’s the “secret sauce” at play here – can we distill it down in order to use it ourselves?

First, let’s look at how the SampleApp script does it. It sets some additional request variables in the Logical SQL:

[oracle@demo PhysicalSQLGenerator]$ cat lsql-out-dir/q1.lsql
SET VARIABLE SKIP_PHYSICAL_QUERY_EXEC=1, LOGLEVEL=2, DISABLE_CACHE_HIT=1, DISABLE_CACHE_SEED=1, QUERY_SRC_CD='SampleApp-PSQLGEN', SAW_SRC_PATH='/users/prodney/folder/request variable example':SELECT
   0 s_0,
   "A - Sample Sales"."Base Facts"."1- Revenue" s_1
FROM "A - Sample Sales"
ORDER BY 1
FETCH FIRST 5000001 ROWS ONLY
;

And if we extract the relevant part out of the bash script we can see that it also uses a couple of extra command line arguments (-q -NoFetch) when invoking nqcmd:

nqcmd -q -NoFetch -d AnalyticsWeb -u weblogic -p Admin123 -s lsql-out-dir/q1.lsql

When it’s run we check nqquery.log and lo-and-behold we get this: (edited for brevity)

------------------- Sending query to database named 01 - Sample App Data (ORCL) (id: <<69923>>), connection pool named Sample Relational Connection, logical request hash dd4fb54f, physical request hash 8d6f36
3d: [[
WITH
SAWITH0 AS (select sum(T42442.Revenue) as c1
from
     BISAMPLE.SAMP_REVENUE_FA2 T42442 /* F21 Rev. (Aggregate 2) */ )
select D1.c1 as c1, D1.c2 as c2 from ( select distinct 0 as c1,
     D1.c1 as c2
from
     SAWITH0 D1 ) D1 where rownum <= 5000001

]]

Query Status: Successful Completion [[

Rows 0, bytes 24 retrieved from database query id: <<69923>> Simulation Gateway 

Physical query response time 0 (seconds), id <<69923>> Simulation Gateway

Whilst the log says it is “Sending query to database” it does no such thing, and the “Simulation Gateway” is the giveaway clue. Proof that it doesn’t connect to the database? I shut the database down, and it still worked just fine. Crude, yes, but effective.

I’ll intersperse here the little trick that I mentioned in the first part of this article : -ShowQueryLog. It’s tedious switching back and forth between nqcmd and the nqquery.log when doing this kind of testing, so let’s do it all as one:

export SA_NQCMD_ADVANCED=Yes
nqcmd -H -ShowQueryLog -q -NoFetch -d AnalyticsWeb -u weblogic -p Admin123 -s lsql-out-dir/q1.lsql

Unfortunately it looks like -ShowQueryLog is mutually exclusive to -q and -NoFetch since it doesn’t return anything, even though the nqquery.log did get additional entries. But that’s fine, since by removing these two flags in order to get -ShowQueryLog to work we’re whittling down what is actually needed to generate the physical SQL on its own without database execution. Here’s the nqcmd, showing the query log inline and showing still the “Simulation Gateway” indicative of no physical query execution:

[oracle@demo PhysicalSQLGenerator]$ export SA_NQCMD_ADVANCED=Yes
[oracle@demo PhysicalSQLGenerator]$ nqcmd -H -ShowQueryLog -d AnalyticsWeb -u weblogic -p Admin123 -s lsql-out-dir/q1.lsql

-------------------------------------------------------------------------------
          Oracle BI ODBC Client
          Copyright (c) 1997-2013 Oracle Corporation, All rights reserved
-------------------------------------------------------------------------------

[...]

------------------------------------
s_0          s_1
------------------------------------
------------------------------------
Row count: 0
------------------------------------
[2015-03-23T05:52:57.000+00:00] [OracleBIServerComponent] [TRACE:2] [USER-0] [] [ecid: 0054Ut7AJ33Fw000jzwkno0005UZ00005Q,0] [tid: 8f194700] [requestid: 8a1e0002] [sessionid: 8a1e0000] [username: weblogic] ############################################## [[
-------------------- SQL Request, logical request hash:
dd4fb54f
SET VARIABLE SKIP_PHYSICAL_QUERY_EXEC=1, LOGLEVEL=2, DISABLE_CACHE_HIT=1, DISABLE_CACHE_SEED=1, QUERY_SRC_CD='SampleApp-PSQLGEN', SAW_SRC_PATH='/users/prodney/folder/request variable example':SELECT
   0 s_0,
   "A - Sample Sales"."Base Facts"."1- Revenue" s_1
FROM "A - Sample Sales"
ORDER BY 1
FETCH FIRST 5000001 ROWS ONLY



[...]

[2015-03-23T05:52:57.000+00:00] [OracleBIServerComponent] [TRACE:2] [USER-18] [] [ecid: 0054Ut7AK5DFw000jzwkno0005UZ00005S,0] [tid: 8f194700] [requestid: 8a1e0002] [sessionid: 8a1e0000] [username: weblogic] -------------------- Sending query to database named 01 - Sample App Data (ORCL) (id: <<70983>>), connection pool named Sample Relational Connection, logical request hash dd4fb54f, physical request hash 8d6f363d: [[
WITH
SAWITH0 AS (select sum(T42442.Revenue) as c1
from
     BISAMPLE.SAMP_REVENUE_FA2 T42442 /* F21 Rev. (Aggregate 2) */ )
select D1.c1 as c1, D1.c2 as c2 from ( select distinct 0 as c1,
     D1.c1 as c2
from
     SAWITH0 D1 ) D1 where rownum <= 5000001

]]
[2015-03-23T05:52:57.000+00:00] [OracleBIServerComponent] [TRACE:2] [USER-34] [] [ecid: 0054Ut7AYi0Fw000jzwkno0005UZ00005T,0] [tid: 8f194700] [requestid: 8a1e0002] [sessionid: 8a1e0000] [username: weblogic] -------------------- Query Status: Successful Completion [[

]]
[2015-03-23T05:52:57.000+00:00] [OracleBIServerComponent] [TRACE:2] [USER-26] [] [ecid: 0054Ut7AYi0Fw000jzwkno0005UZ00005T,0] [tid: 8f194700] [requestid: 8a1e0002] [sessionid: 8a1e0000] [username: weblogic] -------------------- Rows 0, bytes 24 retrieved from database query id: <<70983>> Simulation Gateway [[

]]
[2015-03-23T05:52:57.000+00:00] [OracleBIServerComponent] [TRACE:2] [USER-28] [] [ecid: 0054Ut7AYi0Fw000jzwkno0005UZ00005T,0] [tid: 8f194700] [requestid: 8a1e0002] [sessionid: 8a1e0000] [username: weblogic] -------------------- Physical query response time 0 (seconds), id <<70983>> Simulation Gateway [[

[...]

It’s clear that the “-q -Nofetch” parameters used in nqcmd don’t have an effect on whether the physical query is executed (they’re to do with whether nqcmd as an ODBC client pulls back and displays the data you ask for). It’s actually just a single request variable that does the job, and it goes under the rather obvious name of SKIP_PHYSICAL_QUERY_EXEC. When set to 1 it generates all the necessary physical SQL but doesn’t execute it, and the presence of “Simulation Gateway” in the log signals this.

Categories: BI & Warehousing