Skip navigation.

BI & Warehousing

How Engaged Are Your OBIEE Users?

Rittman Mead Consulting - Thu, 2015-06-25 10:58

Following on from Jon’s blog post “User Engagement: Why does it matter?”, I would like to take this one step further by talking about measurement. At Rittman Mead we believe that if you can’t measure it, you can’t improve it. So how do you measure user engagement?

Metrics

User engagement for OBIEE is like most web based products or services:

  • both have users who access the product or service and then take actions.
  • users of both use it repeatedly if they get value from those actions.

A lot of thought has gone into measuring the customer experience and engagement for web based products and services. Borrowing some of these concepts will help us understand how to measure user engagement for BI solutions.

We look at three metrics:

  • Frequency of use
  • Recency of use
  • Reach of the system
Usage Tracking Data

OBIEE offers visibility of what its users are doing through its Usage Tracking feature, we can use this to drive our metrics.

Figure 1

UT UE LDM

As we can see from Figure 1, the usage tracking data can support our three metrics.

Frequency of use
  • Number of times a user or group of users visit in a specific period (Day / Month / Year)
  • Number of times a dashboard / report is accessed in a specific period.
  • How are these measures changing over time?
Recency of use
  • How recently was a report / dashboard used by relevant user groups?
  • What are the average days between use of each report / dashboard by relevant use group?
  • Number of dashboards / reports used or not used in a specific period (Day / Month / Year)
  • Number of users that have used or not used OBIEE in a specific period (Day / Month / Year)
  • How are these changing over time?
Reach of the system
  • Overall number of users that have used or not used OBIEE. This can be further broken down by user groups.
  • How is it changing over time?
User engagement KPI perspective

We have compared BI solutions to web-based products and services earlier in this post. Let’s look at some popular KPIs that many web-based products use to measure engagement and how they can be used to measure OBIEE engagement.

  • Stickiness: Generally defined as the amount of time spent at a site over a given period.
  • Daily Active Users (DAU): Number of unique users active in a day
  • Monthly Active Users (MAU): Number if unique users active in a month.

DAU and MAU are also used as a ratio (DAU / MAU) to give an approximation of utility.

The R&D division of Rittman Mead has developed the Rittman Mead User Engagement Toolkit, a set of tools and reports to capture and visualise user engagement metrics. The example charts given below have been developed using the R programming language.

Figure 2 – DAU over time with a trailing 30-day average (Red line)

 MAU trailing 30 day average V0.3

Figure 3 – Forecast DAU/MAU for 30 days after the data was generated

MAU

What Can You Do With These Insights?

Recall that Jon’s blog post points out the folowing drivers of user engagement:

  • User interface and user experience
  • Quality, relevance, and confidence in data
  • Performance
  • Ability to use the system
  • Accessibility – is the system available in the right way, at the right time?

There are several actions you can take to influence the drivers as a result of monitoring the aforementioned metrics.

  • Identify users or groups that are not using the system as much as they used to. Understand their concerns and address the user engagement drivers that are causing this.
  • Verify usage of any significant enhancement to the BI solution over time.
  • Analyse one of the key drivers, performance, from usage data.
  • Determine peak usage to project future hardware needs.
Conclusion

User engagement is the best way users can get value from their OBIEE systems. Measuring user engagement on an ongoing basis is important and can be monitored with the use of some standard metrics and KPIs.

Future blog posts in this series will address some of the key drivers behind user engagement in addition to providing an overview of the Rittman Mead User Engagement Toolkit.

If you are interested in hearing more about User Engagement please sign up to our mailing list below.

#mc_embed_signup{background:#fff; clear:left; font:14px Helvetica,Arial,sans-serif; } /* Add your own MailChimp form style overrides in your site stylesheet or in this style block. We recommend moving this block and the preceding CSS link to the HEAD of your HTML file. */

Other posts in this series
Categories: BI & Warehousing

Rittman Mead at ODTUG KScope’15, Hollywood Florida

Rittman Mead Consulting - Fri, 2015-06-19 05:50

NewImage

ODTUG KScope’15 is running in Hollywood, Florida next week and Rittman Mead are running a number of sessions during the week on OBIEE, Essbase, ODI and Big Data. I’ve personally been attending ODTUG KScope (or “Kaleidoscope”, as it used to be known) for many years now and it’s the best developer-centric conference we go to, coupled with amazing venues and a great community atmosphere.

Sessions we’re running over the week include:

  • Gianni Ceresa : 2-in-1: RPD Magic and Hyperion Planning “Adapter”
  • Jerome : Manage Your Oracle Data Integrator Development Lifecycle
  • Michael Rainey : Practical Tips for Oracle Business Intelligence Applications 11g Implementations
  • Michael Rainey : GoldenGate and Oracle Data Integrator: A Perfect Match
  • Mark Rittman : Bringing Oracle Big Data SQL to OBIEE and ODI
  • Mark Rittman : End-to-End Hadoop Development Using OBIEE, ODI, and Oracle Big Data
  • Mark Rittman : Thursday Deep Dive – Business Intelligence: Bringing Oracle Tools to Big Data
  • Andy Rocha & Pete Tamisin : OBIEE Can Help You Achieve Your GOOOOOOOOOALS!

We’ll also be taking part in various “Lunch and Learn” sessions, community and ACE/ACE Director events, and you can also talk to us about our new OBIEE “User Engagement” initiative and how you can get involved as an early adopter. Details and agenda for KScope’15 can be found on the event website, and if you’re coming we’ll look forward to seeing you in sunny Hollywood, Florida!

Categories: BI & Warehousing

Replicating Hive Data Into Oracle BI Cloud Service for Visual Analyzer using BICS Data Sync

Rittman Mead Consulting - Thu, 2015-06-18 15:26

In yesterday’s post on using Oracle Big Data Discovery with Oracle Visual Analyzer in Oracle BI Cloud Service, I said mid-way through the article that I had to copy the Hadoop data into BI Cloud Service so that Visual Analyzer could use it; at present Oracle Visual Analyzer is only available as part of Oracle BI Cloud Service (BICS) so at some point the data prepared by Big Data Discovery had to be moved into BICS so that Visual Analyzer (VA) could access it. In the future once Visual Analyzer is available on-premise as part of Oracle Business Intelligence 12c we’ll be able to connect the on-premise RPD directly to Hadoop via the Cloudera Impala ODBC driver, but for now to get this early access to VA features we’re going to have to copy the data up to BICS and report on it from there. So how does this work?

With this second release of BICS there are actually a number of ways to get on-premise data up into BICS’s accompanying database service:

  • As before, you can export data as CSV or an Excel Spreadsheet, and upload it manually into BICS using the Data Load feature (the approach I took in this recent Oracle Magazine article)
  • You can use SQL*Developer to SFTP “carts” of Oracle database data up into BICS, where it’ll then be unpacked and used to create Oracle tables in the accompanying database
  • You can now also connect BICS to the full Oracle Database-as-a-Service, a full database rather than a single schema that also provides a SQL*Net connection that ETL tools can connect to, for example ODI or Informatica
  • And there’s now a new utility called “Data Sync” that we’ll use in this example, to replicate tables or files up into BICS’s database store with options for incremental refresh, drop-and-reload and so forth

In our case the situation is a bit more complicated in that our data sits in a Hadoop cluster, as Hive tables that we’re accessing through the Cloudera Impala MPP engine. OBIEE 11.1.1.9 can actually connect directly to Impala and if we were just using Answers and Dashboards we wouldn’t have any more work to do, but as we’re using VA through BICS and BICS can’t access on-premise data sources, we need some way of copying the data up into BICS so VA can access it. Again, there’s many ways you can get data out of Hive on Hadoop and into databases and files, but the approach I took is this:

  1. First export each of the Hive tables I accessed through the on-premise RPD into CSV files, in my case using the Hue web-based user interface in CDH5
  2. Then use the Data Sync to upload the contents of those CSV files to BICS’s database store, selecting the correct Oracle datatypes for each of the columns
  3. Do any modeling on those tables to add any sequences or keys that I’m going to need when working with BICS’s more simplistic RPD modeller
  4. Then create a replica (or as close to replica) RPD model in BICS to support the work I’m going to want to do with VA

Again, there are also other ways to do this – another option is to just lift-and-shift the current 11.1.1.9 RPD up into BICS, and replicate the Hive/CSV data into Oracle Database-as-a-Service and then repoint the uploaded RPD to this service, but I haven’t got a DBaaS instance to-hand and I think it’d be more useful to replicate using BICS and recreate the RPD manually – as that’s what most customers using BICS will end-up doing. So the first step then is to export the Hive data out into CSV files using Hue, by first running a SELECT * FROM … for each table, then using the menu option to export the query results to a CSV file on my workstation.

NewImage

Then it’s a case of setting up BICS Data Sync to first connect to my BICS cloud instance, and then selecting one-by-one the CSV files that I’ll be uploading into BICS via this tool.

NewImage

Of course anyone who’s been around Oracle BI for a while will recognise Data Sync as being built on the DAC, the ETL orchestration tool that came with the 7.9.x releases of BI Apps and worked in-conjunction with Informatica PowerCenter to load data into the BI Apps data warehouse. The DAC is actually a pretty flexible tool (disclaimer – I know the development PMs at Redwood Shores and think they’re a pretty cool bunch of people) and more recently it gained the ability to replicate BI Apps DW data into TimesTen for use with Exalytics, so it’s pluggable architecture and active development team meant it provided a useful platform to deliver something in-between BICS’s ApEX data uploader and fully-fledged ODI loading into Oracle DBaaS. The downside of using something built on the DAC is that the DAC had some UI “quirks”, but equally the upside is that if you know the DAC, you can pretty much pick up Data Sync and guess how it works.

As part of uploading each CSV file, I also get to sample the file contents and confirm the datatype choices that Data Sync has suggested; these can of course be amended, and if I’m bringing in data from Oracle, for example, I wouldn’t need to go through such an involved process. 

NewImage

Then it’s a case of uploading the data. In my case one of the tables uploaded OK first time, but an issue I hit was where Hive tables had numeric columns containing NULLs that got exported as the text “NULL” and then caused the load to fail when trying to insert them into numeric columns. Again, a bit of knowledge of how the DAC worked came in useful as I went through the log files and then restarted parts of the load – in the end I replaced the word NULL with an empty string and the loads then succeeded. 

NewImage

Now the data should be uploaded to BICS, you can check out the new tables and their contents either from within BICSs Data Modeller function, or from within the ApEx console that comes with BICS’s database part.

NewImage

One thing I did know I’d have to give some thought to was how to do the types of RPD modelling I’d done in the on-premise RPD, within the more constrained environment of the BICS data modeller. Looking back at the on-premise RPD I’ve made a lot of use of aliases to create fact and dimension versions of key log tables (posts, tweets) and multiple versions of the date dimensions, whereas in BICS you don’t get aliases but you can create database views. What was more worrying was that I’d used columns from the main webserver log table to populate both the main logical fact table and another dimension whilst still keeping a single source table as their physical source, but in BICS I’d have to create these two sources as views and then join them on a common key, which would be tricky as the log table in Hive didn’t have an obvious primary key. In the end I “cheated” a bit and created a derived copy of the incoming log file table with a sequence number added to it, so that I could then join both the derived fact table and dimension table on this synthetic unique key column.

NewImage

Now it’s a case of modelling out the various incoming tables uploaded via Data Sync into the facts and dimensions that the BICS data model will use; again something to be aware of is that each of these tables will need to join to its relevant dimensions or facts, so you need to leave the joining keys in the fact table rather than remove them as you’d do when creating logical fact tables in on-premise OBIEE.

NewImage

Tables that only perform one role, for example the IMP_RM_POSTS_VA table that contains details of all blog posts and web pages on our site, can be brought into the model as they are without creating views. For the second time when I add in the time dimension table, this time to create a time dimension role table for the Tweets fact table, I have to create a view over the table that performs a similar role to alias tables in on-premise OBIEE, and I’m then directed to create a fact or dimension object in the model from that view.

NewImage

Once this is all done, I end up with a basic BICS data model that’s starting to look like the one I had with the on-premise OBIEE install.

NewImage

Then finally, once I’d amended all the column names, brought in all of the additional columns and lookup tables to provide for example lists of Twitter user handles, I could then view the model in BICS’s Visual Analyzer and start produce data visualisation projects off of it.

NewImage

So – it’s admittedly a bit convoluted in the first iteration but once you’ve set up the BICS data model and the Data Sync upload process, you can use DataSync to refresh the replicated Hive table data in the BICS database instance and keep the two systems in-sync. As I said, OBIEE12c will come with Visual Analyzer as part of the on-premise install, but until then this is the way we link VA to Big Data Discovery on Hadoop to enable Visual Analyzer access to BDD’s enriched datasets.

Categories: BI & Warehousing

Introducing Rittman Mead’s New User Engagement Service

Rittman Mead Consulting - Thu, 2015-06-18 05:15

NewImage

Although you’ve been hearing a lot on the blog recently about extending Oracle BI&DW with Hadoop and big data technologies, another initiative Rittman Mead have been working on over the past couple of months is user engagement. All-too often developers create BI systems that meet all an organisation’s technical goals, but then fail to get adopted by users for reasons we can’t understand; all we know is that our brilliantly-architected system just doesn’t seem to engage users, and that’s just as fatal to a project as the ETL not working or the underlying technology being out-of-date.

Surveys have shown that adoption rates for new BI systems are often as low as 25%, because we don’t focus enough on the user experience (UX) or user interface, queries run too slow, the data or reports just aren’t relevant or the overall experience just isn’t up to the standard that internet users expect now. If you’re a BI manager or a CIO for your organisation it’s essential that you know whether the BI systems you’re providing for your users are actually being used, and it’s quite often the case that a BI system provided by the IT department is thought to be well-used by the end-users, but when you check the usage stats you’ll find that engagement has really fallen-off since the initial rollout of the system – and the worst thing is, you probably don’t really know why this is, all you know is that users aren’t happy and they’re now looking to implement some self-service tools that’ll break your data governance model.

To help you improve user engagement for your BI system and increase the return on your investment in Oracle technology, Rittman Mead are developing a User Engagement Service to address these issues along with a User Engagement Toolkit developed by the likes of Robin Moffatt, Jordan Meyer, Tom Underhill and other OBIEE and UX experts within Rittman Mead. Initial details of the service are on our Rittman Mead User Engagement Service homepage and the Rittman Mead User Engagement Service Datasheet, look-out for more information on these services over the coming days and weeks, and if you’re interested in getting-involved in our early-adopter program you can sign-up using the form below and we’ll get back to you shortly.

#mc_embed_signup{background:#fff; clear:left; font:14px Helvetica,Arial,sans-serif; } /* Add your own MailChimp form style overrides in your site stylesheet or in this style block. We recommend moving this block and the preceding CSS link to the HEAD of your HTML file. */

Categories: BI & Warehousing

Combining Oracle Big Data Discovery and Oracle Visual Analyzer on BICS

Rittman Mead Consulting - Wed, 2015-06-17 13:11

So now that Oracle Visual Analyzer is out as part of Oracle BI Cloud Service, and Visual Analyzer (VA) is due to ship on-premise as part of OBIEE12c sometime in the next twelve months, several of our customers have asked us if they need both VA and Oracle Big Data Discovery if they’re looking to analyse Hadoop data as part of a BI project. It’s an interesting question so I thought it’d be useful to go through my thoughts on how the two tools work together, when to use one, and when to use the other.

Taking our standard “big data” dataset of website log activity, Twitter mentions and page details from our WordPress blogging software, before Visual Analyzer came along the two usual ways we’d want to analyze these datasets is either a traditional BI metrics analysis-type scenario, and a data discovery/visualization scenario where we’re more interested in the content of the data rather than precise metrics. My half of the recent BI Forum 2015 Masterclass goes through these two scenarios in detail (presentation slides in PDF format here), and it’s Big Data Discovery that provides the more “Tableau”-type experience with fast point-and-click access to both datasets joined together on their common website page URL details.

NewImage

Now we have Visual Analyzer though, things get interesting; in my article on Visual Analyzer within BICS I showed a number of data visualisations that look pretty similar to what you’d get with Big Data Discovery, and when we have VA available on-site as part of OBIEE12c we’ll be able to connect it directly to Hadoop via Cloudera Impala, potentially analyzing the whole dataset rather than the (representative) sample that Big Data Discovery loads into its Endeca Server-based engine.

NewImage

So if the customer is looking to analyze data held in Hadoop and Visual Analyzer is available, where’s the value in Big Data Discovery (BDD)? To my mind there’s three areas where BDD goes beyond what VA can do, or helps you perform tasks that you’ll need to do before you can work with your data in VA:

  • The initial data discovery, preparation and cleansing that you’d otherwise have to do using HiveQL or an ETL tool such as ODI12c
  • Providing you with a high-level overview and landscape of your data, when that’s more important to you at the time than precise counts and drill-down analysis
  • Understanding how data joins together, and how best to use your datasets in terms of metrics, facts, dimensions and so forth

Taking the data preparation and cleansing part first, I’ve covered in several blogs over the past couple of years how tools such as ODI can be used to create formal, industrialized data pipelines to ingest, prepare and then summarise data coming into your Hadoop system, and how you can drop-down to languages such as HiveQL, Pig and Spark to code these data transformations yourself. In the case of my webserver log, twitter and page details datasets this work would include standardising URL formats across the three sources, geocoding the IP addresses in the access logs to derive the country and city for site visitors, turning dates and times in different formats into ones that work as Hive timestamps, and so forth. Doing this all using ODI and/or HiveQL can be a pretty technical task, so where BDD comes in useful even – if VA and an OBIEE RPD is the final destination for the data.

NewImage

Datasets that you transform and enrich in Big Data Discovery can be saved back to Hive as new Hive tables, or exported out as files for you to load into Oracle using SQL*Developer, or upload into BICS to use in Visual Analyzer. Where BDD then becomes useful is giving you a quick, easy to use overview of your dataset before you get into the serious business of defining facts, dimensions and aliases against these three Hive tables. The screenshots below show a couple of typical Big Data Discovery Studio data visualisation pages against the webserver logs dataset, and you can see how easy it is to create simple charts, tag clouds and maps against the data you’re working with – the aim being to give you an overview of the data you’re working with, help you understand its contents and “shape”, before moving further down the curation process and applying formal structures to the data.

NewImage

Where things get harder to do within Big Data Discovery is when more-and-more formatting, complex joining and “arranging” of the data is required; for example, BDD gives you a lot of flexibility in how you join datasets, but this flexibility can be confusing for end-users if they’re then presented with every possible variation of a three-table join rather than having the data presented to them as simple facts and dimensions. But this is how we’d really expect it – if you go back to the logical data architecture I went through in the blog post a while ago about the updated Oracle Information Management Reference Architecture, the trade-off in using schema-on-read data reservoirs is that this data, although quick and cheap to store, requires a lot more work to be done each time you access the data to get “value” from it.

NewImage

OBIEE, in contrast, makes you define your data structures in-full before you present data to end-users, dividing data in the three datasets into measures (for the fact tables) and attributes (for dimensions) and making it possible to add more dimension lookups (for a date dimension, for Twitter users in this case) and separate the overall set of data into more focused subject areas. Working with the dataset on the on-premise version of OBIEE first, the RPD that I created to present this data in a more formal, dimensional and hierarchical way to users looked like this:

NewImage

I can leave this RPD connected directly to the underlying Hive and Impala tables if I want to use just Answers and Dashboards, but for the time being I either need to export the underlying Hive tables into CSV files or into an Oracle Database before uploading into Visual Analyzer, but come OBIEE12c this should all be seamless. What users are then presented with when they go into Visual Analyzer is then something like this:

NewImage

Notice how the various attributes of interest are grouped into fact and dimension table folders, and there’s a simple means to add calculations, change the visualisation type and swap chart settings around. Note also that the count on the screen is the actual count of records in the full dataset, not the sample that BDD takes in order to provide an overview of values and distribution in the full dataset. Whilst it’s relatively easy to create a line chart, for example, to show tweets per user within BDD, using Visual Analyzer it’s just a case of double-clicking on the relevant measures and attributes on one side of the page, selecting and arranging the visualisation and applying any filters using dialog boxes and value-selectors – all much more familiar and obvious to BI users.

NewImage

Enrichment to the data that I’ve done in Big Data Discovery should in most cases be able to come through to Visual Analyzer; for example, I used Big Data Discovery’s text enrichment features to determine the sentiment of blog post titles, tweets and other commentary data, I could use the latitude and longitude values derived during the visitor IP address geocoding to plot site visitors on a map. Using the sentiment value derived from the post title, tweet contents and other textual data, I can create a chart of our most popular posts mentioned on Twitter and colour bars to show how positive, or negative, the comments about the post were.

NewImage

The only thing that Visual Analyzer can’t yet do that would be useful, is to be able to include more than one subject area in a project. To analyze the number of tweets and the number of page views for posts in a scatter chart, for example, I currently have to create a separate subject area that includes both sets of facts and dimensions, though I understand BICS on VA will have the ability to include multiple subject areas in a forthcoming release.

NewImage

So in summary, I’d say that Big Data Discovery, and Visual Analyzer as part of BI Cloud Service, are complementary tools rather than one being able to replace the other in a big data context. I find that Big Data Discovery is a great tool to initially understand, catalog and view at a high-level data sources going into VA, and then  to do some user-driven cleaning-up of the data, enhancing it and enriching it before committing it to the formal dimensional model that Visual Analyzer requires.

In its BICS guise there’s the additional step of having to export the Hadoop data out of your Big Data Appliance or other Hadoop cluster and upload it in the form of files using BICS’s data load or the new Data Sync utility, but when VA comes as part of OBIEE12c in the next twelve months you’ll be able to directly connect to the Hadoop cluster using Impala ODBC and analyse the data directly, in-place.

I’ll be covering more on BICS over the next few weeks, including how I got data from Hadoop into BICS using the new Data Sync utility.

Categories: BI & Warehousing

Can Better Visual Design Impact User Engagement?

Rittman Mead Consulting - Tue, 2015-06-16 04:44
Background

For every dashboard succinctly displaying key business metrics, there’s another that is a set of unconnected graphs which don’t provide any insight to its viewers.

In order for your users to get value from your business intelligence and analytics systems they need to be engaging, they need to tell a story.

As part of its User Engagement initiative Rittman Mead has created a User Engagement Service. A key part of this is a Visual Redesign process. Through this process, we review an organisation’s existing dashboards and reports and transform them into something meaningful and engaging.

This service focuses on the user interface and user experience; here we will use our expertise in data visualisation to deliver high value OBIEE dashboards.

The process starts by prioritising your dashboards and then, taking one at a time, rebuilds them. There are 3 key concepts that lie behind this process.

Create a guided structure of information

The layout of information on a dashboard should tell the user a story. This makes it much easier for data to be consumed because users can identify related data and instantly see what is relevant to them. If a user can consume the data they need easily, they’re more likely to come back for more. ‘The founder of modern management’, Peter Drucker said “If you can’t measure it, you can’t manage it.” When users are in touch with their data, they will be more engaged with the business.

visual-redesign-sm

To achieve this, first we must consider the audience. We need to know who will be consuming the dashboards and how they will be used. Then we can begin to create a design that will satisfy the users’ needs. Secondly, we need to think about what we exclude as much as what we include on the dashboard. There is limited space, so it needs to be used effectively. Only information that adds value should be displayed on a dashboard and everything should be there for a reason. Finally, we need to consider what questions the users want to answer and what decisions will be based on this. This will enable us to guide the user to the information that will help them take action.

Choose the right visuals

One common design mistake is overcrowding of dashboards. Dashboards often develop over time to become a jumbled array of graphs and tables with no consideration of the visual design.

The choice of graphs will determine how readable the information being displayed on a dashboard is. We constantly ask ourselves “What is the best graph for the data?” Understanding how different types of graphs answer different questions, allows us to make the best visual choices. This is a vital tool for communicating messages to the users and providing them with the ability to identify patterns and relationships in the data more efficiently.

Thoughtful use of colour

The use of colour is an effective way to draw attention to something, connect related objects and evoke users’ emotions. Thoughtful use of colour can have a big impact on user engagement. To be sure we choose the best colours throughout dashboard design, the key question we need to ask ourselves is, “how will these colours make the user feel?”

Like the charts themselves, every different colour used on a dashboard should be there for a reason. Intentional use of colour could determine how a user will feel whilst consuming the information being displayed to them. Bright, unnatural colours will alarm users and attract their attention. Cool colours will give a restful, calming feel to the user and are most effective for displaying sustained trends. Through taking into consideration the most effective way to use colour, we can work towards creating an attractive visual design, which is engaging and enjoyable to use.

Applying these 3 concepts through Rittman Mead’s visual redesign process, has proven to result in engaging OBIEE dashboards. Users are equipped to make the most out of their data, allowing them to make informative business decisions.

Rittman Mead’s Visual Redesign process is a key part Rittman Mead’s User Engagement Service, for more info see http://www.rittmanmead.com/user-engagement-service/.

If you are interested in hearing more about User Engagement please sign up to our mailing list below.

#mc_embed_signup{background:#fff; clear:left; font:14px Helvetica,Arial,sans-serif; } /* Add your own MailChimp form style overrides in your site stylesheet or in this style block. We recommend moving this block and the preceding CSS link to the HEAD of your HTML file. */


Categories: BI & Warehousing

Next Generation Outline Extractor 2.0.4.887 Released

Tim Tow - Sun, 2015-06-14 08:36
We recently released an updated version of the Next Generation Outline Extractor. This new version, 2.0.4.887, addresses three issues:


  • Fixed an issue where the username and password passed via the command line were improperly logged
  • Fixed an issue reading MaxL XML data sources when the alias or UDA contained xml encoded characters such as the ampersand (&) character.
  • Updated labels on the Input Source tab of the user interface to clarify their purpose.
Here is a screenshot showing the updated labeling.


Due to the architecture of the Oracle Essbase APIs, it is generally much faster to use the MaxL Outline XML extracts when processing an Essbase Outline extract.  The Next Generation Outline Extractor still uses the Essbase Java API during this extract, but it is able to minimize the number of calls.  The second option shown above, Extract and Process MaxL Outline XML, will automatically extract the Outline XML from the cube during the processing.  The third option shown, Use Previously Extracted MaxL Outline XML, uses (obviously) an Outline XML file that has already been extracted.
Thank you to everyone who reported issues or made suggestions as you help make this utility better!
Categories: BI & Warehousing

Practical Tips for Oracle BI Applications 11g Implementations

Rittman Mead Consulting - Mon, 2015-06-08 06:35

As with any product or technology, the more you use it the more you learn about the “right” way to do things. Some of my experiences implementing Oracle Business Intelligence Applications 11g have led me to compile a few tips that will improve the overall process for installation and configuration and make the application more maintainable in the future. You can find me at KScope15 in Hollywood, FL beginning June 21st, presenting this exact topic. In this post I want to give you a quick preview of a couple of the topics in my presentation.

Data Extract Type – Choose Wisely

Choosing how the data is extracted from the source and loaded to the data warehouse target is an important part of the overall ETL performance in Oracle BI Applications 11g. In BI Apps, there are three extract modes to choose from:

  • JDBC mode
    This default mode will use the generic Loading Knowledge Modules (LKM) in Oracle Data Integrator to extract the data from the source and stream it through the ODI Agent, then down to the target. The records are streamed through the agent to translate datatypes between heterogeneous data sources. That makes the JDBC mode useful only when the source database is non-Oracle (since the target for BI Apps will always be an Oracle database).
  • Database link mode
    If your source is Oracle, then the database link mode is the best option. This mode uses the database link functionality built-in to the Oracle database, allowing the source data to be extracted across this link. This eliminates the need for an additional translation of the data as occurs in the JDBC mode.
  • SDS mode
    This should really be called “GoldenGate mode”, but I’m sure Oracle wants to keep their options open. In this mode, Oracle GoldenGate is used to replicate source transactions to the target data warehouse in what is called a Source Dependent Data Store (SDS) schema. This SDS schema mimics the source schema(s), allowing the SDE process to extract from the DW local SDS schema rather than across the network to the actual source.
    If the use of GoldenGate is an option, it’s hands-down better than JDBC mode should you be extracting data from a non-Oracle source. (Have a look at my OTN ArchBeat 2-minute Tech Tip as I attempt beat the clock while discussing when to use GoldenGate with BI Apps!)

OBIA architecture

Let’s go into a bit more detail about using GoldenGate with BI Applications. Because the SDS is setup to look exactly like the source schema, the Oracle Data Integrator pre-built Interfaces can change which source they are using from within the Loading Knowledge Module (LKM) by evaluating a variable (IS_SDS_DEPLOYED) at various points throughout the LKM. Using this approach, the GoldenGate integration can be easily enabled at any point, even after initial configuration. The Oracle BI Applications team did a great job of utilizing the features of ODI that allow the logical layer to be abstracted from the physical layer and data source connection. For further information about how to implement Oracle GoldenGate with Oracle BI Applications 11g, check out the OTN Technical article I wrote which describes the steps for implementation in detail.

Disaster Recovery

If the data being reported on in BI Applications is critical to your business, you probably want a disaster recovery process. This will involve an entirely different installation on a full server stack located somewhere that is not near the production servers. Now, there are many different approaches to DR with each of the products involved in BI Applications – OBIEE, ODI, databases, etc., but I think this approach is more simple than many others.

BI Apps DR Architecture

The installation of BI Apps would occur on each site (primary and standby) as standalone installations. It’s critical that you have a well defined, hopefully scripted and automated, process for installation and configuration, since everything will need to be exactly the same between the two sites. Looking at the architecture diagram above, you can see the data warehouse, ODI repository, and BIACM repository schemas will be replicated from primary to standby via Oracle Dataguard. The OBIEE metadata repositories are not replicated due to much of the configuration information being stored in files rather than the database schema.

With the installation and configuration identical, any local, internal URLs will be setup to use the local site URL (e.g. http://biapps.rittmanmead-primary.com). The external URLs, such as the top-level site (e.g. http://biapps.rittmanmead.com/biacm) or database JDBC connection URLs, will all use canonical names (CNAMEs) as the URL. A CNAME is really just an alias used in the DNS, allowing an easy switch when redirecting from one site to another. For example, the CNAME biapps.rittmanmead-primary.com will have an alias of biapps.rittmanmead.com. This alias will switch to point to biapps.rittmanmead-standby.com during the failover / switchover process.

We can now run through a few simple steps to perform the failover or switchover to the standby server.

  • Update Global CNAMEs
  • Switch primary database via DataGuard
  • Update the Web Catalog and Application Role assignments
  • Start NodeManager, OHS, WebLogic AdminServer
  • Update Embedded LDAP User GUID in ODI (if necessary)
  • Start BI and ODI Managed Servers
  • Update and Deploy the RPD
  • Start the BI Services

Looks pretty straightforward, right? With the appropriate attention to detail up front during the installation and configuration, it becomes simple to maintain and perform the DR switchover and failover. I’ll go into more detail on these topics and others, such as installation and configuration, LDAP integration, and high availability, during my presentation at KScope15 later this month. I hope to see you there!

 

Categories: BI & Warehousing

New In Oracle BI Cloud Service – Oracle Visual Analyzer, and Data Mashups in VA

Rittman Mead Consulting - Fri, 2015-06-05 13:27

Oracle released an update to Oracle BI Cloud Service a few weeks ago that included Oracle Visual Analyzer, along with some other improvements including support for full Oracle Database-as-a-Service as the database backend, the ability to upload RPDs and run them in the cloud, and support for a new utility called DataSync. In this post though I want to take a quick look at Visual Analyzer, and in-particular look at the data-mashup feature it provides.

Visual Analyzer is of course one of the tentpole features in OBIEE12c that we’ve all been looking forward to, as is 12c’s ability to allow users to upload spreadsheets of data and join them to existing subject areas in Answers. I’m covering Visual Analyzer in an upcoming edition of Oracle Magazine so I won’t go into too much detail on the product at a high-level here, but in summary Visual Analyzer provides a single-pane-of-glass, Tableau-type environment for analysing and visualising datasets stored in Oracle Cloud Database and modelled in BICS’s cut-down web-based data-modeller. In the Oracle Magazine article I take the Donors Choose dataset that we featured at the recent Rittman Mead BI Forum 2015, and create a range of visualizations as I explore the dataset and pick the type of project I’d most like to donate to.

NewImage

Visual Analyzer differs from Answers in that all of the available data items are listed down one side of the page, there’s no flicking backwards-and-forwards between the Criteria tab and the Results tab, filters are set by just right-clicking on the column you wish to filter by, and the visualisation builds up in front-of your eyes as you add more columns, move things around and arrange the data to get the most appropriate view of it.

From an IT manager’s perspective, where Visual Analyzer improves on desktop analysis tools such as Tableau and Spotfire is that the data you work with is the same governed dataset that Answers and Dashboards users work with, the same security rules and auditing apply to you as to other Presentation Services and Catalog users, but those types of “self-service” users who just want to play-around with and explore the data – rather than create reports and dashboards for mass consumption – now can work with the type of tool they’ve up-to-now had to look elsewhere for.

NewImage

One of the other headline features for OBIEE12c announced at last year’s Oracle Openworld is “Model Extensibility and Data Mashup”. Announced as part of Paul Rodwick’s “Business Analytics and Strategy Roadmap” session and described in the slide below, this feature extends the capabilities of the BI Server to now handle data the user uploads from the Answers (and now Visual Analyzer) report creation page, joining that data as either “fact extensions” or “measure extensions” to an existing Presentation Services subject area. 

NewImage

I won’t go into the technical details of how this works at this point but in terms of how it looks to the end-user, let’s consider a situation where I’ve got a spreadsheet of additional state-level data that I’d like to use in this Visual Analyzer (VA) project, to in this case colour the states in the map based on the income level of the people living there. The spreadsheet of data that I’ve got looks like this:

NewImage

Note the cunningly-named columns in the first row – they don’t have to match the column names in your VA data model, but if they do as you’ll see in a moment it speeds the matching process up. To add this spreadsheet of data to my VA project I therefore switch the menu panel on the left to the Data Sources option, right-click and then choose Add Data Source…

NewImage

Then using the Add Data Source dialog, upload the XLSX file from your workstation. In my instance, because I named the columns in the top row of the spreadsheet to match the column names already in the BICS data model, it’s matched the SCHOOL_STATE column in the spreadsheet to the corresponding column in the SCHOOLS table and worked out that I’m adding measures, joined on that SCHOOL_STATE column.

NewImage

If my spreadsheet contained other text fields matched to the existing model via a dimension attribute, the upload wizard would assume I’m adding dimension attributes, or if it detects them wrong I can match the columns myself, and specify whether the new file contains measures or attributes. BICS then confirms the join between the two datasets and I can then start selecting from the new measures to add to my project.

NewImage

My final step then is to add the HOUSEHOLD_INCOME measure to my visualisation, so that each state is now shaded by the household income level, allowing me to see which states might benefit most from my school project donation.

NewImage

One thing to bear-in-mind when using mashups though, is that what you’re effectively doing is adding a new fact table that joins to the existing one on one or more dimension levels. In my case, my HOUSEHOLD_INCOME and POPULATION measures only join to the DONATIONS dataset on the SCHOOL dimension, and then only at the STATE level, so if I try and reference another column from another dimension – to add, for example a filter on the FUNDING STATUS column within the PROJECTS dimension – the project will error as that dimension isn’t conformed across both facts.

NewImage

My understanding is that Oracle will fix this in a future release by setting all the non-conformed dimensions to “Total” as you can do with the on-site version of OBIEE yourself, but for now this restricts mashups to datasets that use fully-conformed dimensions, and with filters that only use those conformed dimensions from the join-level up.

So that’s VA on BICS in a nutshell, with this article drilling-down further into the very interesting new data mashup feature. Look out for more on this new release of BICS soon as I cover the new DataSync feature, RPD uploads and connecting BICS to the full Oracle Database-as-a-Service.

 

Categories: BI & Warehousing

WebCenter Content - A new Delivery Channel in 11.1.1.9.0 !!

Tim Dexter - Wed, 2015-06-03 02:13

Hi Everyone

One of the new features introduced in 11.1.1.9.0 is WebCenter Content (WCC) Server as a delivery channel. Prior to this release, we could manage delivery to WCC (formerly UCM) server using webDAV as explained by Tim in his blog "BI Publisher and WebDAV... done!". However, there were few restrictions

  • No way to include standard or custom metadata. Therefore, there was no description for the documents submitted and searching these documents in WCC was not convenient.
  • WebDAV uses folders to store the document and access to these folders have to be pre-configured by WCC Administrator. User can not select security group or account at the time of scheduling.

Moreover, with WCC as delivery channel you have the ability to use idc(s), http(s) and JAX-WS protocols. Refer to the documentation on RIDC protocols for more details on these protocols.

We have now videos in BI Publisher Youtube Channel to demonstrate how BI Publisher integration with WCC works. The videos are split in two parts: Part 1 (Title: BIPublisherWCC Part1) covers an overview and explains some of the WCC concepts, while Part 2 (Title: BIPublisherWCC Part2) walks you through all the steps necessary to make the integration work.

You can find additional details in the documentation guide. Navigate to the Books link and check the Administrators Guide for setting up delivery destinations and the Data Modeling Guide for Custom Metadata & Bursting related information. 

I am sure you will find this new feature very easy to configure and very useful for maintaining documents in WCC. Have a nice day !! 

Categories: BI & Warehousing

User Engagement – Why Does it Matter?

Rittman Mead Consulting - Mon, 2015-06-01 09:25

The value of any system can be measured by how often you interact with it. The perfect example is Slack, which is now seen as the most successful collaboration app ever and is valued at $1.2bn. The key metric for this is DAU/MAU (the ratio of daily users over monthly users), which scales from 0 (no visits) to 1 (every user visits daily) based on how active your users are.

How does this relate to your BI or analytics system? We believe that the same concept applies. The more you interact with your BI/analytics system, the more valuable it is to your organisation.

Return on investment (ROI) of BI/analytics systems has always been hard to measure. There is widespread belief that the battle was either won or lost during the development stage of its lifecycle so, agile project approaches, stakeholder involvement, and effective change management programs are employed to closer align the system to users’ true requirements. However, even that may not drive users to use the system.

What can you do if your system is already live?

Focus on user engagement. The BI Scorecard describes typical BI/analytics systems as having only 22% user adoption. Gartner, in its The Consumerization of BI Drives Greater Adoption paper, puts the typical user adoption rate at 30%. This leaves a lot of room for improvement.

We believe that there are 5 drivers for user engagement:

  • User interface and user experience
  • Quality, relevance, and confidence in data
  • Performance
  • Ability to use the system
  • Accessibility – is the system available in the right way, at the right time?

Addressing these can drive low cost and low risk improvements to the user engagement of your system and, in turn, increase its value.

Over the next few months, we are focusing on user engagement and will be launching a set of articles and initiatives to assist organisations. We will also launch a User Engagement Service that baselines your current user engagement levels, makes a series of recommendations for improvements based on the drivers above, implements them and then measures their impact on the system. The results are a demonstrable increased ROI on your BI/analytics system.

Over the next few months we will be putting a lot of focus into user engagement, plus developing a set of tools to extend OBIEE to use technologies like D3. If you are interested in hearing more, please sign up to our mailing list below.

#mc_embed_signup{background:#fff; clear:left; font:14px Helvetica,Arial,sans-serif; width:500px;} /* Add your own MailChimp form style overrides in your site stylesheet or in this style block. We recommend moving this block and the preceding CSS link to the HEAD of your HTML file. */


Categories: BI & Warehousing

Business Rules in Enterprise Apps and Data Mining Applications

Dylan's BI Notes - Tue, 2015-05-26 20:55
Transaction system has several places to capture the decisions and has several places to capture the business rules. These are the potential places where the data mining scoring engine can be deployed. Here are typical “decisions” captured in the enterprise apps I can think of: Marketing  Marketing Strategy and Planning 4 Ps are the decisions […]
Categories: BI & Warehousing

OBIEE 11.1.1.9: New Features for Front-End Developers and Business Users

Rittman Mead Consulting - Tue, 2015-05-26 11:13

OBIEE 11.1.1.9 was released this May and Robin and Mark wrote great posts about New Features from System Administrators and the new support for HiveServer2 and Cloudera Impala respectively. In this post, we will see some useful new features for Front-End development. Some of these features like tree map, new styles, etc. were included in the OBIEE 11.1.1.7.10 version, but I’m including them here anyway because if you are moving from an earlier release of 11.1.1.7 you probably haven’t seen them yet.

Search & Sort options inside the Subject Area

There is a new option to search inside the subject area when we create an analysis. If we press the magnifying glass icon in the Subject Areas pane, a search box appears and OBIEE will return all the objects that contains the text that we entered. We can also use the % wildcard.

search_option

In addition there is also the option to order the objects inside the subject area sorting the folders and the presentation columns inside them. The default value is showing the objects in the way that are saved in the repository.

sort_subj_area

Save Calculated Items in the Web Catalog

A very useful feature for business users is the possibility to create calculated items in an analysis and save it in the catalog in order to reuse it in other analyses for the same subject area.

In the Results tab there is a new icon for creating a new measure. The calculation is created in the same way the column formulas are created.

new_calc_measure

After the new measure is created you can go to the Criteria tab and in the column pop-up menu select the Save Column As option. In this manner you save this new measure as a new object in the catalog to be reused in other analyses. The feature of creating a new calculated measure for tables and pivot tables was included in the 11.1.1.7.10 version but the possibility to save the column in the catalog is a new feature of the OBIEE 11.1.1.9.

savenew_colcalculated_measure4

You can also change the properties of the new column in the Criteria tab but be aware that conditional formatting and conditional action links for the column are not saved to the catalog. Regular action links are saved for the column as well as the format properties.

The saved column can be edited from the catalog pane and catalog page. All the changes that you make in the catalog are applied to every analysis that used this column. If you modify it inside a particular analysis, these changes are valid only for the analysis that you are working on.

editnew_measure

To be able to save new measures as columns you should have the Save Column privilege.

calculatedmeasure_priv

Global Variables

In this version we find a new new type of variables: the Global Variables. Global Variables can be defined in the context of an analysis and can be used in other analysis. Useful to do some intermediate reusable calculations.

To create a new global variable, select the Edit Formula option for the required column. When you press the Variable button in the Formula Editor you will see the new Global option. The Insert Global Variable dialog appears and you can select an existing global variable to be used in the formula, or you can create a new one. To create a new one you need to enter a name, type and value. If you want to use an expression like in the example (Min(Revenue)) as value, the data type should be Text.

global_variable

To reference a global variable you need to use the fully qualified name including the context: @{global.variables.variablename}.

global_var_result

Also to manage global variables you should have the Manage Global Variables privilege.

New Visualisation: Treemap

There is a new visualisation called Treemap since OBIEE 11.1.1.7.10. Treemap groups the dimensional data that you selected in the analysis in tiles. By default, the tiles size is based in the content of first measure of the analysis and the tiles colour is based in the content of the second measure. If you have only one measure in your analysis, is used for both size and colour.

treemap

You can edit it as any other view and change the group by information as well as the measures which affects the size and colour of the tile and the range of colours that you want to use. Also you can choose the style between seeing the information in coloured bins or using continuous colour fill. If you selected the first one you can also select the amount of different coloured bins do you want to use.

treemap2

New Preview Styles

When you want to preview an analysis in the Results tab, instead of showing immediately how the analysis will look, OBIEE offers you different Styles to see your analysis. These feature was also included since the 11.1.1.7.10 version. In case you want to create a custom style, Oracle recommends to use the new Skyros style as a starting point.

preview

Horizontal Layout for Radio Button and Check Box Dashboard Prompts

Radio button and check box dashboards prompts can be showed horizontally. When you create a new dashboard prompt using check box or radio button under Options, you can select between horizontal or Vertical Layout appears under Options.

horizontal_prompts

horizontal_prompt1

Enhancements in Export & Print Options

In this version, you will find more options in the Print and Export Options dialog. You can select if you want to include charts, images and formatting or to specify column properties like the column width and to wrap the text in columns.

dashboardprintoptions

In summary, these features are simple but quite useful for business users and front-end developers, and give more flexibility to create better data visualisations.

 

Categories: BI & Warehousing

Oracle BI Publisher 11.1.1.9.0 is available !!!

Tim Dexter - Mon, 2015-05-25 13:12

Hi Everyone,

I am happy to announce that Oracle BI Publisher 11.1.1.9.0 is released, although I admit that this is almost a week old news now and I am sure some of you may have already known this by now from the BI Publisher homepage in OTN or from other sources. My sincere apologies for the delay here.

Thank you Tim for helping me to get back into this blog membership and being patient to allow me put this word out. My activity in the blog has been so less in the past that I am as good as a new member here. When I tried to login last week, I was greeted with this message -

"Sorry, you do not have the privileges necessary to access the page you requested. This system is available to Oracle Employees only. Oracle Employees who would like to request a blog account should click here."

So I had to start all over and get a fresh access created. Thanks to Tim and Phil from IT support for helping me with this. I will now make sure to use this space more often and share more features, tips and tricks.

Oracle BI Publisher 11.1.1.9.0 was GA on May 19th and you can get the download, documentation, certification matrix and release notes here at the BI Publisher home page in OTN. Here is a quick snapshot of new features in this release. The download is also available at Oracle Software Delivery Cloud site. The documentation page has also been given a fresh new structure where in the left navigation you will notice "Task" and "Books" as two menu items. The task will provide quick reference to role based activities under different sub menu items such as "View & Publish", "Design Reports" etc. The "Books" menu will take you to the complete set of books. You can select Administrator's guide, or Developer's guide, Data Modelling Guide, Report Designer's guide and User's Guide for BI Publisher here. If you are looking for any feature, and do not find information under "Tasks" then check for the same under "Books" or use the search option.

Stay tuned for more updates on new features. Wish you have a good time exploring the new features!!

Categories: BI & Warehousing

Deploying Oracle BI Mobile HD within Good, Oracle OMSS and Other Secure MDM Containers

Rittman Mead Consulting - Sun, 2015-05-24 10:53

A few months ago ODTUG announced their ODTUG Mobile Day in Utrecht, Netherlands and asked if I’d like to put forward an abstract for a presentation. Something that’s been on my mind for a while is the issues some of our customers have been reporting trying to run Oracle BI Mobile HD with the Good Mobile Device Management toolkit, so I thought this might be a good opportunity to look into the topic in more detail and try and come-up with some recommendations on how to get this working. As the ODTUG Mobile Day was for all Oracle developers, not just BI ones, I thought it’d be a good opportunity to look into iOS and Android development in-general as it relates to Oracle, and in-particular what’s involved in deploying into these “container” environments that many large enterprise customers insist on when deploying mobile applications. The presentation is available on Slideshare if you want to download it, but I thought I’d expand on some of the concepts and tips in a separate blog post as it’s all actually quite interesting.

As a bit of background, OBIEE itself can of course display its standard dashboards through mobile web browsers with graphics and charts switching to iOS and Android-friendly formats rather than trying to render in Flash. But standard dashboards and analyses look too small and are hard to work with in mobile browsers, so using this approach to mobile-enable your dashboards isn’t generally recommended. Instead of course, as most readers will already be aware, Oracle released a number of mobile clients for OBIEE over the years, with Oracle BI Mobile and Oracle BI Mobile App Designer available in the latest 11.1.1.9 and suiting most end-user and customer needs for mobile access to their business analytics data.

 NewImage

Oracle BI Mobile App Designer apps can be deployed using SSL and SSO, whilst BI Mobile HD takes advantage of iOS and Android platform security features such as password storage in encrypted on-device keystores and remote-wipe of devices. But for some organizations that need higher-levels of device control and data security, this platform-level security isn’t enough as:

  • Unknown and unsecured devices accessing their networks (for example,via VPN) is considered an unacceptable security risk
  • For Android devices in-particular, there is known malware out on the internet that can compromise “rooted” devices
  • It’s possible to cut-and-paste sensitive information from BI Mobile applications into other applications
  • Users don’t always set secure passwords, and lost or stolen devices can potentially expose BI data stored on the device to unauthorised users

To address these issues, a number of Mobile Device Management vendors provide enterprise-level solutions to these issues, typically by having the customer specially-sign applications they wish to deploy to indicate they’ve been authorised for use, and by deploying those applications within managed containers that keep them separate from the general mobile apps on the users’ device. As well as specially signed and deployed applications such as BI Mobile HD, these MDM suites also typically provide secure and containerised web browsers and email devices, both of which need to work with Oracle BI and Oracle BI Mobile App Designer. Vendors in this space include Good Technology and MobileIron, and Oracle have their own MDM solution after they acquired Bitzer Mobile back in 2013. So how do these solutions work, why do some customers have trouble getting them working, and what’s the “preferred” approach if a customer asks you to just get it working for them?

To start with the simplest approach, the tested and recommended way to run Oracle BI Mobile HD, and Oracle BI Mobile App Designer through an MDM-supplied web browser, is to use Oracle Mobile Security Suite (OMSS), based on the technology Oracle acquired from Bitzer Mobile. OMSS is a complete platform for deploying mobile apps in a secure, managed container environment, and takes customer-signed iOS and Android applications and deploys them onto enterprise users’ mobile devices using a centralized console and management service.

NewImage

To support deployment of Oracle BI Mobile into OMSS and other vendor MDM solutions, Oracle have made an un-packed and un-signed version of Oracle BI Mobile available for download on OTN as “Oracle Business Intelligence Mobile Security Toolkit”, and taking the iOS version as an example you can then compile this is Apple Xcode and sign it for deployment in your organisation’s OMSS installation. For some organisations this can be a bit of a challenge as they don’t generally use Macs and don’t do iOS development, but for Rittman Mead testing this out was pretty straightforward as we all use Macs for our consulting work and some of us play around with Xcode and iOS development in our spare time. What can also be a challenge is setting up an iOS Developer Account so that you can sign the BI Mobile HD application with your organization’s own certificate, but we set up such an account a couple of years ago and were able to get it all setup with just a couple of hours’ work. What you will need then to get this running (apart from the OMSS part that I’ll cover in a moment) is the following bits of hardware and software, in this case for the iOS version of BI Mobile HD:

  • An Apple Mac that can run recent versions of Xcode
  • An Apple Developer account that can develop and sign iOS applications, in your organization’s name and not an individual developer’s – note that you generally need to be registered with Dun and Bradstreet’s business verification service to set this up, which can take a few weeks if your entry is out-of-date or not matching your current company details
  • Oracle Business Intelligence Mobile Security Toolkit
  • Apple Xcode

NewImage

Then it’s a case of setting up a new project in Xcode, selecting Single View Application and Universal as the device type, entering your project and organization’s information, and then merging the Oracle Mobile Security Toolkit’s project files with the empty project you just created and setting any other project properties.

NewImage

At this point you should be able to run the application within the Xcode project and test it using Xcode’s built-in iPhone and iPad simulators, so that you can check all the features still work as expected.

NewImage

Now comes the point where you sign the app you’ve just created so that you can then deploy it into OPSS for distribution to your users. This point is important to security-conscious customers as it’s now the customer, rather than Oracle, that have signed the app and you can be much more certain that no malware or other backdoors have been introduced into the code before compiling and distribution. Signing of the app takes place within Xcode, with a similar process being used for the Android version of BI Mobile HD within the Android Studio IDE.

NewImage

The resulting compiled .app and .apk files are then uploaded into OMSS’s catalog for distribution to users, with provisioning taking place using emails set to corporate users that provide download links to these containerised, managed and secured versions of Oracle’s BI Mobile HD app.

NewImage

So all of this looks pretty straightforward, albeit maybe a bit more complicated for organisations that don’t use Macs and don’t generally develop iOS applications – but Oracle partners such as ourselves can help with this if you’d like to offload this part of the process to a specialist team. Where things do sometimes get a bit more complicated is when other MDM vendor technologies are used, particularly Good Dynamics MDM solution that works in a slightly different way to Oracle Mobile Security Suite.

Unlike OMSS’s approach where it has you compile and sign BI Mobile HD within Apple’s own iOS application, Good requires you to build and export the unsigned Oracle Mobile Security Toolkit project in Xcode as an .ipa file, and then copy it along with your iOS Developer Program certificate and the certificate password into Good’s own Good Control Management Console. There your application is then combined with Good’s security libraries, signed with your certificate password and deployed as a “wrapped application” to then be distributed to users using a similar method to the one OMSS takes; however all mobile application access then goes through a Good proxy server, typically placed outside the main company network and providing secure communications between these managed applications running outside of the company firewall into that company’s secure servers – in this case, OBIEE11g.

NewImage

There’s nothing inherently wrong with this compared to how OMSS does it, and organisations often pick Good Dynamics over other MDM solutions because of the extra functionality Good’s approach provides through the insertion of their security SDK into your mobile application; but its when organisations take advantage of these features to provide custom integration with their own security platform that problems can sometimes occur.

For example, a common issue we hear about when deploying Oracle BI Mobile HD using Good is when the customer tries to integrate their SSO solution into the user authentication process. Good’s security SDK makes it possible to intercept user login events and route the request to the customer SSO server, but it’s essential that control is passed back to the BI Server as if this re-routing hadn’t taken place and returning the authentication details the BI Server expects, and if the custom login process doesn’t quite do this then the authentication process fails. Another issue we heard about recently was recent versions of iOS (iOS 7) deprecating synchronous API calls but BI Mobile HD still making them; in this case Oracle supplied a patch and all calls are now made asynchronously but until then, deployment in the Good environment mysteriously failed.

What makes these issues doubly-tricky to identify and resolve is the restrictions most security-conscious enterprise customers place around disclosing details of their network setup, to the point where they often aren’t allowed to tell partners or Oracle Support any of the details they’ll need to work out how traffic passes around the network and over to OBIEE and the Good MDM environment. What troubleshooting often comes down to then is good old-fashioned packet-sniffing and investigation by someone familiar with OBIEE (in particular, the BI Server’s authentication process), Good’s security SDK and the customer’s network setup.

NewImage

So given all of this, what is our recommendation for customers looking to implement mobile OBIEE11g clients within an MDM, container solution? My advice would be, where the customer doesn’t currently have an MDM solution and wants the easiest way to deploy Oracle BI Mobile within a secure container, go for the Oracle Mobile Security Suite option – it’s what Oracle support and test for, and as an Oracle solution it’s obviously going to be easier to go down this route than try and troubleshoot another vendor’s solution if things go wrong. But if an organization is insisting on deploying Oracle BI Mobile in a secure container its unlikely this is the first time they’ve had to do it, so there’s most probably already an MDM solution in-place and it’s likely to be from Good.

In this case, first of all remember that it should work and the two products are compatible; what you need to be aware of though is the correct way of linking BI Mobile HD in this environment to your corporate SSO and security platform, and work together with your network security team to quickly identify where the issue is coming from if things don’t work first time. Engaging with an experienced OBIEE partner such as Rittman Mead can of course help, and Oracle’s own product development and support teams have most probably seen most of the issues that can come up and can help if things get tricky. The team here at Rittman Mead have several customers successfully using Good and other vendor’s MDM solutions along with Oracle BI Mobile, and of course we can help with the app signing and deployment process if your organization doesn’t usually work with Macs or have experience with Xcode and Oracle Mobile Security Toolkit.

Finally, thanks to Chris Redgrave from the Rittman Mead team, and Oracle’s Matt Milella and Jacques Vigeant who helped me with the background research for this article and the ODTUG BI Mobile Day presentation. As I mentioned earlier the presentation from the ODTUG event is available on Slideshare, and there’s also walkthroughs for deploying BI Mobile HD within Oracle OMSS on iOS and Android on the OTN website.

Categories: BI & Warehousing

New Oracle Magazine article on Oracle BI Cloud Service

Rittman Mead Consulting - Sun, 2015-05-24 07:42

NewImageThe May/June 2015 edition of Oracle Magazine is now out, and my Business Intelligence article in this edition is on Oracle BI Cloud Service (BICS). In “Upload, Model, Analyze and Report” I focus on the “departmental power-user” use-case where someone with a small team wants to share data and dashboards with others in the department, is familiar with OBIEE but wants to get something up-and-running quickly without having to include IT or the formal development process. By just uploading a spreadsheet of data and quickly modeling it into a star-schema using BICS’s simple-to-use web-based tools, you can create reports that can be shared with others in your team using OBIEE’s familiar dashboard interface.

Rittman Mead offer a number of services around Oracle BI Cloud Service and now have our own BICS “pod” for demonstration and evaluation purposes. Now that Oracle have released Visual Analyser early for BICS we’re expecting a lot of interest and demand for services, support and training around Oracle’s cloud version of OBIEE, so if you’re interested in moving part of your OBIEE estate in the cloud, or you’re a departmental BI lead looking to run OBIEE within your department without the need to get IT involved, drop us a line at enquiries@rittmanmead.com and we’ll be pleased to help.

Categories: BI & Warehousing

Connecting OBIEE 11.1.1.9 to Hive, HBase and Impala Tables for a DW-Offloading Project

Rittman Mead Consulting - Fri, 2015-05-22 07:28

In two previous posts this week I talk about a client request to offload part of their data warehouse top Hadoop, taking data from a source application and loading it into Hive tables on Hadoop for subsequent reporting-on by OBIEE11g. In the first post I talked about hosting the offloaded data warehouse elements on Cloudera Hadoop CDH5.3, and how I used Apache Hive and Apache HBase to support insert/update/delete activity to the fact and dimension tables, and how we’d copy the Hive-on-HBase fact table data into optimised Impala tables stored in Parquet files to make sure reports and dashboards ran fast.

NewImage

In the second post I got into the detail of how we’d keep the Hive-on-HBase tables up-to-date with new and changed data from the source system, using HiveQL bulk-inserts to load up the initial table data and a Python script to handle subsequent inserts, updates and deletes by working directly with the HBase Client and the HBase Thrift Server. Where this leaves us at the end then is with a set of fact and dimension tables stored as optimised Impala tables and updatable Hive-on-HBase tables, and our final step is to connect OBIEE11g to it and see how it works for reporting.

NewImage

As I mentioned in another post a week or so ago, the new 11.1.1.9 release of OBIEE11g supports Cloudera Impala connections from Linux servers to Hadoop, with the Linux Impala drivers being shipped by Oracle as part of the Linux download and the Windows ones used for the Admin Tool workstation downloadable directly from Cloudera. Once you’ve got all the drivers and OBIEE software setup, it’s then just a case of setting up the ODBC connections on the Windows and Linux environments, and you should then be in a position to connect it all up.

NewImage

In the Impala side, I first need to create a copy of the Hive-on-HBase table I’ve been using to load the fact data into from the source system, after running the invalidate metadata command to refresh Impala’s view of Hive’s metastore.

[oracle@bigdatalite~]$impala-shell
 
[bigdatalite.localdomain:21000]>invalidate metadata;
 
[bigdatalite.localdomain:21000]>create table impala_flight_delays
                                >stored as parquet
                                >as select *from hbase_flight_delays;

Next I import the Hive-on-HBase and the Impala table through the Impala ODBC connection – even though only one of the tables (the main fact table snapshot copy) was created using Impala, I still get the Impala speed benefit for the other three tables created in Hive (against the HBase source, no less). Once the table metadata is imported into the RPD physical layer, I can then create a business model and subject area as I would do normally, so my final RPD looks like this:

NewImage

Now it’s just a case of saving the repository online and creating some reports. If you’re using an older version of Impala you may need to disable the setting where a LIMIT clause is needed for every GROUP BY (see the docs for more details, but recent (CDH5+) versions will work fine without this). Something you’ll also need to do back in Impala is compute statistics for each of the tables, like this:

[bigdatalite.localdomain:21000] > compute stats default.impala_flight_delays;
Query: compute stats default.impala_flight_delays
+-----------------------------------------+
| summary                                 |
+-----------------------------------------+
| Updated 1 partition(s) and 8 column(s). |
+-----------------------------------------+
Fetched 1 row(s) in 2.73s
[bigdatalite.localdomain:21000] > show table stats impala_flight_delays;
Query: show table stats impala_flight_delays
+---------+--------+---------+--------------+---------+-------------------+
| #Rows   | #Files | Size    | Bytes Cached | Format  | Incremental stats |
+---------+--------+---------+--------------+---------+-------------------+
| 2514141 | 1      | 10.60MB | NOT CACHED   | PARQUET | false             |
+---------+--------+---------+--------------+---------+-------------------+
Fetched 1 row(s) in 0.01s

Apart from being generic “good practice” and giving the Impala query optimizer better information to form a query plan with, you might hit the error below in OBIEE if you don’t do this.

NewImage

If you do hit this error, go back to the Impala Shell or Hue and compute statistics, and it should go away next time. Then, finally, you can go and create some analyses and dashboards and you should find the queries run fine against the various tables in Hadoop, and moreover the response time is excellent if you use Impala as the main query engine.

NewImage

I did a fair bit of testing of OBIEE 11.1.1.9 running against Cloudera Impala, and my findings were that all of the main analysis features worked (prompts, hierarchies, totals and subtotals etc) and the response time was comparable with a well-turned data warehouse, maybe even Exalytics-level of speed. If you take a look at the nqquery.log file for the Impala SQL queries OBIEE is sending to Impala, you can see they get fairly complex (which is good, as I didn’t hit any errors when running the dashboards) and you can also see where the BI Server takes a more simple approach to creating subtotals, nested queries etc compared to the GROUP BY … GROUPING SETS that you get when using a full Oracle database.

select D1.c1 as c1,
     D1.c2 as c2,
     D1.c3 as c3,
     D1.c4 as c4,
     D1.c5 as c5,
     D1.c6 as c6,
     D1.c7 as c7,
     D1.c8 as c8,
     D1.c9 as c9,
     D1.c10 as c10,
     D1.c11 as c11,
     D1.c12 as c12
from 
     (select 0 as c1,
               D1.c3 as c2,
               substring(cast(NULL as  STRING ), 1, 1 ) as c3,
               substring(cast(NULL as  STRING ), 1, 1 ) as c4,
               substring(cast(NULL as  STRING ), 1, 1 ) as c5,
               'All USA' as c6,
               substring(cast(NULL as  STRING ), 1, 1 ) as c7,
               1 as c8,
               substring(cast(NULL as  STRING ), 1, 1 ) as c9,
               substring(cast(NULL as  STRING ), 1, 1 ) as c10,
               D1.c2 as c11,
               D1.c1 as c12
          from 
               (select sum(T44037.late) as c1,
                         sum(T44037.flights) as c2,
                         T43925.carrier_desc as c3
                    from 
                         
                              hbase_carriers T43925 inner join 
                              impala_flight_delays T44037 On (T43925.key = T44037.carrier)
                    where  ( T43925.carrier_desc = 'American Airlines Inc.' or T43925.carrier_desc = 'Delta Air Lines Inc.' or T43925.carrier_desc = 'Southwest Airlines Co.' or T43925.carrier_desc = 'Spirit Air Lines' or T43925.carrier_desc = 'Virgin America' ) 
                    group by T43925.carrier_desc
               ) D1
          union all
          select 1 as c1,
               D1.c3 as c2,
               substring(cast(NULL as  STRING ), 1, 1 ) as c3,
               substring(cast(NULL as  STRING ), 1, 1 ) as c4,
               D1.c4 as c5,
               'All USA' as c6,
               substring(cast(NULL as  STRING ), 1, 1 ) as c7,
               1 as c8,
               substring(cast(NULL as  STRING ), 1, 1 ) as c9,
               D1.c4 as c10,
               D1.c2 as c11,
               D1.c1 as c12
          from 
               (select sum(T44037.late) as c1,
                         sum(T44037.flights) as c2,
                         T43925.carrier_desc as c3,
                         T43928.dest_state as c4
                    from 
                         
                              
                                   hbase_carriers T43925 inner join 
                                   impala_flight_delays T44037 On (T43925.key = T44037.carrier) inner join 
                              hbase_geog_dest T43928 On (T43928.key = T44037.dest)
                    where  ( T43925.carrier_desc = 'American Airlines Inc.' or T43925.carrier_desc = 'Delta Air Lines Inc.' or T43925.carrier_desc = 'Southwest Airlines Co.' or T43925.carrier_desc = 'Spirit Air Lines' or T43925.carrier_desc = 'Virgin America' ) 
                    group by T43925.carrier_desc, T43928.dest_state
               ) D1
          union all
          select 2 as c1,
               D1.c3 as c2,
               substring(cast(NULL as  STRING ), 1, 1 ) as c3,
               D1.c4 as c4,
               D1.c5 as c5,
               'All USA' as c6,
               substring(cast(NULL as  STRING ), 1, 1 ) as c7,
               1 as c8,
               D1.c4 as c9,
               D1.c5 as c10,
               D1.c2 as c11,
               D1.c1 as c12
          from 
               (select sum(T44037.late) as c1,
                         sum(T44037.flights) as c2,
                         T43925.carrier_desc as c3,
                         T43928.dest_city as c4,
                         T43928.dest_state as c5
                    from 
                         
                              
                                   hbase_carriers T43925 inner join 
                                   impala_flight_delays T44037 On (T43925.key = T44037.carrier) inner join 
                              hbase_geog_dest T43928 On (T43928.key = T44037.dest and T43928.dest_state = 'Georgia')
                    where  ( T43925.carrier_desc = 'American Airlines Inc.' or T43925.carrier_desc = 'Delta Air Lines Inc.' or T43925.carrier_desc = 'Southwest Airlines Co.' or T43925.carrier_desc = 'Spirit Air Lines' or T43925.carrier_desc = 'Virgin America' ) 
                    group by T43925.carrier_desc, T43928.dest_city, T43928.dest_state
               ) D1
          union all
          select 3 as c1,
               D1.c3 as c2,
               D1.c4 as c3,
               D1.c5 as c4,
               D1.c6 as c5,
               'All USA' as c6,
               D1.c4 as c7,
               1 as c8,
               D1.c5 as c9,
               D1.c6 as c10,
               D1.c2 as c11,
               D1.c1 as c12
          from 
               (select sum(T44037.late) as c1,
                         sum(T44037.flights) as c2,
                         T43925.carrier_desc as c3,
                         T43928.dest_airport_name as c4,
                         T43928.dest_city as c5,
                         T43928.dest_state as c6
                    from 
                         
                              
                                   hbase_carriers T43925 inner join 
                                   impala_flight_delays T44037 On (T43925.key = T44037.carrier) inner join 
                              hbase_geog_dest T43928 On (T43928.key = T44037.dest and T43928.dest_city = 'Atlanta, GA')
                    where  ( T43925.carrier_desc = 'American Airlines Inc.' or T43925.carrier_desc = 'Delta Air Lines Inc.' or T43925.carrier_desc = 'Southwest Airlines Co.' or T43925.carrier_desc = 'Spirit Air Lines' or T43925.carrier_desc = 'Virgin America' ) 
                    group by T43925.carrier_desc, T43928.dest_airport_name, T43928.dest_city, T43928.dest_state
               ) D1
     ) D1
order by c1, c6, c8, c5, c10, c4, c9, c3, c7, c2 limit 65001

Not bad though for a data warehouse offloaded entirely to Hadoop, and it’s good to see such a system handling full updates and deletes to data as well as insert appends, and it’s also good to see OBIEE working against an Impala datasource and with such good response times. If any of this interests you as a potential customer, feel free to drop me an email at mark.rittman@rittmanmead.com, or check-out our Big Data Quickstart page on the website.

Categories: BI & Warehousing

Opportunities for Cloud based BI

Dylan's BI Notes - Thu, 2015-05-21 18:33
Three opportunities for the future Cloud based BI: 1. Provide the tools for consolidating data  Investing the the data matching and merging technologies is too costly for on-premise BI implementations. Providing the services within the cloud BI will lower the cost. 2. Provide the External Data Bringing the external data into the individual on-premise env […]
Categories: BI & Warehousing

Loading, Updating and Deleting From HBase Tables using HiveQL and Python

Rittman Mead Consulting - Thu, 2015-05-21 14:32

Earlier in the week I blogged about a customer looking to offload part of the data warehouse platform to Hadoop, extracting data from a source system and then incrementally loading data into HBase and Hive before analysing it using OBIEE11g. One of the potential complications for this project was that the fact and dimension tables weren’t append-only; Hive and HDFS are generally considered write-once, read-many systems where data is inserted or appended into a file or table but generally then can’t be updated or overwritten without deleting the whole file and writing it again with the updated dataset.

To get around this problem we loaded our incoming data into HBase tables, a NoSQL key/value-store database that also runs on Hadoop and HDFS but permits update and delete operations on rows as well as selects and inserts; later on we took the main fact table stored in Hive-on-HBase and copied its contents into Impala to considerably improve the response time of queries against this tables and the still-Hive-on-HBase dimension tables, but going back to the insert-update-delete operations on the HBase tables, how exactly does this work and what’s the most efficient way to do it?

Taking a step back for a moment, HBase is a NoSQL, key/value-type database where each row has a key (for example, “SFO” for San Francisco airport) and then a number of columns, grouped into column families. In the Flight Delays dataset that we used in the previous blog post, an HBase of origin airports might have a few thousand entries with each entry, or row, keyed on a particular airport code like this:

NewImage

(Note that at the start, these key values won’t be there – they’re more for illustrative purposes)

At the time of HBase table definition, you specify one or more “column families”. These are group headers for columns you might add earlier, and in the case of my origin airport table I might just use the column family name “dest”, so that the HBase table DDL looks like this:

create 'geog_origin','origin'

and the conceptual view of the table would look like this:

NewImage

Now what’s neat about NoSQL-style databases like this (and Endeca Server is the same) is that you can define individual columns just by using them. For example, I could create columns for the airport name, airport city, airport state and airport code just by using their name in a data load, prefixing those column names with the named of a previously-defined column family. Using the HBase Shell, for example, I could issue the following PUT commands to insert the first row of data into this HBase table, like this:

put 'geog_origin’,’SFO','origin:airport_name','San Francisco, CA: San Francisco'
put 'geog_origin’,’SFO','origin:city’,’San Francisco, CA'
put 'geog_origin’,’SFO',’origin':state','California'
put 'geog_origin’,'SFO',’origin':id’,'14771'

Now my HBase table conceptually looks like this:

NewImage

If I then want to use another column under the “origin” column family for LAX, I can just do so by using it in the next set of PUT commands, like this:

put 'geog_origin','LAX’,origin:airport_name','Los Angeles, CA: Los Angeles'
put 'geog_origin','LAX','origin:city','Los Angeles, CA'
put 'geog_origin','LAX','origin:state','California'
put 'geog_origin','LAX','origin:region’,’West Coast'
put 'geog_origin','LAX','origin:id','12892'

NewImage

Each column within column families has its values individually set, retrieved and deleted using PUT, GET and DELETE commands, and as long as you prefix the column name with one of the previously-defined column-family names and provide the key value for the row you’re interested in, HBase database tables are very flexible and were designed for simple product catalog-type applications running on hundreds of sharded server nodes for companies of the likes of Amazon, Google and Facebook (see this HBase “Powered-by” page for more examples of organizations using HBase).

But what HBase very much isn’t is a relational database like Oracle, Microsoft SQL server or even Apache Hive, databases that we’re much more likely to store data warehouse-type data in. In the previous post I showed how Hive table structures can in-fact be put over HBase tables, mapping HBase columns to Hive columns, and then HiveQL INSERT INTO TABLE … SELECT commands can be used to bulk-load these HBase tables with initial sets of data. So back to the original question – what’s the best way to then incrementally load and refresh these HBase tables, and I can I still use HiveQL for this?

In my original post, I defined Hive tables over my HBase ones using the Hive-on-Hbase (yum install hive-hbase) package and associated Hive storage handler; for example, the Hive table that provided SQL access over the flight_delays HBase tables was defined like this:

ADD JAR /usr/lib/hive/lib/zookeeper.jar;
ADD JAR /usr/lib/hive/lib/hive-hbase-handler.jar;
ADD JAR /usr/lib/hive/lib/guava-11.0.2.jar;
ADD JAR /usr/lib/hive/lib/hbase-client.jar;
ADD JAR /usr/lib/hive/lib/hbase-common.jar;
ADD JAR /usr/lib/hive/lib/hbase-hadoop-compat.jar;
ADD JAR /usr/lib/hive/lib/hbase-hadoop2-compat.jar;
ADD JAR /usr/lib/hive/lib/hbase-protocol.jar;
ADD JAR /usr/lib/hive/lib/hbase-server.jar;
ADD JAR /usr/lib/hive/lib/htrace-core.jar;

 
CREATE EXTERNAL TABLE hbase_flight_delays
 (key string,
  year string,
  carrier string,
  orig string,
  dest string,
  flights string,
  late   string,
  cancelled string,
  distance string
) 
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES 
("hbase.columns.mapping" = ":key,dims:year,dims:carrier,dims:orig,dims:dest,measures:flights,measures:late,measures:cancelled,measures:distance")
TBLPROPERTIES ("hbase.table.name" = "test1_flight_delays");

With the underlying HBase table defined with a key and two column families, one for dimension columns and one for fact (measure) ones – the key is a sequence number that I added to the source dataset to give each row a unique identifier.

create ‘test1_flight_delays','dims','measures'

To initially populate the table, I’ve created another Hive table with the initial set of source data in it, and I just insert its values in to the Hive-on-HBase table, like this:

insert into table hbase_flight_delays              
select * from flight_delays_initial_load;      
         
Total jobs = 1
...
Total MapReduce CPU Time Spent: 11 seconds 870 msec
OK
Time taken: 40.301 seconds

This initial load of 200,000 rows in this instance took 40 seconds to load; not bad, certainly acceptable for this particular project. Imagine now for every day after this we typically added another 500 or so flight records; in regular Hive this would be straightforward and we’d use the LOAD DATA or INSERT INTO TABLE … SELECT commands to add new file data to the Hive table’s underlying HDFS directories. And we can do this with the Hive-on-HBase table too, with the INSERT INTO TABLE command adding the incoming data to new rows/cells in the HBase table. Checking the row count and min/max ID values in the Hive-on-HBase table at the start, like this:

select count(*), min(cast(key as bigint)) as min_key, max(cast(key as bigint)) as max_key
from hbase_flight_delays;
 
Total jobs = 1
...
Total MapReduce CPU Time Spent: 14 seconds 660 msec
OK
200000  1  200000
Time taken: 53.076 seconds, Fetched: 1 row(s)

I can see that there’s 200,000 rows in the HBase table, starting at key value 1 and ending at key value 200,000. The table containing new data has key values going from 200,001 to 200,500, so let’s insert that new data into the Hive-on-HBase table:

insert into table hbase_flight_delays                                              
select * from flight_delays_daily_update_500_rows;     
                                 
Total jobs = 1
...
Total MapReduce CPU Time Spent: 3 seconds 870 msec
OK
Time taken: 26.368 seconds

Not bad – 26 seconds for the 500 rows, not quite as fast as the initial load but acceptable. Let’s also check that the data went in OK:

select count(*), min(cast(key as bigint)) as min_key, max(cast(key as bigint)) as max_key
from hbase_flight_delays;  
                                                      
Total jobs = 1
...
Total MapReduce CPU Time Spent: 13 seconds 580 msec
OK
200500   1   200500
Time taken: 44.552 seconds, Fetched: 1 row(s)

As I’d hoped, the number of rows has increased by 500 and the maximum key value is now 200,500. But how do we apply updates to the data in the table? I’ve got another source table that this time contains 1,000 randomly-selected rows from the initial data load dataset, where I’ve set the LATE column value to ‘999’:

hive> select * from flight_delays_daily_changes_1000_rows                                      
    > limit 5;
OK
21307  2008 WN BDL  BWI  1  999  1  283
136461  2008  OO  ORD  TYS  0  999  1  475
107768  2008  WN  BWI  ORF  0  999  1  159
102393  2008  OO  SLC  ACV  0  999  1  635
110639  2008  WN  BOI  OAK  0  999  1  511
Time taken: 0.139 seconds, Fetched: 5 row(s)

In fact the way you apply these updates is just to INSERT INTO TABLE … SELECT again, and the incoming values create new versions of existing rows/cells if needed. Some versions of HBase automatically keep a number of versions of each cell value (typically 3 versions), however the version of HBase that comes with CDH5.2 and higher only keeps one version by default (you can increase this number per table, or system wide, using the steps in the CDH5.2 release notes). Let’s try this out now, first using the HBase shell to see the values and timestamps currently held for one particular key value I know should by updated by the next dataset:

hbase(main):029:0> get 'test1_flight_delays', '102393'
COLUMN                                     CELL                                                                                                                       
 dims:carrier                              timestamp=1432236609421, value=OO                                                                                          
 dims:dest                                 timestamp=1432236609421, value=ACV                                                                                         
 dims:orig                                 timestamp=1432236609421, value=SLC                                                                                         
 dims:year                                 timestamp=1432236609421, value=2008                                                                                        
 measures:cancelled                        timestamp=1432236609421, value=1                                                                                           
 measures:distance                         timestamp=1432236609421, value=635                                                                                         
 measures:flights                          timestamp=1432236609421, value=0                                                                                           
 measures:late                             timestamp=1432236609421, value=0                                                                                           
8 row(s) in 0.0330 seconds

I’ll now use Hive to apply the updates, like this:

insert into table hbase_flight_delays                                              
select * from flight_delays_daily_changes_1000_rows;
 
Total jobs = 1
...
Total MapReduce CPU Time Spent: 4 seconds 340 msec
OK
Time taken: 24.805 seconds
 
select count(*), min(cast(key as bigint)) as min_key, max(cast(key as bigint)) as max_key
from hbase_flight_delays;                                                          
Total jobs = 1
...
Total MapReduce CPU Time Spent: 13 seconds 430 msec
OK
200500 1 200500
Time taken: 47.379 seconds, Fetched: 1 row(s)

Notice how this third INSERT didn’t create any new rows, the max key ID in the follow-up query hasn’t increased since the previous insert of new data. Querying one of the rows that I know was changed by this new table of data updates, I can see that the LATE column value has been changed:

select * from hbase_flight_delays where key = '102393';
Total jobs = 1
...
Total MapReduce CPU Time Spent: 3 seconds 600 msec
OK
102393  2008  OO  SLC  ACV  0  999  1  635

Let’s go into the HBase shell now and take a look at the columns cells for that same key ID:

hbase(main):030:0> get 'test1_flight_delays', '102393'
COLUMN                                     CELL                                                                                                                       
 dims:carrier                              timestamp=1432236723680, value=OO                                                                                          
 dims:dest                                 timestamp=1432236723680, value=ACV                                                                                         
 dims:orig                                 timestamp=1432236723680, value=SLC                                                                                         
 dims:year                                 timestamp=1432236723680, value=2008                                                                                        
 measures:cancelled                        timestamp=1432236723680, value=1                                                                                           
 measures:distance                         timestamp=1432236723680, value=635                                                                                         
 measures:flights                          timestamp=1432236723680, value=0                                                                                           
 measures:late                             timestamp=1432236723680, value=999                                                                                         
8 row(s) in 0.0800 seconds

Notice how the timestamp for each of the cells has now updated? If I had more than the default 1 version of each cell enabled, I could query the previous versions to see the old values and timestamps. So this works pretty well, and all I need to do is use HiveQL and INSERT INTO TABLE … SELECT to initially populate, append to and even update values in the table. But what If I want to update HBase more “programmatically”, maybe as part of a process that reads directly from a source application (for example, Salesforce or a web service) and then writes directly into HBase without the intermediate step of landing the incoming data into a file? For this we can use the HBase Client API of which there are libraries for many languages with the most popular being the Java API. If Java is too much though and you’d rather interact with HBase using a language such as Python, as this Cloudera blog post explains you can use either a REST API interface to HBase or one using the Thrift interface and work with languages such as Python.

In my case, my preferred way of programatically working with HBase is to use Python and a developer library called Happybase, where I can also bring in other libraries such as ones to work with Hive and even ones to work with OBIEE and Fusion Middleware and do my work at a much higher-level of abstraction. To show how this might work, I’m going to use Python, the HBase Client API and Happybase to programatically read from my update Hive tables (in real-life I’d probably connect directly to a web service if going down this more complicated route) and write a routine to read rows from the Hive table and load them into HBase.

Again I’m using the Oracle Big Data Lite 4.1 VM which has Python 2.7.6 already installed, and to get ready to install the Happybase library I first need to install pip, the “preferred installer program” for Python. As per the pip installation instructions, first download pip and then install it from the command-line:

sudo python get-pip.py

Then use Pip to install Happybase 

sudo pip install happybase

Whist you’re there you might as well install “pyhs2”, another python package that in this case lets us easily connect to Hive tables via the HiveServer2 interface found on CDH5+ and the Big Data Lite 4.1 VM.

sudo pip install pyhs2

Now I can put together a Python program such as the one below, that in this case creates a connection to a Hive table, selects all rows from it into a cursor and then PUTs these rows into the HBase table, via a batch process that sends data to HBase via the Thrift interface every 10,000 rows:

import pyhs2
import happybase
 
connection = happybase.Connection('bigdatalite')
flight_delays_hbase_table = connection.table('test1_flight_delays')
b = flight_delays_hbase_table.batch(batch_size=10000)
 
with pyhs2.connect(host='bigdatalite',
               port=10000,
               authMechanism="PLAIN",
               user='oracle',
               password='welcome1',
               database='default') as conn:
    with conn.cursor() as cur:
 
        #Execute query
        cur.execute("select * from flight_delays_initial_load")
 
        #Fetch table results
        for i in cur.fetch():
            b.put(str(i[0]),{'dims:year': i[1],
                             'dims:carrier': i[2],
                             'dims:orig': i[3],
                             'dims:dest': i[4],
                             'measures:flights': i[5],
                             'measures:late': i[6],
                             'measures:cancelled': i[7],
                             'measures:distance': i[8]})
b.send()

which I can then run from the command-line like this:

[oracle@bigdatalite ~]$ python ./load_update_flight_delays.py

As I said, using this approach I could just as easily connect to a web service or read in data via Flume or Kafka, and I can delete rows as well as insert/update them and add any other logic. From my testing it’s not all that faster than going via HiveQL and INSERT INTO TABLE … SELECT scripts (most probably because I’m still going into HBase indirectly, via the Thrift interface) but it does offer the possibility of direct inserts into HBase (and therefore Hive) from the source application without the intermediate step of writing files to disk.

So to finish this short series, tomorrow I’ll look at how well these Hive-on-HBase tables, and the Impala table I created in the previous example, work when queried from OBIEE11g. Back tomorrow.

Categories: BI & Warehousing

BIP scheduleReport with Parameters

Tim Dexter - Wed, 2015-05-20 14:37

I have just spent an hour or so working on getting a sample scheduleReport web service working with parameter values. There are a lot of examples out there but none I have found have the parameters being set. Our doc is a little light on details on how to set them up :) In lieu of that, here's this!

        // Set the parameter values for the report. In this example we have
        // 'dept' and 'emp' parameters. We could easily query the params dynamically
 
        //Handle 'dept' parameter
        ParamNameValue deptParamNameVal = new ParamNameValue();
        deptParamNameVal= new ParamNameValue();
        deptParamNameVal.setName("dept");
        // Create the string array to hold the parameter value(s)
        ArrayOfXsdString deptVal = new ArrayOfXsdString();
        // For individual values or multiples, add values to the 
        // string array e.g. 10,20,30
        deptVal.getItem().add("10");
        deptVal.getItem().add("20");
        deptVal.getItem().add("30");
 
        // Asterisk used for a null value ie 'All'
        //deptVal.getItem().add("*");

        // add the array to the parameter object
        deptParamNameVal.setValues(deptVal);
 
        //Handle 'emp' parameter
        ParamNameValue empParamNameVal = new ParamNameValue();
        empParamNameVal= new ParamNameValue();
        empParamNameVal.setName("emp");
        ArrayOfXsdString empVal = new ArrayOfXsdString();
        // For individual values or multiples, add values to the string array 
        // empVal.getItem().add("Jennifer Whalen");
        // empVal.getItem().add("Michael Hartstein");

        // Asterisk used for a null value ie 'All'
        empVal.getItem().add("*");
        empParamNameVal.setValues(empVal);
 

        // add parameter values to parameter array        
        ArrayOfParamNameValue paramArr = new ArrayOfParamNameValue();
        paramArr.getItem().add(deptParamNameVal);
        paramArr.getItem().add(empParamNameVal);
 
        //Now add array to values obj
        ParamNameValues pVals = new ParamNameValues();
        pVals.setListOfParamNameValues(paramArr);

 The pVals object can then be added to the report request object.

        req.setParameterNameValues(pVals);

Hopefully, you can extrapolate to your code. JDev application available here, unzip and open the application.
Just the schedule report class is available here.

Categories: BI & Warehousing