BI & Warehousing

Work-Life Balance at Rittman Mead

Rittman Mead Consulting - Wed, 2016-11-30 20:00
Work-Life Balance at Rittman Mead



Rittman Mead has always had a long standing commitment to giving back, not only to the technology industry, but to local and global communities as well.  Recently, Rittman Mead employees have been encouraged to take up to 40 paid hours to participate in community service opportunities.  This year, I chose to use my 40 hours to serve at an orphanage in the Dong Nai province of Vietnam.  The Bien Hoa Center for Supporting and Vocational Training is an orphanage that currently serves 53 children, ranging in age from infants to 16 year olds.

Work-Life Balance at Rittman Mead

Additionally, the Bien Hoa Center was home to my recently adopted son for over 7 years.   So this was a place near and dear to my heart. The orphanage is run by a very attentive staff, who do a great job caring for the kids, despite having very limited resources.   Many of the children are learning english and other useful skills that will serve them well once they leave the orphanage, either through aging out of the program or through adoption.  While we were there, my family and I were able to play with the older children and comfort the babies.  It was a pleasure to see them display such beautiful, wide smiles despite their difficult situations.   Work-Life Balance at Rittman Mead We were also able to deliver a gift donated by many of my generous Rittman Mead colleagues, which included over 60lbs of art supplies, candy and toys.  Despite our consultants being separated across many states, once I posted an opportunity to contribute, gifts just started arriving at our Atlanta office, where I hang my laptop bag.

Work-Life Balance at Rittman Mead

It is truly a pleasure working with such compassionate people and having a management team that values more than just profits.  Caring about the causes that are important to employees is a big part of the Rittman Mead culture. This attitude, coupled with numerous family friendly work events, makes employees feel like more than simply a cog in the wheels of a profit machine. At Rittman Mead, employees are supported in their pursuit of a healthy work-life balance and that is one of the big reasons I am proud to work here.


Categories: BI & Warehousing

Another Thanksgiving Email

Tim Tow - Sat, 2016-11-26 16:06
Here is another email we got Wednesday from a Dodeca Excel Add-In for Essbase customer.  This customer has a lot of VBA macros running some automation with Essbase and asked for some assistance.  We told it was as easy as replacing they Essbase function declarations file with our Dodeca Add-In  function declarations file, and then setting the variables that contain the location for the Dodeca-Essbase server.  In other words, replace this file:

With this file:





Easy, right? Here is the email:

I hadn't had a chance to test this massive file out with the latest version of the add-in yet, but was more than pleasantly surprised when I replaced the add-in code for the Dodeca wrapper and the only thing I had to do was change the connection to our new server and the file was live! It's truly drop and go! Thanks so much!!!!



Categories: BI & Warehousing

Use REST API to get the data

Dylan's BI Notes - Sat, 2016-11-26 10:37
We now see more and more data are available in REST API.  Here are some of the data sources I worked earlier: ServiceNow Table API, REST API for Oracle Service Cloud, Here is a check list to enable a data source for BI: 1. How does the service perform the authentication? In the case of […]
Categories: BI & Warehousing

Rittman Mead at UKOUG Tech 16

Rittman Mead Consulting - Thu, 2016-11-24 11:52
Rittman Mead at UKOUG Tech 16

This year as always Rittman Mead is coming to UKOUG Tech 16 with a strong presence and a great line up of sessions covering OBIEE, ODI, Kafka, advanced visualisation and more. And yes, there will be Cloud!

Rittman Mead at UKOUG Tech 16

Here is the details of the Rittman Mead sessions :

There is no better way to finish a conference than with two success stories from our recent engagements! If the OBIEE 12c upgrade depicted in Francesco's session is something you are also looking to achieve, we would be pleased to tell you more about it and to see how we can help you.

And of course, we are also happy to answer any questions if you see us in sessions or around the conference. You can find some of us during the Oracle Big Data meetup (Monday evening) or the ODTUG Data and Analytics Switzerland meetup (Tuesday evening).

So see you in two weeks in Birmingham !

Categories: BI & Warehousing

Happy Thanksgiving (Featuring an Email That I Am Thankful For)

Tim Tow - Wed, 2016-11-23 11:12

I get tons of email every day and often have hundreds of emails that have to track on a daily basis which explains, in part, my absence from doing recent blog entries.   Most of the emails I track are sales and support relating to Dodeca.  After all, with Dodeca, the buck stops here, right?  I also get some spam in the mix.  Sometimes, there is an email that really makes my day.  Here is one of those types I received this morning:

From: (masked)
Sent: Wednesday, November 23, 2016 8:48 AM
To: Applied OLAP Support
Subject: Happy Thanksgiving!

Although we are a smaller Essbase shop at 105 users, our users are strong advocates for Essbase and use it extensively. So, it is great timing to say how thankful I am to you and your teams for developing the Dodeca Essbase Add-In. At the end of the day, it’s the user experience that drives the support for Essbase and your product will excite our user base! [And, it requires little or no support from my team!]

Happy Thanksgiving to all!


Wow!  This email really made my day!  To our new customer that sent this email (and who graciously granted me permission to post her words), Happy Thanksgiving to you as well.  We hope this is the first of many Thanksgivings that we work together!


Categories: BI & Warehousing

How to Use Versioning in ODI 12c

Rittman Mead Consulting - Wed, 2016-11-23 09:00
How to Use Versioning in ODI 12c

How many times have you been working on a project and something goes wrong, or the power shuts off, or you go on vacation and someone has messed with your code or somehow your work is lost? Well, now you have an alternative to safe proof your project work.

Versioning in ODI is allowed at various hierarchy levels within the instances and is stored in the master repository. What this means is that if you have multiple work repositories connected to the same master, you can see all the versions when connected to either work repository.

A version is a backup copy of an object that is saved as a checkpoint in ODI. ODI allows you to version Projects, Folders, Packages, Scenarios, Load Plans, Mappings, Procedures, Knowledge Modules, Models, Model Folders and Solutions. You will need to decide which objects to create and manage versions for, but this tutorial will review the process using Packages.

Fast Review: A package is made up of steps organized into a diagram that is executed. The steps include mappings, variables, procedures, ODI tools (such as OdiXMLConcat, OdiZip, etc), models, sub-models and datastores.

  1. Connect and move to the Designer Navigator and expand the Projects folder.

    How to Use Versioning in ODI 12c

  2. Select and expand Packages. In our example I will open up Target Data Load.

    How to Use Versioning in ODI 12c

  3. As you see below a well trained ODI developer has set up the load plans to run in a specific execution and if any fail, to send an email.

    How to Use Versioning in ODI 12c

    In addition to a fail notification, a new manager also wants to know the package executed successfully. We will create an original version of the package and then add a new email notification. Once we have finished our changes, we will create a new version.
  4. Right-click on the Target Data Load package and select Version > Create Version

    How to Use Versioning in ODI 12c

  5. Type 'Target Data Load v1' as the name and 'Existing Target Data Load package (original)' in the description. You can name the original (or next version, depending on where you are in your versions) whatever you would like. Best practice is to keep the name simple and a version. Make sure to put more descriptive details about the purpose of the version in the description box below the version name.

    How to Use Versioning in ODI 12c

    Now that we have our original version safely created, we can make our changes to the package.

  6. If it is not expanded, expand the Internet accordion of the ODI toolbox and locate OdiSendMail. Highlight OdiSendMail and place it on the canvas to the right

    How to Use Versioning in ODI 12c

  7. Using the toolbar connect your last mapping to OdiSendMail 2 using the green ok arrow. Remember to click on the mapping and drag your cursor to OdiSendMail 2.

    How to Use Versioning in ODI 12c

  8. Make sure to click on the white cursor from menu (above the Toolbox, to the left of the green arrows) and click Odi Send Mail 2 so the properties window populates with the mail server information. Press Save. Contact your company email or network admin to get the required smtp information.

    Pro-tip: For our demonstrations and tutorials we use mailtrap.io. It is a working dummy smtp testing server.

  9. Once completed your screen should look similar to the image below.

    How to Use Versioning in ODI 12c

  10. Before we version the changes, we should test them. Execute the new package by right clicking on the newly modified package, in our tutorial it will be Target Data Load. Click OK and accept the defaults in the Run window.

    WARNING: Make sure you are in the correct environment for testing and development. DO NOT run any package that will change or affect any real data.

    How to Use Versioning in ODI 12c

  11. Now check your execution under the Operator tab to make sure it was successfully executed.

    How to Use Versioning in ODI 12c

  12. Now we will create a new version of the package that will be the latest version. We will repeat earlier steps, so return to the Designer Navigator > Projects window and click on the Target Data Load package. Right-click Target Data Load package and go to Version > Create Version. Note when the window appears you see the original version in the bottom box.

    Leave the default name and change it to v2. Then for a description put 'Added successful execution for Target Data Load mappings OdiSendMail' and click ok.

    How to Use Versioning in ODI 12c

    You have now created 2 versions of your package: the original version and the modified version with the success email. Versioning is a key feature in ODI that really allows multiple developers to interact and work together to achieve the best results for data integration projects.

Special Note:

In this post, we reviewed how to use internal versioning in ODI. Rittman Mead always recommend to use an external configuration management systems (ex: GitHub) in ODI releases earlier than 12.2.1.2, rather than the internal versioning. In the next ODI patchset, there will be integration with Git for better work flow control. Subversion integration is currently available as of ODI 12.1.3.

If you are interested in seeing how to use Git in conjunction with older versions of ODI (prior to 12.2.1.2.6) stay tuned and check out an upcoming video here using Rittman Mead Principal Consultant Pete Tamsin's method for 'Using Git in an ODI Procedure'.

Huge thanks to Pete Tamsin and Michael Rainey for their help editing this post. No man, or woman, is an island!

Find me on:

How to Use Versioning in ODI 12c
How to Use Versioning in ODI 12c



Related Videos:





Categories: BI & Warehousing

Catalog Validation: Why, What, When, Where and How?

Rittman Mead Consulting - Tue, 2016-11-22 02:04

One of the features everybody "loved" about OBIEE 11g were the Global Unique Identifiers (GUIDs), used to recognize users and groups based on an identifier that could be different from the username. The original aim of GUIDs was being able to distinguish different users sharing the same username coming from multiple Authentication Providers.

The GUIDs management could be tricky especially if they are not in sync between different environments, and could cause a wide range of errors like the inability to login or to see parts of the catalog.

[2016-10-20T09:19:04.000+02:00] [OBIPS] [ERROR:1] [] [saw.security.validate.indexes] [ecid: 0058cGJgGOkBh4sawh3j6G0001QC00000B,0] [tid: 2002437888] XXXX's guid 0A8AC9E0811D11E4AF4FE155B36CBFFD in catalog doesn't match guid 49BB3BB0629311E5BFFE71BB91F31C2B in backend! Aborting! Please UpdateGuids!

After checking the Presentation Services logs (sawlog.log), the solution for most of those errors was simply regenerating GUIDs. The GUIDs regeneration method however isn't something easily doable in a production system since it requires some downtime (a reboot of both the Oracle BI Server and Presentation Services is required).

Why Would you Run Catalog Validation?

You may ask yourself:

Why is he talking about GUIDs when they have been removed in OBIEE 12c?

And you would be perfectly correct. GUIDs misalignment is not a problem anymore in OBIEE 12c but was historically only one of the issues causing catalog corruption and that would require afterwards a catalog validation.

Even without GUIDs catalog corruption is still something that could happen in OBIEE 12c: objects (e.g. analysis, dashboards, agents) owned by deleted users, broken links, corrupted files in the server are only some of the issues that could be present in any OBIEE implementation no matter which version it's installed.
Most of the time corrupted catalogs generate errors which are difficult to diagnose and the manual fixing is not always possible and never easy to do.

The Catalog Validation process, available since OBIEE 11g, is very powerful since provides a detailed analysis - and an automated fix if configured - of all the catalog corruptions.

What is Catalog Validation?

As per Oracle's documentation, the Catalog Validation (CV) procedure does the following checks:

  • Ensures that each object in the catalog is larger than zero bytes: any object with zero bytes size is probably due to corruption and should be removed.
  • Ensures that each item in the catalog has a valid corresponding .atr file: the .atr file contains the properties (permissions, ownership, creation date, modification date etc.) of any object in the catalog. An object without related .atr file is not visible in OBIEE's front-end.
  • Ensures that each link in the catalog is valid: links to deleted or renamed dashboards and analysis will cause an error when clicked.
  • Ensures that the files in the account cache are valid: this step checks that all the accounts are valid and the cache entries (storing user related information) are up to date.
  • Ensures that all XML objects in the catalog pass schema validation: every object (dashboard, analysis, prompt etc.) in the catalog is stored as XML file. This step checks that the XML is valid.
  • Attempts to repair object names that were damaged by ftp programs: moving catalog objects using ftp programs could corrupt the object name.
When Should You Run Catalog Validation?

I've seen Catalog Validation being used only when problems were raised, however it is a good practice to validate the catalog every time a major change is made that impacts it or on a schedule in environments where end users can directly create content.

The following is a list of cases when running a Catalog Validation could be useful:

  • Before an upgrade: running CV before an upgrade and ensuring the consistency helps avoiding problems related to possible corruptions
  • After an upgrade: running CV after an upgrade to ensure that content and security migration worked
  • After a major change: when a major change happens in the catalog CV ensures to missing links or ownership problems are present
  • After a deployment: executing CV after a production deployment to check the content migration and verify the security.
  • On a schedule: execute CV on instances where end-users can create content and to verify accounts.

Please note that a catalog can have corruption even if no front-end enhancements have been made, the following are just some examples:

  • Developer account deletions: all objects owned by that account will be flagged as corrupted
  • Security changes: changing/deleting security roles impact catalog privileges
  • File System corruption: data can be badly written in file system
  • Content deletions: deleting content makes referring objects corrupted

Sometimes the OBIEE environment continues working as expected even if some of the above corruptions are present. Nevertheless on a long period those may be cause of errors especially if upgrades or changes in the security are planned.

Where Do You Run Catalog Validation?

Catalog Validation can be run in every OBIEE instance available, however the following use cases could be particularly interesting:

  • Validating development catalog: once consistency of development catalog is ensured it can then be migrated forward to production
  • Validating production (or smoke test) catalog: validating production catalog to ensure that code promotions happened consistently, that user homes are valid and that no objects (user created or promoted) are broken.

A particularity to note down is that if running CV with a production catalog in a different environment (e.g. development) with a different security store, then many accounts and their related content could be flagged as not-existent and deleted. As a general rule CV should be run on environments sharing the same security as where the catalog is sourced from, allowing a genuine check of the security settings.

Performing a Catalog Validation in production environment is not always possible due to the processes restarts required, a smoke test environment sharing the same security settings would be the perfect target for the test. When running Catalog Validation on a live catalog or when taking a catalog backup ensure that "Maintenance Mode" is activated: setting this flag ON (that can be found under Administration page in OBIEE's front-end) ensures that no changes can be performed in the catalog during the check or upgrade.

Maintenance Mode

How Do You Run Catalog Validation?

In order to run Catalog Validation you need to:

  • Stop Presentation Service[s] (obips): Stopping the component can be achieved either in Enterprise Manager or via command line. Command line syntax has changed between OBIEE 11g and 12c, you can find the two statements in below code
# 11g Syntax
$INSTANCE_HOME/bin/opmnctl stopproc ias-component=obips1
# 12c Syntax
$INSTANCE_HOME/bitools/bin/stop.sh -i OBIPS
  • Create a backup of the catalog: when performing a catalog backup 7-Zip should be the chosen tool. WinZip has know problems with catalog files (see Oracle's doc, chapter "17.10 Archiving and Unarchiving Using Catalog Manager").
  • Create a backup of instanceconfig.xml file (under $INSTANCE_HOME/config/fmwconfig/biconfig/OBIPS)
  • Change instanceconfig.xml file in order to include the validation tags explained in the following section
  • Start Presentation Service[s]: like the stop operation, this can be performed either via EM or command line. Below the code for 11g and 12c
# 11g Syntax
$INSTANCE_HOME/bin/opmnctl startproc ias-component=obips1
# 12c Syntax
$INSTANCE_HOME/bitools/bin/start.sh -i OBIPS
  • Repeat the steps above until the catalog is fully validated: As explained in section below, several different assessment and automated fixes can be performed. The sawlog.log files will contain entries when corrupted object are present in the catalog. A catalog is fully validated when no corrupted objects are found during CV.
  • Stop Presentation Service[s]
  • Restore original instanceconfig.xml file
  • Start Presentation Service[s]
Catalog Validation configuration

The following tags must be inserted under <ServerInstance><Catalog> tag.

<?xml version="1.0" encoding="UTF-8" standalone="no"?>  
    <!-- Oracle Business Intelligence Presentation Services Configuration File -->
    <WebConfig xmlns="oracle.bi.presentation.services/config/v1.1">
    <ServerInstance>
        [...]
        <Catalog>
              <Validate>OnStartupAndExit</Validate>
              <ValidateAccounts>Clean</ValidateAccounts>
              <ValidateHomes>Report</ValidateHomes>
              <ValidateItems>Clean</ValidateItems>
              <ValidateLinks>Clean</ValidateLinks>
        </Catalog>
        [...]
    </ServerInstance>
</WebConfig>  

The tags do the following. See below for an explanation of the values that can be specified:

  • Validate: Main configuration tag. Possible values are
    • None: No Catalog Validation is going to happen, however all the privileges and each object ACLs are cleaned for non-existing accounts
    • OnStartupAndExit: Presentation Service is started, performs the validation based on the following tags and stops. This process can be reiterated multiple times with different options for each element.
  • ValidateAccounts: Verifies the consistency of users, roles and groups.
  • ValidateHomes: Verifies all user's homes, is executed only if ValidateAccounts is set to Report or Clean
  • ValidateItems: Verifies if catalog items are consistent - size greater than zero and valid xml.
  • ValidateLinks: Verifies the consistency of all links in the catalog (e.g. all analysis contained in a dashboard).

The accepted values for all settings except Validate are: are the following:

  • None: no validation will be performed
  • Report: a log is written for every inconsistent item in sawlog.log file under $INSTANCE_HOME/servers/obips1/logs
  • Clean: does the same step as Report plus removing from the catalog the inconsistent object.

As you understand the "Clean" option isn't suggested for all tags, you don't want a dashboard to be deleted only because the owner doesn't exist anymore, but it is the desired choice when you need to remove all the old or corrupted user homes. The "Report" option on the other side provides a way of logging all the corrupted items and fixing them manually.

Catalog Validation is an extremely useful tool, allowing an automated check (and fix) of all the corrupted items in the catalog. Using Catalog Validation together with Baseline Validation Tool provides a way of ensuring the correctness of migrations and developments:

  • Running Catalog Validation before the migration to ensure all objects to promote are consistent
  • Running Catalog Validation after the migration to ensure the consistency of all promoted objects and security
  • Running Baseline Validation Tool between source and target environment to ensure the expected outputs are matching.

Summarizing Catalog Validation and Baseline Validation Tool can be considered complementary: the first checks the catalog objects and security consistency, the second analyses and compares the expected results. Running both alongside any code promotion process should be considered a good practice.

Categories: BI & Warehousing

Common OBIEE Implementation Mistakes That Impact Performance

Rittman Mead Consulting - Thu, 2016-11-17 04:19

I've recently performed several assessments to help our clients diagnose and fix their OBIEE performance problems. They often share similar design or implementation issues, and I thought it would be useful to present them here as a handy reference:

Mistake 1: No Software Monitoring

The key to happy users is to know there’s a problem before they do, and even better, fix it before they realise. Availability of comprehensive performance metrics is vital to achieve this goal, and Usage Tracking data represent the natural starting point. When enabled in OBIEE, Usage Tracking collects statistics for each query and is helpful to understand usage patterns across dashboards and users, and to analyse performance trends and diagnose specific issue. Whilst Usage Tracking is generally always enabled and data it generates reviewed regularly, other software metrics are not gathered at all or are not available to the BI team, giving only a partial view of the picture.

In addition to gathering and monitoring Usage Tracking data, we would strongly recommend:

  • Proper OS monitoring to capture full breadth of OS metrics on both application and database servers including CPU %, Load Average, Memory, Disk I/O, Disk Busy %, Network I/O. Store historic data for trend analysis with appropriate aggregations (so as to not blunt peaks), and near-realtime granular data for interactive performance diagnostics.

  • OBIEE Dynamic Monitoring Service (DMS) metrics collection. These metrics provide information regarding Oracle Fusion Middleware components' performance, state, and on-going behaviour.

    OBIEE Performance Analytics Dashboards

All of these can be done and visualised with Rittman Mead OBIEE Performance Analytics Dashboards. Have a look at these videos to see exactly how the dashboards can help, and get in touch with us to request a demo!

Mistake 2: Randomly Changing Configuration Settings

It’s a common belief, or hope depending on available hardware, among online gamers that by changing configuration settings you can fix any software performance issue. Unfortunately OBIEE is not a game and this is not true! Whilst OBIEE does not provide any Boost Performance switch, the defaults are generally a good starting point, and unnecessary fiddling without good reason should be avoided:

  • You can introduce additional problems by mistake.

  • It makes it hard to track what's changed from a default config.

  • If it fixes the problem, could be coincidence - no actual understanding of the underlying problem without proof of it.

We at Rittman Mead are firm believers in a practical and logical approach to performance analytics and optimisation. Eschewing the drunk man anti-method of ‘tuning’ configuration settings at random, we advocate making a clear diagnosis and baseline of performance problems before changing anything. Once a clear understanding of the situation is established, steps are taken in a controlled manner to implement and validate one change at a time.

Mistake 3: Data Transformation at Query Time

You can usually find a lot of data transformation being done in the RPD itself. This is one of the most useful functionality of OBIEE, since data re-modelling in-flight at query time is usually a lot quicker to implement rather than changing the underlying data model and / or ETL processes.

Wrangling data this way in a small number of instances is common, but widespread use is indicative of a data model and / or ETL design that needs modifying to take into account the report requirements, it's difficult to maintain, and it results in less efficient SQL, as well as unnecessary complex SQL.

  • Conditional transformation - Business is always changing and it may require to group data in a way that was not expected when data model has been designed. For example, you can create a logical column using CASE WHEN statement to define a new country grouping which is not present in your data model.

    Conditional transformation example

  • Data cleansing - Sometimes you may need to integrate social networks data into your corporate reports, but you have limited or no control over these contents. For example, you can use the TRIM function to deal with blanks that are expected in a column containing users' reviews for your products.

    Data cleansing example

  • Data conversion - The classic use case for data conversion is when you have to manipulate dates to provide the required format. For example, you can use CAST and Calendar/Date functions to create the format YYYY-MM, required to define the Month level's key of your Date hierarchy, from a date column.

    Data conversion example

  • Opaque views - An opaque view is a physical layer table that consists of a SELECT statement. The typical use case for opaque views is when you have to turn a transactional data model into a star schema. However opaque views mask the true data structure from OBIEE, preventing it from best generating optimal data access queries, and they should be used only when there are no other available solutions.

We recommend to do this kind of transformations once, at ETL time, rather than every time a particular table or column is used in a query.

Mistake 4: Underuse of Double Column Feature

Double Column is probably one of the most undervalued and unused features of OBIEE. It provides a mechanism for associating two logical columns: one column provides the display and description values such as the description of an item; the second column provides a descriptor ID or code column.

Use should be made of the Double Column feature to enable OBIEE to generate more efficient database physical SQL. The principle behind this is that instead of queries running using string-based predicates which may not be indexed in the database or could be less efficient, the user sees the string version of an attribute whilst OBIEE uses the corresponding numerical ID column in the generated query when it queries the database, thus generally creating more optimal execution plans.

As an example, to do this we start with assigning the P0 Product Number column as the descriptor ID column in the Business Model and Mapping layer for P1 Product column:

Double column feature setup example

Let's now create a simple report in Answers containing T05 Per Name Year and 1- Revenue columns, and apply a filter on P1 Product as well. If you are selecting is equal to / is in, is not equal to / is not in or is between operators, then filtering by code values checkbox will be available:

Filter by code values example

Let's check it, and then see, in terms of SQL, how the query filter is generated:

select sum(T216.Revenue) as c1,  
     T795.PER_NAME_YEAR as c2
from  
     BISAMPLE.SAMP_PRODUCTS_D T451 /* D10 Product (Dynamic Table) */ ,
     BISAMPLE.SAMP_TIME_QTR_D T795 /* D03 Time Quarter Grain */ ,
     BISAMPLE.SAMP_REVENUE_FA2 T216 /* F21 Rev. (Aggregate 2) */ 
where  ( T216.Prod_Key = T451.Prod_Key  
         and T216.Prod_Key = 7
         and T216.Bill_Qtr_Key = T795.QTR_KEY
         and T451.Prod_Key = 7 ) 
group by T795.PER_NAME_YEAR  

Double Column feature can also be used to define language independent filters as my former colleague Venkat already noted in another post.

Mistake 5: Suboptimal Report Design

Certain suboptimal report designs that I've seen recently include overcrowded analyses, abuse of table / pivot prompts, and OBIEE used as a data extraction tool.

Overcrowded analyses / Overuse of the Excluded area in Views

An overcrowded analysis is an analysis with an awful lot of columns included in Criteria tab and several different views with many excluded columns. These views can then be displayed all together or spread out different dashboard pages.

This pattern is quite common with clients that migrated from Hyperion Interactive Reporting and the main motivation for it - there is only one analysis to maintain - may sound good. However, since Oracle BI Server retrieves results for all columns specified in Criteria tab, we recommend to create several analyses with a single view rather than adopt this pattern. In this way the BI Server generates more efficient queries and avoids on the fly data aggregation.

As an example, you can use Oracle Sample Application v607 to create and run an analysis with a Line-Bar view as follows. Note the huge number of columns in the Excluded area.

Overcrowded analysis example

This will results in a simple chart with 3 bars and a line connecting 3 points:

Line-Bar view example

Let's open obis1-query.log log file and have a look at the physical SQL generated by the BI Server:

select sum(T418.Units) as c1,  
     sum(T418.Revenue) as c2,
     T762.Name as c3,
     T666.Office_Dsc as c4,
     T666.Company as c5,
     T418.Order_Status as c6,
     T451.Prod_Dsc as c7,
     T451.Brand as c8,
     T437.Employee_Key as c9,
     T42.Calendar_Date as c10,
     T42.Per_Name_Year as c11,
     T666.Office_Key as c12,
     T762.Cust_Key as c13,
     T451.Prod_Key as c14,
     T666.Company_Key as c15
from  
     BISAMPLE.SAMP_PRODUCTS_D T451 /* D10 Product (Dynamic Table) */ ,
     BISAMPLE.SAMP_TIME_DAY_D T42 /* D01 Time Day Grain */ ,
     BISAMPLE.SAMP_CUSTOMERS_D T762 /* D60 Customers */ ,
     BISAMPLE.SAMP_OFFICES_D T666 /* D30 Offices */ ,
     BISAMPLE.SAMP_EMPL_D_VH T437 /* D50 Sales Rep (Parent Child Hierarchy) */ ,
     BISAMPLE.SAMP_EMPL_PARENT_CHILD_MAP T490 /* D51 Closure Table Sales Rep Parent Child */ ,
     BISAMPLE.SAMP_REVENUE_F T418 /* F10 Billed Rev */ 
where  ( T42.Calendar_Date = T418.Bill_Day_Dt and T418.Cust_Key = T762.Cust_Key and T418.Prod_Key = T451.Prod_Key and T418.Empl_Key = T490.Member_Key and T418.Office_Key = T666.Office_Key and T437.Employee_Key = T490.Ancestor_Key )  
group by T42.Calendar_Date, T42.Per_Name_Year, T418.Order_Status, T437.Employee_Key, T451.Prod_Key, T451.Prod_Dsc, T451.Brand, T666.Company_Key, T666.Office_Dsc, T666.Company, T666.Office_Key, T762.Name, T762.Cust_Key)  

The physical SQL code doesn't look as simple as the resulting view! But the worst is yet to come: it will take about 7 seconds to complete and 256k rows (1.5 GB) will be retrieved from the database!

Let's remove all the excluded columns from the analysis and see what happens to the physical SQL:

select sum(T216.Units) as c1,  
     sum(T216.Revenue) as c2,
     T795.PER_NAME_YEAR as c3
from  
     BISAMPLE.SAMP_TIME_QTR_D T795 /* D03 Time Quarter Grain */ ,
     BISAMPLE.SAMP_REVENUE_FA2 T216 /* F21 Rev. (Aggregate 2) */ 
where  ( T216.Bill_Qtr_Key = T795.QTR_KEY )  
group by T795.PER_NAME_YEAR  

Definitely far better than it looked before. Moreover it will take about 0.2 seconds - 97% less - to complete and only 3 rows (3 KB) - 99% less - will be retrieved from the database.

Abuse of Table / Pivot prompts

Table / Pivot prompts provides an interactive result set that enables users to select the data that they want to view. However Table prompts doesn’t apply any WHERE condition to the query issued by the BI Server: it will retrieve the full result set first, and then select the data to display on the fly. For this reason we recommend to use Dashboard prompts rather than Table / Pivot prompts.

As an example, you can create and run an analysis with a Pivot Table view as follow:

Pivot prompts example

This will results in a pivot table with 8 rows and 25 columns:

Pivot Table view example

Again, let's open obis1-query.log log file and have a look at the physical SQL generated by the BI Server:

select sum(T418.Units) as c1,  
     sum(T418.Revenue) as c2,
     T666.Company as c3,
     T451.Prod_Dsc as c4,
     T451.Brand as c5,
     T653.Per_Name_Month as c6,
     T653.Per_Name_Year as c7,
     T666.Company_Key as c8,
     T451.Prod_Key as c9
from  
     BISAMPLE.SAMP_PRODUCTS_D T451 /* D10 Product (Dynamic Table) */ ,
     BISAMPLE.SAMP_TIME_MTH_D T653 /* D02 Time Month Grain */ ,
     BISAMPLE.SAMP_OFFICES_D T666 /* D30 Offices */ ,
     BISAMPLE.SAMP_REVENUE_F T418 /* F10 Billed Rev */ 
where  ( T418.Prod_Key = T451.Prod_Key and T418.Bill_Mth_Key = T653.Mth_Key and T418.Office_Key = T666.Office_Key )  
group by T451.Prod_Key, T451.Prod_Dsc, T451.Brand, T653.Per_Name_Year, T653.Per_Name_Month, T666.Company_Key, T666.Company  

As expected no WHERE condition based on selected values in pivot prompts was applied to the query. It will take about 0.3 seconds to complete and 2k rows (10 MB) will be retrieved from the database.

Now let's replace pivot prompts with dashboard prompts and see what happens to the physical SQL:

select sum(T418.Units) as c1,  
     sum(T418.Revenue) as c2,
     T451.Prod_Dsc as c3,
     T653.Per_Name_Month as c4,
     T451.Prod_Key as c5
from  
     BISAMPLE.SAMP_PRODUCTS_D T451 /* D10 Product (Dynamic Table) */ ,
     BISAMPLE.SAMP_TIME_MTH_D T653 /* D02 Time Month Grain */ ,
     BISAMPLE.SAMP_OFFICES_D T666 /* D30 Offices */ ,
     BISAMPLE.SAMP_REVENUE_F T418 /* F10 Billed Rev */ 
where  ( T418.Prod_Key = T451.Prod_Key and T418.Bill_Mth_Key = T653.Mth_Key and T418.Office_Key = T666.Office_Key and T451.Brand = 'BizTech' and T653.Per_Name_Year = '2013' and T666.Company = 'Genmind Corp' )  
group by T451.Prod_Key, T451.Prod_Dsc, T653.Per_Name_Month  

It looks better than before, huh? The optimised code will take about 0.1 seconds - 67% less - to complete and 96 rows (0.2 MB) - 95% less - will be retrieved from the database.

OBIEE as a data extraction tool

A large number of clients are using OBIEE mostly as a feed for Excel-marts. Using OBIEE simply as a data extraction tool for lots of data into Excel or other destinations can cause both performance and functional problems. OBIEE is designed to be primarily used as a reporting and analytics tool, with users interacting with the data within OBIEE and, as my colleague Francesco Tisiot already noted in another post, almost any transformation doable in Excel can be achieved in OBIEE, probably faster and with zero impact on local workstations.

This behaviour could highlight a lack of end user training and / or communication between them and BI developers. End users may don't know OBIEE at all, but are used to working with Excel: at the end of the day people just want their job done as painlessly as possible. We recommend to talk to your end users about what they're doing with the data in their destination system, and whether the same can be achieved within OBIEE. If users really do just need the data in Excel, then you should look at how best to do it (Oracle Doc ID 1558070.1).

Conclusion

I've shown here some of the pitfalls that are commonly made with OBIEE. They can impact the performance for end-users, as well as making life more difficult for those building and maintaining the reports.

If you'd like Rittman Mead to take a look at your OBIEE implementation and make sure you're getting the best out of it, please do get in touch!

Categories: BI & Warehousing

Oracle Data Integrator 12c: Getting Started - Developer's Quickstart

Rittman Mead Consulting - Mon, 2016-11-14 13:18

I’ve decided that it’s time for a refresher on Oracle Data Integrator 12c. This week in the “Oracle Data Integrator 12c: Getting Started” series: getting a quick start on mapping development. Several objects must be created before a single bit of ETL can even be created, and for those who are new to the product, as many readers of this series will be, that can be frustrating. The objects that must be in place are as follows:

  • Data Server
  • This object is the connection to your data source. Created under one of the many technologies available in ODI, this is where the JDBC url, username, password, and other properties are all created and stored.
  • Physical Schema
  • Underneath the Data Server you’ll find the Physical Schema. This object, when connecting to a relational database, represents the database schema where the tables reside that you wish to access in ODI.
  • Logical Schema
  • Here’s where it can sometimes get a bit tricky for folks new to Oracle Data Integrator. One of the great features in ODI is how it abstracts the physical connection and schema from the logical objects. The Logical Schema is mapped to the Physical Schema by an object called a Context. This allows development of mappings and other objects to occur against the Logical schema, shielding the physical side from the developers. Now when promoting code to the next environment, nothing must changed in the developed objects for the connection.
  • Model
  • Once you have the Topology setup (Data Server, Physical Schema, Logical Schema), you can then create your Model. This is going to be where the logical Datastores are grouped for a given schema. There are many other functions of the Model object, such as journalizing (CDC) setup, but we’ll save those features for another day.
  • Datastore
  • The Datastore is a logical representation of a table, file, XML element, or other physical object. Stored in the form of a table, the Datastore has columns and constraints. This is the object that will be used as a source or target in your ODI Mappings.

Now you can create your mapping. Whew!

Over the years, Oracle has worked to make the process of getting started a lot easier. Back in ODI 11g, the Oracle Data Integrator QuickStart was a 10 step checklist, where each step leads to another section in the documentation. A nice gesture by Oracle but by no means “quick”. There was also a great tool, the ODI Accelerator Launchpad, built in Groovy by David Allan of the Oracle DI team. Now we were getting closer to something “quick”. But this was simply a script that you had to run, not an integrated part of the ODI Studio platform. Finally, with the release of ODI 12.1.3, the Quickstart was introduced. The New Model and Topology Objects wizard allows you to create everything you need in order to reverse engineer tables into ODI Datastore objects and begin creating your first mappings.

ODI 12c New Model and Topology Objects wizard

Going through the wizard is much simpler than manually setting up the Topology objects and Model for folks just getting started with Oracle Data Integrator. The blog post from Oracle linked above can walk you through the process and I’ve added a demonstration video below that does the same. As a bonus in my demo, I’ve added a tip to help you get your initial load mappings created in an instant. Have a look:

There you have it, a quick and easy way to get started with Oracle Data Integrator 12c and create your first source to target Mapping. If you have further questions and would like a more detailed answer, you can always join one of the Rittman Mead ODI bootcamps to learn more from one of our data integration experts. Up next in the Getting Started series, we’ll look at adding enhancing the ODI metadata by adding constraints and other options.

Categories: BI & Warehousing

Oracle Data Integrator 12c: Getting Started - Installation and Configuration

Rittman Mead Consulting - Tue, 2016-11-08 15:34

I’ve decided that it’s time for a refresher on Oracle Data Integrator 12c. This week, I’m continuing the “Oracle Data Integrator 12c: Getting Started” series with the next topic: Installation and Configuration. I don’t plan to dig into all of the nitty gritty details on each specific scenario you might run into. Rittman Mead have an excellent training program for ODI 12c where you can ask the specifics. There is also this 100+ page document from Oracle that can provide further details: Oracle® Fusion Middleware - Installing and Configuring Oracle Data Integrator 12c (12.2.1) E55859-03. In this post, we’ll hit on the basics and hopefully guide you in the right direction.

First, let’s take a look at the installation process. Installing ODI 12c is quite simple compared to past versions of product. This is mainly due to the configuration of agents being moved outside of the initial installation and into a completely different process. The big choice we have now is which implementation to choose: standard or enterprise.

Standard:
Choose this installation if you plan to use a Standalone Agent to execute your mappings and data integration processes. This will install ODI Studio and the Standalone Agent, along with all of the necessary drivers and libraries to make everything work. Choose this option when installing ODI Studio on your client machine.

Enterprise:
This option will install the JEE Agent, Standalone Colocated Agent, as well as ODI Studio. The installation footprint is slightly larger than the Standard install, coming in at 3390 MB vs 3017 MB for standard. That’s a part of why I choose Standard when just installing Studio. Along with the JEE Agent and ODI Studio, you’ll also have the ODI Console and ODI plugin for Fusion Middleware Control (Enterprise Manager). One thing to note, you’ll need to first install the Oracle Fusion Middleware Infrastructure (download) package as it is a prerequisite to the ODI 12c Enterprise version.

To get started, download the bits and bytes from Oracle OTN and unzip the file. Next, using the appropriate Java version, run the ODI installer.

[oracle@bigdatalite bin]$ ./java -version
java version "1.8.0_111"
Java(TM) SE Runtime Environment (build 1.8.0_111-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.111-b14, mixed mode)
[oracle@bigdatalite bin]$ cd ../..
[oracle@bigdatalite odi122120]$ ./jdk1.8.0_111/bin/java -jar fmw_12.2.1.2.0_odi.jar 
Launcher log file is /tmp/OraInstall2016-10-21_12-20-12PM/launcher2016-10-21_12-20-12PM.log.
Extracting the installer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . .

From there, simply go through the install wizard. I won’t go into too much because there’s not much to it. First, decide whether you will install the Standard or Enterprise version. Next, determine where you want to install your ODI home. For a JEE or Colocated Agent installation, ODI must be installed into the Oracle Home containing Fusion Middleware Infrastructure. For Standalone agent, a brand new Oracle Home must be used. Now that we’ve installed the software, it’s time to setup the repository database schema.

Master and Work Repository Setup

The Repository Configuration Utility (RCU) is delivered as a part of the ODI install. It is used to create Fusion Middleware repositories, such as the ODI Master/Work Repositories, and any additional database schemas necessary.

Start the RCU by browsing to ODI_HOME/oracle_common/bin and running ./rcu. The steps are pretty straightforward, so I’ll leave this as an exercise for everyone to try at home. Once we have our repositories, we can begin configuring Agents and Studio.

ODI Studio Configuration

Setting up ODI Studio is fairly straightforward, so I’ll keep it brief. Once installed, browse to ODI_HOME/odi/studio and fire up ./odi.sh (or odi.exe in Windows). Then, create a new connection to your ODI Master and Work repository. Enter the ODI username/password to connect with, master repository schema username/password, JDBC connection information, and Work repository. Test the connection to ensure it’s all working and you’re set.

Standalone Agent Configuration

Browse to your /oracle_common/common/bin and run the WebLogic Server Configuration wizard.

./config.sh

Create ODI Agent

First, create a new domain for the agent. This is similar to a WebLogic domain if you’ve gone through the setup process for any application that relies on WLS. It’s recommended to create the domain outside of the ODI Oracle Home. That way you can avoid issues when upgrading/reinstalling and can follow the same domain directory structure if installing multiple agents, since each must be installed in its own domain.

Next, choose the Standalone Agent Template, installed as part of the ODI download. Then, select the JDK to be used by the agent.

Configure ODI Agent component

The repository is already setup, so now if we connect to the service table (STB) schema, we can look up the ODI repository connection information without having to manually enter anything.

Get RCU configuration

The system component will be the ODI Agent name. Then, set the host server listen address, port, and Supervisor username/password.

ODI Agent configuration

Finally, setup the Node Manager for the ODI Agent. The Node Manager doesn’t have to be used when starting the agent, but it is a requirement to set up during configuration. The Username/Password should be kept in a safe place, as it will be necessary for starting components within the Node Manager.

Finally, click Create and you’re off and running.

Starting the Standalone Agent

Before you can start the agent, be sure to configure it in the master repository via ODI studio. Check out the video below for a short tutorial.

Now, you can startup the agent with the node manager or without it. If we want to manage the agent via Fusion Middleware Control, use the Node Manager. But, if you plan to simply use the command line to start / stop the agent, you can do so using the classic startup method.

Start Agent with Node Manager

From within the DOMAIN_HOME/bin directory, run the following. You’ll want to use nohup to start the process in the background on Linux. For Windows, create a service to run nodemanager.cmd.

[oracle@bigdatalite bin]$ pwd
/u01/odi_agent/user_projects/domains/base_domain/bin
[oracle@bigdatalite bin]$ nohup ./startNodeManager.sh > nm.out&
[oracle@bigdatalite bin]$ nohup: ignoring input and redirecting stderr to stdout

Next, start the agent component from within the same directory. You’ll be prompted for the Node Manager password setup during configuration.

[oracle@bigdatalite bin]$ ./startComponent.sh OracleDIAgent1
Starting system Component OracleDIAgent1 ...

Initializing WebLogic Scripting Tool (WLST) ...
...
Reading domain from /u01/odi_agent/user_projects/domains/base_domain

Please enter Node Manager password:
Connecting to Node Manager ...
Start Agent without Node Manager

For times when you don’t need the Node Manager, you can still start the ODI Agent using the classic method. Just remember to always use all-caps when referring to the agent parameters, such as NAME.

./agent.sh -NAME=OracleDIAgent1

Either approach for starting the agent will get you up and running.

Colocated Agent Configuration

The Colocated Agent is essentially a Standalone agent that is managed and maintained via WebLogic Server. Rather than repeat the steps, I thought I would point you to a nice Oracle By Example on Creating a Colocated Agent. As mentioned earlier, be sure to create your domain outside of the ODI home.

Ping colocated agent

For reference, when testing an agent, simply place the hostname:port/agent-web-application-context in a browser. The Ping response means the agent is up and running.

JEE Agent Configuration

Finally, we’ll discuss the JEE Agent configuration. As mentioned in my previous post, the main reasons for using the JEE Agent is for high availability or connection pooling. You may also want to use the ODI Console or need to manage and monitor ODI agents from FMW Control. Whatever the reason, here’s how we configure the agent.

The configuration process is very similar to that of the Colocated Agent, but with a several extra steps involved. I’ll highlight those steps here. To get started, just as with the other agents, browse to the ODI_HOME/oracle_common/common/ directory and run ./config.sh.

Work through the usual first few installer screens. Then, we select the template to configure. Choose ODI Agent and any additional items you’d like to add to the domain, such as ODI Console or Enterprise Manager plugin.

JEE Agent template

Next, add the credential mappings for both the ODI user, allowing JEE agent access to the Master / Work Repository, and the WebLogic domain user.

JEE Agent credential mapping

Select the options you plan to configure for the domain. This will vary based on what currently exists in WebLogic Server, but in this example we’ll be setting up a new AdminServer, Node Manager, and the Managed Server to host the ODI JEE Agent.

JEE Agent domain configuration

Next, configure the AdminServer, Node Manager, and Managed Server. For each, remember to use the machine IP address and not the default “Local All Machines”. For the node manager, enter a username and password as we did in previous configurations.

JEE Agent server configuration

The final screens in the configuration wizard allow you to create a cluster and assign the cluster to the server. You can assign a dynamic cluster as needed for scalability, though this falls outside the context of this article. Assign the cluster to a server, create the machine (remember to assign the IP address), and finally assign the servers (AdminServer and Managed Server) to the machine.

JEE Agent topology

Click create and you're done! Now start weblogic, node manager, then login to console to start ODI managed server and agent.

There you are, ODI Agents installed and configured. As always, there are many external factors that may change how your configuration and setup works in your environment, so please think of this as a simple guide and not the exact approach for each unique situation. If you do have further questions and would like a more detailed answer, you can always join one of the Rittman Mead ODI bootcamps to learn more from one of our experts on the product. Up next in the Getting Started series, we’ll look at creating your first ODI mapping.

Categories: BI & Warehousing

Oracle Data Visualization Desktop: Star Schemas and Multiple Fact Sources

Rittman Mead Consulting - Tue, 2016-11-08 04:00

Once added to a project, columns I specified with my custom SQL query now shows up as one source.

Now that I have a custom data source with only the data I want to explore, I can easily create an analysis without having to sift through multiple data sources.

* A note about Data Visualization Desktop and caching: When using the above method of writing SQL to create a virtual table, it is loaded into the cache. You should only use this method for very compact models. If the virtual table contains too much data, you can still add it as a data source but it may be too big to cache, causing your columns not to load when creating analyses.

Although having one fact source is common in relational models, using multiple fact sources is sometimes unavoidable in dimensional modeling.

In my sample data, I have another schema called GCBC_SURVEYS which contains two fact tables containing satisfaction scores for both customers and staff and one dimension table containing the organization that conducted the surveys.

For this example, I’m going to try to add each table as a data source manually first and attempt to join the two fact tables to my dimension table. When using this method, pay close attention to change any key and ID columns from Measure to Attribute so they aren’t aggregated. Data Visualization Desktop sees a numeric datatype and assumes it’s a measure.

Once I've added in all of the GCBCSURVEYS tables as data sources, I’m going to load them all into a project and create my joins using source diagram. When I joined each fact table to the dimension table on the SURVORG_ID, notice how DVD automatically created a join between my two fact tables.

This is not desirable because, due to the presence of a circular join, we run the risk of double counting. When I try to break the join between the two fact tables, DVD asks which data source I want to break conformance from.

When I select one of the fact sources, it will not only break the join between the two fact sources but also the join between the fact and the dimension table.

As of this writing, I have not found a way to only break joins between fact tables if they are using the same key to connect to the dimension table.

The only workaround to this I’ve found is to write SQL statement to pull in the columns and create the joins into one virtual table. This way I could specify the joins without DVD creating one between the fact sources.

Once I created my virtual table, I could use it to create a report and use both fact sources.

Although it can take some time to set up all the data sources you want to use for your model, Data Visualization Desktop packs some powerful features when it comes to dimensional modeling. The ability to use more than one fact source when needed adds another area of flexibility to DVD. Whether you are pulling in each table and then creating your model or writing a SQL to create one virtual table, DVD has the flexibility to be able to accommodate a variety of different scenarios.

Categories: BI & Warehousing

Creating Security Profiles in ODI 12c

Rittman Mead Consulting - Fri, 2016-11-04 05:00
Creating Security Profiles in ODI 12c

As a newcomer to ODI I enjoy hearing from the more seasoned veterans about common situations they encounter on projects. One of these recurring situations (especially if the company has a very small dev team) is the lack of security. I will not discuss how Oracle improved security by using public/private key pairs for Cloud services, external hackers or any of the buzz words the media likes to toss about. But, I will share with you an easy way to create profiles in ODI to setup a more secure work environment.

Generally speaking, security is neglected because admins, operators or users are not aware of how to set it up or they find it too limiting and tedious to deal with. Other times you might see the exact opposite, where someone has it so locked down you have to request project permissions on the hour just to get work done (Pro-tip: never let control freaks setup or manage security! Just kidding. Maybe.)

Prior to starting any security profile setups, make sure to sit down and really put some thought into the types of profiles you want to create. Think about the different types of work being done in ODI (developer, operator, etc) and what level of permission someone may require. Review the built-in generic profiles here. Keep in mind that you will need to setup security in each environment (Dev, Test, QA, Prod and any others you might use) that you want to connect to. No security setup 'automatically' transfers over to other environments, and not all users require access to each environment.

In this tutorial we will take into consideration the following users:

  • DI Projects Developer - Level I
  • Senior BI Project Manager
  • Consultant

We will setup the Security Profile (access) for each user and connect it to the appropriate User for the DEV environment.

NOTE: This tutorial is specific to ODI internal password storage and authentication, not external authentication.

The first step is to decide what type of security profile (access) each user will need.

  • DI Projects Developer - Level I: Entry level DI developer. Should be able to develop in select projects only. Should also have 'view-all' access across the environment
  • Senior BI Project Manager: Full access to all related tasks in Designer, Operator and Topology. Might also have ability to edit or create new users.
  • Consultant: Brought in to assist in developing mappings and to load new data from a recently acquired company
  • Ok, now we can begin the setups.

    1. In a work environment you will login using an ADMIN or Security Profile that has the credentials to create security profiles. In our example you are assumed to be logged in as Admin, Training or Supervisor and have the correct access to set the profiles up.
    2. Navigate to the Security tab in ODI.
    3. Creating Security Profiles in ODI 12c
      If you do not see it, go to the very top menu and click Window > ODI Security Navigator (seen below)
      Creating Security Profiles in ODI 12c

      Now we will create the User logins that will be linked to each profile.

      Most of you will already have User logins, just double-click the specific 'User Profile' when logged in under the ADMIN or full access account to edit permissions for the User.

    4. Expand the 'Users' accordion and click 'New User' to open up the properties window
      Creating Security Profiles in ODI 12c
    5. Create a login for Jane Maine, our Level I ETL Developer. Assign her the profiles that will allow strictly regulated access in the Designer Navigator to projects and models but allow her view access for everything. (Review the profile descriptions)

      We see that in order to limit her development access in DEV to specific projects and models in the Designer Navigator, we must use the non-generic profiles NG_DESIGNER and NG_VERSION_ADMIN. We also must include CONNECT so she has the option to connect to DEV.

      Fast Review: An object is a representation of a design-time or run-time artifact handled through Oracle Data Integrator. Examples of objects include agents, projects, models, data stores, scenarios, mappings, and even repositories. An instance is a particular occurrence of an object. For example, the Datawarehouse project is an instance of the Project object. A method is an action that can be performed on an object, such as edit or delete.

      Generic profiles allow access to all methods of all instances of an object.

      Non-generic profiles are not authorized for all methods on the instances, an admin must grant rights on the methods for each instance.

    6. Input Jane's information and assign her a password (abc123) by clicking 'Enter Password'. Make sure that Jane's password will expire in 6 months, forcing her to change it for security purposes. Click 'OK'
    7. Best Practice: Always go in to your account and change any temporary password. See the video on how to do that, here.

      Creating Security Profiles in ODI 12c
      Your screen should now look like this (description is optional - I always add them in):
      Creating Security Profiles in ODI 12c

    8. On the same properties window click the 'Authorization' tab on the top left side. We are granting very limited access because her projects, as a entry level developer, are limited. Click 'Projects' and allow her access to all methods within 'Projects'. Select all methods (use the checkmark in the top left of each objects) from the following: Select SAVE after each object group methods have been selected:
      • Column
      • Condition
      • Diagram
      • Folder
      • Interface
      • Load Plan
      • Mapping
      • Package
      • Procedure
      • Procedure Command
      • Scenario
      • Scenario Variable

      Your Security Navigator should look similar to this:
      Creating Security Profiles in ODI 12c

    9. Now we create the User and Profile for a recently hired Senior BI Manager named Will Doe. Following the same steps, create the User by expanding (or locating) the Users accordion and clicking New User. Make sure to set the password to expire in 6 months.

    10. Creating Security Profiles in ODI 12c
      Creating Security Profiles in ODI 12c

    11. Unlike the entry level employee, Will Doe needs full access as Senior Manager but he does not need Supervisor access. Check each generic profile (do not check any that start with NG) and click save. Your screen should look similar to the image below.
      Creating Security Profiles in ODI 12c
      Pro Tip: If you aren't sure your security settings are correct, after your new user/profile is saved, expand the 'Objects' and/or 'Instances' (orange boxes on the screenshots above) under the Users name and see what is available.
    12. Now we need to create the Consultants general User and profile. The Consultant password does not need to expire, since we will let the account expire after a month.

    13. Create a new User under the Users accordion. Use the name: 'Consultant', Password: abc123, Notes: Temp consultant for ETL DEV work only.
    14. In this situation, the consultant will need nearly full access but not total access to everything. Check all of the generic profiles EXCEPT version admin. Select the NG VERSION ADMIN to allow selective version access. Your screen should look similar to below.
      Creating Security Profiles in ODI 12c

    15. Click on the 'Authorizations' tab on the top left and scroll down in the objects list and select 'Version' and check only Compare, Restore and View. Click Save. Your image should look similar to below.

    16. Creating Security Profiles in ODI 12c

      Now we test our user settings. Disconnect ODI and login using each USER you created. Look at the limitations for each user.

    17. This screenshot shows how Jane Maine can only access Projects and Load Plans, but not any of the models. What are differences you see for your profiles?

    18. Creating Security Profiles in ODI 12c

      There are so many options for creating secure Users and Profiles within ODI that allow the appropriate amount of access with maximum security - and in fact, it's fairly easy. You can block out high level access such as the entire Operator Navigator or Designer Navigator, all the way down to granular level security where you can block out very specific Methods and Objects associated with it.

      A word to the wise: It is strongly suggested that you only use a generic SUPERVISOR or ADMIN account that has full ODI access for creating users, profiles, changing passwords, etc. Create your own personal user/profile to perform daily work. The reason for this is to know who specifically is doing what. If the user is assigned ADMIN (or something generic) then there is no way to tell who used the login.

      Other suggested settings to try out: You can create Users and Profiles for admin purposes including a 'Designer Navigator only' access, 'Topology Navigator only' access, 'Operator Navigator only' access and variations where you can only access the Designer Navigator and Toplogy navigator, but not the Operator tab.
      ------------
      Make sure to check out these videos and subscribe:

Categories: BI & Warehousing

Combining Google Analytics and JSON data through Apache Drill in Oracle Data Visualization Desktop

Rittman Mead Consulting - Fri, 2016-11-04 03:26

I've been talking a lot about Oracle's Data Visualization Desktop (DVD) recently, explaining DVD 12.2.2.0 new features and the details of Data Flow component via a fantasy football use case.

Yesterday a new requirement was raised within Rittman Mead: we wanted to analyse our blog stats and specifically understand the number of page views per author of blog posts published in 2016. The two sources of our data were:

My colleague Robin Moffatt already explained in his post how to connect DVD to Google Analytics and how to query JSON files with Apache Drill. Both sources are compatible with DVD, and in this blog post I'll explain my journey in the analysis with the tool to combine both sources.

Ghost JSON data preparation

Following Robin's blog I created two Apache Drill views on top of Ghost Data:

  • vw_fact_post: containing the post information
create or replace view vw_fact_posts as  
 select
 po.post.id id,
 po.post.uuid uuid,
 po.post.title title,
 po.post.slug slug,
 po.post.markdown markdown,
 po.post.published_by published_by,
 cast(po.post.published_at as date) published_at,
 po.post.updated_at updated_at,
 po.post.created_by created_by, 
 cast(po.post.created_at as date) created_at,
 po.post.author_id author_id,
 po.post.meta_description meta_description,
 po.post.visibility visibility,
 po.post.`language` lan,
 po.post.status status from 
 (select flatten(ghost.db.data.posts) post from dfs.tmp.ghost) po;
  • vw_dim_author: containing author data.
select author.u.id id,  
 author.u.name name,
 author.u.slug slug,
 author.u.password pwd,
 author.u.email email,
 author.u.image image,
 author.u.status status,
 author.u.`language` lan,
 author.u.visibility visibility,
 author.u.last_login last_login,
 author.u.created_at created_at,
 author.u.updated_at updated_at,
 author.u.updated_by updated_by 
from (select flatten(ghost.db.data.`users`) u from dfs.tmp.ghost) author;  

The views are not strictly required for the purpose of the analysis since Drill SQL can be directly injected in DVD however creating them has two advantages:

  • the interface between DVD and Drill is cleaner, no complex sql has to be entered and debugged
  • the views can be reused for other projects outside DVD if needed
DVD Data Source Settings

Robin's post provided all the details needed to connect to Google Analytics, no need to add anything there. Apache Drill datasource setting is pretty easy - we just need to specify hostname and port where Drill is running along with the connection username and password.

Drill setup

Once the Drill connection is working I can start importing the views. I do it by selecting the myDrill connection, choosing the dfs.tmp database, selecting the view I want to import and clicking on Add All or selecting the columns.

Drill import

When clicking on OK an "Unsupported SQL statements" error may be raised. This is due to the wrong usage of double quotation marks (") instead of the backtick (`) needed by Drill. I amended the error by clicking on Enter SQL and changing the Drill SQL as in image below.

Drill Error

Having imported the two sources I can review the Measure/Attribute definitions as well as the aggregation methods. This is a crucial point since the associations are made automatically and could be wrong. In my case author_id column was automatically declared as a Measure, which prevented me from using it in joins. I can fix the definitions by right clicking on the source, select Inspect and amend the Measure/Attribute definition.

Drill Measure

Wrangling Google Analytics Data

DVD's Google Analytics connector exposes a set of pre-aggregated views of the tracking data. The Page Tracking view contains a summary of page views, entrances, exit rates and other KPIs at page level - exactly the information I was looking for.

Page Tracking

I then started analysing Page Tracking data within DVD, and found some discrepancies within the data.

  • we recently moved our blog from Wordpress to Ghost: all pages accessed since rittmanmead.com website is on Ghost have the /blog/ prefix, the same was not happening when we were still in Wordpress.
  • the page name could contain a query string appendix like /?.....
  • Ghost slug contains the pure page name, without folder prefix like /blog/2014/ and without starting and ending /

The following is an example of data retrieved from Google Analytics, all for one actual blog post.

GA Row Data

The wrangling of Google Analytics data was necessary and DVD's Data Flow component suited perfectly my needs. I started by importing Page Tracking data source, then by adding two columns:

  • RemovingBlogFromPageName in order to avoid the Ghost/Wordpress problem described above with a case-when and removing the initial / using the substring

  • PageNameClean to remove extra appendix like /?..... with a substring function

Code

I then added the Aggregate step to define the Measures/Attributes and aggregation types. Finally I saved the result locally.

Global GA Flow

The two columns were added in two consecutive Add Columns steps since RemovingBlogFromPageName column is a dependency in the in PageNameClean formula. In a single Add Columns step several columns can be added if they are referencing columns already existing in the dataset.

Creating the Project

In my previous post I linked several data sources with Data Flow, this is always possible but in this case I tried a different approach: the link between Google Analytics and Ghost data (through Drill) was defined directly in DVD Project itself.

First step was to include the Drill datasources: I added vw_fact_post initially and then vw_dim_author by right clicking and selecting Add Data Souce.

Drill data sources

Both sources are now visible in my project and the join between them is auto-generated and based on column name matching. In my case this was wrong and I reviewed and changed it by right clicking, selecting Source Diagram and correcting the joning conditions.

Drill Joining Conditions

Note that I set the Parameter "This connection will" to "Extend a dimension" since the author data is adding attributes to my post data.

Next step was adding Google Analytics refined dataset that I stored locally with Data Flow. I can do it by right clicking, selecting Add Data Souce and then GAAnalyticsDataCleansed. The new source could not be visible immediately in my project, this was due to the lack of matching columns names for the automatic join to be working. I added the joining condition in the Source Diagram section.

GA join

Note that in this case the "This connection will" parameter is set to "Add Facts" since Google Analytics data contains the number of page views and other metrics.
After amending the joining condition I'm finally able to include any columns from my datasource in my project. Here we’ve met the requirement to see post metrics by author, all in DVD and through a very intuitive and accessible interface.

Categories: BI & Warehousing

Whatever next? The battle to keep up with changes in technology

Rittman Mead Consulting - Wed, 2016-11-02 10:00

I’ve been managing Rittman Mead’s Training services in the UK & Europe for over a year now, working closely with my colleagues in the US to ensure we continue to coherently deliver expert instructor led educational courses to the very highest standards.

One thing I’ve noticed is that while a great deal of value is gained by attending instructor led training, particularly if you’re upgrading or installing a new product to your data platform, you can find that two or three months down the line a new version is released which comes with new features and functions that weren’t previously taught.

Take OBIEE 12c for example, it first hit the shelves back in October 2015 and we are now on the 3rd release. In the last year, we’ve seen additions such as Data Visualization Desktop (see my friend Francesco’s recent blog post) and the ability to incorporate even more data sources to your OBIEE system (see my friend Matt’s new features guide!)

At Rittman Mead, we’ve been working on a service which we believe will complement our customer’s initial investment in instructor led training sessions. While we always wish to retain an open line of communication with everyone we have trained in the past, we realise that offering contemporary content that’s more attuned to end-users everyday roles is beneficial and necessary.

Our on-demand training service will offer top-up courses as well as our current catalog of training. It's designed to fit around users’ busy schedules with self paced learning to help people get the best out of their investment in new technology.

For more information including sneak peeks & early bird offers for our on-demand training services, please register your details here.

#mc_embed_signup{background:#fff; clear:left; font:14px Helvetica,Arial,sans-serif; } /* Add your own MailChimp form style overrides in your site stylesheet or in this style block. We recommend moving this block and the preceding CSS link to the HEAD of your HTML file. */
Categories: BI & Warehousing

Using SQL to Query JSON Files with Apache Drill

Rittman Mead Consulting - Tue, 2016-11-01 09:49

I wrote recently about what Apache Drill is, and how to use it with OBIEE. In this post I wanted to demonstrate its great power in action for a requirement that came up recently. We wanted to analyse our blog traffic, broken down by blog author. Whilst we have Google Analytics to provide the traffic, it doesn't include the blog author. This is held within the blog platform, which is Ghost. The common field between the two datasets is the post "slug". From Ghost we could get a dump of the data in JSON format. We needed to find a quick way to analyse and extract from this JSON a list of post slugs and associated author.

One option would be to load the JSON into a RDBMS and process it from within there, running SQL queries to extract the data required. For a long-term large-scale solution, maybe this would be appropriate. But all we wanted to do here was query a single file, initially just as a one-off. Enter Apache Drill. Drill can run on a single laptop (or massively clustered, if you need it). It provides a SQL engine on top of various data sources, including text data on local or distributed file systems (such as HDFS).

You can use Drill to dive straight into the json:

0: jdbc:drill:zk=local> use dfs;
+-------+----------------------------------+
|  ok   |             summary              |
+-------+----------------------------------+
| true  | Default schema changed to [dfs]  |
+-------+----------------------------------+
1 row selected (0.076 seconds)
0: jdbc:drill:zk=local> select * from `/Users/rmoff/Downloads/rittman-mead.ghost.2016-11-01.json` limit 1;
+----+
| db |
+----+
| [{"meta":{"exported_on":1478002781679,"version":"009"},"data":{"permissions":[{"id":1,"uuid":"3b24011e-4ad5-42ed-8087-28688af7d362","name":"Export database","object_type":"db","action_type":"exportContent","created_at":"2016-05-23T11:24:47.000Z","created_by":1,"updated_at":"2016-05-23T11:24:47.000Z","updated_by":1},{"id":2,"uuid":"55b92b4a-9db5-4c7f-8fba-8065c1b4b7d8","name":"Import database","object_type":"db","action_type":"importContent","created_at":"2016-05-23T11:24:47.000Z","created_by":1,"updated_at":"2016-05-23T11:24:47.000Z","updated_by":1},{"id":3,"uuid":"df98f338-5d8c-4683-8ac7-fa94dd43d2f1","name":"Delete all content","object_type":"db","action_type":"deleteAllContent","created_at":"2016-05-23T11:24:47.000Z","created_by":1,"updated_at":"2016-05-23T11:24:47.000Z","updated_by":1},{"id":4,"uuid":"a3b8c5c7-7d78-442f-860b-1cea139e1dfc","name":"Send mail","object_type":"mail","action_

But from this we can see the JSON object is a single column db of array type. Let's take a brief detour into one of my favourite commandline tools - jq. This let's you format, filter, and extract values from JSON. Here we can use it to get an idea of how the data's structured. We can do this in Drill, but jq gives us a headstart:

We can see that under the db array are two elements; meta and data. Let's take meta as a simple example to expose through Drill, and then build from there into the user data that we're actually after.

Since the root data element (db) is an array, we need to FLATTEN it:

0: jdbc:drill:zk=local> select flatten(db) from `/Users/rmoff/Downloads/rittman-mead.ghost.2016-11-01.json` limit 1;
+--------+
| EXPR$0 |
+--------+
| {"meta":{"exported_on":1478002781679,"version":"009"},"data":{"permissions":[{"id":1,"uuid":"3b24011e-4ad5-42ed-8087-28688af7d362","name":"Export database","object_type":"db","action_type":"exportContent","created_at":"2016-05-23T11:24:47.000Z","created_by":1,"updated_at":"2016-05-23T11:24:47.000Z","updated_by":1},{"id":2,"uuid":"55b92b4a-9db5-4c7f-8fba-8065c1b4b7d8","name":"Import database","object_type":"db","action_type":"importContent","created_at":"2016-05-23T11:24:47.000Z","created_by":1,"updated_at":"2016-05-23T11:24:47.000Z","u

Now let's query the meta element itself:

0: jdbc:drill:zk=local> with db as (select flatten(db) from `/Users/rmoff/Downloads/rittman-mead.ghost.2016-11-01.json`) select db.meta from db limit 1;
Nov 01, 2016 2:18:31 PM org.apache.calcite.sql.validate.SqlValidatorException <init>
SEVERE: org.apache.calcite.sql.validate.SqlValidatorException: Column 'meta' not found in table 'db'
Nov 01, 2016 2:18:31 PM org.apache.calcite.runtime.CalciteException <init>
SEVERE: org.apache.calcite.runtime.CalciteContextException: From line 1, column 108 to line 1, column 111: Column 'meta' not found in table 'db'
Error: VALIDATION ERROR: From line 1, column 108 to line 1, column 111: Column 'meta' not found in table 'db'

SQL Query null

[Error Id: 9cb4aa98-d522-42bb-bd69-43bc3101b40e on 192.168.10.72:31010] (state=,code=0)

This didn't work, because if you look closely at the above FLATTEN, the resulting column is called EXPR$0, so we need to alias it in order to be able to reference it:

0: jdbc:drill:zk=local> select flatten(db) as db from `/Users/rmoff/Downloads/rittman-mead.ghost.2016-11-01.json`;
+----+
| db |
+----+
| {"meta":{"exported_on":1478002781679,"version":"009"},"data":{"permissions":[{"id":1,"uuid":"3b24011e-4ad5-42ed-8087-28688af7d362","name":"Export database","object_type":"db","action_type":"exportConten

Having done this, I'll put the FLATTEN query as a subquery using the WITH syntax, and from that SELECT just the meta elements:

0: jdbc:drill:zk=local> with ghost as (select flatten(db) as db from `/Users/rmoff/Downloads/rittman-mead.ghost.2016-11-01.json`) select ghost.db.meta from ghost limit 1;
+------------------------------------------------+
|                     EXPR$0                     |
+------------------------------------------------+
| {"exported_on":1478002781679,"version":"009"}  |
+------------------------------------------------+
1 row selected (0.317 seconds)

Note that the column is EXPR$0 because we've not defined a name for it. Let's fix that:

0: jdbc:drill:zk=local> with ghost as (select flatten(db) as db from `/Users/rmoff/Downloads/rittman-mead.ghost.2016-11-01.json`) select ghost.db.meta as meta from ghost limit 1;
+------------------------------------------------+
|                      meta                      |
+------------------------------------------------+
| {"exported_on":1478002781679,"version":"009"}  |
+------------------------------------------------+
1 row selected (0.323 seconds)
0: jdbc:drill:zk=local>

Why's that matter? Because it means that we can continue to select elements from within it.

We could continue to nest the queries, but it gets messy to read, and complex to debug any issues. Let's take this meta element as a base one from which we want to query, and define it as a VIEW:

0: jdbc:drill:zk=local> create or replace view dfs.tmp.ghost_meta as with ghost as (select flatten(db) as db from `/Users/rmoff/Downloads/rittman-mead.ghost.2016-11-01.json`) select ghost.db.meta as meta from ghost;
+-------+-------------------------------------------------------------+
|  ok   |                           summary                           |
+-------+-------------------------------------------------------------+
| true  | View 'ghost_meta' created successfully in 'dfs.tmp' schema  |
+-------+-------------------------------------------------------------+
1 row selected (0.123 seconds)

Now we can select from the view:

0: jdbc:drill:zk=local> select m.meta.exported_on as exported_on, m.meta.version as version from dfs.tmp.ghost_meta m;
+----------------+----------+
|  exported_on   | version  |
+----------------+----------+
| 1478002781679  | 009      |
+----------------+----------+
1 row selected (0.337 seconds)

Remember that when you're selected nested elements you must alias the object that you're selecting from. If you don't, then Drill assumes that the first element in the column name (for example, meta.exported_on) is the table name (meta), and you'll get an error:

Error: VALIDATION ERROR: From line 1, column 8 to line 1, column 11: Table 'meta' not found

So having understood how to isolate and query the meta element in the JSON, let's progress onto what we're actually after - the name of the author of each post, and associated 'slug'.

Using jq again we can see the structure of the JSON file, with the code taken from here:

> jq 'path(..)|[.[]|tostring]|join("/")' rittman-mead.ghost.2016-11-01.json |grep --color=never post|more
"db/0/data/posts"
"db/0/data/posts/0"
"db/0/data/posts/0/id"
"db/0/data/posts/0/uuid"
"db/0/data/posts/0/title"
[...]

So Posts data is under the data.posts element, and from manually poking around we can see that user data is under data.users element.

Back to Drill, we'll create views based on the same pattern as we used for meta above; flattening the array and naming the column:

use dfs.tmp;
create or replace view ghost_posts as select flatten(ghost.db.data.posts) post from ghost;
create or replace view ghost_users as select flatten(ghost.db.data.users) `user` from ghost;

The ghost view is the one created above, in the dfs.tmp schema. With these two views created, we can select values from each:

0: jdbc:drill:zk=local> select u.`user`.id,u.`user`.name from ghost_users u where u.`user`.name = 'Robin Moffatt';
+---------+----------------+
| EXPR$0  |     EXPR$1     |
+---------+----------------+
| 15      | Robin Moffatt  |
+---------+----------------+
1 row selected (0.37 seconds)

0: jdbc:drill:zk=local> select p.post.title,p.post.slug,p.post.author_id from ghost_posts p where p.post.title like '%Drill';
+----------------------------------+----------------------------------+---------+
|              EXPR$0              |              EXPR$1              | EXPR$2  |
+----------------------------------+----------------------------------+---------+
| An Introduction to Apache Drill  | an-introduction-to-apache-drill  | 15      |
+----------------------------------+----------------------------------+---------+
1 row selected (0.385 seconds)

and join them:

0: jdbc:drill:zk=local> select p.post.slug as post_slug,u.`user`.name as author from ghost_posts p inner join ghost_users u on p.post.author_id = u.`user`.id where u.`user`.name like 'Robin%' and p.post.status='published' order by p.post.created_at desc limit 5;
+------------------------------------------------------------------------------------+----------------+
|                                     post_slug                                      |     author     |
+------------------------------------------------------------------------------------+----------------+
| connecting-oracle-data-visualization-desktop-to-google-analytics-and-google-drive  | Robin Moffatt  |
| obiee-and-odi-security-updates-october-2016                                        | Robin Moffatt  |
| otn-appreciation-day-obiees-bi-server                                              | Robin Moffatt  |
| poug                                                                               | Robin Moffatt  |
| all-you-ever-wanted-to-know-about-obiee-performance-but-were-too-afraid-to-ask     | Robin Moffatt  |
+------------------------------------------------------------------------------------+----------------+
5 rows selected (1.06 seconds)

This is pretty cool. From a 32MB single-row JSON file:

to being able to query it with standard SQL like this:

all with a single tool that can run on a laptop or desktop, and supports ODBC and JDBC for use with your favourite BI tools. For data exploration and understanding new datasets, Apache Drill really does rock!

Categories: BI & Warehousing

Connecting Oracle Data Visualization Desktop to Google Analytics and Google Drive

Rittman Mead Consulting - Tue, 2016-11-01 05:42

To use Data Visualisation Desktop (DVD) with data from Google Analytics or Google Drive, you need to set up the necessary credentials on Google so that DVD can connect to it. You can see a YouTube of this process on this blog here.

Before starting, you need a piece of information from Oracle DVD that will be provided to Google during the setup. From DVD, create a new connection of type Google Analytics, and make a note of the the provided redirect URL:

Once you have this URL, you can go and set up the necessary configuration in Google. To do this, go to https://console.developers.google.com/ and sign in with the same Google credentials as have acces to Google Analytics.

Then go to https://console.developers.google.com/iam-admin/projects and click on Create Project

Having created the project, we now need to make available the necessary APIs to it, after which we will create the credentials. Go to https://console.developers.google.com/apis/ and click on Analytics API

On the next screen, click Enable, which adds this API to the project.

If you want, at this point you can return to the API list and also add the Google Drive API by selecting and then Enabling it.

Now we will create the credentials required. Click on Credentials, and then on OAuth consent screen. Fill out the Product name field.

Click on Save, and then on the next page click on Create credentials and from the dropdown list OAuth client ID

Set the Application type to Web Application, give it a name, and then copy the URL given in the DVD New Connection window into the Authorised redirect URIs field.

Click Create, and then make a note of the provided client ID and client secret. Watch out for any spaces before or after the values (h/t @Nephentur). Keep these credentials safe as you would any password.

Go back to DVD and paste these credentials into the Create Connection screen, and click Authorise. When prompted, sign in to your Google Account.

Click on Save, and your connection is now created successfully!

With a connection to Google Analytics created, you can now analyse the data available from within it. You'll need to set the measure columns appropriately, as by default they're all taken by DVD to be dimensions.

Categories: BI & Warehousing

Performing a 12c Upgrade with a New Install

Rittman Mead Consulting - Tue, 2016-10-25 04:00

Software updates often include new features, and while useful, these new features are often the only driving factors in upgrading software. There's no harm in wanting to play around with the shiny new toy but many software updates also include much more significant changes, such as resolving bugs or security vulnurabilities.

In fact, bug fixes and security patches are usually released on a more frequent schedule than new feature sets. These changes are necessary to maintain a healthy environment. For this reason, Rittman Mead usually suggests environments are always as up to date as possible with the current releases available.

OBIEE 12.2.1.1 was released this past summer, and it seems to have resolved many issues that plagued early 12C adopters. Recently, OBIEE 12.2.1.2 was also released, resolving even more issues with the early 12C versions. With all of the improvements and fixes available in these versions, an upgrade plan should be a priority to anyone currently on one of the earlier releases of 12c (especially 12.2.1.0).

Okay, so how do I upgrade?

Spencer McGhin has already posted a fantastic blog going over how to perform an in-place upgrade for the 12.2.1.1 release. Even though it was for the previous release, the process is very similar. For those interested in reading a step by step guide, or looking to see what would go into the process, I would suggest reading his post here.

However, with OBIEE 12C's new BAR files, we could take another approach to performing an upgrade. Instead of the traditional "in-place" upgrades, we could perform an upgrade using a different process. We could simply perform a brand new install of this OBIEE version and migrate the existing content using a variety of tools Oracle provides us. Robin Moffatt covered this kind of installation for OBIEE 11.1.1.7 here, before the new BAR files existed, and now the BAR files will make this process much more straightforward.

If you choose to "upgrade" your environment by performing a fresh install, implementing the upgrade process will comprise of exporting the required files from OBIEE, removing the old version of OBIEE (if you are using the same machine), installing the new version of OBIEE, and then deploying the previously exported content. This process resembles that of a migration, and can be thought of that way, but migrating between 12C environments seems to be much simpler than migrating to a 12C environment from an older environment.

So an upgrade process could instead look like a brand new installation of the new OBIEE version, and then the execution of a handful of commands provided by Oracle to return the environment to its previous state.

But what would we gain from following this process, rather than a traditional in-place upgrade?

It's worth noting that either approach requires careful planning and testing. Performing a brand new install does not remove the necessity of planning an upgrade process, gathering requirements, identifying all content that must be migrated, testing the installation, testing the migration, and user acceptance and validation testing. The proper process should never be ignored, regardless of the implementation method.

Is there any advantage to a fresh install?

For starters, you won't need to pollute your system with old or deprecated scripts/directories. In Spencer's aforementioned blog, he found that after his upgrade process he had to maintain a second middleware home directory. If you upgrade your environment throughout the years, you may end up with hundreds of unused/deprecated scripts and files. Who enjoys the thought that their environment is full of old and useless junk? A fresh install would cull most of these superfluous and defunct files on a regular basis.

Additionally, there is the occasional bug that seems to reappear in upgraded environments. These bugs usually appear to be from environments that were patched, and then upgraded to a new version, which causes the previously fixed bug to reappear. While these bugs are fixed in future patches, fresh installs are usually free from these kind of issues.

Finally, I would argue a fresh installation can occasionally be simpler than performing the upgrade process. By saving response files used in an installation, the same installation can be performed again extremely easily. You could perform an install in as little as three lines, if not fewer:
/home/oracle/files/bi_platform-12.2.1.2.0_linux64.bin -silent -responseFile /home/oracle/files/obiee.rsp /home/oracle/Oracle/Middleware/Oracle_Home/oracle_common/bin/rcu -silent -createRepository -databaseType ORACLE -connectString localhost:1521/ORCL -dbUser sys -dbRole sysdba -schemaPrefix DEV -component BIPLATFORM -component MDS -component WLS -component STB -component OPSS -component IAU -component IAU_APPEND -component IAU_VIEWER -f < /home/oracle/files/db_passwords.txt /home/oracle/Oracle/Middleware/Oracle_Home/bi/bin/config.sh -silent -responseFile /home/oracle/files/configure_obiee.rsp

If this is the case, you can just save the response files set up during the first installation, and reuse them to install each new OBIEE version. Of course the required response file structure could change between versions, but I doubt any changes would be significant.

How do I migrate everything over?

So you've chosen to do a fresh install, you've saved the response files for future use, and you have a brand new OBIEE 12.2.1.2 environment up and running. Now, how do we get this environment back to a state where it can be used?

Before performing the upgrade or uninstall, we need to gather a few things from the current environment. The big things we need to make sure we get is the catalog, RPD, and the security model. We may need additional content (like a custom style/skin or deployments on the Weblogic Server, configurations, etc.) but I will ignore those for brevity. To move some these, I expect you would be required to use the WLST.

Catalog, RPD, and Security Model

Lucky for us, the Catalog, RPD, and Security Model are all included in the BAR export we can create using the exportServiceInstance() function in the WLST. You can then import these to a 12C environment using the importServiceInstance() function. Easy enough, right?

Users

If your users are maintained in the embedded Weblogic LDAP, you must export them and then re-import them. This process can be done manually or through the WLST using the Current Management Object.

If users are maintained through an external Active Directory source, then the configurations will be pulled in with the Security Model in the BAR file.

Testing the migration

The final step is, of course, to make sure everything works! And what better way than to use Oracle's new Baseline Validation Tool. This tool is included in OBIEE 12C, and is perfect for testing migrations between environments.

For those unfamiliar, the basic process is this:

  • Configure and run the Baseline Validation Tool against your content.
  • Perform the upgrade (be sure to preserve the previously gathered test results)!
  • Run the Baseline Validation Tool again to gather the new output, and display the compared results.

The output should be an HTML file that, when opened in a browser, will let you know what has changed since the last time it was run. If everything was migrated properly, then there should be no major discrepancies.

Final Thoughts

Is it better to do an in-place upgrade, or a fresh install and migrate current content? The answer, as always, depends on the business. One method adds complexity but allows for more customization possibilities, while the other is likely faster and a more standard approach. Use whichever works for your specific requirements.

It's an interesting idea to install a new version of OBIEE every so often, rather than perform an upgrade, but maybe for some organizations it will simplify the process and alleviate common upgrade issues. If you or your organization are often stuck on older versions of OBIEE because you are uncomfortable or unfamiliar with the typical upgrade process, maybe you can provision an additional environment and attempt this alternative method.

As previously stated, it is imperative for environments to be as up to date as possible, and this method is simply another, albeit unconventional, avenue to make that happen.

Categories: BI & Warehousing

Oracle OpenWorld 2016 - Data Integration Recap

Rittman Mead Consulting - Fri, 2016-10-21 16:33

I know it's been about a month since Oracle OpenWorld 2016 concluded, but I wanted to give a brief recap on a few things that I thought were interesting in the data integration space. During the week prior to OpenWorld, I had the privilege to attend the Oracle ACE Director Briefing. Over 2 days, ACE Directors were provided an early preview of what's to come down the Oracle product pipeline. The importance of the event is easy to note as Thomas Kurian himself spends an hour plus providing the initial product overview and answering questions. The caveat, the entire session is under NDA (as you might imagine). But, the good thing is that several of the early preview products were announced the following week at Oracle OpenWorld. Here's what I saw that might impact the Oracle Data Integration product area most.

Data Flow ML

Take an ETL tool, add the cloud, and mix in the latest Big Data technologies and methodologies and you have Data Flow ML. This cloud-based tool is built for stream or batch processing, or both, following the Lambda architecture. Data is ingested into Kafka topics, transformed using Spark Streaming, and loaded into a final target, which may be created automatically by DFML. Along the way, Spark ML is used to profile the data and make suggestions for how to enrich the data with internal or external sources. The technology is still in its early stages but keep an eye out on the Rittman Mead blog for more information over the next few months.

Data Integration in the Cloud

Oracle Data Integrator Cloud Service is coming soon and with it, new licensing options. ODI can be deployed in the cloud on Java Cloud Service or Big Data Cloud Service, or it can be deployed on-premises for more of a hybrid environment. From a licensing perspective, ODICS can be a monthly subscription or you can BYOL (bring your own license) and run ODI from any compute resource. This flexibility allows you to pushdown the transformation execution to the location of the data, rather than moving the data to the transformation engine - a standard for Oracle Data Integrator.

Oracle Data Integrator 12.2.1.2

Coming soon, the next patchset release for Oracle Data Integrator 12c. Features discussed at Oracle OpenWorld were:

  • Git Integration and Smart Merge:
    This release will introduce a new integration for lifecycle management, Git, adding to the current integration option of Subversion. Not only that, but ODI will finally provide "smart" merge functionality to allow an automated merge of a branch into the trunk.
  • Kafka and Spark Streaming:
    With ODI for Big Data, streaming integration is coming. Use of Apache Kafka as a source or target and Spark Streaming integration for transformations will allow for more real-time processing of data. The introduction of Cassandra as a data source is another enhancement for the Big Data release.
  • RESTful Connectivity:
    Another long awaited feature is REST web service integration. A new technology, similar to the SOAP web service integration, will be available and able to connect to any RESTful service. Along with that, BICS and Storage Cloud Service integration will be introduced.

There are definitely many other interesting looking products and product updates coming (or already here), including GoldenGate Service Architecture, updates to the GoldenGate Cloud Service, Big Data Cloud Service, Big Data Compute and several others. It’s an interesting time as the Oracle shift to the cloud continues - and data integration won’t be left behind.

Categories: BI & Warehousing

Data Visualization Desktop 12.2.2.0: Data Flow Component

Rittman Mead Consulting - Thu, 2016-10-20 07:54

My previous post contained a brief description of Data Visualization Desktop (DVD) new features in 12.2.2.0, in terms of sources, visualisations and components. In this post we're going to simulate a typical analyst use case and understand how DVD can support the process.

Data Visualisation Desktop is a tool aimed at departmental analysis, with data coming from different sources and results that need to be delivered quickly. Given the ad-hoc nature of it, traditional long term IT-driven Business Intelligence processes often won’t suffice. In this example we'll have a deep look at DVD's Data Flow component and how it can be used to create an ETL flow in order to analyse data coming from a multitude of sources. Data Flow is new functionality introduced in DVD 12.2.2.0.

Preamble: being Italian I can't avoid talking about football, the example provided in this post will analyse some Serie A data together with some Fantasy Football information in order to understand which players I should choose for my team.

Data Sources

In order to analyse Serie A players I based my research on the following data points:

  • Players cost: Excel file containing Team, Role and Fantasy Football Cost for each Serie A player. This file can change match by match since Cost of a single players can vary reflecting his performances.
  • Players statistics: CSV files containing players statistics like goal scored, yellow and red cards, assists and fantasy football mark for every match of the current and past season.

For the purpose of the example I'm assuming the Players cost file is an XLSX received manually by the analyst (think at Budget data) and the Players statistics data stored in an Hive table.

Creating Data Sources in DVD

Data Visualization Desktop has a native connector to Hive, so just need to click on "Data Sources", then Create -> Connection and select "Apache Hive". The setup is pretty simple, we need to specify the host, port, username and password of the Hive Server.

Hive Connection

The next step is creating a new Data Source and select the newly created "TestHive" as source. The list of Hive's databases and, selecting FantasyFootball, the list of tables are visible.

Hive Data Source

After clicking on the ff_statistics table we can select and import the columns. There is also an option to check or directly enter the SQL if needed. After clicking OK (and checking that no errors arise) we are ready to use the Hive table.

Hive Columns

The "Players Cost" Excel file, received manually by the analyst, can be directly updated using the Data Source -> Create -> Data Source -> File option.

Upload a File

DVD automatically detects the column types and provides a preview of the content

Excel file content

Once the data source is saved we are ready to start manipulating the data.

Data Flow

Our initial goal is to exclude from the statistics table any data quality issues. This could be down to invalid CSVs, as well as players not existing in "Players Cost" file (if they were sold to teams outside Serie A or they stopped their career). To do so we can use the Data Flow option included in DVD and accessible in the Data Source page.

Path to data flow

The first step is to select ff_statistics from list of sources, right click and select "Add Step". From the list of options presented we can select Filter and remove all the invalid data by simply only include rows where the "Code" is not empty [null].

Data Flow Step 1

The Data Flow chart now includes the Filter component. Following step is to bring in the "Players cost" file in the flow by selecting the Add Data option. Then it's time to join the two sources, we can do that by selecting both them and choosing the Join option.

Join two Dataflows

We can specify the columns which will be used in the joining condition and the join type (inner or outer) by selecting the desired option in the Keep Rows section (between Matching rows or All rows). For the purpose of our analysis we'll keep only the matching rows of the two datasets (inner join) since we are interested in all players listed in Players Cost and having a valid set of statistics in Players Statistics.

Now we can enrich the data set further, by adding derived metrics and attributes:

  • Count of Matches: The number of valid matches (having a not null grade) played by so far by each player. This will be used later to filter out all players having less than 10 valid games since those are less likely to play most of the games.
  • Role Translation: Roles are specified in Italian, a simple CASE WHEN can translate them in English.

The enrichment can be achieved by creating an additional Add Columns Step and filling properly the formulas.

New Columns Formula

After filtering out all players with less than 10 valid marks, an Aggregate step can be added to set the aggregation level and methods. The Aggregate step should be included in every Data Flow since it's the unique place where Attribute/Measure and aggregation definitions can be made. A Data Flow without the Aggregation step will provide a default column definition that may result in an unusable output data source.
Finally we can store the end resultset locally in order to proceed with the analysis.

Global Flow

We can now execute the data flow and FantasyFootball is automatically added to the list of DVD's Data Sources. The Data Flow can also be stored in DVD in order to be re-executed when necessary.
Keep in mind that Data Flow works locally on the workstation where DVD is installed, so data extraction and manipulation will generate a load on the system based on the data volume and complexity of the steps.

Project

Before creating a project we can review the resulting FantasyFootball dataset settings and change the Attribute/Measure definition of my Columns as well as the type of aggregation.

Change Columns Attributes

As written before it's better to define Attributes/Measures with an Aggregate step in the Data Flow since any setting changed directly in the dataset will be overwritten when the Data Flow is re-executed.

With the data preparation work completed, now is time to start creating a project using the FantasyFootball dataset. As written in my previous post a number of new visualisations is available with DVD 12.2.2.0, some are used in the example below like Chord, Parallel and Sankey diagrams.

Global Flow

Unfortunately I'll not share the details of my findings since those could be used against me in the competition but Hey....that Higuain looks like a good player!

In this post we saw a typical analyst use case, with data coming from multiple sources needing to be joined together and cleansed. All operations done manually via Excel that can now be automated, saved and re-executed with DVD's Data Flow.

Categories: BI & Warehousing

Data Visualization Desktop 12.2.2.0

Rittman Mead Consulting - Wed, 2016-10-19 06:17

Yesterday Data Visualization Desktop (DVD) Version 12.2.2.0 was released. DVD, since its first release, aims to extend Oracle's Data Visualization portfolio by adding a desktop tool allowing data visualization capabilities directly to end users without the intervention of the IT department, in line with the Gartner's bi-modal IT.

The new version adds several capabilities to the existing product like visualization types, data sources and a wrangling option. This post will share the details of the new release additional features.

Installation

After downloading DVD, the installation is pretty simple, just double click on the Oracle_Data_Visualization_Desktop_V2_12_2_2_0_0.msi file, choose the installation folder and click on "Install".

Installation end

Once the installation is finished, be aware of the message in the last screen, it says that the correct version of R and the set of packages need to be installed in order to be used with DVD for Advanced Analytics. Those can be installed via the "Install Advanced Analytics" file placed in Start Menu -> Programs -> Oracle.

This setup allows to chose the R installation location, installs R and then downloads from cran.us.r-project.org the relevant packages.

R Setup

New Visualisations

The first set of improvements in the new release is about the new out of the box visualisation, and new set of graphs is now available:

  • List: Shows a list of the dimension's values together with a gradient colouring based on the measure selected

List View

  • Parallel Coordinates: Shows multiple dimensions on the same chart enhancing the ability to quickly get an insight about possible connections between them

Parallel View

  • Timeline: It's an effective way of showing time related facts, each fact is shown along a timeline, with one or more distinguishing attributes, the example shows the quantity shipped by day and city.

Timeline View

  • Network Diagrams: Chord, Circular, Network and Sankey Diagrams are used to shows inter-relationship between elements

Network Views

Other visual enhancements include a multi-canvas layout that can be exported with a single click and a hierarchical or gradient colouring for the charts.

Data Sources

A lot of new data sources have been added to DVD, some of them still in beta phase. A bunch of new databases are now supported like Netezza, Amazon Aurora and PostgreSQL.

An interesting enhancement is the connection to Dropbox and Google Drive allowing DVD to source files stored in Cloud. Finally DVD's exposure to Big Data world has been enhanced by the addition of the connectivity to tools such as Apache Drill, Presto and Cassandra.

DVD Data Sources

Excel Editing

Excel sheets used as data source now can be edited and the DVD project refreshed without the need of manually reloading the spreadsheet.

Data Flows

There is a new component in DVD called Data Flow allowing the end user some basic transformations of the data like joining two datasets (even if coming from different sources), filtering, aggregating, adding columns based on custom formulas and storing the result on the local file system.

DVD Data Flows Options

In the example below two files coming from Hive (but the source can also be different) are joined and a subset of columns is selected and stored locally.

DVD Data Flows Options

Data Flows can be stored in DVD and re-executed upon request. The list of Data Flows is available under Data Sources -> Data Flows. In the next blog post I'll show a typical Analyst use case in which Data Flow can help automating a series of data loading, cleansing and enriching steps.

Data Insights

Data Insights provides a way of quickly understand the dataset available, by default it shows a series of graphs, one for every attribute, with the cardinality of each attribute's value. A drop down menu allows to show the same graphs based on any measure defined in the dataset.

DVD Data Insights

BI Ask

The new DVD version contains also BI Ask, providing the ability to create queries with natural language which is automatically interpreted and presented in suggested visualisations.

BI Ask

As you could read in this post the new version of Data Visualization Desktop adds a series of really interesting features enabling not only the data visualisation but also data exploration and wrangling. In the next blog post we'll see a typical DVD use case and how the new Data Flow option could be used to couple data coming from various sources.

Categories: BI & Warehousing

Pages

Subscribe to Oracle FAQ aggregator - BI &amp; Warehousing