Rittman Mead Consulting

Subscribe to Rittman Mead Consulting feed Rittman Mead Consulting
Rittman Mead consults, trains, and innovates within the world of Oracle Business Intelligence, data integration, and analytics.
Updated: 1 month 3 days ago

Unify: Could it be any easier?

Mon, 2017-06-19 09:00

Rittman Mead’s Unify is the easiest and most efficient method to pull your OBIEE reporting data directly into your local Tableau environment. No longer will you have to worry about database connection credentials, Excel exports, or any other roundabout way to get your data where you need it to be.

Unify leverages OBIEE’s existing metadata layer to provide quick access to your curated data through a standard Tableau Web Data Connector. After a short installation and configuration process, you can be building Tableau workbooks from your OBIEE data in minutes.

This blog post will demonstrate how intuitive and easy it is to use the Unify application. We will only cover using Unify and it’s features, as once the data gets into Tableau it can be used the same as any other Tableau Data Source. The environment shown already has Unify installed and configured, so we can jump right in and start using the tool immediately.

To start pulling data from OBIEE using Unify, we need to create a new Web Data Connector Data Source in Tableau. This data source will prompt us for a URL to access Unify. In this instance, Unify is installed as a desktop application, so the URL is http://localhost:8080/unify.

Once we put in the URL, we’re shown an authentication screen. This screen will allow us to authenticate against OBIEE using the same credentials. In this case, I will authenticate as the weblogic user.

Once authenticated, we are welcomed by a window where we can construct an OBIEE query visually. On the left hand side of the application, I can select the Subject Area I wish to query, and users are shown a list of tables and columns in the selected Subject Area. There are additional options along the top of the window, and I can see all saved queries on the right hand side of the window.

The center of the window is where we can see the current query, as well as a preview of the query results. Since I have not started building a query yet, this area is blank.

Unify allows us to either build a new query from scratch, or select an existing OBIEE report. First, let’s build our own query. The lefthand side of the screen displays the Subject Areas and Columns which I have access to in OBIEE. With a Subject Area selected, I can drag columns, or double click them, to add them to the current query. In the screenshot above, I have added three columns to my current query, “P1 Product”, “P2 Product Type”, and “1 - Revenue”.

If we wanted to, we could also create new columns by defining a Column Name and Column Formula. We even have the ability to modify existing column formulas for our query. We can do this by clicking the gear icon for a specific column, or by double-clicking the grey bar at the top of the query window.

It’s also possible to add filters to our data set. By clicking the Filter icon at the top of the window, we can view the current filters for the query. We can then add filters the same way we would add columns, by double clicking or dragging the specific column. In the example shown, I have a query on the column “D2 Department” where the column value equals “Local Plants Dept.”.

Filters can be configured using any of the familiar methods, such as checking if a value exists in a list of values, numerical comparisons, or even using repository or session variables.

Now that we have our columns selected and our filters defined, we can execute this query and see a preview of the result set. By clicking the “Table” icon in the top header of the window, we can preview the result.

Once we are comfortable with the results of the query, we can export the results to Tableau. It is important to understand that the preview data is trimmed down to 500 rows by default, so don’t worry if you think something is missing! This value, and the export row limit, can be configured, but for now we can export the results using the green “Unify” button at the top right hand corner of the window.

When this button is clicked, the Unify window will close and the query will execute. You will then be taken to a new Tableau Workbook with the results of the query as a Data Source. We can now use this query as a data source in Tableau, just as we would with any other data source.

But what if we have existing reports we want to use? Do we have to rebuild the report from scratch in the web data connector? Of course not! With Unify, you can select existing reports and pull them directly into Tableau.

Instead of adding columns from the lefthand pane, we can instead select the “Open” icon, which will let us select an existing report. We can then export this report to Tableau, just as before.

Now let’s try to do something a little more complicated. OBIEE doesn’t have the capability to execute queries across Subject Areas without common tables in the business model, however Tableau can perform joins between two data sources (so long as we select the correct join conditions). We can use Unify to pull two queries from OBIEE from different Subject Areas, and perform a data mashup with the two Subject Areas in Tableau.

Here I’ve created a query with “Product Number” and “Revenue”, both from the Subject Area “A - Sample Sales”. I’ve saved this query as “Sales”. I can then click the “New” icon in the header to create a new query.

This second query is using the “C - Sample Costs” Subject Area, and is saved as “Costs”. This query contains the columns “Product Number”, “Variable Costs”, and “Fixed Costs”.

When I click the Unify button, both of these queries will be pulled into Tableau as two separate data sources. Since both of the queries contain the “Product Number” column, I can join these data sources on the “Product Number” column. In fact, Tableau is smart enough to do this for us:

We now have two data sets, each from a different OBIEE subject area, joined and available for visualization in Tableau. Wow, that was easy!

What about refreshing the data? Good question! The exported data sources are published as data extracts, so all you need to do to refresh the data is select the data source and hit the refresh button. If you are not authenticated with OBIEE, or your session has expired, you will simply be prompted to re-authenticate.

Using Tableau to consume OBIEE data has never been easier. Rittman Mead’s Unify allows users to connect to OBIEE as a data source within a Tableau environment in an intuitive and efficient method. If only everything was this easy!

Interested in getting OBIEE data into Tableau? Contact us to see how we can help, or head over to https://unify.ritt.md to get a free Unify trial version.

Categories: BI & Warehousing

Unify - An Insight Into the Product

Thu, 2017-06-15 06:00
Unify - An Insight Into the Product

Monday, 12 Jun saw the official release of Unify, Rittman Mead's very own connector between Tableau and OBIEE. It provides a simple but powerful integration between the two applications that allows you to execute queries through OBIEE and manipulate and render the datasets using Tableau.

Unify - An Insight Into the Product

Why We Made It

One of the first questions of course would be why we would want to do this in the first place. The excellent thing about OBI is that it acts as an abstraction layer on top of a database, allowing analysts to write efficient and secure reports without going into the detail of writing queries. As with any abstraction, it is a trade of simplicity for capability. Products like Tableau and Data Visualiser seek to reverse this trade, putting the power back in the hands of the report builder. However, without quoting Spiderman, care should be taken when doing this.

The result can be that users write inefficient queries, or worse still, incorrect ones. We know there will be some out there that use self service tools as purely a visualisation engine, simply dropping pre-made datasets into it. If you are looking to produce sustainable, scalable and accessible reporting systems, you need to tackle the problem both at the data acquisition stage as well as the communication stage at the end.

If you are already meeting both requirements, perhaps by using OBI with Data Visualiser (formerly Visual Analyser) or by other means then that's perfectly good. However, We know from experience that there are many of you out there that have already invested heavily into both OBI and Tableau as separate solutions. Rather than have them linger in a state of conflict, we'd rather we nurse them into a state of symbiosis.

The idea behind Unify is that it bridges this gap, allowing you to use your OBIEE system as an efficient data acquisition platform and Tableau as an intuitive playground for users who want to do a a bit more with their data. Unify works by using the Tableau Web Data Connector as a data source and then our customised software to act as an interface for creating OBIEE queries and them exporting them into Tableau.

How It Works

Unify uses Tableau's latest Web Data Connector data source to allow us to dynamically query OBIEE and extract data into Tableau. Once a dataset is extracted into Tableau, it can be used with Tableau as normal, taking advantages of all of the powerful features of Tableau. This native integration means you can add in OBIEE data sources just as you would add in any others - Excel files, SQL results etc. Then you can join the data sources using Tableau itself, even if the data sources don't join up together in the background.

First you open up Tableau and add a Web Data Connector source:

Unify - An Insight Into the Product

Then give the link to the Unify application, e.g. http://localhost:8080/unify. This will open up Unify and prompt you to login with your OBIEE credentials. This is important as Unify operates through the OBIEE server layer in order to maintain all security permissions that you've already defined.

Unify - An Insight Into the Product

Now that the application is open, you can make OBIEE queries using the interface provided. This is a bit like Answers and allows you to query from any of your available subject areas and presentation columns. The interface also allows you to use filtering, column formulae and OBIEE variables much in the same way as Answers does.

Alternatively, you can open up an existing report that you've made in OBIEE and then edit it at your leisure. Unify will display a preview of the dataset so you can tweak it until you are happy that is what you want to bring into Tableau.

Unify - An Insight Into the Product

Once you're happy with your dataset, click the Unify button in the top right and it will export the data into Tableau. From this point, it behaves exactly as Tableau does with any other data set. This means you can join your OBIEE dataset to external sources, or bring in queries from multiple subject areas from OBIEE and join them in Tableau. Then of course, take advantage of Tableau's powerful and interactive visualisation engine.

Unify - An Insight Into the Product

Unify Server

Unify comes in desktop and server flavours. The main difference between the two is that the server version allows you to upload Tableau workbooks with OBIEE data to Tableau Server and refresh them. With the desktop version, you will only be able to upload static workbooks that you've created, however with the server version of Unify, you can tell Tableau Server to refresh data from OBIEE in accordance with a schedule. This lets you produce production quality dashboards for your users, sourcing data from OBIEE as a well as any other source you choose.

Unify Your Data

In a nutshell, Unify allows you to combine the best aspects of two very powerful BI tools and will prevent the need for building all of your reporting artefacts from scratch if you already have a good, working system.

I hope you've found this brief introduction to Unify informative and if you have OBIEE and would like to try it with Tableau, I encourage you to register for a free desktop trial. If you have any questions, please don't hesitate to get in touch.

Categories: BI & Warehousing

Unify: See Your Data From Every Perspective

Mon, 2017-06-12 09:09
 See Your Data From Every Perspective

 See Your Data From Every Perspective

Ad hoc access to accurate and secured data has always been the goal of business intelligence platforms. Yet, most fall short of balancing the needs of business users with the concerns of IT.

Rittman Mead has worked with hundreds of organizations representing all points on the spectrum between agility and governance. Today we're excited to announce our new product, Unify, which allows Tableau users to directly connect to OBIEE, providing the best of both worlds.

Governed Data Discovery

Business users get Tableau's intuitive data discovery features and the agility they need to easily blend their departmental data without waiting on IT to incorporate it into a warehouse. IT gets peace of mind, knowing their mission-critical data is protected by OBIEE's semantic layer and row-level security.

Unify Essentials

Unify runs as a desktop app, making it easy for departmental Tableau users to connect to a central OBIEE server. Unify also has a server option that runs alongside OBIEE, for organizations with a large Tableau user base or those using Tableau Server.

Desktop installation and configuration is simple. Once installed, users can query OBIEE from within Tableau with just a few clicks. Have a look at these short videos demonstrating setup and use of Unify.

Available Today

Download your free 7-day trial of Unify Desktop here.

No Tableau Desktop license? No problem. Unify is compatible with Tableau Public.

Categories: BI & Warehousing

OAC: Essbase and DVCS

Wed, 2017-06-07 09:00

Finally managed to get around to having a proper look at Essbase within Oracle Analytics Cloud Service (OAC) after a busy couple of months. This post focusses mainly on initial impressions on the ‘out of the box’ the Essbase side of this - which we will explore in more detail in future posts, as well as more detail on the use of Essbase with DVCS.

Using Essbase with DVCS

One of the features we are keen to explore more in this context is the integration of Essbase and the Data Visualisation Cloud Service (DVCS). One point that we found that we do not think is being expressed clearly anywhere else we have seen is how to configure this: In setting up our OAC instance, we were having difficulty coming up with a combination of configuration selections that enables Essbase and DV to work at the same time.

Oracle documentation (such as the price list) suggest that both should be available within Standard Edition OAC:

But Doc ID 2265410.1 on MoS suggests, by needing to add a security rule to the Essbase OAC, that two OAC instances are required. We could not find any reference to this requirement in Oracle documentation or blogs on the subject, but it transpires after checking with Oracle that this is indeed the case – Essbase and DV need to be on separate OAC instances.


Looking purely at Essbase, my initial reaction is very positive…whilst the interface is different (I am sure tears will be shed for EAS & Studio in the foreseeable future…although given the way some stalwarts are still clinging on the last surviving copies of the Excel Add In, maybe not too imminently), once the surface of the new interface is scratched more...ahem…’seasoned’ developers will take comfort from being able to do a lot of the same things as they currently can. I am also confident it will fulfil one of the stated objectives in making it easier for non-experts to be able quickly and easily deploy cubes for analysis purposes.

Whilst the manual application and cube maintenance tools through the OAC front-end seem resilient and work effectively, I think some aspects will be difficult to use as the primary maintenance method in a production system - the ‘breadcrumb’ method afforded to dimension maintenance in particular will start to get fiddly to use with a dimension of any sort of volume. The application and cube Import (from a formatted Excel spreadsheet) facility is great - to my mind, a bit like a supercharged and easier-to-use Outline Load Utility in Hyperion Planning - and the ability to refresh the spreadsheet from a deployed cube is a good feature that shouldn’t have been taken for granted. I know Excel is regarded as the Devil’s work in some BI quarters…I personally don’t feel that way until it is being used as a database (or as some form of primary data storage)…but in this context, it is quick & easy to use, on most people’s desktops straightway, and is intuitive.

Still in the Excel corner, on the Smartview side, the addition of the Cube Designer extension (requiring Smartview to be able to consider & change the more generic aspects (not members) of the ‘cube maintenance’ spreadsheets is a nice touch that makes this more straightforward and removes the need to pay strict attention to the spreadsheet layout. The ‘treeview’ style hierarchy viewer also helps make sense of the parent-child members that need to be detailed on the individual dimension tabs.

One issue that has flitted across my mind at this early stage is that of rules files. Whilst the Import facility creates these for you (as with creating a cube from Essbase Studio) which is welcome, and rules files created in an on-prem system can be uploaded (again, welcome), the on-board rules file editor is text based:

I’m not too sure how many people have created or edited rules files like this before (although I’d hazard a guess), but whilst the presence of any means to create, amend, or even tweak a file is good, it remains to be seen how usable this approach is. The alternative is to resubmit from the maintenance spreadsheet thus getting it created / amended for you or to maintain in on-prem system…but seeing as this platform is an alternative to (rather than an augmentation of) on prem for a lot of people, I’m not sure how practical this is.

Whilst the existing tools look really promising, I can’t help but think there will be occasions going forwards where it might be advantageous to be able to create a rules file to run an uploaded file outside of them: time will tell.

The Command Line Tool (downloadable from OAC-Essbase / Utilities) is a little limited at the moment, but goes some way towards filling the potential gap left by the absence of client-side EssMsh and can only grow with further releases: from the Oracle OAC documentation...

In conclusion, first impressions are very favourable. There are changes (eg Security), new features (eg Sandboxing), and I am sure there will be gaps for those considering moving from existing on-prem applications - for example, as I have seen someone else reference, there does not seem to be any reference to partitions in the front end or the import spreadsheet layout - so whilst there is a lot with which we will quite quickly feel familiar, there are also going to be new areas and new practices for us to get into step with: as above, we will look to explore some of these in future posts.

Categories: BI & Warehousing

Overview of the new Cloudera Data Science Workbench

Fri, 2017-06-02 09:07

Recently Cloudera released a new product called Cloudera Data Science Workbench(CDSW)

Being a Cloudera Partner, we at Rittman Mead are always excited when something new comes along.

The CDSW is positioned as a collaborative platform for data scientists/engineers and analysts, enabling larger teams to work in a self-service manner through a web browser. This browser application is effectively an IDE for R, Python and Scala - all your favorite toys!

The CDSW is deployed onto edge nodes of your CDH cluster, providing easy access to your HDFS data and the Spark2 and Impala engines. This means that team members can immediately start working on their projects, accessing full datasets and share analysis and results. A CDSW Project can include reusable code and snippets, libraries etc helping your teams to collaborate. Oh, and these projects can be linked with Github repos to help keep version history.

The workbench is used to fire up user session with R, Python or Scala inside a dedicated Docker engines. These engines can be customised, or extended, like any other Docker images to include all your favourite R packages and Python libraries. Using HDFS, Hive, Spark2 or Impala the workload can then be distributed over to the CDH cluster, by use of your preferred methods, without having to configure anything. This engine (virtual machine, really) runs for as long as the analysis. Any logs or output files need to be saved in the project folder, which is mounted inside the engine and saved on the CDSW master node. The master node is a gateway node to the CDH cluster and can scale out to many worker nodes to distribute the Docker engines

(C) Cloudera.com

And under the hood we also have Kubernetes to schedule user workload across the worker nodes and provide CPU and memory isolation

So far I find the IDE to be a bit too simple and lacking features compared to e.g. RStudio Server. But the ease of use and the fact that everything is automatically configured makes the CDSW an absolute must for any Cloudera customer with data science teams. Also, I'm convinced that future releases will add loads of cool functionality

I spent about two days building a new cluster on AWS and install the Cloudera Data Science Workbench, just an indication of how easy it is to get up and running. Btw, it also runs in the cloud (Iaas) ;)

Want to know more or see a live demo? Contact us at info@rittmanmead.com

Categories: BI & Warehousing

First Steps with Oracle Analytics Cloud

Thu, 2017-06-01 07:43

Not long ago Oracle added a new offer to their Cloud - an OBIEE in a Cloud with full access. Francesco Tisiot made an overview of it and now it's time to go a bit deeper and see how you can poke it with a sharp stick by yourself. In this blog, I'll show how to get your own OAC instance as fast and easy as possible.

Before you start

The very first step is to register a cloud account. Oracle gives a trial which allows testing of all features. I won't show it here as it is more or less a standard registration process. I just want highlight a few things:

  • You will need to verify your phone number by receiving an SMS. It seems that this mechanism may be a bit overloaded and I had to make more than one attempts. I press the Request code button but nothing happens. I wait and press it again, and again. And eventually, I got the code. I can't say for sure and possible it was just my bad luck but if you face the same problem just keep pushing (but not too much, requesting a code every second won't help you).
  • Even for trial you'll be asked for a credit card details. I haven't found a good diagnostics on how much was already spent and the documentation is not really helpful here.

OAC instances are not self-containing and require some additional services. The absolute minimum configuration is the following:

  • Oracle Cloud Storage (OCS) - is used for backups, log files, etc.
  • Oracle Cloud Database Instance (DBC) - is used for RCU schemas.
  • Oracle Analytics Cloud Instance (OAC) - is our ultimate target.

From the Cloud services point of view, architecture is the following. This picture doesn't show virtual disks mounted to instances. These disks consume Cloud Storage quota but they aren't created separately as services.


We need at least one Oracle Database Cloud instance to store RCU schemas. This database may or may not have a separate Cloud Storage area for backups. Every OAC instance requires Cloud storage area for logs. Multiple OAC instances may share one Cloud storage area but I can't find any advantage of this approach over a separate area for every instance.

Create Resources

We create these resource in the order they are listed earlier. Start with Storage, then DB and the last one is OAC. Actually, we don't have to create Cloud Storage containers separately as they may be created automatically. But I show it here to make things more clear without too much "it works by itself" magic.

Create Cloud Storage

The easiest part of all is the Oracle Cloud Storage container. We don't need to specify its size or lots of parameters. All parameters are just a name, storage class (Standard/Archive) and encryption.


I spent some time here trying to figure out how to reference this storage later. There is a hint saying that "Use the format: <storage service>-<identity domain>/<container>. For example: mystorage1-myid999/mybackupcontainer." And if identity domain and container are pretty obvious, storage service puzzled me for some time. The answer is "storage service=Storage". You can see this in the top of the page.


It seems that Storage is a fixed keyword, rurittmanm is the domain name created during the registration process and demo is the actual container name. So in this sample when I need to reference my demo OCS I should write Storage-rurittmanm/demo.

Create Cloud DB

Now when we are somewhat experienced in Oracle Cloud we may move to a more complicated task and create a Cloud DB Instance. It is harder than Cloud Storage container but not too much. If you ever created an on-premise database service using DBCA, cloud DB should be a piece of cake to you.

At the first step, we set the name of the instance and select the most general options. These options are:

  • Service Level. Specifies how this instance will be managed. Options are:

    • Oracle Database Cloud Service: Oracle Database software pre-installed on Oracle Cloud Virtual Machine. Database instances are created for you using configuration options provided in this wizard. Additional cloud tooling is available for backup, recovery and patching.
    • Oracle Database Cloud Service - Virtual Image: Oracle Database software pre-installed on an Oracle Cloud Virtual Machine. Database instances are created by you manually or using DBCA. No additional cloud tooling is available.
  • Metering Frequency - defines how this instance will be paid: by months or by hours.

  • Software Release - if the Service Level is Oracle Database Cloud Service, we may choose 11.2, 12.1 and 12.2, for Virtual Image only 11.2 and 12.1 are available. Note that even cloud does no magic and with DB 12.2 you may expect the same problems as on-premise.

  • Software Edition - Values are:

    • Standard Edition
    • Enterprise Edition
    • Enterprise Edition - High Performance
    • Enterprise Edition - Extreme Performance
  • Database Type - defines High Availability and Disaster Recovery options:

    • Single Instance
    • Database Clustering with RAC
    • Single Instance with Data Guard Standby
    • Database Clustering with RAC and Data Gard Standby

Database Clustering with RAC and Database Clustering with RAC and Data Gard Standby types are available only for Enterprise Edition - Extreme Performance edition.


The second step is also quite intuitive. It has a lot of options but they should be pretty simple and well-known for anyone working with Oracle Database.


The first block of parameters is about basic database configuration. Parameters like DB name (sid) or Administration Password are obvious.

Usable DataFile Storage (GB) is less obvious. Actually, in the beginning, it puzzled me completely. In this sample, I ask for 25 Gb of space. But this doesn't mean that my instance will take 25 Gb of my disk quota. In fact, this particular instance took 150 Gb of disk space. Here we specify only a guaranteed user disk space, but an instance needs some space for OS, and DB software, and temp, and swap, and so on.


A trial account is limited with 500 Gb quota and that means that we can create only 3 Oracle DB Cloud instances at max. Every instance will use around 125 Gb of let's say "technical" disk space we can't reduce. From the practical point of view, it means that it may be preferable to have one "big" instance (in terms of the disk space) rather than multiple "small".

  • Compute shape specifies how powerful our VM should be. Options are the following:
    • OC3 - 1.0 OCPU, 7.5 GB RAM
    • OC4 - 2.0 OCPU, 15.0 GB RAM
    • OC5 - 4.0 OCPU, 30.0 GB RAM
    • OC6 - 8.0 OCPU, 60.0 GB RAM
    • OC7 - 16.0 OCPU, 120.0 GB RAM
    • OC1m - 1.0 OCPU, 15.0 GB RAM
    • OC2m - 2.0 OCPU, 30.0 GB RAM
    • OC3m - 4.0 OCPU, 60.0 GB RAM
    • OC4m - 8.0 OCPU, 120.0 GB RAM
    • OC5m - 16.0 OCPU, 240.0 GB RAM

We may increase or decrease this value later.

  • SSH Public Key - Oracle gives us an ability to connect directly to the instance and authentication is made by user+private key pair. Here we specify a public key which will be added to the instance. Obviously, we should have a private key for this public one. Possible options are either we provide a key we generated by ourselves or let Oracle create keys for us. The most non-obvious thing here is what is the username for the SSH. You can't change it and it isn't shown anywhere in the interface (at least I haven't found it). But you can find it in the documentation and it is opc.

The second block of parameters is about backup and restore. The meaning of these options is obvious, but exact values aren't (at least in the beginning).


  • Cloud Storage Container - that's the Cloud Storage container I described earlier. Value for this field will be something like Storage-rurittmanm/demo. In fact, I may do not create this Container in advance. It's possible to specify any inexistent container here (but still in the form of Storage-<domain>/<name>) and tick Create Cloud Storage Container check-box. This will create a new container for us.

  • Username and Password are credentials of a user who can access this container.

The last block is Advanced settings and I believe it's quite simple and obvious. Most of the time we don't need to change anything in this block.


When we fill all parameters and press the Next button we get a Summary screen and the actual process starts. It takes about 25-30 minutes to finish.

When I just started my experiments I was constantly getting a message saying that no sites available and my request may not be completed.

It is possible that it was again the same "luck" as with the phone number verification but the problem solved by itself a few hours later.

Create OAC Instance

At last, we have all we need for our very first OAC instance. The process of an OAC instance setup is almost the same as for an Oracle DB Cloud Instance. We start the process, define some parameters and wait for the result.

At the first step, we give a name to our instance, provide an SSH public key, and select an edition of our instance. We have two options here Enterprise Edition or Standard Edition and later we will select more additional options. Standard edition will allow us to specify either Data Visualisation or Essbase instances and Enterprise Edition adds to this list a classical Business Intelligence feature. The rest of the parameters here are exactly the same as for Database Instance.


At the second step, we have four blocks of parameters.


  • Service Administrator - the most obvious one. Here we specify an administrator user. This user will be a system administrator.

  • Database - select a database for RCU schemas. That's why we needed a database.

  • Options - specify which options our instance will have.

    • Self-Service Data Visualisation, Preparation and Smart Discovery - this option means Oracle Data Visualisation and it is available for both Standard and Enterprise Editions.
    • Enterprise Data Models - this option gives us classical BI and available only for Enterprise Edition. Also, this option may be combined with the first one giving us both classical BI and modern Data discovery on one instance.
    • Collaborative Data Collection, Scenarios and What-if Analysis - this one stands for Essbase and available for Standard and Enterprise Editions. It can't be combined with other options.
  • Size is the same thing that is called Compute Shape for the Database. Options are exactly the same.
  • Usable Storage Size on Disk GB also has the same meaning as for the DB. The minimum size we may specify here is 25 Gb what gives us total 170 Gb of used disk space.

Here is a picture showing all possible combinations of services:


And here virtual disks configuration. data disk is the one we specify.

The last block - Cloud Storage Configuration was the hardest one. Especially the first field - Cloud Storage Base URL. The documentation says "Use the format: https://example.storage.oraclecloud.com/v1" and nothing more. When you know the answer it may be easy, but when I saw it for the first time it was hard. Should I place here any unique URL just like an identifier? Should it end with v1? And what is the value for the second instance? V2? Maybe I should place here the URL of my current datacenter (https://dbcs.emea.oraclecloud.com). The answer is https://<domain>.storage.oraclecloud.com/v1 in my case it is https://rurittmanm.storage.oraclecloud.com/v1. It stays the same for all instances.

All other parameters are the same as they were for DBCS instance. We either specify an existing Cloud Storage container or create it here.


The rest of the process is obvious. We get a Summary and then wait. It takes about 40 minutes to create a new instance.

Note: diagnostics here is a bit poor and when it says that the instance start process is completed it may not be true. Sometimes it makes sense to wait some time before starting to panic.

Now we may access our instance as a usual. The only difference is that the port is 80 not 9502 (or 443 for SSL). For Data Visualisation the link is http(s)://<ip address>/va, for BIEE - http(s)://<ip address>/analytics and for Essbase http(s)://<ip address>/essbase. Enterprise Manager and Weblogic Server Console are availabale at port 7001 which is blocked by default.

What is bad that https uses a self-signed certificate. Depending on browser settings it may give an error or even prevent access to https.

Options here either use HTTP rather than HTTPS or add this certificate to your local computer. But these aren't the options for a production server. Luckily Oracle provides a way to use own SSL certificates.

Typical Management Tasks SSH to Instances

During the setup process, we provide Oracle with a public key which is used to get an SSH access to instances. Cloud does nothing special to this. In the case of Windows, we may use Putty. Just add the private key to Pageant and connect to the instance using user opc.



Opening Ports

By default only the absolute minimum of the ports is open and we can't connect to the OAC instance using BI Admin tool or to the DB with SQLDeveloper. In order to do this, we should create an access rule which allows access to this particular ports.

In order to get to the Access Rules interface, we must use instance menu and select the Access Rules option.


This will open the Access Rules list. What I don't like about it is that it opens the full list of all rules but we can create only a rule for this particular instance.


New rule creation form is simple and should cause no issues. But be careful here and not open too much for a wild Internet.


Add More Users

The user who registered a Cloud Account becomes its administrator and can invite more users and manage privileges.


Here we can add and modify users.


When we add a user we specify a name, email and login. Also here we set roles for the user. The user will get an email with these details, and link to register.

Obviously, the user won't be asked about a credit card. He just starts working and that's all.


My first steps with Oracle Analytics Cloud were not very easy, but I think it was worth it. Now I can create a new OBIEE instance just in a few minutes and one hour later it will be up and running. And I think that's pretty fast compared to a normal process of creating a new server in a typical organisation. We don't need to think about OS installation, or licenses, or whatever else. Just try it.

Categories: BI & Warehousing

OBIEE upgrades and Windows vulnerabilities

Mon, 2017-05-15 06:00
OBIEE upgrades and Windows vulnerabilities

OBIEE upgrades and Windows vulnerabilities

These two topics may seem unrelated; however, the ransomware attacks over the last few days provide us with a reminder of what people can do with known vulnerabilities in an operating system.

Organisations consider upgrades a necessary evil; they cost money, take up time and often have little tangible benefit or return on investment (ROI). In the case of upgrades between major version of software, for example, moving from OBIEE 10g to 12c there are significant architecture, security, functional and user interface changes that may justify the upgrade alone, but they are unlikely to significantly change the way an organisation operates and may introduce new components and management processes which produce an additional overhead.

There is another reason to perform upgrades: to keep your operating systems compliant with corporate security standards. OBIEE, and most other enterprise software products, come with certification matrices that detail the supported operating system for each product. The older the version of OBIEE, the older the supported operating systems are, and this is where the problem starts.

If we take an example of an organisation running OBIEE 10g, the most recent certified version of Windows it can run is Windows 2008 R2, which will fall outside of your company's security policy. You will be less likely to be patching the operating system on the server as it will either have fallen off the radar or Microsoft may have stopped releasing patches for that version of the operating system.

The result leaves a system that has access to critical enterprise data vulnerable to known attacks.

The only answer is to upgrade, but how do we justify ROI and obtain budget? I think we need to recognise that there is a cost of ownership associated with maintaining systems, the benefit of which is the mitigation of the risk of an instance like the ransomware attacks. It is highly unlikely that anyone could have predicted those attacks, so you could never have used it as a reason to justify an upgrade. However, these things do happen, and a significant amount of cyber attacks probably go on undetected. The best protection you have is to make sure your systems are up to date.

Categories: BI & Warehousing

A focus on Higher Education, HEDW 2017

Wed, 2017-05-03 09:04

First, before I get into a great week of Higher Education Data Warehousing and analytics discussions, I want to thank the HEDW board and their membership. They embraced us with open arms in our first year of conference sponsorship. Our longtime friend and HEDW board member, Phyllis Wykoff, from Miami University of Ohio even spent some time with us behind the booth!

HEDW was in the lovely desert scape of Tucson, AZ at the University of Arizona. Sunday was a fantastic day of training, followed by three days of outstanding presentations from member institutions and sponsors. Rittman Mead wanted to show how important the higher education community is to us, so along with me, we had our CEO-Jon Mead, our CTO-Jordan Meyer, and our US Managing Director-Charles Elliott. If our AirBnB had ears, it would have heard several solutions to the problems of the world as well as discussions of the fleeting athleticism of days gone past. But alas, that will have to wait.

While at the conference, we had a multitude of great conversations with member institutions and there were a few themes that stuck out to us with regard to common issues and questions from our higher education friends. I will talk a little bit about each one below with some context on how Rittman Mead is the right fit to be partners in addressing some big questions out there.

Legacy Investment vs BI tool Diversification (or both)

One theme that was evident from hour one was the influx of Tableau in the higher education community. Rittman Mead is known for being the leader in the Oracle Business Intelligence thought and consulting space and we very much love the OBIEE community. With that said, we have, like all BI practitioners, seen the rapid rise of Tableau within departments and lately as an enterprise solution. It would be silly for the OBIEE community to close their eyes and pretend that it isn’t happening. There are great capabilities coming out of Oracle with Data Visualization but the fact is, people have been buying Tableau for a few years and Tableau footprints exist within organizations. This is a challenge that isn't going away.

Analytics Modernization Approaches

We had a ton of conversations about how to include newer technologies in institutions’ business intelligence and data warehousing footprints. There is clearly a desire to see how big data technologies like Hadoop, data science topics like the R statistical modeling language, and messaging services like Kafka could positively impact higher education organizations. Understanding how you may eliminate batch loads, predict student success, know if potential financial aid is not being used, know more about your students with analysis of student transactions with machine learning, and store more data with distributed architectures like Hadoop are all situations that are readily solvable. Rittman Mead can help you prioritize what will make the biggest value impact with a Modernization Assessment. We work with organizations to make good plans for implementation of modern technology at the right place and at the right time. If you want more info, please let us know.

Sometimes we need a little help from our friends

Members of HEDW need a different view or another set of eyes sometimes and the feedback we heard is that consulting services like ours can seem out of reach with budgets tighter than ever. That is why we recently announced the Rittman Mead Expert Service Desk. Each month, there are hours available to spend however you would like with Rittman Mead’s experts. Do you have a mini project that never seems to get done? Do you need help with a value proposition for a project or upgrade? Did production just go down and you can’t seem to figure it out? With Expert Service desk, you have the full Rittman Mead support model at your fingertips. Let us know if you might want a little help from your friends at Rittman Mead.

To wrap up

Things are a changing and sometimes it is tough to keep up with all of the moving parts. Rittman Mead is proud to be a champion of sharing new approaches and technologies to our communities. Spending time this week with our higher education friends is proof more that our time spent sharing is well worth it. There are great possibilities out there and we look forward to sharing them throughout the year and at HEDW 2018 in Oregon!

Categories: BI & Warehousing

The Case for ETL in the Cloud - CAPEX vs OPEX

Thu, 2017-04-27 11:12

Recently Oracle announced a new cloud service for Oracle Data Integrator. Because I was helping our sales team by doing some estimates and statements of work, I was already thinking of costs, ROI, use cases, and the questions behind making a decision to move to the cloud. I want to explore what is the business case for using or switching to ODICS?

Oracle Data Integration Cloud Services

First, let me briefly talk about what is Oracle Data Integration Cloud Services? ODICS is ODI version available on Oracle’s Java Cloud Service known as JCS. Several posts cover the implementation, migration, and technical aspects of using ODI in the cloud. Instead of covering the ‘how’, I want to talk about the ‘when’ and ‘why’.

Use Cases

What use cases are there for ODICS?
1. You have or soon plan to have your data warehouse in Oracle’s Cloud. In this situation, you can now have your ODI J2EE agent in the same cloud network, removing network hops and improving performance.
2. If you currently have an ODI license on-premises, you are allowed to install that license on Oracle’s JCS at the JCS prices. See here for more information about installing on JCS. These use cases are described in a webinar posted in the PM Webcast Archive.

When and Why?

So when would it make sense to move towards using ODICS? These are the scenarios I imagine being the most likely:
1. A new customer or project. If a business doesn’t already have ODI, this allows them to decide between an all on-premises solution or a complete solution in Oracle’s cloud. With monthly and metered costs, the standard large start-up costs for hardware and licenses are avoided, making this solution available for more small to medium businesses.
2. An existing business with ODI already and considering moving their DW to the cloud. In this scenario, a possible solution would be to move the current license of ODI to JCS and begin using that to move data, all while tracking JCS costs. When the time comes to review licensing obligations for ODI, compare the calculation for a license to the calculation of expected usage for ODICS and see which one makes the most sense (cents?). For a more detailed explanation of this point, let’s talk CAPEX and OPEX!


CAPEX and OPEX are short for Capital Expense and Operational Expense, respectively. In a finance and budgeting perspective, these two show up very differently on financial reports. This often has tax considerations for businesses. Traditionally in the past, a data warehouse project was a very large initial capital expenditure, with hardware, licenses, and project costs. This would land it very solidly as CAPEX. Over the last several years, sponsorship for these projects has shifted from CIOs and IT Directors to CFOs and Business Directors. With this shift, several businesses would rather budget and see these expenses monthly as an operating expense as opposed to every few years having large capital expenses, putting these projects into OPEX instead.


Having monthly and metered service costs in the cloud that are fixed or predictable are appealing. As a bonus, this style of service is highly flexible and can scale up (or down) as demand changes. If you are or will soon be in the process of planning for your future business analytics needs, we provide expert services, assessments, accelerators, and executive consultations for assisting with these kinds of decisions. When it is time to talk about actual numbers, your Oracle Sales Representative will have the best prices. Please get in touch for more information.

Categories: BI & Warehousing

SQL-on-Hadoop: Impala vs Drill

Wed, 2017-04-19 10:01
 Impala vs Drill

I recently wrote a blog post about Oracle's Analytic Views and how those can be used in order to provide a simple SQL interface to end users with data stored in a relational database. In today's post I'm expanding a little bit on my horizons by looking at how to effectively query data in Hadoop using SQL. The SQL-on-Hadoop interface is key for many organizations - it allows querying the Big Data world using existing tools (like OBIEE,Tableau, DVD) and skills (SQL).

Analytic Views, together with Oracle's Big Data SQL provide what we are looking for and have the benefit of unifying the data dictionary and the SQL dialect in use. It should be noted that Oracle Big Data SQL is licensed separately on top of the database and it's available for Exadata machines only.

Nowadays there is a multitude of open-source projects covering the SQL-on-Hadoop problem. In this post I'll look in detail at two of the most relevant: Cloudera Impala and Apache Drill. We'll see details of each technology, define the similarities, and spot the differences. Finally we'll show that Drill is most suited for exploration with tools like Oracle Data Visualization or Tableau while Impala fits in the explanation area with tools like OBIEE.

As we'll see later, both the tools are inspired by Dremel, a paper published by Google in 2010 that defines a scalable, interactive ad-hoc query system for the analysis of read-only nested data that is the base of Google's BigQuery. Dremel defines two aspects of big data analytics:

  • A columnar storage format representation for nested data
  • A query engine

The first point inspired Apache Parquet, the columnar storage format available in Hadoop. The second point provides the basis for both Impala and Drill.

Cloudera Impala

We started blogging about Impala a while ago, as soon as it was officially supported by OBIEE, testing it for reporting on top of big data Hadoop platforms. However, we never went into the details of the tool, which is the purpose of the current post.

Impala is an open source project inspired by Google's Dremel and one of the massively parallel processing (MPP) SQL engines running natively on Hadoop. And as per Cloudera definition is a tool that:

provides high-performance, low-latency SQL queries on data stored in popular Apache Hadoop file formats.

Two important bits to notice:

  • High performance and low latency SQL queries: Impala was created to overcome the slowness of Hive, which relied on MapReduce jobs to execute the queries. Impala uses its own set of daemons running on each of the datanodes saving time by:
    • Avoiding the MapReduce job startup latency
    • Compiling the query code for optimal performance
    • Streaming intermediate results in-memory while MapReduces always writing to disk
    • Starting the aggregation as soon as the first fragment starts returning results
    • Caching metadata definitions
    • Gathering tables and columns statistics
  • Data stored in popular Apache Hadoop file formats: Impala uses the Hive metastore database. Databases and tables are shared between both components. The list of supported file formats include Parquet, Avro, simple Text and SequenceFile amongst others. Choosing the right file format and the compression codec can have enormous impact on performance. Impala also supports, since CDH 5.8 / Impala 2.6, Amazon S3 filesystem for both writing and reading operations.

One of the performance improvements is related to "Streaming intermediate results": Impala works in memory as much as possible, writing on disk only if the data size is too big to fit in memory; as we'll see later this is called optimistic and pipelined query execution. This has immediate benefits compared to standard MapReduce jobs, which for reliability reasons always writes intermediate results to disk.
As per this Cloudera blog, the usage of Impala in combination with Parquet data format is able to achieve the performance benefits explained in the Dremel paper.

Impala Query Process

Impala runs a daemon, called impalad on each Datanode (a node storing data in the Hadoop cluster). The query can be submitted to any daemon in the cluster which will act as coordinator node for the query. Impala daemons are always connected to the statestore, which is a process keeping a central inventory of all available daemons and related health and pushes back the information to all daemons. A third component called catalog service checks for metadata changes driven by Impala SQL in order to invalidate related cache entries. Metadata are cached in Impala for performance reasons: accessing metadata from the cache is much faster than checking against the Hive metastore. The catalog service process is in charge of keeping Impala's metadata cache in sync with the Hive metastore.

Once the query is received, the coordinator verifies if the query is valid against the Hive metastore, then information about data location is retrieved from the Namenode (the node in charge of storing the list of blocks and related location in the datanodes), it fragments the query and distribute the fragments to other impalad daemons to execute the query. All the daemons read the needed data blocks, process the query, and stream partial result to the coordinator (avoiding the write to disk), which collects all the results and delivers it back to the requester. The result is returned as soon as it's available: certain SQL operations like aggregations or order by require all the input to be available before Impala can return the end result, while others, like a select of pre-existing columns without a order by can be returned with only partial results.

 Impala vs Drill

Apache Drill

Defining Apache Drill as SQL-on-Hadoop is limiting: also inspired by Google's Dremel is a distributed datasource agnostic query engine. The datasource agnostic part is very relevant: Drill is not closely coupled with Hadoop, in fact it can query a variety of sources like MongoDB, Azure Blob Storage, or Google Cloud Storage amongst others.

One of the most important features is that data can be queried schema-free: there is no need of defining the data structure or schema upfront - users can simply point the query to a file directory, MongoDB collection or Amazon S3 bucket and Drill will take care of the rest. For more details, check our overview of the tool. One of Apache Drill's objectives is cutting down the data modeling and transformation effort providing a zero-day analysis as explained in this MapR video.
 Impala vs Drill

Drill is designed for high performance on large datasets, with the following core components:

  • Distributed engine: Drill processes, called Drillbits, can be installed in many nodes and are the execution engine of the query. Nodes can be added/reduced manually to adjust the performances. Queries can be sent to any Drillbit in the cluster that will act as Foreman for the query.
  • Columnar execution: Drill is optimized for columnar storage (e.g. Parquet) and execution using the hierarchical and columnar in-memory data model.
  • Vectorization: Drill take advantage of the modern CPU's design - operating on record batches rather than iterating on single values.
  • Runtime compilation: Compiled code is faster than interpreted code and is generated ad-hoc for each query.
  • Optimistic and pipelined query execution: Drill assumes that none of the processes will fail and thus does all the pipeline operation in memory rather than writing to disk - writing on disk only when memory isn't sufficient.
Drill Query Process

Like Impala's impalad, Drill's main component is the Drillbit: a process running on each active Drill node that is capable of coordinating, planning, executing and distributing queries. Installing Drillbit on all of Hadoop's data nodes is not compulsory, however if done gives Drill the ability to achieve the data locality: execute the queries where the data resides without the need of moving it via network.

When a query is submitted against Drill, a client/application is sending a SQL statement to a Drillbit in the cluster (any Drillbit can be chosen), which will act as Foreman (coordinator in Impala terminology) that will parse the SQL and convert it into a logical plan composed by operators. The next step is the cost-based optimizer which, based on optimizations like rule/cost based, data locality and storage engine options, rearranges operations to generate the optimal physical plan. The Foreman then divides the physical plan in phases, called fragments, which are organised in a tree and executed in parallel against the data sources. The results are then sent back to the client/application. The following image taken from drill.apache.org explains the full process:

 Impala vs Drill

Similarities and Differences

As we saw above, Drill and Impala have a similar structure - both take advantage of always on daemons (faster compared to the start of a MapReduce job) and assume an optimistic query execution passing results in cache. The code compilation and the distributed engine are also common to both, which are optimized for columnar storage types like Parquet.

There are, however, several differences. Impala works only on top of the Hive metastore while Drill supports a larger variety of data sources and can link them together on the fly in the same query. For example, implicit schema-defined files like JSON and XML, which are not supported natively by Impala, can be read immediately by Drill.
Drill usually doesn't require a metadata definition done upfront, while for Impala, a view or external table has to be declared before querying. Following this point there is no concept of a central and persistent metastore, and there is no metadata repository to manage just for Drill. In OBIEE's world, both Impala and Drill are supported data sources. The same applies to Data Visualization Desktop.
 Impala vs Drill

The aim of this article isn't a performance-wise comparison since those depends on a huge amount of factors including data types, file format, configurations, and query types. A comparison dated back in 2015 can be found here. Please be aware that there are newer versions of the tools since this comparison, which bring a lot of changes and improvements for both projects in terms of performance.


Impala and Drill share a similar structure - both inspired by Google's Dremel - relying on always active daemons deployed on cluster nodes to provide the best query performances on top of Big Data data structures. So which one to choose and when?
As described, the capability of Apache Drill to query a raw data-source without requiring an upfront metadata definition makes the tool perfect for insights discovery on top of raw data. The capacity of joining data coming from one or more storage plugins in a unique query makes the mash-up of disparate data sources easy and immediate. Data science and prototyping before the design of a reporting schema are perfect use cases of Drill. However, as part of the discovery phase, a metadata definition layer is usually added on top of the data sources. This makes Impala a good candidate for reporting queries.
Summarizing, if all the data points are already modeled in the Hive metastore, then Impala is your perfect choice. If instead, you need a mashup with external sources, or need work directly with raw data formats (e.g. JSON), then Drill's auto-exploration and openness capabilities are what you're looking for.
Even though both tools are fully compatible with Oracle BIEE and Data Visualization (DV), due to Drill's data exploration nature, it could be considered more in line with DV use cases, while Impala is more suitable for standard reporting like OBIEE. The decision on tooling highly depends on the specific use case - source data types, file formats and configurations have deep impact on the agility of the business analytics process and query performance.

If you want to know more about Apache Drill, Impala and the use cases we have experienced, don't hesitate to contact us!

Categories: BI & Warehousing

OBIEE Component Status Notifications

Wed, 2017-04-05 09:00

At Rittman Mead, we often hear requests for features or solutions generally not provided by Oracle. These requests range from commentary functionality to custom javascript visualizations. There are many ways to implement these functionalities, but organizations often lack the in-house resources to engineer an acceptable solution.

Rittman Mead has the capability to engineer any solution desired, and in many cases, has already developed a solution. Many of our accelerators currently offered, such as Chit Chat or User Engagement, grew out of numerous requests for these features.

One of the more common requests we hear at Rittman Mead is for BI Administrators to receive notifications for the status of their OBIEE components. They want to be notified of the status of the system components throughout the day in a convenient manner, so any failures are recognized quickly.

This particular feature can easily be implemented with Rittman Mead's Performance Analytics service. However, Rittman Mead would like to publicly provide this capability independent of our accelerator. We have developed a small Python script to provide this functionality, and we would like to give this script to the community.

The provided script is available free of charge, and available under the MIT license. It has been tested on both OBIEE 11G and 12C environments, as well as on Windows and Linux operating systems. The rest of this blog will detail, at a high level, how the script works, and how to configure it correctly.

The script is available through our public Github repository here.

Script Output

First, let's clarify how we will gather the status of the components in the first place. Thankfully, OBIEE includes some scripts to display this information on both Linux and Windows. In 12C, the script is status.sh/status.cmd, and in 11G the primary command is opmnctl status.

When I execute this script on an OBIEE 12C OEL environment, I receive the following response:

The output includes some extra information we don't require, but we can ignore it for now. With some programming knowledge, we can trim what we don't need, organize it into a nice table, and then send the output to nearly anywhere desired. For portability and stability, I will use Python to organize the message contents and I will also use email as the channel of communication.

Sending the Output Through Email

If we are only concerned with notifying administrators of the current status, one of the better avenues to send this data is through email. An email destination will allow users to be able to receive the status of the components almost instantaneously, and be able to take the appropriate action as soon as possible.

Additionally, Python's standard set of modules includes functions to assist in sending SMTP messages, making the script even more portable and maintainable. The simplest method to generate the email is just by sending the complete output as the body of the message. An example of this output is below:

While this works, it's not exactly attractive. With some Python and HTML/CSS skills, we can style the email to look much nicer:

Now we have something nice we can send BI Administrators to show the status of the components.

Configuration and Use

To effectively utilize this script, you will have to change some of the configuration parameters, located at the top of the script. The parameters I am using are shown below (with sensitive information hidden, of course):

The sender and username fields should both be the user you are logging in as on the SMTP server to send the email. If you want the email address shown on a message to be different than the user configured on the SMTP server, then these can be set separately. The password field should be the password for the user being configured on the SMTP server.

The recipient field should be the address of the user who will be receiving the emails. For simple management, this should be a single user, who should then be configured to forward all incoming status emails to the responsible parties. This will allow easier maintenance, since modifying the list of users can take place in the email configuration, rather than the script configuration. In this example, I am just sending the emails to my Rittman Mead email address.

The SMTP settings should also be updated to reflect the SMTP server being utilized. If you are using Gmail as the SMTP server, then the configuration shown should work without modifications.

Finally, the python script requires the absolute path to the status command to execute to produce the output (the opmnctl or status commands). Environment variables may not be taken into consideration by this script, so it's best to not use a variable in this path.

NOTE: If the \ character is in the path, then you MUST use \\ instead. This is especially true on Windows environments. If this change is required but omitted, the script will not function properly.

Additionally, if you don't care about the HTML output (or if it doesn't render nicely in your email client), then it can be disabled by setting the value of render_html to False. If, for some reason, the nice HTML fails to render, then the email will just send the plain text output, instead of failing to deliver an email at all.

Once configured, try executing the script:
python email_component_status.py

If everything worked correctly, then you should have received an email with the status of the components. If you do not receive an email, then you should check both the configuration settings, and the internet connection of the machine (firewalls included). The script will also generate output that should assist you in troubleshooting the cause of the problem.

Additional Notifications or Destinations

The solution provided, while useful, is not perfect. What if you want to send this information to a destination other than an email address, such as a ticketing system like Atlassian JIRA? Or what if you want notifications based on other situations, such as slow running reports, or high CPU usage?

There may be many situations in which you would want one, or several, employees to receive different notifications based on events or circumstances that occur in your OBIEE environment. The script in this blog post only provides one of these notifications, but implementing many more will quickly become burdensome.

As part of Rittman Mead's Performance Analytics offering, we include custom software and code to fulfill this requirement. In addition to providing dashboards to explore the performance of your OBIEE systems, Performance Analytics can be configured to distribute alerts, based on any quantifiable condition, to any number of external systems.

The full Performance Analytics suite can not only alert users of down system components, but of any number of conditions that may occur in your BI environment.

If you have questions about this script, Performance Analytics, or anything else Contact Us here.

To find out more about Performance Analytics, contact us, visit the product page here, or read some of the fantastic blogs from Robin Moffatt.

Categories: BI & Warehousing

Oracle Analytics Cloud: Product Overview

Tue, 2017-04-04 10:00
 Product Overview

We at Rittman Mead are always helping our customer solving their problems, many times we heard them

  • being unsure about the sizing of their server
  • being worried about the upfront cost of the licensing
  • having recurring nightmares about patching
  • willing to try the cloud but couldn't find the right option to replace their on-premises system

This is their lucky day: Oracle officially launched Oracle Analytics Cloud (OAC), a new PaaS (Platform as a Service) providing a complete and elastic Business Intelligence platform in the cloud, customizable and managed by you but all on the Oracle Cloud!

 Product Overview

If you haven't been on a remote island you may have noticed that in recent years Oracle's main focus has been around the Cloud. Several products have been launched covering a vast spectrum of functionalities: Data Management, Application Development, Business Analytics and Security are only some of the areas covered by the Software/Platform/Infrastructure as a Service offering.

 Product Overview

In the Business Analytics area, we at Rittman Mead started thinking long time ago on how to host Oracle's BI on-premises (OBIEE) in the Cloud and worked closely with Oracle since the beta phase of their first PaaS product: BI Cloud Service (BICS). Effectively we put our hands on all the cloud products in the BA family like Big Data Discovery (both on premises and cloud), Data Visualization Cloud Service, Big Data Preparation Service.

Business Intelligence Cloud Products

Until few weeks ago Oracle's main Business Analytics cloud products were BI Cloud Service (BICS) and Data Visualization Cloud Service (DVCS). As mentioned in our blog both tools aimed initially at departmental use-cases: the simplicity of the data model interface and the lack of admin configuration options stopped them from being a compelling story for hosting a full enterprise Business Intelligence solution.

 Product Overview

New features like BICS Data Sync, Remote Data Connector and RPD lift and shift addressed almost all the limitations but the lack of detailed admin/maintenance capabilities represent a stopper for moving complex environments in the cloud. Still BICS and DVCS are perfect for their aim: business users analysing sets of data without needing to wait the IT to provision a server or to care about upfront licensing costs.

Oracle Analytics Cloud

Oracle Analytics Cloud extends the watermark in every direction by providing a product that is:

  • Complete functionality: most of the tools, procedures and options provided on-premises are now available in OAC.
  • Combining all the offering of BICS, DV, BIEE and Essbase: OAC includes the features of Oracle's top BI products.
  • Licensing Tailored: the many options available (discussed in a later post) can be chosen depending on analytical needs, timeframe of service, required performances
  • Easily Scalable: do you want to expand your BI solution to the double of the users without loosing performances? Just buy some more horsepower!
  • Fully Accessible: SSH connection available to the server makes it easy to change settings as needed, REST API and Clients are provided for all lifecycle operations
  • Customizable: settings, images, networking, VPN all settings are available
  • Scriptable: settings like scaling, instance creation and deletion, start and stop can be easily scripted via the REST-APIs
  • Fully Customer Managed: Oracle provides the automation to backup and patch but the customer decides when to run them.
What's The Difference?

So what's the difference between Oracle Analytics Cloud and the "old" DVCS and BICS? How is OACS going to change Oracle's BI offer in the cloud?

The great deal of using OACS is control: BICS/DVC limiting factors around admin options and development are solved providing a tool capable of hosting a full enterprise BI solution. Even if the platform is managed by Oracle SSH access is provided meaning that instance configurations can be changed. No more upfront server sizing decisions, now the size of the instance is decided during creation time and can be changed later in the process if the demand changes.

The REST-APIs will enable the scripting of the full lifecycle of the instance, providing a way to automate the BI enterprise workflow even in complex environments where concurrent development is needed. Patching and Backups are not a problem anymore with the automated processes provided.

Direct RPD online editing is available with the Admin tool. The old BICS Data Modeler is still there for simple models, but Admin Tool can be used in case of complex RPDs.

 Product Overview

The front-end is like the BICS and OBIEE 12c one, some new visualization have been added to Visual Analyzer in line with the new additions to Data Visualization Desktop: Parallel Coordinates, Chord, Network, Sankey diagrams are now available.

 Product Overview

A new console is now available in Visual Analyzer allowing settings like Mail or Deliveries that before were only accessible via Weblogic Console, Enterprise Manager or config files.

 Product Overview

Finally Essbase is now available in the cloud too with a new web interface!
Summarizing, if you wanted to go Cloud, but were worried about missing options, now Oracle Analytics Cloud provides all you need to host a full Enterprise BI solution.

In the next few days I'll be analysing various aspects of Oracle Analytics Cloud Suite, so keep in touch!

If you need assistance in checking if Oracle Analytics Cloud suits your needs or in planning your migration to the cloud don't hesitate to contact us

Categories: BI & Warehousing

Metadata Modeling in the Database with Analytic Views

Mon, 2017-04-03 09:00
Metadata Modeling in the Database with Analytic Views

12.2, the latest Oracle database release provides a whole set of new features enhancing various aspects of the product including JSON support, Auto-List Partitioning and APEX news among others. One of the biggest news in the Data Warehousing / Analytics area was the introduction of the Analytic Views, that as per Oracle's definition are

Metadata objects that enable the user to quickly and easily create complex hierarchical and dimensional queries on data in database tables and views


If you are on rush, here is an abstract of what you'll find in this looooong blog post:

Metadata modeling can now be done directly in the database using Analytic Views, providing to end users a way of querying database objects without needing a knowledge of joining conditions, aggregation functions or order by clauses.
This post will guide you through the creation of an analytic view that replicates a part of a OBIEE's Sampleapp business model. The latest part of the post is dedicated to understanding the usage of analytic views and the benefits for end users especially in cases when self-service BI tools are used.

Metadata Modeling in the Database with Analytic Views

If you are still around and interested in the topic please take a drink and sit comfortably, it will be a good journey.

Metadata Modeling

What are then the Analytics Views in detail? How are they going to improve end user's ability in querying data?

To answer above question I would take a step back. Many readers of this blog are familiar with OBIEE and its core: the Repository. The repository contains the metadata model from the physical sources till the presentation areas and includes the definition of:

  • Joins between tables
  • Hierarchies for dimensions
  • Aggregation rules
  • Security settings
  • Data Filters
  • Data Sources

This allows end users to just pick columns from a Subject Area and display them in the appropriate way without needing to worry about writing SQL or knowing how the data is stored. Moreover definitions are held centrally providing the famous unique source of truth across the entire enterprise.

Metadata Modeling in the Database with Analytic Views

The wave of self-service BI tools like Tableau or Oracle's Data Visualization Desktop provided products capable of querying almost any kind of data sources in a visual and intuitive way directly in the end user hands. An easy and direct access to data is a good thing for end user but, as stated above, requires knowledge of the data model, joins and aggregation methods.
The self-service tools can slightly simplify the process by providing some hints based on column names, types or values but the cruel reality is that the end-user has to build the necessary knowledge of the data source before providing correct results. This is why we've seen several times self-service BI tools being "attached" to OBIEE: get corporate official data from the unique source of truth and mash them up with information coming from external sources like personal Excel files or output of Big Data processes.

Analytics Views

Analytic Views (AV) take OBIEE's metadata modeling concept and move it at database level providing a way of organizing data in a dimensional model so it can be queried with simpler SQL statements.
The Analytical Views are standard views with the following extra options:

  • Enable the definition of facts, dimensions and hierarchies that are included in system-generated columns
  • Automatically aggregate the data based on pre-defined calculations
  • Include presentation metadata

Analytics views are created with a CREATE ANALYTIC VIEW statement, some privileges need to be granted to the creating user, you can find the full list in Oracle's documentation.

Every analytical view is composed by the following metadata objects:

  • Attribute dimensions: organising table/view columns into attributes and levels.
  • Hierarchies: defining hierarchical relationships on top of an attribute dimension object.
  • Analytic view objects: defining fact data referencing both fact tables and hierarchies.

With all the above high level concepts in mind it's now time to try how Analytical Views could be used in a reporting environment.

Database Provisioning

For the purpose blog post I used Oracle's database Docker image, provided by Gerald Venzl, the quickest way of spinning up a local instance. You just need to:

  • Install Docker
  • Download database installer from Oracle's website
  • Place the installer in the proper location mentioned in the documentation
  • Build Oracle Database Enterprise Edition Docker image by executing
./buildDockerImage.sh -v -e
  • Running the image by executing
docker run --name db12c -p 1521:1521 -p 5500:5500 -e ORACLE_SID=orcl -e ORACLE_PDB=pdborcl -e ORACLE_CHARACTERSET=AL32UTF8 oracle/database:  

The detailed parameters definition can be found in the GitHub repository. You can then connect via sqlplus to your local instance by executing the standard

sqlplus sys/pwd@//localhost:1521/pdborcl as sysdba  

The password is generated automatically during the first run of the image and can be found in the logs, look for the following string


Once the database is created it's time to set the goal: I'll try to recreate a piece of the Oracle's Sampleapp RPD model in the database using Analytic Views.

Model description

In this blog post I'll look in the 01 - Sample App business model and specifically I'll try to replicate the logic behind Time, Product and the F0 Sales Base Measures using Analytic Views.

Metadata Modeling in the Database with Analytic Views

Dim Product

The Sampleapp's D1 - Products (Level Based Hierarchy) is based on two logical table sources: SAMP_PRODUCTS_D providing product name, description, LOB and Brand and the SAMP_PROD_IMG_D containing product images. For the purpose of this test we'll keep our focus on SAMP_PRODUCTS_D only.
The physical mapping of Logical columns is shown in the image below.

Metadata Modeling in the Database with Analytic Views

Attribute Dimension

The first piece we're going to build is the attribute dimension, where we'll be defining attributes and levels. The mappings in above image can "easily" be translated into an attributes with the following SQL.

 (PROD_KEY as P0_Product_Number
    CLASSIFICATION caption VALUE 'P0 Product Number',
  PROD_DSC as P1_Product
    CLASSIFICATION caption VALUE 'P1 Product',
  TYPE as P2_Product_Type
    CLASSIFICATION caption VALUE 'P2 Product Type',
  TYPE_KEY as P2k_Product_Type
    CLASSIFICATION caption VALUE 'P2k Product Type',
  LOB as P3_LOB
  LOB_KEY as P3k_LOB
  BRAND as P4_Brand
    CLASSIFICATION caption VALUE 'P4 Brand',
  BRAND_KEY as P4k_Brand
    CLASSIFICATION caption VALUE 'P4k Brand',
  ATTRIBUTE_1 as P5_Attribute_1
    CLASSIFICATION caption VALUE 'P5 Attribute 1',
  ATTRIBUTE_2 as P6_Attribute_2
    CLASSIFICATION caption VALUE 'P6 Attribute 2',
  SEQUENCE as P7_Product_Sequence
    CLASSIFICATION caption VALUE 'P7 Product Sequence',
  TOTAL_VALUE as P99_Total_Value
    CLASSIFICATION caption VALUE 'P99 Total Value')

Few pieces to note:

  • CREATE OR REPLACE ATTRIBUTE DIMENSION: we are currently defining a dimension, the attributes and levels.
  • USING SAMP_PRODUCTS_D: defines the datasource, in our case the table SAMP_PRODUCTS_D. Only one datasource is allowed per dimension.
  • PROD_KEY as P0_Product_Number: using the standard notification as we can easily recaption columns names
  • CLASSIFICATION CAPTION ... several options can be added for each attribute like caption or description

The dimension definition is not complete with only attribute declaration, we also need to define the levels. Those can be taken from OBIEE's hierarchy

Metadata Modeling in the Database with Analytic Views

For each level we can define:

  • The level name, caption and description
  • The Key
  • the Member Name and Caption
  • the Order by Clause

Translating above OBIEE's hierarchy levels into Oracle SQL

  CLASSIFICATION description VALUE 'Brand'
  KEY P4k_Brand
  ORDER BY P4_Brand
LEVEL Product_LOB  
  CLASSIFICATION description VALUE 'Lob'
LEVEL Product_Type  
  CLASSIFICATION description VALUE 'Type'
  KEY P2k_Product_Type
  MEMBER NAME P2_Product_Type
  MEMBER CAPTION P2_Product_Type
  ORDER BY P2_Product_Type
LEVEL Product_Details  
  CLASSIFICATION description VALUE 'Detail'
  KEY P0_Product_Number
  MEMBER NAME P1_Product
  ORDER BY P1_Product

There is an additional DETERMINES line in above sql for each level apart from Brand, this is how we can specify the relationship between level keys. If we take the Product_LOB example, the DETERMINES(P4k_Brand) defines that any LOB in our table automatically determines a Brand (in OBIEE terms that LOB is a child of Brand).


Next step is defining a hierarchy on top of the attribute dimension D1_PRODUCTS defined above. We can create it just by specifying:

  • the attribute dimension to use
  • the list of levels and the relation between them

which in our case becomes

  CLASSIFICATION caption VALUE 'Products Hierarchy'
  (Product_Details CHILD OF
   Product_Type CHILD OF
   Product_LOB CHILD OF

When looking into the hierarchy Product_hier we can see that it's creating an OLAP-style dimension with a row for each member at each level of the hierarchy and extra fields like DEPT, IS_LEAF and HIER_ORDER

Metadata Modeling in the Database with Analytic Views

The columns contained in Product_hier are:

  • One for each Attribute defined in attribute dimension D1_PRODUCTS like P0_PRODUCT_NUMBER or P2K_PRODUCT_TYPE
  • The member name, caption and description and unique name
  • The level name in the hierarchy and related depth
  • The relative order of the member in the hierarchy
  • A field IS_LEAF flagging hierarchy endpoints
  • References to the parent level
Member Unique Names

A particularity to notice is that the MEMBER_UNIQUE_NAME of Cell Phones is [PRODUCT_TYPE].&[101] which is the concatenation of the LEVEL and the P2K_PRODUCT_TYPE value.
One could expect the member unique name being represented as the concatenation of all the preceding hierarchy members, Brand and LOB, and the member key itself in a string like [PRODUCT_TYPE].&[10001]&[1001]&[101].

This is the default behaviour, however in our case is not happening since we set the DETERMINES(P3k_LOB,P4k_Brand) in the attribute dimension definition. We Specified that Brand ([10001]) and LOB ([1001]) can automatically be inferred by the Product Type so there is no need to store those values in the member key. We can find the same setting in OBIEE's Product Type logical level

Metadata Modeling in the Database with Analytic Views

Dim Date

The basic D0 Dim Date can be built starting from the table SAMP_TIME_DAY_D following the same process as above. Like in OBIEE, some additional settings are required when creating a time dimension:

  • DIMENSION TYPE TIME: the time dimension type need to be specified
  • LEVEL TYPE <LEVEL_NAME>: each level in the time hierarchy needs to belong to a precise level type chosen from:
    • YEARS
    • MONTHS
    • WEEKS
    • DAYS
    • HOURS
Attribute Dimension

Metadata Modeling in the Database with Analytic Views

Taking into consideration the additional settings, the Dim Date column mappings in above image can be translated in the following attribute dimension SQL definition.


You may have noticed a different mapping of keys, member names and order by attributes. Let's take the CAL_MONTH as example. It's defined by two columns

  • BEG_OF_MTH_WID: used for joins and ordering
  • PER_NAME_MONTH: used as "display label"

PER_NAME_MONTH in the YYYY / MM format could be also used for ordering, but most of the times end user requests months in the MM / YYYY format. Being able to set a ordering column different from the member name allows us to properly manage the hierarchy.

Metadata Modeling in the Database with Analytic Views


Time hierarchy follows the same rules as the product one, no additional settings are required.

Fact Sales

The last step in the journey is the definition of the analytic view of the fact table that as per Oracle's documentation

An analytic view specifies the source of its fact data and defines measures that describe calculations or other analytic operations to perform on the data. An analytic view also specifies the attribute dimensions and hierarchies that define the rows of the analytic view.

The analytic view definition contains the following specifications:

  • The data source: the table or view that will be used for the calculation
  • The columns: which columns from the source objects to use in the calculations
  • The attribute dimensions and hierarchies: defining both the list of attributes and the levels of the analysis
  • The measures: a set of aggregations based on the predefined columns from the data source.

Within analytical views definition a materialized view can be defined in order to store aggregated values. This is a similar to OBIEE's Logical Table Source setting for aggregates.

Analytic View Definition

For the purpose of the post I'll use SAMP_REVENUE_F which is one of the sources of F0 Sales Base Measures in Sampleapp. The following image shows the logical column mapping.

Metadata Modeling in the Database with Analytic Views

The above mappings can be translated in the following SQL


Some important parts need to be highlighted:

  • USING SAMP_REVENUE_F: defines the analytic view source, in our case the table SAMP_REVENUE_F
  • DIMENSION BY: this section provides the list of dimensions and related hierarchies to take into account
  • KEY BILL_DAY_DT REFERENCES TOO_CALENDAR_DATE: defines the join between the fact table and attribute dimension
  • HIERARCHIES (TIME_HIER DEFAULT): multiple hierarchies can be defined on top of an attribute dimension and used in an analytical view, however like in OBIEE only one will be used by default
  • F1_REVENUE FACT REVENUE AGGREGATE BY SUM: defines the measure with alias, source column and aggregation method
  • F2_BILLED_QTY FACT UNITS: if aggregation method is not defined it replies on default SUM
  • F21_REVENUE_AGO: new metrics can be calculated based on previously defined columns replicating OBIEE functions like time-series. The formula (LAG(F1_REVENUE) OVER (HIERARCHY TIME_HIER OFFSET 1)) calculates the equivalent of the OBIEE's AGO function for each level of the hierarchy.
  • DEFAULT MEASURE F1_REVENUE: defines the default measure of the analytic view
Using Analytic Views

After the analytic view definition, it's time to analyse what benefits end users have when using them. We are going to take a simple example: a query to return the Revenue and Billed Qty per Month and Brand.

Using only the original tables we would have the following SQL


The above SQL requires the knowledge of:

  • Aggregation methods
  • Joins
  • Group by
  • Ordering

Even if this is an oversimplification of the analytic view usage you can already spot that some knowledge of the base data structure and SQL language is needed.

Using the analytic views defined above, the query can be written as


As you can see, there is a simplification of the SQL statement: no more aggregation, joining conditions and group by predicates are needed. All the end-user has to know is the analytical view name, and the related hierarchies that can be used.

The additional benefit is that if we want to change the level of granularity of the above query we just need to change the WHERE condition. E.g. to have the rollup per Year and LOB we just have to substitute




without touching granularity, group by and order by statements.

Using Analytic Views in DVD

At the beginning of my blog post I wrote that Analytic Views could be useful when used in conjunction with self-service BI tools. Let's have a look at how the end user journey is simplified in the case of Oracle's Data Visualization Desktop.

Without AV the end-user had two options to source the data:

  • Write the complex SQL statement with joining condition, group and order by clause in the SQL editor to retrieve data at the correct level with the related dimension
  • Import the fact table and dimensions as separate datasources and join them together in DVD's project.

Both options require a SQL and joining conditions knowledge in order to being able to present correct data. Using Analytic Views the process is simplified. We just need to create a new source pointing to the database where the analytic views are sitting.
Next step is retrieve the necessary columns from the analytic view. Unfortunately analytic views are not visible from DVD object explorer (only standard table and views are shown)

Metadata Modeling in the Database with Analytic Views

We can however specify with a simple SQL statement all the informations we need like Time and Member Slice, the related levels and the order in hierarchy.


You may have noted that I'm not specifying any WHERE clause for level filtering: as end user I want to be able to retrieve all the necessary levels by just changing a filter in my DVD project. After including the above SQL in the datasource definition and amending the measure/attribute definition I can start playing with the analytic view data.

Metadata Modeling in the Database with Analytic Views

I can simply include the dimension's MEMBER_NAME in the graphs together with the measures and add the LEVEL_NAME in the filters. In this way I can change the graph granularity by simply selecting the appropriate LEVEL in the filter selector for all the dimensions available.

Metadata Modeling in the Database with Analytic Views

One particular to notice however is that all the data coming from various columns like date, month and year are "condensed" into a single VARCHAR column. In case of different datatypes (like date in the time dimension) this will prevent a correct usage of some DVD's capabilities like time series or trending functions. However if a particular type of graph is needed for a specific level, either an ad-hoc query or a casting operation can be used.


In this blog post we analysed the Analytic Views, a new component in Oracle Database 12.2 and how those can be used to "move" the metadata modeling at DB level to provide an easier query syntax to end-users.

Usually metadata modeling is done in reporting tools like OBIEE that offers additional set of features on top of the one included in analytic views. However centralized reporting tools like OBIEE are not present everywhere and, with the wave of self-service BI tools, analytic views represent a perfect method of enabling users not familiar with SQL to simply query their enterprise data.

If you are interested in understanding more about analytic views or metadata modeling, don't hesitate to contact us!
If you want to improve the SQL skills of your company workforce, check out our recently launched SQL for beginners training!

Categories: BI & Warehousing

Real World OBIEE: Demystification of Variables Pt. 3

Tue, 2017-03-28 08:00
 Demystification of Variables  Pt. 3

In part two of this blog series, I went over using Repository, System and Presentation Variables to make reports dynamic for any series of time. In part three, I am going to talk about making reports dynamic for periods of time using built in functions within Answers itself.

Real World

While it's a lot more efficient to create Repository Variables to use in filters and prompts for time dimensions, sometimes it is simply not possible. Perhaps you are a front end developer for OBIEE and have no access to the RPD or the database . Perhaps you have no communication with the person in your organization who handles all of the RPD development and therefore can not submit any change requests.

Don't worry. We've got you covered.

There are several functions and tricks you can use within Answers itself to make reports dynamic and eliminate having to hardcode dates.

The Scenario

I am going to use the same scenario I used for part two of this series for the example one. To recap, here are the requirements:

I have been asked to create a report that is going to reside on a products dashboard. It needs to have the same product grouping as the report I used in part one of this series, needs to contain 'Gross Rev $', 'Net Rev $' and '# of Orders' and have a prompt that can select between the first and current day of the month and every day in-between. The person who requested the report wants the prompt to change dynamically with each month and does not want users to be able to select future dates.

In part two, I used a custom SQL statment which used Repository Variables I created to populate all the date values from the first of every month to the current date for Variable Prompts. There is a gap in the data loads for # of Orders in which data does not update until the 2nd or 3rd of each new month. The person who requested the report wanted a summary of the previous months '# of Orders' to be shown until the data is updated for the current month. I used a Repository Variable that returned the value of the previous month with the current year and used a CASE statement with along with Filter Expressions to switch between the Filter Expression using the Repository Variable (Prev_Month) if the date was <=2 or if # of Orders is null and the Filter Expression which contained the Start Date and End_Date Presentation Variable placeholders which were defined in my Variable Prompts.

Example One

In this example, I have to figure out a way to make the report dynamic with only the functions available within Answers. There are two parts to this example. First I need to use a function that will return the previous month's value for the Calendar Year Month column to use with the '# of Orders' column. This will replace the Repository Variable Prev_Month I used in part two of this series. Second I need to write a new SQL statment for the Start Date and End_Date prompts I created in part two and also define a new SQL statment for the default values in those prompts.

Part 1

I am going to start by creating a new statement to return the previous month's value for the Calendar Year Month column. I can use the TIMESTAMPADD function in conjunction with the extraction syntax 'YEAR' and 'MONTH' to return the desired results. Let's take a look at the entire statment and then I will break it down.


1. TIMESTAMPADD - This is what defines that I am going to use addition to return a date by adding or one date to another.

2. SQL_TSI_YEAR and SQL_TSI_MONTH - The first argument in the function. It defines what interval of time the function will work with.

3. -1. This the interval of time that is compared to the third argument.

4. CURRENT_DATE - The third argument in the function. This is what the second argument is compared against.

5. YEAR and MONTH - This is the extraction syntax that will return only the year and the month respectively.

Also notice that I have used VARCHAR(6) for the CAST argument. If I use VARCHAR, I can specify the exact number of characters I want returned.

Now I need to copy my column formula and paste it into the column formula that I created for '# of Orders' in part two of this series.

I am going to replace the Prev_Month Repository Variable with my statement, which will look like this.

CASE WHEN DAY(CURRENT_DATE)<=2 or="" "sales="" -="" fact="" sales"."measures"."#="" of="" orders"="" is="" null="" then="" filter("sales="" using="" ("sales="" sales"."periods"."calendar="" year="" month"="CAST(YEAR(timestampadd(SQL_TSI_MONTH," -1,current_date))*100+month(timestampadd(sql_tsi_month,="" -1,current_date))="" as="" varchar(6))))="" else="" ("periods"."day="" date"="" between="" @{pv_start_dt}{date="" '2015-10-01'}="" and="" @{pv_end_dt}{date="" '2015-10-15'}))="" end<="" code="">

 Demystification of Variables  Pt. 3

If I run the report, my results return as expected.

 Demystification of Variables  Pt. 3

Part 2

Now I need to write a new SQL statement for my Start Date and End Date prompts. In order to do this, I am going to need to use two functions: TIMESTAMPS and CURRENT_DATE. First, lets take a look at the TIMESTAMP function.

I am going to use the TIMESTAMP function to filter the Day Date column for the first day of the month. To demonstrate, I am going to create a new analysis and use the TIMESTAMP function in a column formula. My column formula looks like the following:


This formula can be broken down into four parts:

1. TIMESTAMPADD - This is what defines that I am going to use addition to return a date by adding or one date to another.

2. SQL_TSI_DAY - The first argument in the function. It defines what interval of time the function will work with (in this case days)

3. -DAYOFMONTH(CURRENT_DATE)+1 - This the interval of time that is compared to the third argument. In this case I am taking the negative value of the day of the month, adding 1 and then adding it with current date which always returns 1 or the first day of the month.

4. CURRENT_DATE - The third argument in the function. This is what the second argument is compared against.

This is only scratching the surface of what you can do with the TIMESTAMP function. If you would like more information, check out the blog on TIMESTAMPS written by Brian Hall.

I am going to add an additional column to the Criteria and use the CURRENT_DATE function in a column formula.

 Demystification of Variables  Pt. 3

 Demystification of Variables  Pt. 3

Now I am going to click on Results to show the results of the TIMESTAMP function and the CURRRENT_DATE function.

 Demystification of Variables  Pt. 3

From the results you can see that I have both the first day of the month and the current date. Now I need to convert this into a filter for the Day Date column so that I can get the logical SQL query for my Start Date and End Date prompts.

 Demystification of Variables  Pt. 3

In the New Filter window, I need to change the operator to is between and click on Add More Options to add a SQL Expression.

 Demystification of Variables  Pt. 3

In the SQL Expression box, I need to put the TIMESTAMP function for current date from the previous example. In addition I need to add another SQL Expression for the CURRENT_DATE function.

 Demystification of Variables  Pt. 3

 Demystification of Variables  Pt. 3

When I return to my Criteria, I can see the filter I created in the Filter window.

 Demystification of Variables  Pt. 3

I can click on Results to run the report. The results for the Day Date column return as expected.

 Demystification of Variables  Pt. 3

Now I can click on the Advanced tab and copy the logical SQL statement to use for my Start Date and End Date prompts.

 Demystification of Variables  Pt. 3

Now I am going to paste the following into my Start Date Variable Prompt

Choice List Values > SQL Results

SELECT "Sales - Fact Sales"."Periods"."Day Date" 
ORDER BY "Periods"."Day Date"

Default Selection > SQL Results

FROM "Sales - Fact Sales"

 Demystification of Variables  Pt. 3

For the default selection, I am using a SQL statment that is selecting the first day of the month using the same TIMESTAMP function used in the above query from my subject area "Sales - Fact Sales".

Now I need to change the SQL query for both the Choice List Values and Default Selection for my End Date Variable Prompt.

I am going to use the same SQL query for the Choice List Values in my End Date prompt as I did in my Start Date prompt. I am going to change the default selection to the following:

FROM "Sales - Fact Sales"

 Demystification of Variables  Pt. 3

If I go to the Display window, I can view the results of my changes.

 Demystification of Variables  Pt. 3

 Demystification of Variables  Pt. 3

Notice that the results are exactly the same as the results in part two of this series.

I can save the dashboard prompt and go to my dashboard and test the prompt.

 Demystification of Variables  Pt. 3

 Demystification of Variables  Pt. 3

 Demystification of Variables  Pt. 3

In Conclusion

In part one of this series, we looked at using Bins, CASE statements to create custom grouping for values and switch between those groups and values using Presentation Variables.

In part two of this series, we looked at creating Repository Variables to make reports dynamic using those Repository Variables in Variable Prompts and passing them into column formulas using Presentation Variables.

In the third and final part of this series, we looked at making reports dynamic by using built in functions within Answers such as TIMESTAMPS and CURRENT_DATE.

My hope is that you can take these examples and apply them in your own OBIEE development. If you would like to know more about front end or RPD development, please check out the variety of training courses we offer at Rittman Mead. Until next time.

Categories: BI & Warehousing

Real World OBIEE: Demystification of Variables Pt. 2

Tue, 2017-03-21 09:00

In part one of this blog series, I went over using bins and presentation variables to dynamically create groups and switch between them in a report and on a dashboard. In part two, I am going to talk about making reports dynamic for periods of time using repository, system and presentation variables. Before I dive into an example, there are a couple of things I would like to cover first.


The sysdate function returns the current datetime set by the system where the database resides. Sysdate is a really useful function for creating repository variables for use with date dimensions. If I go into SQL Developer, I can write a query to return the current sysdate:

select sysdate from dual;


The current_date functions returns the current datetime set by the system where the bi server resides. This datetime may differ from sysdate depending on the geographical location of the database vs. the system that OBIEE resides on. I can write a query using sql developer to return the datetime using the current_date function:

select current_date from dual;

Since my database and OBIEE instance are on the same system, sysdate and current_date are the same.


When using sysdate or current_date to create repository variables for dates (which I am going to show in an upcoming example), you have to keep something in mind. While the date may match, the time may not. To show an example of this, I am going to join one of my date columns with sysdate.

select sysdate, dim_date_key from dual, 
where sysdate = dim_date_key;

If I run this query, I don't get an error but I get no results.

Why? To answer this, I need to write a query to inspect my date column.

select dim_date_key from gcbc_pef.dim_date;

As you can see by the results of my query, the DIM_DATE_KEY column does have the same format as sysdate but all the times are set to 00:00:00 (or midnight). To further demonstrate the difference between my date column and sysdate, I am going to write a new query and use the TRUNC (or TRUNCATE) function.

select sysdate, dim_date_key from dual, 
where trunc(sysdate) = dim_date_key;

As you can see, the query runs successfully but notice how sysdate and DIM_DATE_KEY still have different times. How is the join possible? Because I used the truncate function in the where clause in my query for sysdate. Without going into too much detail, using truncate on a date function without any formatting (which I will cover later) will set (or truncate) the datetime to the start (or midnight) of the current day. For example, if I run another query that just selects the truncated sysdate from dual, I get this result.

select trunc(sysdate) from dual;

Now, lets dive into an example.

Note: For all of the examples in this blog series I am using OBIEE

The Scenario

In this example, I have been asked to create a report that is going to reside on a products dashboard. It needs to have the same product grouping as the report I used part one of this series, needs to contain Gross Rev $, Net Rev $ and # of Orders and have a prompt that can select between the first and current day of the month and every day in-between. The person who requested the report wants the prompt to change dynamically with each month and does not want users to be able to select future dates.

There are two foreseeable challenges with this report. The first, and probably the most obvious, is how to make the date prompt for the current month and have it change dynamically with each month. The second is how to pass the dates into the report.

There is one more challenge that I will have to tackle. There is a gap in the data loads for # of Orders. Data does not update until the 2nd or 3rd of each new month. This wouldn't be a big deal except the person who requested the report wants a summary of the previous months # of Orders to be shown until the data is updated for the current month.

Fortunately, by using Repository, System and Presentation Variables, I can accomplish all of the requirements of this report.

The Example

For this example, I am going to start by creating Repository Variables to use with my date column in order to make the dates dynamic. There are other ways to make dates dynamic using functions within Answers but they are a little bit trickier to use and are less common. I am going to go over some of those functions in part three of this blog series.

Repository Variables are created using the Admin Tool. By launching the Admin Tool and opening my RPD in online mode (can also be created offline), I can go to Manage > Variables to start creating my first Repository Variable.

From the Variable Manager window, I can create a Repository Variable by selecting Action > New > Repository > Variable.

I am going to start by creating the Repository Variable for the current date. Since this variable will be dynamic, I need to make sure I select the option 'Dynamic' and I am going to give it the name USCurDate.

Now I need to create a new init block. I can do this by clicking New...

Once in the Repository Variable Initialization Block screen, I need to give the init block a name, set the schedule for when variable or variables will be refreshed then click Edit Data Source to define the connection pool the init block will use as well as the initialization string (query) the init block will use to populate the Repository Variable.

In the data source window, I am going to set my connection pool to one I have created just for my init blocks and then type in the following into the initialization string window:

select TRUNC(sysdate) from dual;

If I click Test, the query will execute and will return a result.

Notice how the result is the same as the query I ran using SQL Developer earlier.

Now I need to create a Repository Variable for the first day of every month. I am going to use the same method as before and name it USMoBeginDate. The query I am going to use is slightly different from the previous query. I still need to use the TRUNC function but I also need to apply formatting so that it truncates to the start of the month. I am going to enter the following into the initialization string window:

select TRUNC(sysdate, 'MM') from dual;

Some other useful queries I can use are:

First Day of the Current Year

select TRUNC(sysdate, 'YY') from dual;

Last Day of the Previous Year

select TRUNC(sysdate, 'YY') -1 from dual;

Previous Year Date

select TRUNC(ADD_MONTHS(sysdate, -12)) from dual;

Now I need to create a Repository Variable for the previous month to use with my # of Orders measure column. Upon inspection, I discover that the column I need to use is called Calendar Year Month and is a VARCHAR or character type. If I go into Answers and pull in the Calendar Year Month column, I can see the format is 'YYYYMM'

To create the Repository Variable, I am going to use the same method as with the current date and first day of the current month Repository Variables and issue a new query. Because the Calendar Year Month column is a VARCHAR, I need to use the to_char function to change sysdate from a date type to a character type, use some formatting syntax and use some basic arithmetic. The query is as follows:

select to_char(to_number(to_char(sysdate, 'YYYY')) * 100 + to_number(to_char(sysdate, 'MM') -1)) from dual;

To break down each part of this query, lets start with the year. In order to use the 'YYYY' format I must first cast sysdate to a character (to_char(sysdate, 'YYYY')). Then I need to cast that result back to and int so that I can multiply by 100. This will give me the result 201500.00. The reason for this is when I add the month number to my yearx100, there will always be a leading 0 for month numbers 1-9. To get the previous month number, I have to first cast sysdate to a character and use the formatting 'MM'. I then have to cast it back to an int and subtract 1 to get the previous month number (to_number(to_char(sysdate, 'MM') -1) then cast the entire statment back to a character type so that it matches the type for the Calendar Year Month column. When I run the query, I get this result.

Now that I have my three repository variables (USCurDate, USMoBeginDate and Prev_Month) I can start to create the report.

Im going to fast forward a little bit to the part of the report creation process where I will use my Repository Variables I created using the Admin Tool. Since I am using virtually the same report as part one of this blog series, please refer back for how to create custom groups using bins and presentation variables and custom value prompts.

Because of the delay in the data load for the # of Orders at the beginning of the month, I can not use a global report filter. Instead, I am going to have to use something called a Filter Expression within each measure column formula.

About Filter Expressions

Unlike global report filters, column formula level filter expressions are used when you need to specify a particular constraint within the column formula itself. Because the filter is at the column formula level, it is independent of any subsequent column filters.

Note: When using a column formula filter for a measure, you can not add a global filter of the same data subject on top of it. For example, if using a column level filter for a particular Year and Month, I can not add a global filter for a particular year. The two filters contradict each other and the result will be null.

To add a filter in the column formula, go to Edit formula, make sure the column syntax is highlighted and click Filter.

From here the Insert Filter window will pop up and I can select the attribute column to filter the measure by. Here, I want to use the column Day Date to filter Gross Rev $ by the day.

I can add a column by double clicking it in the the Subject Areas pane. When a column is added, I will be prompted with a New Filter window and from here, everything is exactly the same process as adding a global report filter.

Here I need to define the operator as is between since we are dealing with date ranges. I could call my Repository Variables for current_date and first day of the month here but, because the request is for a prompt to select between date ranges, I am going to have to call Presentation Variables and use the prompt to populate the actual values.

Note: If you are unsure about the functionality of Presentation Variables, see part one of this blog series

To add Presentation Variables to the filter expression, click Add More Options and select Presentation Variable from the dropdown.

When a Presentation Variable is added to the filter, two new text boxes appear. The Variable Expr box is where you define the variable to be used and the (default) box is used to add a default value. The default value is optional but, when defining a Presentation Variable within a filter, you have to specify a default value in order to get any results. The reason for this is because, when the report is run, the query issued will use the Presentation Variable placeholder that is defined unless a default value is specified. In other words, the default value will always be used unless the Presentation Variable is populated with a value or a list of values.

Because I want the users to be able to specify a date range, I need to define two Presentation Variables: one for the start date and one for the end date. I can add another place for a Presentation Variable by simply clicking Add More Options again and selecting Presentation Variable.

Now I need to add both my start and end date Presentation Variables in the Variable Expr boxes. I’m going to call my start date presentation variable pv_start_dt and my end date presentation variable pv_end_dt. I am also going to specify a default date range from the beginning of the current month (10/01/2015) to yesterday's date (10/15/2015).

If I click OK, I will be taken back to the Insert Filter screen where I can see the filter expression previously defined.

Clicking OK again will return me to Edit Column Formula which shows the column formula with the filter expression defined in the previous steps.

Now I have to do the exact same thing for the Net Rev $ column. Since the filter expression is identical, I can simply copy and paste the column formula for Gross Rev $ and replace the column name in the expression.

Now I need to take care of the # of Orders column. This column is tricky because of the gap between the 1st and the 2nd or 3rd of every month. I could use a filter expression that defaults to the previous month by using the previous month repository variable I created in a previous step, but this alone wouldn’t switch over when the data became available.

So how can we fulfill the requirement of the report if we don’t know the exact date in which the data will be available? This can be accomplished by using a CASE statement as shown previously in part one of this series. We can break the Case statement down into two parts or two conditions:

1. When the day for the current month is less than or equal to 2 OR if # of Orders is null, then filter # of Orders by Calendar Year Month using the value of the Prev_Month Repository Variable.

2. When condition one is not true, then filter # of Orders by Day Date between the values of the pv_start_date and the pv_end_date Presentation Variables

Putting both conditions together and using the correct syntax for Column Formula results in the following formula:

Note that I am using CURRENT_DATE in my column formula. In this case, I am extracting the day number from the current date by using the extract day function (DAY(CURRENT_DATE)). I am going to talk about this in further detail when I talk about using built in functions in Answers to make reports dynamic in part 3 of this series.

Now I need to create my dashboard prompt. I am going to start by clicking on New > Dashboard Prompt.

I need to create two prompts: One for the start date and one for the end date. Because I am using presentation variables as placeholders for the date between values, I have to use a Variable Prompt instead of a Column Prompt. Variable Prompts allow us to define a presentation variable and then define a list of values for the users to select from.

To create a Variable Prompt for Start Date, I can click on the new prompt icon and select Variable Prompt.

There a few things I need to do in order to make this prompt function for the report. First, I have to define the same presentation variable name (pv_start_dt) that I used in the filter expressions for the Gross Rev $, Net Rev $ and # of Orders columns.

Because this is not a column prompt, I have to manually specify the values I want the user to be able to select from. Rather than typing in each value, I can use the SQL Results option from the Choice List Values dropdown and use a SQL statement to select the exact values that I want.

This may seem daunting at first but there is a very straightforward way to accomplish this. Rather than manually writing out a SQL query, we can make use of the Advanced Tab within a new report.

I’m going to start by clicking New > Analysis and selecting the column that I want values for: Day Date.

I need to add a filter to Day Date so that it returns only the values I want to user to select from.

Now I need to select the operator to be is between and add two Repository Variables that I have set up: one for the first date of the current month and one for the current date of the current month.

If I go to results, I can see the data returned with the filter I have specified.

As you can see, the Day Date column only contains the values from the first of the month to the current date (October, 16th 2015 in this example)

Now for the good stuff. I can navigate to the Advanced Tab and copy the SQL statement used to generate these values and paste them into the SQL Results text box in my prompt.

You will notice that within the SQL Statement generated by OBI,
there are numbers and s_# between the SELECT and Day Date column, after the Day Date column and there is also an order by clause that uses a number “2”. Without going into too much detail, this what OBI uses to make the query more efficient when retrieving results from the database. In order to allow the values to populate the prompt, these have to be removed in OBIEE 12c and the “ORDER BY” clause has to be rewritten in order to make it work.


   0 s_0,
   "Sales - Fact Sales"."Periods"."Day Date" s_1
FROM "Sales - Fact Sales"
("Periods"."Day Date" BETWEEN VALUEOF("USMoBeginDate") AND  VALUEOF("USCurDate"))

Changed to this

   "Sales - Fact Sales"."Periods"."Day Date"
FROM "Sales - Fact Sales"
("Periods"."Day Date" BETWEEN  VALUEOF("USMoBeginDate") AND  VALUEOF("USCurDate"))
ORDER BY "Periods"."Day Date" ASC

This can be a bit confusing if you are not very familiar with SQL but just remember:

When populating a prompt using an SQL statement in OBIEE 12c, take out any number and anything that begins with “s” between the SELECT and first column and anything that begins with “s” after any subsequent columns and make sure the “ORDER BY” clause contains the actual column name of the column you want to order by.

Note: If you do not require any values to be in order, you can omit the “ORDER BY” clause all together.

If I expand Options in the Edit Prompt window, I can add a default selection or a default value that the prompt will start with. I can use the USMoBeginDate here as well so that the prompt always starts with the first date of every month as the start date.

Note: You will notice that under Options in the Edit Prompt window there is a Variable Data Type option with a dropdown selector. This can be used if the data type needs to be specified to something other than the default which is ‘text’ or character type. If you are getting an error when running the report that says “Selected value does not match datatype. Expected [this value] but got [this value]” you need to change the Variable Data Type to the datatype of the column you are prompting on. In this example, we are prompting a date datatype so therefore it needs to be set to date.

If I click OK, I can check the values in the display window by clicking the dropdown for the Start Date prompt I just created.

The blue checkmark indicates the value that is selected which, because the first date of every month was set by using the USMoBeginDate Repository Variable as the default value, defaults to the first date of the current month (October, 1st 2015) in this example.

Now I need to create another Variable Prompt for the End Date. The SQL statement used for Start Date can be reused for the values as we want the exact same values to be available for selection. I am going to specify the presentation variable to be named pvenddt, and the default value to be the USCurDate Repository Variable so that the End Date prompt always defaults to the current date.

Now all that’s left to do is put the prompt and report on the Dashboard. Here is the result.

So that concludes part 2 of Demystification of Variables. Please feel free to ask questions or leave me a comment! In part 3, I am going to talk about using built in front end functions and presentation variables to make reports dynamic for any series of time. Until next time.

Categories: BI & Warehousing

A Performance Analytics Application Case Study: Challenges and Successes

Mon, 2017-03-20 10:00

The Performance Analytics application is a collection of open source technologies that aids users in: identifying performance bottlenecks, identifying causes for slow report execution, discovering areas for performance optimization, and gathering meaningful insights into the health of an OBIEE environment.

This post focuses on lessons learned after a successful Performance Analytics application installation, where within one day of being operational it enabled us to identify and isolate a long-standing memory issue. Here's how.

OBIEE Performance Analytics Dashboards


Rittman Mead recently undertook an engagement with the remit to:

  • Carry out a health check on the current state of an OBIEE platform, architecture, and development process
  • Install the Rittman Mead Performance Analytics application, enabling rapid and accurate diagnostics of OBIEE issues

The client was on OBIEE 11g, having previously upgraded from 10g. OBIEE Production environment was a three-node cluster running the latest version of the 11g release. It served around 150 users daily, of a registered user base of around 1000.

The client had a long-standing issue with memory alerts on the master node of OBIEE cluster, but no evident architectural, hardware capacity, or software configuration issues were found after completing the health check.

Challenges and successes Gather all relevant data

Performance Analytics gathers data from a variety of sources in order to provide a full stack view of the OBIEE environment.

  • Active Session History (ASH) - The Active Session History data is read from the v$ACTIVE_SESSION_HISTORY system database view. Access to this data allows Performance Analytics users to have an understanding of the performance and state of the database at a given point it time as it provides information such as the SQL operation being performed, which application is executing the query, whether the query is actively being performed or is waiting for service, what state of execution the query is in, and many other useful statistics.

  • Operating System Metrics - Unix-based Operating Systems contain several commands used to gather information about the performance or status of the server such as vmstat, iostat, netstat, top and df. Performance Analytics utilizes the output of these commands to display the current status of the OS at a given point in time.

  • Usage Tracking - The Oracle BI Server supports the collection of usage tracking data. When usage tracking is enabled, the Oracle BI Server collects usage tracking data for each query, and it writes statistics to a usage tracking log file or inserts them directly into a database table. Access to this data allows Performance Analytics users to have an understanding of the performance of the BI Server and specific reports in the OBIEE environment at any given point in time.

  • OBIEE metrics - OBIEE has the capability to expose internal performance data through the Dynamic Monitoring Service (DMS). The data exposed contains information such as Connection Pool statistics, JVM statistics, the number of active threads, and much more. Access to this data allows Performance Analytics to record the current status of many of the OBIEE components and intricacies found within the tool.

Performance Analytics was deployed using Docker in a couple of days, requiring absolutely no software installation on any machine other than the monitoring server. All configuration settings are held in one file, and it was sufficient to add connection details of each server to it in order to gather all aforementioned data.

Accurately diagnose issues

By combining operating system metrics (CPU, memory, etc.) with internal OBIEE metrics and those from the database, Performance Analytics gives a "single pane of glass" view on the behaviour of the whole stack. This enables correlations in behaviour to be easily identified, and issues drilled into using the analysis capabilities of the tool.

Within a day of being installed, Performance Analytics enabled the client to accurately diagnose a long-standing issue with memory alerts on OBIEE master node. The cause was traced to the export to Excel of a large dataset by a single user.

Workload Planning

Performance Analytics allows to capture system status and workload over time, so you can see how the system is responding to peak loads in real-time.

With Performance Analytics the client is now able to estimate maximum workload the current architecture can support before starting to see issues and whether it is going to cope with the next years workload.


Performance Analytics just paid for itself.

Performance Analytics collects all relevant data and makes it accessible from one central location, allowing users to investigate performance inquiries promptly and simply. Instead of forcing users to dig through database records or a linux server manually, they can access all of the same data through a set of dashboards engineered to facilitate discovery from the collected data.

If you’d like to find out more about the Performance Analytics service offered by Rittman Mead, please get in touch.

Categories: BI & Warehousing

Property Graph in Oracle 12.2

Fri, 2017-03-10 11:00

The latest release of Oracle (12.2) includes support for Property Graph, previously available only as part of the Big Data Spatial and Graph tool. Unlike the latter, in which data is held in a NoSQL store (Oracle NoSQL, or Apache HBase), it is now possible to use the Oracle Database itself for holding graph definitions and analysing them.

Here we'll see this in action, using the same dataset as I've previously used - the "Panama Papers".

My starting point is the Oracle Developer Day VM, which at under 8GB is a tenth of the size of the beast that is the BigDataLite VM. BDL is great for exploring the vast Big Data ecosystem, both within and external to the Oracle world. However the Developer Day VM serves our needs perfectly here, having been recently updated for the 12.2 release of Oracle. You can also use DB 12.2 in Oracle Cloud, as well as the Docker image.

Prepare Database for Property Graph

The steps below are based on Zhe Wu's blog "Graph Database Says Hello from the Cloud (Part III)", modified slightly for the differing SIDs etc on Developer Day VM.

First, set the Oracle environment by running from a bash prompt

. oraenv

When prompted for SID enter orcl12c:

[oracle@vbgeneric ~]$ . oraenv
ORACLE_SID = [oracle] ? orcl12c  
ORACLE_BASE environment variable is not being set since this  
information is not available for the current user ID oracle.  
You can set ORACLE_BASE manually if it is required.  
Resetting ORACLE_BASE to its previous value or ORACLE_HOME  
The Oracle base has been set to /u01/app/oracle/product/12.2/db_1  
[oracle@vbgeneric ~]$

Now launch SQL*Plus:

sqlplus sys/oracle@localhost:1521/orcl12c as sysdba  

and from the SQL*Plus prompt create a tablespace in which the Property Graph data will be stored:

alter session set container=orcl;

create bigfile tablespace pgts  
datafile '?/dbs/pgts.dat' size 512M reuse autoextend on next 512M maxsize 10G  
segment space management auto;  

Now you need to do a bit of work to update the database to hold larger string sizes, following the following steps.

In SQL*Plus:

ALTER SYSTEM SET max_string_size=extended SCOPE=SPFILE;  
shutdown immediate;  
startup upgrade;  

Then from the bash shell:

cd $ORACLE_HOME/rdbms/admin  
mkdir /u01/utl32k_cdb_pdbs_output  
mkdir /u01/utlrp_cdb_pdbs_output  
$ORACLE_HOME/perl/bin/perl $ORACLE_HOME/rdbms/admin/catcon.pl -u SYS -d $ORACLE_HOME/rdbms/admin -l '/u01/utl32k_cdb_pdbs_output' -b utl32k_cdb_pdbs_output utl32k.sql

When prompted, enter SYS password (oracle)

After a short time you should get output:

catcon.pl: completed successfully  

Now back into SQL*Plus:

sqlplus sys/oracle@localhost:1521/orcl12c as sysdba  

and restart the database instances:

shutdown immediate;  

Run a second script from the bash shell:

$ORACLE_HOME/perl/bin/perl $ORACLE_HOME/rdbms/admin/catcon.pl -u SYS -d $ORACLE_HOME/rdbms/admin -l '/u01/utlrp_cdb_pdbs_output' -b utlrp_cdb_pdbs_output utlrp.sql

Again, enter SYS password (oracle) when prompted. This step then takes a while (c.15 minutes) to run, so be patient. Eventually it should finish and you'll see:

catcon.pl: completed successfully  

Now to validate that the change has worked. Fire up SQL*Plus:

sqlplus sys/oracle@localhost:1521/orcl12c as sysdba  

And check the value for max_string, which should be EXTENDED:

alter session set container=orcl;  
SQL> show parameters max_string;

NAME                                 TYPE        VALUE  
------------------------------------ ----------- ------------------------------
max_string_size                      string      EXTENDED  
Load Property Graph data from Oracle Flat File format

Now we can get going with our Property Graph. We're going to use Gremlin, a groovy-based interpretter, for interacting with PG. As of Oracle 12.2, it ships with the product itself. Launch it from bash:

cd $ORACLE_HOME/md/property_graph/dal/groovy  
sh gremlin-opg-rdbms.sh

Mar 08, 2017 8:52:22 AM java.util.prefs.FileSystemPreferences$1 run  
INFO: Created user preferences directory.  

First off, let's create the Property Graph object in Oracle itself. Under the covers, this will set up the necessary database objects that will store the data.

cfg = GraphConfigBuilder.\  
opg = OraclePropertyGraph.getInstance(cfg);  

You can also do this with the PL/SQL command exec opg_apis.create_pg('panama', 4, 8, 'PGTS');. Either way, the effect is the same; a set of tables created in the owner's schema:

SQL> select table_name from user_tables;  

Now let's load the data. I'm using the Oracle Flat File format here, having converted it from the original CSV format using R. For more details of why and how, see my article here.

From the Gremlin prompt, run:

// opg.clearRepository();     // start from scratch
opgdl.loadData(opg, vfile, efile, 1, 10000, true, null);  

This will take a few minutes. Once it's completed you'll get null response, but can verify the data has successfully loaded using the opg.Count* functions:

opg-oracledb> opgdl.loadData(opg, vfile, efile, 1, 10000, true, null);  
opg-oracledb> opg.countEdges()  
opg-oracledb> opg.countVertices()  

We can inspect the data in Oracle itself too. Here I'm using SQLcl, which is available by default on the Developer Day VM. Using the ...VT$ table we can query the number of distinct properties the nodes (verticies) in the graph:

SQL> select distinct k from panamaVT$;  
Entity incorporation.date  
Entity company.type  
Entity note  
Officer icij.id  
Entity status  
Source ID  
Country Codes  
Entity struck.off.date  
Entity address  
Entity jurisdiction  
Entity jurisdiction.description  
Entity dorm.date

17 rows selected.  

Inspect the edges:

[oracle@vbgeneric ~]$ sql scott/oracle@localhost:1521/orcl

SQL> select p.* from PANAMAGE$ p where rownum<5;

       EID       SVID       DVID EL               K       T V      VN VT     SL VTS  VTE  FE
---------- ---------- ---------- ---------------- ---- ---- ---- ---- ---- ---- ---- ---- ----
         6          6     205862 officer_of
        11         11     228601 officer_of
        30         36     216748 officer_of
        34         39     216487 officer_of


You can also natively execute some of the Property Graph algorithms from PL/SQL itself. Here is how to run the PageRank algorithm, which can be used to identify the most significant nodes in a graph, assigning them each a score (the "page rank" value):

set serveroutput on  
    wt_pr  varchar2(2000); -- name of the table to hold PR value of the current iteration
    wt_npr varchar2(2000); -- name of the table to hold PR value for the next iteration
    wt3    varchar2(2000);
    wt4    varchar2(2000);
    wt5    varchar2(2000);
    n_vertices number;
    wt_pr := 'panamaPR';
    opg_apis.pr_prep('panamaGE$', wt_pr, wt_npr, wt3, wt4, null);
    dbms_output.put_line('Working table names  ' || wt_pr
       || ', wt_npr ' || wt_npr || ', wt3 ' || wt3 || ', wt4 '|| wt4);
    opg_apis.pr('panamaGE$', 0.85, 10, 0.01, 4, wt_pr, wt_npr, wt3, wt4, 'SYSAUX', null, n_vertices)

When run this creates a new table with the PageRank score for each vertex in the graph, which can then be queried as any other table:

SQL> select * from panamaPR  
  2  order by PR desc
  3* fetch first 5 rows only;
      NODE         PR          C
---------- ---------- ----------
    236724 8851.73652          0
    288469 904.227685          0
    264051 667.422717          0
    285729 562.561604          0
    237076 499.739316          0

On its own, this is not so much use; but joined to the vertices table, we can now find out, within our graph, the top ranked vertices:

SQL> select pr.pr, v.k,v.V from panamaPR pr inner join PANAMAVT$ V on pr.NODE = v.vid where v.K = 'Name' order by PR desc fetch first 5 rows only;  
        PR K          V
---------- ---------- ---------------
8851.73652 Name       Portcullis TrustNet Chambers P.O. Box 3444 Road Town- Tortola British Virgin Isl  
904.227685 Name       Unitrust Corporate Services Ltd. John Humphries House- Room 304 4-10 Stockwell Stre  
667.422717 Name       Company Kit Limited Unit A- 6/F Shun On Comm Bldg. 112-114 Des Voeux Road C.- Hong  
562.561604 Name       Sealight Incorporations Limited Room 1201- Connaught Commercial Building 185 Wanc  
499.739316 Name       David Chong & Co. Office B1- 7/F. Loyong Court 212-220 Lockhart Road Wanchai Hong K


Since our vertices in this graph have properties, including "Type", we can also analyse it by that - the following shows the top ranked vertices that are Officers:

SQL> select V.vid, pr.pr from panamaPR pr inner join PANAMAVT$ V on pr.NODE = v.vid where v.K = 'Type' and v.V = 'Officer' order by PR desc fetch first 5 rows only;  
       VID         PR
---------- ----------
  12171184 1.99938104
  12030645 1.56722346
  12169701 1.55754873
  12143648 1.46977361
  12220783 1.39846834

which we can then put in a subquery to show the details for these nodes:

with OfficerPR as  
        (select V.vid, pr.pr
          from panamaPR pr
               inner join PANAMAVT$ V
               on pr.NODE = v.vid
         where v.K = 'Type' and v.V = 'Officer'
      order by PR desc
      fetch first 5 rows only)
select pr2.pr,v2.k,v2.v  
from OfficerPR pr2  
     inner join panamaVT$ v2
     on pr2.vid = v2.vid
where v2.k in ('Name','Countries');  
        PR K          V
---------- ---------- -----------------------
1.99938104 Countries  Guernsey  
1.99938104 Name       Cannon Asset Management Limited re G006  
1.56722346 Countries  Gibraltar  
1.55754873 Countries  Guernsey  
1.55754873 Name       Cannon Asset Management Limited re J006  
1.46977361 Countries  Portugal  
1.39846834 Countries  Cyprus  

10 rows selected.  

But here we get into the limitations of SQL - already this is starting to look like a bit of a complex query to maintain. This is where PGQL comes in, as it enables to express the above request much more eloquently. The key thing with PGQL is that it understands the concept of a 'node', which removes the need for the convoluted sub-select that I had to do above to first identify the top-ranked nodes that had a given property (Type = Officer), and then for those identified nodes show information about them (Name and Countries). The above SQL could be expressed in PGQL simply as:

SELECT n.pr, n.name, n.countries  
WHERE (n WITH Type =~ 'Officer')  
ORDER BY n.pr limit 5  

At the moment Property Graph in the Oracle DB doesn't support PGQL - but I'd expect to see it in the future.

Jupyter Notebooks

As well as working with the Property Graph in SQL and Gremlin, we can use the Python API. This is shipped with Oracle 12.2. I'd strongly recommend using it through a Notebook, and this provides an excellent environment in which to prototype code and explore the results. Here I'll use Jupyter, but Apache Zeppelin is also very good.

First let's install Anaconda Python, which includes Jupyter Notebooks:

wget https://repo.continuum.io/archive/Anaconda2-4.3.0-Linux-x86_64.sh  
bash Anaconda2-4.3.0-Linux-x86_64.sh  

In the install options I use the default path (/home/oracle) as the location, and keep the default (no)

Launch Jupyter, telling it to listen on any NIC (not just localhost). If you installed anaconda in a different path from the default you'll need to amend the /home/oracle/ bit of the path.

/home/oracle/anaconda2/bin/jupyter notebook --ip

If you ran the above command from the terminal window within the VM, you'll get Firefox pop up with the following:

If you're using the VM headless you'll now want to fire up your own web browser and go to http://<ip>:8888 use the token given in the startup log of Jupyter to login.

Either way, you should now have a functioning Jupyter notebook environment.

Now let's install the Property Graph support into the Python & Jupyter environment. First, make sure you've got the right Python set, by confirming with which it's the anaconda version you installed, and when you run python you see Anaconda in the version details:

[oracle@vbgeneric ~]$ export PATH=/home/oracle/anaconda2/bin:$PATH
[oracle@vbgeneric ~]$ which python
[oracle@vbgeneric ~]$ python -V
Python 2.7.13 :: Anaconda 4.3.0 (64-bit)  
[oracle@vbgeneric ~]$

Then run the following

cd $ORACLE_HOME/md/property_graph/pyopg  
touch README  
python ./setup.py install  

without the README being created, the install fails with IOError: [Errno 2] No such file or directory: './README'

You need to be connected to the internet for this as it downloads dependencies as needed. After a few screenfuls of warnings that appear OK to ignore, the installation should be succesful:

creating /u01/userhome/oracle/anaconda2/lib/python2.7/site-packages/JPype1-0.6.2-py2.7-linux-x86_64.egg  
Extracting JPype1-0.6.2-py2.7-linux-x86_64.egg to /u01/userhome/oracle/anaconda2/lib/python2.7/site-packages  
Adding JPype1 0.6.2 to easy-install.pth file

Installed /u01/userhome/oracle/anaconda2/lib/python2.7/site-packages/JPype1-0.6.2-py2.7-linux-x86_64.egg  
Finished processing dependencies for pyopg==1.0  

Now you can use the Python interface to property graph (pyopg) from within Jupyter, as seen below. I've put the notebook on gist.github.com meaning that you can download it from there and run it yourself in Jupyter.

Categories: BI & Warehousing

Accelerating Your ODI Implementation, Rittman Mead Style

Thu, 2017-03-09 10:01

Over the years, at Rittman Mead, we've built up quite a collection of tooling for ODI. We have accelerators, scripts, reports, templates and even entire frameworks at our disposal for when the right use case arises. Many of these tools exploit the power of the excellent ODI SDK to automate tasks that would otherwise be a chore to perform manually. Tasks like, topology creation, model automation, code migration and variable creation.

In this blog post, I'm going to give you a demo of our latest addition, a tool that allows you to programmatically create ODI mappings. ( and a few other tricks )

So you may be thinking isn't that already possible using the ODI SDK ? and you'd be right, it most definitely is. There are many examples out there that show you how it's done, but they all have one thing in common, they create a fairly simple mapping, with, relatively speaking, quite a lot of code and are only useful for creating the said mapping.

And herein lies the obvious question, Why would you create a mapping using the ODI SDK, when it's quicker to use ODI Studio ?

And the obvious answer is...you wouldn't, unless, you were trying to automate the creation of multiple mappings using metadata.

This is a reasonable approach using the raw ODI SDK, the typical use case being the automation of your source to stage mappings. These mappings tend to be simple 1 to 1 mappings, the low hanging fruit of automation if you like. The problem arises though, when you want to automate the creation of a variety of more complex mappings, you run the risk of spending more time writing the automation code, than you would actually save due to the automation itself. The point of diminishing return can creep up pretty quickly.

The principle, however, is sound. Automate as much as possible by leveraging metadata and free up your ODI Developers to tackle the more complex stuff.

All Aboard the Rittman Mead Metadata Train !

What would be really nice is something more succinct, more elegant, something that allows us to create any mapping, with minimal code and fuss.

Something that will allow us to further accelerate...

  • Migrating to ODI from other ETL products
  • Greenfield ODI Projects
  • Day to Day ODI Development work

..all powered by juicy metadata.

These were the design goals for our latest tool. To meet these goals, we created a mini-mapping-language on top of the ODI SDK. This mini-mapping-language abstracts away the SDK's complexities, while, at the same time, retaining its inherent power. We call this mini mapping language OdiDsl ( Oracle Data Integrator Domain Specific Language ) catchy heh?!


OdiDsl is written in Groovy and looks something like this...

 * OdiDsl to create a SCD2 dimension load mapping.


                [name: "HR.EMPLOYEES"],
                [name: "HR.DEPARTMENTS"],
                [name: "HR.JOBS"],
                [name: "PERF.D_EMPLOYEE", integration_type: "SCD"],
            .filter('NAME_FILTER', [filter_condition: "EMPLOYEES.FIRST_NAME LIKE 'D%'" ])
            .join('DEPT_JOBS', ['JOBS'], [join_condition: "EMPLOYEES.JOB_ID = JOBS.JOB_ID" ])
            .connect("D_EMPLOYEE", [
                                        [ attr: "employee_id", key_indicator: true ],
                                        [ attr: "eff_from_date", expression: "sysdate", execute_on_hint: "TARGET"],
                                        [ attr: "eff_to_date", expression: "sysdate", execute_on_hint: "TARGET"],
                                        [ attr: "current_flag", expression: 1, execute_on_hint: "TARGET"],
                                        [ attr: "surr_key", expression: ":RM_PROJECT.D_EMPLOYEE_SEQ_NEXTVAL", execute_on_hint: "TARGET"],

The above code will create the following, fully functional, mapping in ODI 12c (sorry 11g).

It should be fairly easy to eyeball the code and reconcile it with the above mapping image. We can see that we are specifying our datastores, selecting the EMPLOYEES datastore, adding a filter, a couple of joins and then connecting to our target. OdiDsl has been designed in such a way that it mimics the flow based style of ODI 12c's mappings by chaining components onto one another.

Creating a Mapping Using OdiDsl

Let's walk through the above code, starting with just the datastores, adding the rest as we go along...


We start by creating the mapping with mapping.create( <project>, <folder>, <mapping name>). We then chain the .datastores(), .commit() and .validate() methods onto it using the "dot" notation. The .datastores() method is the only method you can chain directly onto mapping.create() as it's a requirement to add some datastores before you start building up the mapping. The .commit() method persists the mapping in the repository and the .validate() method calls ODI's validation routine on the mapping to check if all is ok.

 * OdiDsl to create a mapping with 4 datastores.


                [name: "HR.EMPLOYEES"],
                [name: "HR.DEPARTMENTS"],
                [name: "HR.JOBS"],
                [name: "PERF.D_EMPLOYEE", integration_type: "SCD"],

When we execute this code it returns the following to the console. You can see that the mapping has been dropped/created and that ODI has some validation warnings for us.

Connecting to the repository...

mapping EMPLOYEE_DIM_LOAD dropped  
mapping EMPLOYEE_DIM_LOAD created

  Validation Results
  WARNING: Mapping component EMPLOYEES has no input or output connections.
  WARNING: Mapping component DEPARTMENTS has no input or output connections.
  WARNING: Mapping component JOBS has no input or output connections.
  WARNING: Mapping component D_EMPLOYEE has no input or output connections.

And here is the mapping in ODI - well, it's a start at least...

Starting the Flow with a Filter

Before we can start building up the rest of the mapping we need to select a starting datastore to chain off, you've got to start somewhere right? For that, we call .select("EMPLOYEES"), which is a bit like clicking and selecting the component in ODI Studio. The .filter() method is then chained onto it, passing in the filter name and some configuration, in this case, the actual filter condition.

 * OdiDsl to create a mapping with 4 datastores and a filter.


                [name: "HR.EMPLOYEES"],
                [name: "HR.DEPARTMENTS"],
                [name: "HR.JOBS"],
                [name: "PERF.D_EMPLOYEE", integration_type: "SCD"],
            .filter('NAME_FILTER', [filter_condition: "EMPLOYEES.FIRST_NAME LIKE 'D%'" ]) 

We now have an error in the validation results. This is expected as our filter doesn't connect to anything downstream yet.

Connecting to the repository...

mapping EMPLOYEE_DIM_LOAD dropped  
mapping EMPLOYEE_DIM_LOAD created

  Validation Results
  WARNING: Mapping component DEPARTMENTS has no input or output connections.
  WARNING: Mapping component JOBS has no input or output connections.
  WARNING: Mapping component D_EMPLOYEE has no input or output connections.
  ERROR: Mapping component NAME_FILTER must have a connection for output connector point OUTPUT1.

And here's the mapping, as you can see the filter is connected to the EMPLOYEES datastore output connector.

Adding a Join

Next we'll create the join between the filter and the DEPARTMENTS table. To do this we can just chain a .join() onto the .filter() method and pass in some arguments to specify the join name, what it joins to and the join condition itself.

 * OdiDsl to create a mapping with 4 datastores a filter and a join.


                [name: "HR.EMPLOYEES"],
                [name: "HR.DEPARTMENTS"],
                [name: "HR.JOBS"],
                [name: "PERF.D_EMPLOYEE", integration_type: "SCD"],
            .filter('NAME_FILTER', [filter_condition: "EMPLOYEES.FIRST_NAME LIKE 'D%'" ])

Only 2 validation warnings and no errors this time...

Connecting to the repository...

mapping EMPLOYEE_DIM_LOAD dropped  
mapping EMPLOYEE_DIM_LOAD created

  Validation Results
  WARNING: Mapping component JOBS has no input or output connections.
  WARNING: Mapping component D_EMPLOYEE has no input or output connections.

We now have a join named EMP_DEPT joining DEPARTMENTS and the filter, NAME_FILTER, together.

Adding another Join

We'll now do the same for the final join.

 * OdiDsl to create a mapping with 4 datastores, a filter and 2 joins.


                [name: "HR.EMPLOYEES"],
                [name: "HR.DEPARTMENTS"],
                [name: "HR.JOBS"],
                [name: "PERF.D_EMPLOYEE", integration_type: "SCD"],
            .filter('NAME_FILTER', [filter_condition: "EMPLOYEES.FIRST_NAME LIKE 'D%'" ])
            .join('DEPT_JOBS', ['JOBS'], [join_condition: "EMPLOYEES.JOB_ID = JOBS.JOB_ID" ])      

looking better all the time...

Connecting to the repository...

mapping EMPLOYEE_DIM_LOAD dropped  
mapping EMPLOYEE_DIM_LOAD created

  Validation Results
  WARNING: Mapping component D_EMPLOYEE has no input or output connections.

And we now have a join named DEPT_JOBS joining JOBS and the join, EMP_DEPT, to each other.

Connecting to the target

The final step is to connect the DEPT_JOBS join to our target datastore, D_EMPLOYEE. For this we can use the .connect() method. This method is used to map upstream attributes to a datastore. When you perform this action in ODI Studio, you'll be prompted with the attribute matching dialog, with options to auto-map the attributes.

OdiDsl will, by default, auto-map all attributes that are not explicitly mapped in the .connect() method. In our completed code example below we are explicitly mapping several attributes to support SCD2 functionality, auto-map takes care of the rest.

 * OdiDsl to create a SCD2 dimension load mapping


                [name: "HR.EMPLOYEES"],
                [name: "HR.DEPARTMENTS"],
                [name: "HR.JOBS"],
                [name: "PERF.D_EMPLOYEE", integration_type: "SCD"],
            .filter('NAME_FILTER', [filter_condition: "EMPLOYEES.FIRST_NAME LIKE 'D%'" ])
            .join('DEPT_JOBS', ['JOBS'], [join_condition: "EMPLOYEES.JOB_ID = JOBS.JOB_ID" ])
            .connect("D_EMPLOYEE", [
                                        [ attr: "employee_id", key_indicator: true ],
                                        [ attr: "eff_from_date", expression: "sysdate", execute_on_hint: "TARGET"],
                                        [ attr: "eff_to_date", expression: "sysdate", execute_on_hint: "TARGET"],
                                        [ attr: "current_flag", expression: 1, execute_on_hint: "TARGET"],
                                        [ attr: "surr_key", expression: ":RM_PROJECT.D_EMPLOYEE_SEQ_NEXTVAL", execute_on_hint: "TARGET"],

Nice, all validated this time.

Connecting to the repository...

mapping EMPLOYEE_DIM_LOAD dropped  
mapping EMPLOYEE_DIM_LOAD created  
Validation Successful  

What about Updates ?

Yes. We can also update an existing mapping using mapping.update(<project>, <folder>, <mapping name>). This is useful when you need to make changes to multiple mappings or when you can't drop and recreate a mapping due to a dependency. The approach is the same, we start by selecting a component with .select() and then chain a method onto it, in this case, .config().

mapping.update('MYPROJECT', 'DEMO', "EMPLOYEE_DIM_LOAD")  
            .config([join_type: "LEFT_OUTER"])

Which Properties Can I Change for each Component ?

Probably more than you'll ever need to, OdiDsl mirrors the properties that are available in ODI Studio via the SDK.

Can We Generate OdiDsl Code From an Existing Mapping ?

Yes, we can do that too, with .reverse(). This will allow you to mirror the process.

Let's take this, hand built, fictional and completely CRAZY_MAPPING as an example. (fictional and crazy in the sense that it does nothing useful, however, the flow and configuration are completely valid).

If we execute .reverse() on this mapping by calling...

mapping.reverse('MY_PROJECT', 'DEMO_FOLDER', 'CRAZY_MAPPING')  

...OdiDsl will return the following output to the console. What you are seeing here is the OdiDsl required to recreate the crazy mapping above.

Connecting to the repository...

mapping.create('MY_PROJECT', 'DEMO_FOLDER', 'CRAZY_MAPPING')  
         ['name':'STAGING.TABLE1', 'alias':'TABLE1'],
         ['name':'STAGING.TABLE9', 'alias':'TABLE9'],
         ['name':'STAGING.TABLE3', 'alias':'TABLE3'],
         ['name':'STAGING.TABLE4', 'alias':'TABLE4'],
         ['name':'STAGING.TABLE6', 'alias':'TABLE6'],
         ['name':'STAGING.TABLE5', 'alias':'TABLE5'],
         ['name':'STAGING.TABLE7', 'alias':'TABLE7'],
         ['name':'STAGING.TABLE2', 'alias':'TABLE2'],
         ['name':'STAGING.TABLE8', 'alias':'TABLE8'],
         ['name':'STAGING.TABLE11', 'alias':'TABLE11'],
         ['name':'STAGING.TABLE12', 'alias':'TABLE12'],
         ['name':'STAGING.TABLE13', 'alias':'TABLE13'],
         ['name':'STAGING.TABLE15', 'alias':'TABLE15'],
         ['name':'STAGING.TABLE14', 'alias':'TABLE14'],
         ['name':'STAGING.TABLE16', 'alias':'TABLE16'],
         ['name':'STAGING.TABLE17', 'alias':'TABLE17'],
         ['name':'STAGING.TABLE42', 'alias':'TABLE42'],
        .join('JOIN2', ['TABLE7'], [join_condition: "TABLE5.ID = TABLE7.ID" ])
        .join('JOIN3', ['TABLE6'], [join_condition: "TABLE6.ID = TABLE7.ID" ])
        .connect('TABLE14', [
                [ attr: "ID", expression: "TABLE5.ID" ],
                [ attr: "COL1", expression: "TABLE7.COL1" ],
                [ attr: "COL2", expression: "TABLE6.COL2" ],
                [ attr: "COL3", expression: "TABLE7.COL3" ],
                [ attr: "COL4", expression: "TABLE7.COL4" ],
        .expr('EXPRESSION1', [attrs: [
                [ attr: "ID", expression: "TABLE6.ID * 42", datatype: "NUMERIC", size: "38", scale: "0"]]])
        .connect('TABLE15', [
                [ attr: "ID", expression: "EXPRESSION1.ID" ],
                [ attr: "COL1", expression: "", active_indicator: false ],
                [ attr: "COL2", expression: "TABLE6.COL2" ],
                [ attr: "COL3", expression: "TABLE7.COL3" ],
                [ attr: "COL4", expression: "", active_indicator: false ],
        .join('JOIN', ['TABLE14'], [join_condition: "TABLE14.ID = TABLE15.ID" ])
        .filter('FILTER2', [filter_condition: "TABLE15.COL3 != 'FOOBAR'" ])
        .connect('TABLE16', [
                [ attr: "ID", expression: "TABLE15.ID" ],
                [ attr: "COL1", expression: "TABLE15.COL1" ],
                [ attr: "COL2", expression: "TABLE14.COL2" ],
                [ attr: "COL3", expression: "TABLE14.COL3" ],
                [ attr: "COL4", expression: "TABLE14.COL4" ],
        .connect('TABLE17', [
                [ attr: "ID", expression: "TABLE15.ID" ],
                [ attr: "COL1", expression: "TABLE15.COL1" ],
                [ attr: "COL2", expression: "TABLE14.COL2" ],
                [ attr: "COL3", expression: "TABLE14.COL3" ],
                [ attr: "COL4", expression: "TABLE14.COL4" ],
        .sort('SORT1', [sorter_condition: "TABLE5.ID, TABLE5.COL2, TABLE5.COL4" ])
        .connect('TABLE13', [
                [ attr: "ID", expression: "TABLE5.ID" ],
                [ attr: "COL1", expression: "TABLE5.COL1" ],
                [ attr: "COL2", expression: "TABLE5.COL2" ],
                [ attr: "COL3", expression: "TABLE5.COL3" ],
                [ attr: "COL4", expression: "TABLE5.COL4" ],
        .filter('FILTER1', [filter_condition: "TABLE3.ID != 42" ])
        .filter('FILTER', [filter_condition: "TABLE4.COL1 = 42" ])
        .lookup('LOOKUP1', 'FILTER1', [join_condition: "TABLE4.ID = TABLE3.ID AND TABLE3.COL1 = TABLE4.COL1"])
        .join('JOIN5', ['TABLE13'], [join_condition: "TABLE13.ID = TABLE3.ID" ])
        .distinct('DISTINCT_', [attrs: [
                [ attr: "COL3_1", expression: "TABLE4.COL3", datatype: "VARCHAR", size: "30"],
                [ attr: "COL4_1", expression: "TABLE4.COL4", datatype: "VARCHAR", size: "30"]]])
        .join('JOIN4', ['EXPRESSION1'], [join_condition: "TABLE5.ID = TABLE6.COL1" ])
        .sort('SORT', [sorter_condition: "EXPRESSION1.ID" ])
        .connect('TABLE8', [
                [ attr: "ID", expression: "EXPRESSION1.ID" ],
                [ attr: "COL1", expression: "", active_indicator: false ],
                [ attr: "COL2", expression: "", active_indicator: false ],
                [ attr: "COL3", expression: "TABLE7.COL3" ],
                [ attr: "COL4", expression: "", active_indicator: false ],
        .connect('TABLE12', [
                [ attr: "ID", expression: "TABLE8.ID" ],
                [ attr: "COL1", expression: "TABLE8.COL1" ],
                [ attr: "COL2", expression: "TABLE8.COL2" ],
                [ attr: "COL3", expression: "TABLE8.COL3" ],
                [ attr: "COL4", expression: "TABLE8.COL4" ],
        .expr('EXPRESSION', [attrs: [
                [ attr: "ID", expression: "TABLE9.ID *42", datatype: "NUMERIC", size: "38", scale: "0"],
                [ attr: "COL4", expression: "TABLE9.COL4 || 'FOOBAR'", datatype: "VARCHAR", size: "30"]]])
        .connect('TABLE1', [
                [ attr: "ID", expression: "EXPRESSION.ID" ],
                [ attr: "COL1", expression: "", active_indicator: false ],
                [ attr: "COL2", expression: "", active_indicator: false ],
                [ attr: "COL3", expression: "", active_indicator: false ],
                [ attr: "COL4", expression: "TABLE9.COL4" ],
        .join('JOIN1', ['TABLE2'], [join_condition: "TABLE1.ID = TABLE2.ID" ])
        .aggregate('AGGREGATE', [attrs: [
                [ attr: "ID", expression: "TABLE1.ID", datatype: "NUMERIC", size: "38", scale: "0", group_by: "YES"],
                [ attr: "COL4_1", expression: "MAX(TABLE2.COL4)", datatype: "VARCHAR", size: "30", group_by: "AUTO"]]])
        .lookup('LOOKUP', 'DISTINCT_', [join_condition: "AGGREGATE.ID = DISTINCT_.COL3_1"])
        .aggregate('AGGREGATE1', [attrs: [
                [ attr: "ID", expression: "AGGREGATE.ID", datatype: "NUMERIC", size: "38", scale: "0", group_by: "YES"],
                [ attr: "COL4_1_1", expression: "SUM(AGGREGATE.COL4_1)", datatype: "VARCHAR", size: "30", group_by: "AUTO"]]])
        .filter('FILTER3', [filter_condition: "AGGREGATE1.COL4_1_1 > 42" ])
        .connect('TABLE42', [
                [ attr: "ID", expression: "AGGREGATE1.ID" ],
        .join('JOIN6', ['TABLE8'], [join_condition: "AGGREGATE1.ID = TABLE8.ID" ])
        .connect('TABLE11', [
                [ attr: "ID", expression: "TABLE8.ID" ],
                [ attr: "COL1", expression: "" ],
                [ attr: "COL2", expression: "" ],
                [ attr: "COL3", expression: "TABLE8.COL3" ],
                [ attr: "COL4", expression: "TABLE8.COL4" ],

When we execute this OdiDsl code we get, you guessed it, exactly the same crazy mapping with the flow and component properties all intact.

Being able to flip between ODI studio and OdiDsl has some really nice benefits for those who like the workflow. You can start prototyping a mapping in ODI Studio, convert it to code, hack around for a bit and then reflect it all back into ODI. It's also very useful for generating a "code template" from an existing mapping. The generated code template can be modified to accept variables instead of hard coded properties, all you need then is some metadata.

Did Somebody Say Metadata ?

Metadata is the key to bulk automation. You can find metadata in all kinds of places. If you are migrating to ODI from another product then there will be a whole mass of metadata living in your current product's repository or via some kind of export routine which typically produces XML files. If you are starting a fresh ODI implementation, then there will be metadata living in your source and target systems, in data dictionaries, in excel sheets, in mapping specifications documents, all over the place really. This is the kind of metadata that can be used to feed OdiDsl.

A Quick Example of One possible Approach to Using OdiDsl With Metadata

First we build a mapping in Odi Studio, this will act as our template mapping.

We then generate the equivalent OdiDsl code using mapping.reverse('MY_PROJECT', 'DEMO_FOLDER', 'FEED_ME_METADATA'). Which gives us this code.

         ['name':'STAGING.TABLE1', 'alias':'LOOKUP_TABLE'],
         ['name':'STAGING.TABLE2', 'alias':'TABLE2'],
         ['name':'STAGING.TABLE3', 'alias':'TABLE3'],
         ['name':'STAGING.TABLE4', 'alias':'TARGET_TABLE'],
        .lookup('LOOKUP', 'LOOKUP_TABLE', [join_condition: "TABLE2.ID = LOOKUP_TABLE.ID"])
        .join('JOIN', ['TABLE3'], [join_condition: "TABLE2.ID = TABLE3.ID" ])
        .filter('FILTER', [filter_condition: "TABLE3.COL1 = 'FILTER'" ])
        .expr('EXPRESSION', [attrs: [
                [ attr: "CONSTANT", expression: "42", datatype: "VARCHAR", size: "30"]]])
        .connect('TARGET_TABLE', [
                [ attr: "ID", expression: "LOOKUP_TABLE.ID" ],
                [ attr: "COL1", expression: "LOOKUP_TABLE.COL1 || EXPRESSION.CONSTANT" ],
                [ attr: "COL2", expression: "TABLE2.COL2" ],
                [ attr: "COL3", expression: "TABLE3.COL3" ],
                [ attr: "COL4", expression: "LOOKUP_TABLE.COL4" ],

We now need to decorate this code with some variables, these variables will act as place holders for our metadata. The metadata we are going to use is from a database table, I'm keeping it simple for the purpose of this demonstration but the approach is the same. Our metadata table has 10 rows and from these 10 rows we are going to create 10 mappings, replacing certain properties with the values from the columns.

Remember that OdiDsl is expressed in Groovy. That means, as well as OdiDsl code, we also have access to the entire Groovy language. In the following code we are using a mixture of Groovy and OdiDsl. We are connecting to a database, grabbing our metadata and then looping over mapping.create(), once for each row in our metadata table. The columns in the metadata table are represented as the variables row.datastore, row.constant_expr and row.filter_cond. The code comments indicate where we are substituting these variables in place of our previously hard coded property values.

import groovy.sql.Sql

// Connect to the database and retrieve rows from the METADATA table.
def sqlConn = Sql.newInstance("jdbc:oracle:thin:@hostname:1521/pdborcl", "username", "password", "oracle.jdbc.pool.OracleDataSource")  
def rows = sqlConn.rows("SELECT * FROM METADATA")  

// For each row in our METADATA table
rows.eachWithIndex() {  row, index ->

    mapping.create('MY_PROJECT', 'DEMO_FOLDER', "FEED_ME_METADATA_${index+1}") // Interpolate row number to make the mapping name unique
                    ['name': 'STAGING.TABLE1', 'alias': 'LOOKUP_TABLE'],
                    ['name': "STAGING.${row.datastore}" ], // substitute in a different datastore
                    ['name': 'STAGING.TABLE3', 'alias': 'TABLE3'],
                    ['name': 'STAGING.TABLE4', 'alias': 'TARGET_TABLE'],
                .lookup('LOOKUP', 'LOOKUP_TABLE', [join_condition: "${row.datastore}.ID = LOOKUP_TABLE.ID"]) // substitute in a different datastore
                .join('JOIN', ['TABLE3'], [join_condition: "${row.datastore}.ID = TABLE3.ID"]) // substitute in a different datastore
                .filter('FILTER', [filter_condition: "TABLE3.COL1 = '${row.filter_cond}'"]) // substitute in a different filter condition
                .expr('EXPRESSION', [attrs: [
                    [attr: "CONSTANT", expression: row.constant_expr, datatype: "VARCHAR", size: "30"]]]) // substitute in a different constant for the expression
                .connect('TARGET_TABLE', [
                    [attr: "ID", expression: "LOOKUP_TABLE.ID"],
                    [attr: "COL1", expression: "LOOKUP_TABLE.COL1 || EXPRESSION.CONSTANT"],
                    [attr: "COL2", expression: "${row.datastore}.COL2"], // substitute in a different datastore
                    [attr: "COL3", expression: "TABLE3.COL3"],
                    [attr: "COL4", expression: "LOOKUP_TABLE.COL4"],


Here is the output...

Connecting to the repository...

mapping FEED_ME_METADATA_1 created  
Validation Successful  
mapping FEED_ME_METADATA_2 created  
Validation Successful  
mapping FEED_ME_METADATA_3 created  
Validation Successful  
mapping FEED_ME_METADATA_4 created  
Validation Successful  
mapping FEED_ME_METADATA_5 created  
Validation Successful  
mapping FEED_ME_METADATA_6 created  
Validation Successful  
mapping FEED_ME_METADATA_7 created  
Validation Successful  
mapping FEED_ME_METADATA_8 created  
Validation Successful  
mapping FEED_ME_METADATA_9 created  
Validation Successful  
mapping FEED_ME_METADATA_10 created  
Validation Successful  

And here are our 10 mappings, each with it's own configuration.

If we take a look at the FEED_ME_METADATA_5 mapping, we can see the metadata has been reflected into the mapping.

And that's about it. We've basically just built a mini accelerator using OdiDsl and we hardly had to write any code. The OdiDsl code was generated for us using .reverse(). All we really had to code, was the connection to the database, a loop and bit of variable substitution!


With the Rittman Mead ODI Tool kit, accelerating your ODI implementation has never be easier. If you are thinking about migrating to ODI from another product or embarking on a new ODI Project, Rittman Mead can help. For more information please get in touch.

Categories: BI & Warehousing

Create Your Own DVD Plugin in 22 minutes

Wed, 2017-03-08 06:45

Oracle DVD played well for the task of elections data analysis. But I faced some limitations and had to curtail my use of one of my favourite charts - Scatter chart. While DVD’s implementation of it looks good, has smooth animations and really useful, it is not suitable for visualisations of huge datasets. Election dataset has about 100K election precinct data rows and if I try to plot all of them at one chart I will get something close to this. It doesn't really important what is the data, just compare how charts look like.

Picture with Turnout%, United Russia% by PEC

This picture can give us about zero insights. Actually, I'm not even sure if shows my 100K points. What I want to get is something like this:
plugin with Turnout%, United Russia% by PEC (source: https://habrahabr.ru/post/313372/)

What can I see at this chart and can't see at the vanilla one? First of all, it gives a much better understanding of points distribution other the plot. I can see a relatively dense core around coordinates [30;40] and then not so dense tail growing by both dimensions and again a small and very dense core in the top right corner.
Secondly, it not only shows dense and sparse areas but shows that there are exist a few points which have an unexplainably high concentration. Around [64;62], for example.

Luckily DVD allows me (and anyone else too) create custom plugins. That's what I'm going to show here. The plugin I'm going to do is something I'd call Minimum Viable Product. It's very, very simple and needs a lot of work to make it good and really useful.

The code I'm going to show is the simplest I can invent. It doesn't handle errors, exceptions and so on. It should be more modular. It possibly could use some optimisation. But that was done for a purpose. And the purpose is called 'simplicity'. If you want a more comprehensive sample, there is a link to the Oracle's guide in the Summary. I want to show you that writing a DVD plugin is not a rocket science. It takes less than half an hour to build your own DVD plugin.

And the clock starts now!

[0:00-5:40] Setup Environment and DVD SDK Setup

Download and install DVD. I presume you already have it, because from my experience people who doesn't have it very rarely feel a need for its plugins. Hence I didn't include download and installation times into 22 minutes.

Create a folder to store your plugins and some Gradle-related configs and scripts. It may be any directory you like, just select your favourite place.

Next step is to define some environment variables.

  • The first one is DVDDESKTOP_SDK_HOME and it should point to thу DVD folder. In most of the cases, it is C:\Program files\Oracle Data Visualisation Desktop.
  • The second variable is PLUGIN_DEV_DIR it points to the folder you created 40 seconds ago. These two variables seem not to be absolutely necessary, but they make your life a bit easier.
  • And one more change to the environment before going further. Add %DVDESKTOP_SDK_HOME%\tools\bin to the PATH variable (do you feel how your life became easier with DVDDESKTOP_SDK_HOME already defined so you can start using it right now?).

And the last step is to initialise our development environment. Go to the plugins directory and execute the initialization command. Open cmd window and execute:


This will not only initialise everything your need but create a sample plugin to explore and research.

Let's take a look at what we got. In the folder we have created before (%PLUGIN_DEV_DIR%) this command generated the following structure.

a screenshot of %PLUGIN_DEV_DIR%

All names here are more than self-explanatory. You won't need to do anything with .gradle and gradle, and src stores all the code we will develop. Please notice that src already has sample plugin sampleviz. You may use it as a referrence for your own work.

Create First Plugin

Every plugin consists of a few parts. We don't need to create them from scratch every time. Oracle gives us a simple way of creating a new plugin. If you didn't close cmd window from previous steps execute the following:

bicreateplugin viz -id com.rittmanmead.demo -subType dataviz  

Obviously, you should use your own company name for the id. The last word (demo) plus word Plugin is what user will see as a visualisation name. Obviously we can change it but not today.

This will create a fully operational plugin. Where can we find it? You shouldn't be too surprised if I say "In src folder".

a screenshot of %PLUGIN_DEV_DIR%

And we may test it right now. Yes, its functionality is very limited, but you didn't expect it to do something useful, right?

Start SDK

New we should start Oracle DVD in SDK mode. The main difference from the normal mode is that DVD will start in the default browser. That's not a typo. DVD will start as a usual site in a browser. It's a good idea to select as the default a browser with good developer console and tools. I prefer Chrome for this task but that's not the only choice. Choose the browser you'd normally use for site development. When you finish with setting up of the default browser do:

.\gradlew run

If you don't have an ultrafast computer I'd recommend you to make some tea at this point. Because the very first start will take a long time. Actually, it will consume most of the first stage time.

[5:40-6:15] Test the Plugin

We didn't start changing the first plugin yet, so I expect everything to be OK at this point. But anyway it's a good idea to test everything. I think we can afford 30-40 seconds for this.

And again I assume you know how to use DVD and don't need a step-by-step guide on creating a simple visualisation. Just create a new project, select any data source, go to Visualisations, find your plugin and add it. You may even add some data to the plugin and it will show how many rows your data has. I think that's a good place to start from.

a picture of plugin

[6:15-22:00] Add Functions Change parameters

If you didn't skip testing phase and played with this new toy for a while you may have noticed two things we need to change right now. First, for a chosen type of visualisation, I need that my plugin can accept two measures. You possibly noticed that I was unable to add Number of cancelled ballots as a second measure. By default a plugin accepts not more than one (zero or one). Luckily it can be changed very easily.

We can find all parameter about measures, rows, columns etc inside of extensions folder. In this case it is %PLUGIN_DEV_DIR%\customviz\com-rittmanemad-demo\extensions. Here we can find two folders. The first one is oracle.bi.tech.pluginVisualizationDataHandlerand it has only one file com.rittmanemad.demo.visualizationDataHandler.JSON. This file allows us to define which types of input data our plugin has (measures, columns, rows, colour etc.), what are their types (measures/dimensions), which is the default one and so on.

a picture of json and plugin side-by side Here we should change maxCount to 2. This will allow us to add more than one measure to our visualisation.

The second thing I want to change is the number of data points. Elections dataset has data about 100K precinct commissions. And DVD's default is 10K. We can change this value in the second configuration file from extensions folder. Its name is oracle.bi.tech.Visualization. There is again only one JSON file com.rittmanmead.demo.json. We need to change rows to 200000.

Why 200000 if I need only 100000 points? Well, it's about how data will be passed to our code. Every measure is a separate row, so I need 100K points with 2 measures each and that gives me 200000 rows. It looks like right now that's the only way to have more than one measure in a plugin (at least it's my best knowledge for DVD

a picture of json and plugin side-by side

Note. I could change here some things like plugin name, input data types shown to user and so on but my aim is simplicity.

Now we should restart DVD SDK in order to use new parameters.

two measures and ~200K rows

Write code

For my custom scatter chart I'm going to use d3js JavaScript library. It will allow me to concentrate more on logic and less on HTML manipulation. To add it to my plugin I should add a couple of strings in the beginning of my code.


                 messages) {

After (I added two lines starting with d3):

                 messages) {

That's all. Now I can use d3js magic. And that's cool.

OK, it's time to make our plugin do something more useful than a simple counting of rows. All plugin code is in demo.js file and the main procedure is render. It is called every time DVD needs to update the visualisation. The initial version of this function is really small. Without comments there are only four rows. It retrieves data, counts rows, then finds a container to write and writes a message.

Demo.prototype.render = function(oTransientRenderingContext) {  
      // Note: all events will be received after initialize and start complete.  We may get other events
      // such as 'resize' before the render, i.e. this might not be the first event.

      // Retrieve the data object for this visualization
  var oDataLayout = oTransientRenderingContext.get(dataviz.DataContextProperty.DATA_LAYOUT);

      // Determine the number of records available for rendering on ROW
      // Because we specified that Category should be placed on ROW in the data model handler,
      // this returns the number of rows for the data in Category.
  var nRows = oDataLayout.getEdgeExtent(datamodelshapes.Physical.ROW);

      // Retrieve the root container for our visualization.  This is provided by the framework.  It may not be deleted
      // but may be used to render.
  var elContainer = this.getContainerElem();
  $(elContainer).html(messages.TEXT_MESSAGE.format("Demo Plugin", "" + nRows));

First of all, I need to know the actual size of the plotting area. I simply added that after var elContainer = this.getContainerElem();:

  var elContainer = this.getContainerElem();

  //container dimensions
  var nWidth = $(elContainer).width();
  var nHeight = $(elContainer).height();

The template plugin has code for pulling the data into the script (oDataLayout variable). But for my scatter plot, I need to put this data into an array and find the minimum and the maximum value for both arrays. This part of the code is my least favourite one. It looks like currently we can't make tho separate measures in a custom viz (or at least I can't find the solution), therefore instead of two nice separate measures, I have them as different rows. Like:

......... PEC #1Turnout,%34,8% PEC #1United Russia,%44,3% PEC #2Turnout,%62,1% PEC #2United Russia,%54,2% PEC #3Turnout,%25,9% PEC #3United Russia,%33,2% .........

I really hope that any solution for this will be found. So far I just put even and odd rows into X ad Y coordinates. At the same time, I'm determine minimum and maximum values for both axes.

  //temporary measure
  var tmp_measure;

  //current measure
  var cur_measure;

  var vMinimax=[Number.MAX_VALUE, Number.MIN_VALUE];
  var hMinimax=[Number.MAX_VALUE, Number.MIN_VALUE];

  for(var i=0;i<nRows;i++){

      cur_measure=Number(oDataLayout.getValue(datamodelshapes.Physical.DATA, i, 0, false));
      vMinimax[0]=Math.min(vMinimax[0], cur_measure);
      vMinimax[1]=Math.max(vMinimax[1], cur_measure);
      tmp_measure=Number(oDataLayout.getValue(datamodelshapes.Physical.DATA, i, 0, false));
      hMinimax[0]=Math.min(hMinimax[0], tmp_measure);
      hMinimax[1]=Math.max(hMinimax[1], tmp_measure);

The next part of the code is about handling multiple calls. I should handle things like changing measures or dimensions. I simply delete the old chart and create a new empty one.

  var oSVG;

  //Delete old chart if exists

  //Create new SVG with id=DemoViz

  //Set plot area size
       .attr("width", nWidth)
       .attr("height", nHeight)

Now I have an array with data. I have a prepared SVG plot for drawing. And I have no reason not to combine all this into a chart.

  //Compute scaling factors
  var hStep=(nWidth-40)/(hMinimax[1]-hMinimax[0]);
  var vStep=(nHeight-40)/(vMinimax[1]-vMinimax[0]);

      .attr("x", function(d) {return 20+d[1]*hStep;})
      .attr("y", function(d) {return nHeight-(20+d[0]*vStep);})
      .attr('fill-opacity', 0.3);

What I should mention here is opacity. If its value is too high (almost no opacity) the chart looks like this:


If I make dots more opaque, the chart looks (and works) better. This is how it looks like with fill-opacity=0.3.
0.99 Look at the black point in the middle. It's there because a lot of commissions has exactly the same values for both measures. That's how we could find Saratov region anomaly.

[22:00-] Summary and TODOs

This shows how we may enhance Oracle DVD with our own visualisation. Yes, it's really, really, really simple. And it needs a lot of work in order to make it more useful. First of all, we should make it more reliable. This realisation doesn't test input parameters for correctness. And we also need to add interactivity. The chart should be zoomable and selectable. It should work as a filter. It should react to resize. It needs visible axes. And so on. But we can create it in less than a half of hour and I think as a first step it works well. I hope you don't afraid of making your own plugins now.

If you need more complex sample, take a look at the Oracle's tutorial. It covers some of these TODOs.

Categories: BI & Warehousing

A Case for Essbase and Oracle Data Visualization

Mon, 2017-03-06 12:03

So it’s the end of the month, or maybe even the end of the quarter. And you’ve found yourself faced, yet again, with the task of pulling whatever data will be needed to produce the usual standardized budgetary and / or finance reports. You’re dreading this as it will probably eat up most of your day just getting the data you need, combing through it to find the metrics you need, and plugging them into your monumentally complicated custom spreadsheet, only to find the numbers are, well, off.

Enter Data Visualization (DV), and its lightweight brother application, Data Visualization Desktop (DVD). My bud at RM, Matt Walding, already did a pretty great post on some of the cursory features of DVD, covering a lot of the important how-tos and what’s whats. So check that out if you need a bit of a walkthrough. Both of these great tools tout that you can go from zero to analysis pretty darn quick, and from the extensive testing and prodding we’ve done with both DV and DVD, this claim is accurate. Now how does this help us, however, in the previous scenario? Well, IT processes being what they are in a lot of mid to large size companies, getting the data we need, to do the crunching we need to do, can be quite the monumental task, let alone the correct data. So when we get it, we are going to want a solution that can take us from zero to report, pretty darn quick.

If you’re the one stuck with doing the crunching, and then providing the subsequent results, your solution or workflow probably resembles one of the following:

Scenario 1

Emailed a spreadsheet with a ton of rows. Download the csv/xlsx and then crunch the rows into something that you can force into a super spreadsheet that has a ton of moving pieces just waiting to throw an error.

Scenario 2

You have access to Essbase, which stays pretty fresh, especially as reporting time draws nigh. You connect to Smartview and extract what you need for your report. See scenario 1.

Scenario 3

You have OBIEE that you depend on for data dumps, and then just export whatever you need. See scenario 1.

While there are no doubt variations on these themes, the bigger picture here is that between the time you receive your dataset and the final report, there are likely a few iterations of said final report. Maybe you’re having to make corrections to your Excel templates, perhaps the numbers on your sheet just aren’t jiving. Whatever the case may be, this part of the process is often the one that can be the most demanding of your time, not to mention the most headache inducing. So what’s the point of my schpeal? Well, wouldn’t it be nice to expedite this part of the whole thing? Let’s take a look at how we can do just that with both Data Visualization in OBIEE and Data Visualization Desktop.

Data Visualization

With DV, we can simply access any of our existing OBIEE subject areas to quickly create a basic pivot table. Right away you can see the profound time savings garnered by using DV. What's more, you don't need to feel forced into managing OBIEE on premises, as DV is also part of BICS (Business Intelligence Cloud Service) and DVCS (Data Visualization Cloud Service).

Even if you have Smartview, and can do more or less the same thing, what if you wanted to delegate some of the tedium involved in manually crunching all those rows? You could simply hand off an export to another analyst, and have them plug it right into their own instance of Data Visualization Desktop, which, might I add, comes with your purchase of the DV license. This also, however, leads down the slippery slope into siloing off your department. This approach is essentially doing that, however kept under the quarantine that is DVD, as this blog is touting, and keeping the data with which you are working consistent, you shouldn't be able to do too much damage. The point I'm trying to make is that everything about using DV and DVD as your sort of report crafting and proofing mediums, is super-fluid and smooth. The process from source system to report and over and over again, is super-seamless. Even if I didn't have direct access to the data source I needed, and had to rely on emailed data dumps or other, I can simply upload that sheet right into DV, assign some data types, and get to work. I can even add dynamic filters to the analysis by simply dragging and dropping a column to the filters area. If you're feeling adventurous, you could also display these tables on a dashboard, that perhaps your department looks at to proof them and share in the pleasurable experience that is concocting period-end reporting.

Data Visualization Desktop

Right now, DVD is only out for Windows (with a version for the Mac on the roadmap), which is mostly ok, as most every medium to large size company I have worked with employs Windows as their go-to OS. An analyst can install the program on their desktop machine and be ready to plug away in under 10 minutes. We can take the example spreadsheet above that we dumped out of VA and create our own version of the report right in DVD. One better, we can also blend it with any other source DVD can connect to. This feature, especially, can save lots of time when trying to get your numbers just right for sign off.

Looks Just Like DV

Acts Just Like DV

And hold the phone! There's even a native Essbase connector!

Actually One Better Than DV!

Speaking of connectors, check out the rest of the list, as well as the custom data flow functionality, which allows you to construct and save in-app data transformations to be invoked again, and again, and again...


Flexibility is the name of the game with DV and DVD. So sure, while it isn't the ideal tool for creating precisely formatted financial statements for SEC submission, it sure beats massaging and munging all that data in Excel. And once we're happy with our numbers, either by sharing our reports on a dashboard, or export, we can go ahead and plug our numbers right into whatever tool we are going to use to produce the final product. We've covered a couple of really good concepts, so let's just do a bit of a wrap up to make everything a bit more cohesive.

  • We can pull data into Data Visualization from OBIEE, from a spreadsheet, or even mash up the two to handle any inconsistencies that may exist at month's/quarter's end. Note that we can also use Smartview to pull directly to Excel. Note that you will be unable to use the OLAP capabilities of Essbase / Essbase in OBIEE, as the data will presented in DV and DVD as flattened hierarchies.

  • We can share reports in Data Visualization with other analysts or approvers via a dashboard (note that this is not a supported feature, but something that requires only a 'bit' of a hack) or PDF, or hand off our exported data set for further work. This can be to another person using DV or DVD.

  • Data Visualization Desktop can connect directly to our Essbase source or utilize a data dump from DV / OBIEE to do further work and analysis on the data set. We can also connect DVD to OBIEE, as posted here, in order to extract an analysis from the web catalog. This will at least save us the step of dumping / emailing the report.

  • All of this app-to-app compatability encourages a sustainable, functional, and fluid reporting environment (warning: data silos!), especially for those who are unfamiliar with the ins and outs of Essbase and / or working with OBIEE.

Categories: BI & Warehousing