BI & Warehousing

Overview of the new Cloudera Data Science Workbench

Rittman Mead Consulting - Fri, 2017-06-02 09:07

Recently Cloudera released a new product called Cloudera Data Science Workbench(CDSW)

Being a Cloudera Partner, we at Rittman Mead are always excited when something new comes along.

The CDSW is positioned as a collaborative platform for data scientists/engineers and analysts, enabling larger teams to work in a self-service manner through a web browser. This browser application is effectively an IDE for R, Python and Scala - all your favorite toys!

The CDSW is deployed onto edge nodes of your CDH cluster, providing easy access to your HDFS data and the Spark2 and Impala engines. This means that team members can immediately start working on their projects, accessing full datasets and share analysis and results. A CDSW Project can include reusable code and snippets, libraries etc helping your teams to collaborate. Oh, and these projects can be linked with Github repos to help keep version history.

The workbench is used to fire up user session with R, Python or Scala inside a dedicated Docker engines. These engines can be customised, or extended, like any other Docker images to include all your favourite R packages and Python libraries. Using HDFS, Hive, Spark2 or Impala the workload can then be distributed over to the CDH cluster, by use of your preferred methods, without having to configure anything. This engine (virtual machine, really) runs for as long as the analysis. Any logs or output files need to be saved in the project folder, which is mounted inside the engine and saved on the CDSW master node. The master node is a gateway node to the CDH cluster and can scale out to many worker nodes to distribute the Docker engines

(C) Cloudera.com

And under the hood we also have Kubernetes to schedule user workload across the worker nodes and provide CPU and memory isolation

So far I find the IDE to be a bit too simple and lacking features compared to e.g. RStudio Server. But the ease of use and the fact that everything is automatically configured makes the CDSW an absolute must for any Cloudera customer with data science teams. Also, I'm convinced that future releases will add loads of cool functionality

I spent about two days building a new cluster on AWS and install the Cloudera Data Science Workbench, just an indication of how easy it is to get up and running. Btw, it also runs in the cloud (Iaas) ;)

Want to know more or see a live demo? Contact us at info@rittmanmead.com

Categories: BI & Warehousing

First Steps with Oracle Analytics Cloud

Rittman Mead Consulting - Thu, 2017-06-01 07:43
Preface

Not long ago Oracle added a new offer to their Cloud - an OBIEE in a Cloud with full access. Francesco Tisiot made an overview of it and now it's time to go a bit deeper and see how you can poke it with a sharp stick by yourself. In this blog, I'll show how to get your own OAC instance as fast and easy as possible.

Before you start

The very first step is to register a cloud account. Oracle gives a trial which allows testing of all features. I won't show it here as it is more or less a standard registration process. I just want highlight a few things:

  • You will need to verify your phone number by receiving an SMS. It seems that this mechanism may be a bit overloaded and I had to make more than one attempts. I press the Request code button but nothing happens. I wait and press it again, and again. And eventually, I got the code. I can't say for sure and possible it was just my bad luck but if you face the same problem just keep pushing (but not too much, requesting a code every second won't help you).
  • Even for trial you'll be asked for a credit card details. I haven't found a good diagnostics on how much was already spent and the documentation is not really helpful here.
Architecture

OAC instances are not self-containing and require some additional services. The absolute minimum configuration is the following:

  • Oracle Cloud Storage (OCS) - is used for backups, log files, etc.
  • Oracle Cloud Database Instance (DBC) - is used for RCU schemas.
  • Oracle Analytics Cloud Instance (OAC) - is our ultimate target.

From the Cloud services point of view, architecture is the following. This picture doesn't show virtual disks mounted to instances. These disks consume Cloud Storage quota but they aren't created separately as services.

Architecture

We need at least one Oracle Database Cloud instance to store RCU schemas. This database may or may not have a separate Cloud Storage area for backups. Every OAC instance requires Cloud storage area for logs. Multiple OAC instances may share one Cloud storage area but I can't find any advantage of this approach over a separate area for every instance.

Create Resources

We create these resource in the order they are listed earlier. Start with Storage, then DB and the last one is OAC. Actually, we don't have to create Cloud Storage containers separately as they may be created automatically. But I show it here to make things more clear without too much "it works by itself" magic.

Create Cloud Storage

The easiest part of all is the Oracle Cloud Storage container. We don't need to specify its size or lots of parameters. All parameters are just a name, storage class (Standard/Archive) and encryption.

20-create_ocs.gif

I spent some time here trying to figure out how to reference this storage later. There is a hint saying that "Use the format: <storage service>-<identity domain>/<container>. For example: mystorage1-myid999/mybackupcontainer." And if identity domain and container are pretty obvious, storage service puzzled me for some time. The answer is "storage service=Storage". You can see this in the top of the page.

30-OCS_naming.png

It seems that Storage is a fixed keyword, rurittmanm is the domain name created during the registration process and demo is the actual container name. So in this sample when I need to reference my demo OCS I should write Storage-rurittmanm/demo.

Create Cloud DB

Now when we are somewhat experienced in Oracle Cloud we may move to a more complicated task and create a Cloud DB Instance. It is harder than Cloud Storage container but not too much. If you ever created an on-premise database service using DBCA, cloud DB should be a piece of cake to you.

At the first step, we set the name of the instance and select the most general options. These options are:

  • Service Level. Specifies how this instance will be managed. Options are:

    • Oracle Database Cloud Service: Oracle Database software pre-installed on Oracle Cloud Virtual Machine. Database instances are created for you using configuration options provided in this wizard. Additional cloud tooling is available for backup, recovery and patching.
    • Oracle Database Cloud Service - Virtual Image: Oracle Database software pre-installed on an Oracle Cloud Virtual Machine. Database instances are created by you manually or using DBCA. No additional cloud tooling is available.
  • Metering Frequency - defines how this instance will be paid: by months or by hours.

  • Software Release - if the Service Level is Oracle Database Cloud Service, we may choose 11.2, 12.1 and 12.2, for Virtual Image only 11.2 and 12.1 are available. Note that even cloud does no magic and with DB 12.2 you may expect the same problems as on-premise.

  • Software Edition - Values are:

    • Standard Edition
    • Enterprise Edition
    • Enterprise Edition - High Performance
    • Enterprise Edition - Extreme Performance
  • Database Type - defines High Availability and Disaster Recovery options:

    • Single Instance
    • Database Clustering with RAC
    • Single Instance with Data Guard Standby
    • Database Clustering with RAC and Data Gard Standby

Database Clustering with RAC and Database Clustering with RAC and Data Gard Standby types are available only for Enterprise Edition - Extreme Performance edition.

40-create_obdc-1.gif

The second step is also quite intuitive. It has a lot of options but they should be pretty simple and well-known for anyone working with Oracle Database.

60-create-odbc-dc.png

The first block of parameters is about basic database configuration. Parameters like DB name (sid) or Administration Password are obvious.

Usable DataFile Storage (GB) is less obvious. Actually, in the beginning, it puzzled me completely. In this sample, I ask for 25 Gb of space. But this doesn't mean that my instance will take 25 Gb of my disk quota. In fact, this particular instance took 150 Gb of disk space. Here we specify only a guaranteed user disk space, but an instance needs some space for OS, and DB software, and temp, and swap, and so on.

65-db-disk.png

A trial account is limited with 500 Gb quota and that means that we can create only 3 Oracle DB Cloud instances at max. Every instance will use around 125 Gb of let's say "technical" disk space we can't reduce. From the practical point of view, it means that it may be preferable to have one "big" instance (in terms of the disk space) rather than multiple "small".

  • Compute shape specifies how powerful our VM should be. Options are the following:
    • OC3 - 1.0 OCPU, 7.5 GB RAM
    • OC4 - 2.0 OCPU, 15.0 GB RAM
    • OC5 - 4.0 OCPU, 30.0 GB RAM
    • OC6 - 8.0 OCPU, 60.0 GB RAM
    • OC7 - 16.0 OCPU, 120.0 GB RAM
    • OC1m - 1.0 OCPU, 15.0 GB RAM
    • OC2m - 2.0 OCPU, 30.0 GB RAM
    • OC3m - 4.0 OCPU, 60.0 GB RAM
    • OC4m - 8.0 OCPU, 120.0 GB RAM
    • OC5m - 16.0 OCPU, 240.0 GB RAM

We may increase or decrease this value later.

  • SSH Public Key - Oracle gives us an ability to connect directly to the instance and authentication is made by user+private key pair. Here we specify a public key which will be added to the instance. Obviously, we should have a private key for this public one. Possible options are either we provide a key we generated by ourselves or let Oracle create keys for us. The most non-obvious thing here is what is the username for the SSH. You can't change it and it isn't shown anywhere in the interface (at least I haven't found it). But you can find it in the documentation and it is opc.

The second block of parameters is about backup and restore. The meaning of these options is obvious, but exact values aren't (at least in the beginning).

70-create-odbc-brc.png

  • Cloud Storage Container - that's the Cloud Storage container I described earlier. Value for this field will be something like Storage-rurittmanm/demo. In fact, I may do not create this Container in advance. It's possible to specify any inexistent container here (but still in the form of Storage-<domain>/<name>) and tick Create Cloud Storage Container check-box. This will create a new container for us.

  • Username and Password are credentials of a user who can access this container.

The last block is Advanced settings and I believe it's quite simple and obvious. Most of the time we don't need to change anything in this block.

80-create-odbc-ac.png

When we fill all parameters and press the Next button we get a Summary screen and the actual process starts. It takes about 25-30 minutes to finish.

When I just started my experiments I was constantly getting a message saying that no sites available and my request may not be completed.

It is possible that it was again the same "luck" as with the phone number verification but the problem solved by itself a few hours later.

Create OAC Instance

At last, we have all we need for our very first OAC instance. The process of an OAC instance setup is almost the same as for an Oracle DB Cloud Instance. We start the process, define some parameters and wait for the result.

At the first step, we give a name to our instance, provide an SSH public key, and select an edition of our instance. We have two options here Enterprise Edition or Standard Edition and later we will select more additional options. Standard edition will allow us to specify either Data Visualisation or Essbase instances and Enterprise Edition adds to this list a classical Business Intelligence feature. The rest of the parameters here are exactly the same as for Database Instance.

90-oacs-1st-step.png

At the second step, we have four blocks of parameters.

100-oacs-2nd-step.png

  • Service Administrator - the most obvious one. Here we specify an administrator user. This user will be a system administrator.

  • Database - select a database for RCU schemas. That's why we needed a database.

  • Options - specify which options our instance will have.

    • Self-Service Data Visualisation, Preparation and Smart Discovery - this option means Oracle Data Visualisation and it is available for both Standard and Enterprise Editions.
    • Enterprise Data Models - this option gives us classical BI and available only for Enterprise Edition. Also, this option may be combined with the first one giving us both classical BI and modern Data discovery on one instance.
    • Collaborative Data Collection, Scenarios and What-if Analysis - this one stands for Essbase and available for Standard and Enterprise Editions. It can't be combined with other options.
  • Size is the same thing that is called Compute Shape for the Database. Options are exactly the same.
  • Usable Storage Size on Disk GB also has the same meaning as for the DB. The minimum size we may specify here is 25 Gb what gives us total 170 Gb of used disk space.

Here is a picture showing all possible combinations of services:

110-oacs-editions.png

And here virtual disks configuration. data disk is the one we specify.
130-oacs-storage.png

The last block - Cloud Storage Configuration was the hardest one. Especially the first field - Cloud Storage Base URL. The documentation says "Use the format: https://example.storage.oraclecloud.com/v1" and nothing more. When you know the answer it may be easy, but when I saw it for the first time it was hard. Should I place here any unique URL just like an identifier? Should it end with v1? And what is the value for the second instance? V2? Maybe I should place here the URL of my current datacenter (https://dbcs.emea.oraclecloud.com). The answer is https://<domain>.storage.oraclecloud.com/v1 in my case it is https://rurittmanm.storage.oraclecloud.com/v1. It stays the same for all instances.

All other parameters are the same as they were for DBCS instance. We either specify an existing Cloud Storage container or create it here.

120-oacs-cloud-storage.png

The rest of the process is obvious. We get a Summary and then wait. It takes about 40 minutes to create a new instance.

Note: diagnostics here is a bit poor and when it says that the instance start process is completed it may not be true. Sometimes it makes sense to wait some time before starting to panic.

Now we may access our instance as a usual. The only difference is that the port is 80 not 9502 (or 443 for SSL). For Data Visualisation the link is http(s)://<ip address>/va, for BIEE - http(s)://<ip address>/analytics and for Essbase http(s)://<ip address>/essbase. Enterprise Manager and Weblogic Server Console are availabale at port 7001 which is blocked by default.

What is bad that https uses a self-signed certificate. Depending on browser settings it may give an error or even prevent access to https.

Options here either use HTTP rather than HTTPS or add this certificate to your local computer. But these aren't the options for a production server. Luckily Oracle provides a way to use own SSL certificates.

Typical Management Tasks SSH to Instances

During the setup process, we provide Oracle with a public key which is used to get an SSH access to instances. Cloud does nothing special to this. In the case of Windows, we may use Putty. Just add the private key to Pageant and connect to the instance using user opc.

140-pageant.png

150-putty.gi

Opening Ports

By default only the absolute minimum of the ports is open and we can't connect to the OAC instance using BI Admin tool or to the DB with SQLDeveloper. In order to do this, we should create an access rule which allows access to this particular ports.

In order to get to the Access Rules interface, we must use instance menu and select the Access Rules option.

150-access-menu.png

This will open the Access Rules list. What I don't like about it is that it opens the full list of all rules but we can create only a rule for this particular instance.

160-access-rules-list.png

New rule creation form is simple and should cause no issues. But be careful here and not open too much for a wild Internet.

170-new-rule.png

Add More Users

The user who registered a Cloud Account becomes its administrator and can invite more users and manage privileges.

180-access-users.png

Here we can add and modify users.

190-users.png

When we add a user we specify a name, email and login. Also here we set roles for the user. The user will get an email with these details, and link to register.

Obviously, the user won't be asked about a credit card. He just starts working and that's all.

Summary

My first steps with Oracle Analytics Cloud were not very easy, but I think it was worth it. Now I can create a new OBIEE instance just in a few minutes and one hour later it will be up and running. And I think that's pretty fast compared to a normal process of creating a new server in a typical organisation. We don't need to think about OS installation, or licenses, or whatever else. Just try it.

Categories: BI & Warehousing

ORA-12154 in Data Guard environment

Amardeep Sidhu - Wed, 2017-05-31 10:54

Hit this silly issue in one of the data guard environments today. Primary is a 2 node RAC running 11.2.0.4 and standby is also a 2 node RAC. Archive logs from node2 aren’t shipping and the error being reported is

ORA-12154: TNS:could not resolve the connect identifier specified

We tried usual things like going to $TNS_ADMIN, checking the entry in tnsnames.ora and then also trying to connect using sqlplus sys@target as sysdba. Everything seemed to be good but logs were not shipping and the same problem was being reported repeatedly. As everything on node1 was working fine so it looked even more weird.

From the error it is clear that the issue is with tnsnames entry. Finally found the issue after some 30 mins. It was an Oracle EBS environment so the TNS_ADMIN was set to the standard $ORACLE_HOME/network/admin/*hostname* path (on both the nodes). On node1 there was no tnsnames.ora file in $ORACLE_HOME/network/admin so it was connecting to the standby using the Apps tnsnames.ora which was having the correct entry for standby. On node2 there was a file called tnsnames.ora in $ORACLE_HOME/network/admin but it was not having any entry for standby. It was trying to connect using that file (the default tns path) and failing with ORA-12154. Once we removed that file, it started using the Apps tnsnames.ora and logs started shipping.

Categories: BI & Warehousing

ACE Alumni

Tim Tow - Tue, 2017-05-23 23:08
Today, I asked Oracle to move me from Oracle ACE Director status to Oracle ACE Alumni status.  There are a number of reasons why I decided to change status.  When I started answering questions on internet forums years ago, I did it to share what I had learned in order to help others.  The same goes for this blog which I originally started so that I could give better and more complete answers to questions on the forums.

After the Hyperion acquisition by Oracle, I was contacted by Oracle who asked if I would be interested in becoming an "Oracle ACE".  It was an honor.  But over time, things have changed.  As more people found out about the ACE program, more people wanted to become an ACE.  If you have ever monitored the OTN Essbase and Smart View forums, they have become cluttered with copy and paste posts from people obviously trying to increase their points.  As the ACE program grew, it also become harder for the OTN team to manage and now require a formal activity reporting - a time report if you will - to track contributions to the community.  As I am already extremely pressed for time, I decided that tracking my contributions to the community - in exchange for a free pass to Open World, just didn't make sense.

All of that being said, just because I have moved to Oracle ACE Alumni status doesn't mean that I will stop contributing to the community.  My company will continue to provide free downloads and support for the Next Generation (Essbase) Outline Extractor and the Outline Viewer along with free downloads of Drillbridge Community Edition.  And maybe, just maybe, I will finally have time to write some new blog posts (maybe even some posts on some new Dodeca features inspired by our work with Oracle Analytics Cloud / Essbase Cloud!)

Categories: BI & Warehousing

Users of Analytics Applications

Dylan's BI Notes - Sun, 2017-05-21 15:08
Business User who are consuming the data and the report.  They see the information pushed to them.  They can see alerts in their phone.  They see emails.  They add the page to a bookmark in their browser and periodically look at them.   They are executives, managers, busy users who have other duties.   They don’t […]
Categories: BI & Warehousing

Delivery to Oracle Document Cloud Services (ODCS) Like A Boss

Tim Dexter - Wed, 2017-05-17 11:53
p { margin-bottom: 0.1in; direction: ltr; color: rgb(0, 0, 10); line-height: 120%; text-align: left; }p.western { font-family: "Liberation Serif",serif; font-size: 12pt; }p.cjk { font-family: "WenQuanYi Micro Hei"; font-size: 12pt; }p.ctl { font-family: "Lohit Devanagari"; font-size: 12pt; }

We have moved to a new blogging platform. This was a post from Pradeep that missed the cut over ...

In release 12.2.1.1, BI Publisher added a new feature - Delivery to Oracle Document Cloud Services (ODCS). Around the same time, BI Publisher was also certified against JCS 12.2.1.x and therefore, today if you have hosted your BI Publisher instance on JCS then we recommend Oracle Document Cloud Services as the delivery channel. Several reasons for this:

  1. Easy to configure and manage ODCS in BI Publisher on Oracle Public Cloud. No port or firewall issues.

  2. ODCS offers a scalable, robust and secure document storage solution on cloud.

  3. ODCS offers document versioning and document metadata support similar to any content management server

  4. Supports all business document file formats relevant for BI Publisher

When to use ODCS?

ODCS can be used for all different scenarios where a document need to be securely stored in a server that can be retained for any duration. The scenarios may include:

  • Bursting documents to multiple customers at the same time.

    • Invoices to customers

    • HR Payroll reports to its employees

    • Financial Statements

  • Storing large or extremely large reports for offline printing

    • End of the Month/Year Statements for Financial Institutions

    • Consolidated department reports

    • Batch reports for Operational data

  • Regulatory Data Archival

    • Generating PDF/A-1b or PDF/A-2 format documents

How to Configure ODCS in BI Publisher?

Configuration of ODCS in BI Publisher requires the  URI, username and password. Here the username is expected to have access to the folder where the files are to be delivered.


 

How to Schedule and Deliver to ODCS?

Delivery to ODCS can be managed through both - a Normal Scheduled Job and a Bursting Job.

A Normal Scheduled Job allows the end user to select a folder from a list of values as shown below

\

In case of Bursting Job, the ODCS delivery information is to be provided in the bursting query as shown below:

Accessing Document in ODCS

Once the documents are delivered to ODCS, they can be accessed by user based on his access to the folder, very similar to FTP or WebDAV access.

That's all for now. Stay tuned for more updates !

 

Categories: BI & Warehousing

OBIEE upgrades and Windows vulnerabilities

Rittman Mead Consulting - Mon, 2017-05-15 06:00
OBIEE upgrades and Windows vulnerabilities

OBIEE upgrades and Windows vulnerabilities

These two topics may seem unrelated; however, the ransomware attacks over the last few days provide us with a reminder of what people can do with known vulnerabilities in an operating system.

Organisations consider upgrades a necessary evil; they cost money, take up time and often have little tangible benefit or return on investment (ROI). In the case of upgrades between major version of software, for example, moving from OBIEE 10g to 12c there are significant architecture, security, functional and user interface changes that may justify the upgrade alone, but they are unlikely to significantly change the way an organisation operates and may introduce new components and management processes which produce an additional overhead.

There is another reason to perform upgrades: to keep your operating systems compliant with corporate security standards. OBIEE, and most other enterprise software products, come with certification matrices that detail the supported operating system for each product. The older the version of OBIEE, the older the supported operating systems are, and this is where the problem starts.

If we take an example of an organisation running OBIEE 10g, the most recent certified version of Windows it can run is Windows 2008 R2, which will fall outside of your company's security policy. You will be less likely to be patching the operating system on the server as it will either have fallen off the radar or Microsoft may have stopped releasing patches for that version of the operating system.

The result leaves a system that has access to critical enterprise data vulnerable to known attacks.

The only answer is to upgrade, but how do we justify ROI and obtain budget? I think we need to recognise that there is a cost of ownership associated with maintaining systems, the benefit of which is the mitigation of the risk of an instance like the ransomware attacks. It is highly unlikely that anyone could have predicted those attacks, so you could never have used it as a reason to justify an upgrade. However, these things do happen, and a significant amount of cyber attacks probably go on undetected. The best protection you have is to make sure your systems are up to date.

Categories: BI & Warehousing

BIP and Mapviewer Mash Up V

Tim Dexter - Mon, 2017-05-08 11:38

The last part on maps, I promise ... its been a fun ride for me at least :0) If you need to catch up on previous episodes:

In this post we're looking at map quality. On the left a JPG map, to the right an SVG output.

If we ignore the fact that they have different levels of features or layers. Imagine getting the maps into a PDF and then printing them. Its pretty clear that the SVG version of the map is going to render better on paper compared to JPG.

Getting the SVG output from mapviewer is pretty straightforward, getting BIP to render it requires a little bit of effort. I have mentioned the XML request that we construct and then do a variable substitution in our servlet. All we need do is add another option to the requested output. Mapviewer supports several flavors of SVG:

  • If you specify SVG_STREAM, the stream of the image in SVG Basic (SVGB) format is returned directly;
  • If you specify SVG_URL, a URL to an SVG Basic image stored on the MapViewer host system is returned.
  • If you specify SVGZ_STREAM, the stream of the image in SVG Compressed (SVGZ) format is returned directly;
  • If you specify SVGZ_URL, a URL to an SVG Compressed image stored on the MapViewer host system is returned. SVG Compressed format can effectively reduce the size of the SVG map by 40 to 70 percent compared with SVG Basic format, thus providing better performance.
  • If you specify SVGTINY_STREAM, the stream of the image in SVG Tiny (SVGT) format is returned directly;
  • If you specify SVGTINY_URL, a URL to an SVG Tiny image stored on the MapViewer host system is returned. (The SVG Tiny format is designed for devices with limited display capabilities, such as cell phones.)

Dont panic, Ive looked at them all for you and we need to use SVGTINY_STREAM. This sends back a complete XML file representation of the map in SVG format. We have a couple of issues:

  1. We need to strip the XML declaration from the top of the file: <?xml version="1.0" encoding="utf-8"?> If we don't BIP will choke on the SVG. Being lazy I just used a string function to strip the line out in my servlet:

    dd

  2. We need to stream the SVG back as text. So we need to set the CONTENT_TYPE for the servlet as 'text/javascript'
  3. We need to handle the SVG when it comes back to the template. We do not use the




Categories: BI & Warehousing

A focus on Higher Education, HEDW 2017

Rittman Mead Consulting - Wed, 2017-05-03 09:04

First, before I get into a great week of Higher Education Data Warehousing and analytics discussions, I want to thank the HEDW board and their membership. They embraced us with open arms in our first year of conference sponsorship. Our longtime friend and HEDW board member, Phyllis Wykoff, from Miami University of Ohio even spent some time with us behind the booth!

HEDW was in the lovely desert scape of Tucson, AZ at the University of Arizona. Sunday was a fantastic day of training, followed by three days of outstanding presentations from member institutions and sponsors. Rittman Mead wanted to show how important the higher education community is to us, so along with me, we had our CEO-Jon Mead, our CTO-Jordan Meyer, and our US Managing Director-Charles Elliott. If our AirBnB had ears, it would have heard several solutions to the problems of the world as well as discussions of the fleeting athleticism of days gone past. But alas, that will have to wait.

While at the conference, we had a multitude of great conversations with member institutions and there were a few themes that stuck out to us with regard to common issues and questions from our higher education friends. I will talk a little bit about each one below with some context on how Rittman Mead is the right fit to be partners in addressing some big questions out there.

Legacy Investment vs BI tool Diversification (or both)

One theme that was evident from hour one was the influx of Tableau in the higher education community. Rittman Mead is known for being the leader in the Oracle Business Intelligence thought and consulting space and we very much love the OBIEE community. With that said, we have, like all BI practitioners, seen the rapid rise of Tableau within departments and lately as an enterprise solution. It would be silly for the OBIEE community to close their eyes and pretend that it isn’t happening. There are great capabilities coming out of Oracle with Data Visualization but the fact is, people have been buying Tableau for a few years and Tableau footprints exist within organizations. This is a challenge that isn't going away.

Analytics Modernization Approaches

We had a ton of conversations about how to include newer technologies in institutions’ business intelligence and data warehousing footprints. There is clearly a desire to see how big data technologies like Hadoop, data science topics like the R statistical modeling language, and messaging services like Kafka could positively impact higher education organizations. Understanding how you may eliminate batch loads, predict student success, know if potential financial aid is not being used, know more about your students with analysis of student transactions with machine learning, and store more data with distributed architectures like Hadoop are all situations that are readily solvable. Rittman Mead can help you prioritize what will make the biggest value impact with a Modernization Assessment. We work with organizations to make good plans for implementation of modern technology at the right place and at the right time. If you want more info, please let us know.

Sometimes we need a little help from our friends

Members of HEDW need a different view or another set of eyes sometimes and the feedback we heard is that consulting services like ours can seem out of reach with budgets tighter than ever. That is why we recently announced the Rittman Mead Expert Service Desk. Each month, there are hours available to spend however you would like with Rittman Mead’s experts. Do you have a mini project that never seems to get done? Do you need help with a value proposition for a project or upgrade? Did production just go down and you can’t seem to figure it out? With Expert Service desk, you have the full Rittman Mead support model at your fingertips. Let us know if you might want a little help from your friends at Rittman Mead.

To wrap up

Things are a changing and sometimes it is tough to keep up with all of the moving parts. Rittman Mead is proud to be a champion of sharing new approaches and technologies to our communities. Spending time this week with our higher education friends is proof more that our time spent sharing is well worth it. There are great possibilities out there and we look forward to sharing them throughout the year and at HEDW 2018 in Oregon!

Categories: BI & Warehousing

Top 5 Quotes from Oracle’s 2017 Modern Finance Experience

Look Smarter Than You Are - Mon, 2017-05-01 12:40
Three days of Oracle’s Modern Finance Experience set my personal new record for “Most Consecutive Days Wearing a Suit.” Surrounded by finance professionals (mostly CFOs, VPs of FP&A, and people who make money from Finance execs), I came prepared to learn nothing… yet found myself quoting the content for days to come.

The event featured top notch speakers on cutting edge concepts: the opening keynote with Mark Hurd, a panel on the changing world of finance with Matt Bradley & Rondy Ng, Hari Sankar on Hybrid in the world of Oracle EPM, and even one of my competitors (more on that in a second).

For those of you who couldn’t be there (or didn’t want to pay a lot of money to dress up for three days), I thought I’d share my top five quotes as best as I could transcribe them.

“IT currently spends 80% of its budget on maintenance. Boards are demanding increased security, compliance, and regulatory investment. All these new investments come from the innovation budget, not maintenance.”
-          Mark Hurd, Oracle, Co-Chief Executive Officer

Mark Hurd was pulling double duty: he gave the opening keynote at Oracle HCM World (held at a nearby hotel) and then bolted over to Oracle Modern Finance Experience to deliver our keynote. He primarily talked Oracle strategy for the next few years which – to badly paraphrase The Graduate – can be summed up in one word: Cloud.

He gave a compelling argument for why the Cloud is right for Oracle and businesses (though server vendors and hosting providers should be terrified). Now let me be clear: much of this conference was focused around the Cloud, so many of these quotes will be too, but what I liked about Mark’s presentation was it gave clear, concise, and practically irrefutable arguments of the benefits of the Cloud.

The reason I liked the quote above is it answers the concerns from all those IT departments: what happens to my job if I don’t spend 80% of our resources on maintaining existing systems? You’ll get to spend your time on actually improving systems. Increased innovation, greater security, better compliance … the things you’ve been wanting to get to but never have time or budget to address.

“The focus is not on adding lots of new features to on-premises applications. Our priority is less on adding to the functional richness and more on simplifying the process of doing an upgrade.”

-          Hari Sankar, Oracle, GVP of Product Management

I went to a session on the hybrid world of Oracle EPM. I knew Hari would be introducing a customer who had both on-premises Hyperion applications and Cloud applications. What I didn’t know is that he would be addressing the future of Oracle EPM on-premises. As most of you know, the current version for the on-premises Oracle EPM products is 11.1.2.4.x. What many of you do not know is that Oracle has taken future major versions (11.1.2.5 and 12c) of those products off the roadmap.

Hari spoke surprisingly directly to the audience about why Oracle is not abandoning EPM on-prem, but why they will not be pushing the Cloud versions and all their cool new functionality back down to the historical user base. To sum up his eight+ minute monologue, the user base is not requesting new functionality. They want simplicity and an easy path to transition to the Cloud eventually, and that’s why Oracle will be focusing on PSUs (Patch Set Updates) for the EPM products and not on “functional richness.”

Or to put it another way: Hyperion Planning and other Hyperion product users who want impressive new features? Go to the Cloud because they’re probably never coming to on-premises. To quote Hari once more, “create a 1-3 year roadmap for moving to a Cloud environment” or find your applications increasingly obsolete.

 “Hackers are in your network: they’re just waiting to pull the trigger.”

-          Rondy Ng, Oracle, SVP of Applications Development

There was an entertaining Oracle panel led by Jeff Jacoby (Master Principal Sales Consultant and a really nice guy no matter what his family says) that included Rondy Ng (he’s over ERP development), Matt Bradley (he’s over EPM development), and Michael Gobbo (also a lofty Master Principal Sales Consultant). While I expected to be entertained (and Gobbo’s integrated ERP/HCM/EPM demo was one for the ages), I didn’t expect them to tackle the key question on everyone’s mind: what about security in the Cloud?

Mark Hurd did address this in his keynote and he gave a fun fact: if someone finds a security flaw in Oracle’s software on a Tuesday, Oracle will patch in by Wednesday, and it will take an average of 18 months until that security patch gets installed in the majority of their client base. Rondy addressed it even more directly: if you think hackers haven’t infiltrated your network, you’re sticking your head in the sand.

Without going into all of Rondy’s points, his basic argument was that Oracle is better at running a data center than any of their customers out there. He pointed out that Oracle now has 90 data centers around the world and that security overrides everything else they do. He also said, “security is in our DNA” which is almost the exact opposite of “Danger is my middle name,” but while Rondy’s line won’t be getting him any dates, it should make the customer base feel a lot safer about letting Oracle host their Cloud applications.

 “Cloud is when not if.”

-          David Axson, Accenture, Managing Director

I have to admit, I have developed a man crush on one of my competitors. I wrote down more quotes from him than from every other speaker at the event put together. His take on the future of Finance and Planning so closely paralleled my thoughts that I almost felt like he had read the State of Business Analytics white paper we wrote. For instance, in that white paper, we wrote about Analysis Inversion: that the responsibility for analyzing the report should be in the hands of the provider of the report, not the receiver of the report. David Axson put it this way: “The reporting and analysis is only as good as the business decisions made from it. In finance, your job starts when you deliver the report and analysis. Most people think that's when it ends.”

The reason I picked the quote above is because it really sums up the whole theme of the conference: the Cloud is not doing battle with on-premises. The Cloud did that battle, won with a single sucker punch while on-prem was thinking it had it made, and Cloud currently dancing on the still unconscious body of on-prem who right now is having a bad nightmare involving losing its Blackberry while walking from Blockbuster to RadioShack.

David is right: the Cloud is coming to every company and the only question is when you’ll start that journey.

“Change and Certainty are the new normal. Combat with agility.”

-          Rod Johnson, Oracle, SVP North America ERP, EPM, SCM Enterprise Business

So, what can we do about all these changes coming to Finance? And for that matter, all the changes coming to every facet of every industry in every country on Earth? Rod Johnson (which he assures me is his not his “stage” name) said it best: don’t fight the change but rather embrace it and make sure you can change faster than everyone else.

"Change comes to those who wait, but it’s the ones bringing the change who are in control."

-          Edward Roske, interRel, CEO


To read more about some of those disruptive changes coming to the world of Finance, download the white paper I mentioned above.
Categories: BI & Warehousing

Deliver Reports to Document Cloud Services!

Tim Dexter - Fri, 2017-04-28 16:32

Greetings !

In release 12.2.1.1, BI Publisher added a new feature - Delivery to Oracle Document Cloud Services (ODCS). Around the same time, BI Publisher was also certified against JCS 12.2.1.x and therefore, today if you have hosted your BI Publisher instance on JCS then we recommend Oracle Document Cloud Services as the delivery channel. Several reasons for this:

  1. Easy to configure and manage ODCS in BI Publisher on Oracle Public Cloud. No port or firewall issues.
  2. ODCS offers a scalable, robust and secure document storage solution on cloud.
  3. ODCS offers document versioning and document metadata support similar to any content management server
  4. Supports all business document file formats relevant for BI Publisher

When to use ODCS?

ODCS can be used for all different scenarios where a document need to be securely stored in a server that can be retained for any duration. The scenarios may include:

  • Bursting documents to multiple customers at the same time.
    • Invoices to customers
    • HR Payroll reports to its employees
    • Financial Statements
  • Storing large or extremely large reports for offline printing
    • End of the Month/Year Statements for Financial Institutions
    • Consolidated department reports
    • Batch reports for Operational data
  • Regulatory Data Archival
    • Generating PDF/A-1b or PDF/A-2 format documents

How to Configure ODCS in BI Publisher?

Configuration of ODCS in BI Publisher requires the  URI, username and password. Here the username is expected to have access to the folder where the files are to be delivered.



How to Schedule and Deliver to ODCS?

Delivery to ODCS can be managed through both - a Normal Scheduled Job and a Bursting Job.

A Normal Scheduled Job allows the end user to select a folder from a list of values as shown below


\

In case of Bursting Job, the ODCS delivery information is to be provided in the bursting query as shown below:


Accessing Document in ODCS

Once the documents are delivered to ODCS, they can be accessed by user based on his access to the folder, very similar to FTP or WebDAV access.

That's all for now. Stay tuned for more updates !

Categories: BI & Warehousing

Failed to create voting files on disk group RECOC1

Amardeep Sidhu - Fri, 2017-04-28 04:01

Long story short, faced this issue while running OneCommand for one Exadata system. The root.sh step (Initialize Cluster Software) was failing with the following error on the screen

Checking file root_dm01dbadm02.in.oracle.com_2017-04-27_18-13-27.log on node dm01dbadm02.somedomain.com
Error: Error running root scripts, please investigate…
Collecting diagnostics…
Errors occurred. Send /u01/onecommand/linux-x64/WorkDir/Diag-170427_181710.zip to Oracle to receive assistance.

Doesn’t make much sense. So let us check the log file of this step

2017-04-27 18:17:10,463 [INFO][  OCMDThread][        ClusterUtils:413] Checking file root_dm01dbadm02.somedomain.com_2017-04-27_18-13-27.log on node inx321dbadm02.somedomain.com
2017-04-27 18:17:10,464 [INFO][  OCMDThread][        OcmdException:62] Error: Error running root scripts, please investigate…
2017-04-27 18:17:10,464 [FINE][  OCMDThread][        OcmdException:63] Throwing OcmdException… message:Error running root scripts, please investigate…

So we need to go to root.sh log file now. That shows

Failed to create voting files on disk group RECOC1.
Change to configuration failed, but was successfully rolled back.
CRS-4000: Command Replace failed, or completed with errors.
Voting file add failed
2017/04/27 18:16:37 CLSRSC-261: Failed to add voting disks

Died at /u01/app/12.1.0.2/grid/crs/install/crsinstall.pm line 2068.
The command ‘/u01/app/12.1.0.2/grid/perl/bin/perl -I/u01/app/12.1.0.2/grid/perl/lib -I/u01/app/12.1.0.2/grid/crs/install /u01/app/12.1.0.2/grid/crs/install/root
crs.pl ‘ execution failed

Makes some senses but we can’t understand what happened while creating Voting files on RECOC1. Let us check ASM alert log also

NOTE: Creating voting files in diskgroup RECOC1
Thu Apr 27 18:16:36 2017
NOTE: Voting File refresh pending for group 1/0x39368071 (RECOC1)
Thu Apr 27 18:16:36 2017
NOTE: Attempting voting file creation in diskgroup RECOC1
NOTE: voting file allocation (replicated) on grp 1 disk RECOC1_CD_00_DM01CELADM01
NOTE: voting file allocation on grp 1 disk RECOC1_CD_00_DM01CELADM01
NOTE: voting file allocation (replicated) on grp 1 disk RECOC1_CD_00_DM01CELADM02
NOTE: voting file allocation on grp 1 disk RECOC1_CD_00_DM01CELADM02
NOTE: voting file allocation (replicated) on grp 1 disk RECOC1_CD_00_DM01CELADM03
NOTE: voting file allocation on grp 1 disk RECOC1_CD_00_DM01CELADM03
ERROR: Voting file allocation failed for group RECOC1
Thu Apr 27 18:16:36 2017
Errors in file /u01/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_ora_228588.trc:
ORA-15274: Not enough failgroups (5) to create voting files

So we can see the issue here. We can look at the above trace file also for more detail.

Now to why did this happen ?

The RECOC1 is a HIGH redundancy disk group which means that if we want to place Voting files there, it should have at least 5 failure groups. In this configuration there are only 3 cells and that doesn’t meet the minimum failure groups condition (1 cell = 1 failgroup in Exadata).

Now to how did it happen ?

This one was an Exadata X3 half rack and we planned to deploy it (for testing purpose) as 2 quarter racks : 1st cluster with db1, db2 + cell1, cell2, cell3 and 2nd cluster with db3, db4 + cell4, cell5, cell6, cell7. All the disk groups to be in High redundancy.

Before a certain 12.x Exadata software version it was not even possible to have all disk groups in High redundancy in a quarter rack as to have Voting disk in a High redundancy disk group you need to have a minimum of 5 failure groups (as mentioned above). In a quarter rack you have only 3 fail groups. With a certain 12.x Exadata software version a new feature quorum disks was introduced which made is possible to have that configuration. Read this link for more details. Basically we take a slice of disk from each DB node and add it to the disk group where you want to have the Voting file. 3 cells + 2 disks from DB nodes makes it 5 so all is good.

Now while starting with the deployment we noticed that db node1 was having some hardware issues. As we needed the machine for testing so we decided to build the first cluster with 1 db node only. So the final configuration of 1st cluster had 1 db node + 3 cells. We imported the XML back in OEDA, modified the cluster 1 configuration to 1 db node and generated the configuration files. That is where the problem started. The RECO disk group still was High redundancy but as we had only 1 db node at this stage so the configuration was not even a candidate for quorum disks. Hence the above error. Changing DBFS_DG to Normal redundancy fixed the issue as when DBFS_DG is Normal redundancy, OneCommand will place the Voting files there.

Ideally it shouldn’t happened as OEDA shouldn’t allow a configuration that is not doable. The case here is that as originally the configuration was having 2 db nodes + 3 cells so High redundancy for all disk groups was allowed in OEDA. While modifying the configuration when one db node was removed from the cluster, OEDA probably didn’t run the redundancy check on disk groups and it allowed the go past that screen. If you try to create a new configuration with 1 db node + 3 cells, it will not allow you to choose High redundancy for all disk groups. DBFS will remain in Normal redundancy. You can’t change that.

Categories: BI & Warehousing

The Case for ETL in the Cloud - CAPEX vs OPEX

Rittman Mead Consulting - Thu, 2017-04-27 11:12

Recently Oracle announced a new cloud service for Oracle Data Integrator. Because I was helping our sales team by doing some estimates and statements of work, I was already thinking of costs, ROI, use cases, and the questions behind making a decision to move to the cloud. I want to explore what is the business case for using or switching to ODICS?

Oracle Data Integration Cloud Services

First, let me briefly talk about what is Oracle Data Integration Cloud Services? ODICS is ODI version 12.2.1.2 available on Oracle’s Java Cloud Service known as JCS. Several posts cover the implementation, migration, and technical aspects of using ODI in the cloud. Instead of covering the ‘how’, I want to talk about the ‘when’ and ‘why’.

Use Cases

What use cases are there for ODICS?
1. You have or soon plan to have your data warehouse in Oracle’s Cloud. In this situation, you can now have your ODI J2EE agent in the same cloud network, removing network hops and improving performance.
2. If you currently have an ODI license on-premises, you are allowed to install that license on Oracle’s JCS at the JCS prices. See here for more information about installing on JCS. These use cases are described in a webinar posted in the PM Webcast Archive.

When and Why?

So when would it make sense to move towards using ODICS? These are the scenarios I imagine being the most likely:
1. A new customer or project. If a business doesn’t already have ODI, this allows them to decide between an all on-premises solution or a complete solution in Oracle’s cloud. With monthly and metered costs, the standard large start-up costs for hardware and licenses are avoided, making this solution available for more small to medium businesses.
2. An existing business with ODI already and considering moving their DW to the cloud. In this scenario, a possible solution would be to move the current license of ODI to JCS and begin using that to move data, all while tracking JCS costs. When the time comes to review licensing obligations for ODI, compare the calculation for a license to the calculation of expected usage for ODICS and see which one makes the most sense (cents?). For a more detailed explanation of this point, let’s talk CAPEX and OPEX!

CAPEX vs. OPEX

CAPEX and OPEX are short for Capital Expense and Operational Expense, respectively. In a finance and budgeting perspective, these two show up very differently on financial reports. This often has tax considerations for businesses. Traditionally in the past, a data warehouse project was a very large initial capital expenditure, with hardware, licenses, and project costs. This would land it very solidly as CAPEX. Over the last several years, sponsorship for these projects has shifted from CIOs and IT Directors to CFOs and Business Directors. With this shift, several businesses would rather budget and see these expenses monthly as an operating expense as opposed to every few years having large capital expenses, putting these projects into OPEX instead.

Conclusion

Having monthly and metered service costs in the cloud that are fixed or predictable are appealing. As a bonus, this style of service is highly flexible and can scale up (or down) as demand changes. If you are or will soon be in the process of planning for your future business analytics needs, we provide expert services, assessments, accelerators, and executive consultations for assisting with these kinds of decisions. When it is time to talk about actual numbers, your Oracle Sales Representative will have the best prices. Please get in touch for more information.

Categories: BI & Warehousing

Breaking News! Dodeca Spreadsheet Management System Certified on Oracle Analytics Cloud!

Tim Tow - Thu, 2017-04-20 23:37
Now that the Oracle Analytics Cloud, or "OAC", has been released, we had to get serious about our work with one of the Oracle Analytics Cloud components, the Essbase Cloud Service, or "EssCS" for short.  You would think that we should have been working hard on EssCS for quite some time, but we had been assured by Oracle product management that the Essbase Java API would be available in EssCS.  Of course, Dodeca was built using the Essbase Java API and thus we expected that support for EssCS would be very easy.

We got access to a production version of the EssCS last week and started our work.  As promised by product management, the Essbase Java API is available in EssCS and, believe it or not, we did not need to change a single line of source code in order to support the Essbase Cloud.  We did, however, have to update our build processes to use Java 8 instead of the decrepit Java 6 used in Essbase 11.x.

As far as configuration inside Dodeca itself, the only change we made was that we configured the APSUrl in the Essbase Connection object to point to the Essbase Cloud APS instance.  Note that the URL format has changed in the cloud.  The Java API was accessible in Essbase 9.3.1 through Essbase 11.1.2.4 using the format:

http://<server>:<port>/aps/JAPI

In the cloud, this has changed to:

http://<server>:<port>/essbase/japi

The Essbase Connection configuration looks pretty much the same as the configuration for an on premise connection configuration:








Of course, the Dodeca views look identical when run against an on premise or a cloud server:



In summary, it was trivial to test Dodeca using EssCS.  Every single Essbase functionality that we use in the product, from data grid operations to metadata operations and even report scripts, worked exactly the same as it does against an on premise Essbase cube.  Based on our testing, we are certifying the Dodeca Spreadsheet Management System to work on the Oracle Analytics Cloud.

We have a number of innovations we plan to introduce in the near future aimed to improve the Essbase Cloud experience, so stay tuned.  If you are planning to come to Kscope17 in San Antonio, plan to attend the Dodeca Symposium and you may just be the first to see of these cool new things!


Categories: BI & Warehousing

SQL-on-Hadoop: Impala vs Drill

Rittman Mead Consulting - Wed, 2017-04-19 10:01
 Impala vs Drill

I recently wrote a blog post about Oracle's Analytic Views and how those can be used in order to provide a simple SQL interface to end users with data stored in a relational database. In today's post I'm expanding a little bit on my horizons by looking at how to effectively query data in Hadoop using SQL. The SQL-on-Hadoop interface is key for many organizations - it allows querying the Big Data world using existing tools (like OBIEE,Tableau, DVD) and skills (SQL).

Analytic Views, together with Oracle's Big Data SQL provide what we are looking for and have the benefit of unifying the data dictionary and the SQL dialect in use. It should be noted that Oracle Big Data SQL is licensed separately on top of the database and it's available for Exadata machines only.

Nowadays there is a multitude of open-source projects covering the SQL-on-Hadoop problem. In this post I'll look in detail at two of the most relevant: Cloudera Impala and Apache Drill. We'll see details of each technology, define the similarities, and spot the differences. Finally we'll show that Drill is most suited for exploration with tools like Oracle Data Visualization or Tableau while Impala fits in the explanation area with tools like OBIEE.

As we'll see later, both the tools are inspired by Dremel, a paper published by Google in 2010 that defines a scalable, interactive ad-hoc query system for the analysis of read-only nested data that is the base of Google's BigQuery. Dremel defines two aspects of big data analytics:

  • A columnar storage format representation for nested data
  • A query engine

The first point inspired Apache Parquet, the columnar storage format available in Hadoop. The second point provides the basis for both Impala and Drill.

Cloudera Impala

We started blogging about Impala a while ago, as soon as it was officially supported by OBIEE, testing it for reporting on top of big data Hadoop platforms. However, we never went into the details of the tool, which is the purpose of the current post.

Impala is an open source project inspired by Google's Dremel and one of the massively parallel processing (MPP) SQL engines running natively on Hadoop. And as per Cloudera definition is a tool that:

provides high-performance, low-latency SQL queries on data stored in popular Apache Hadoop file formats.

Two important bits to notice:

  • High performance and low latency SQL queries: Impala was created to overcome the slowness of Hive, which relied on MapReduce jobs to execute the queries. Impala uses its own set of daemons running on each of the datanodes saving time by:
    • Avoiding the MapReduce job startup latency
    • Compiling the query code for optimal performance
    • Streaming intermediate results in-memory while MapReduces always writing to disk
    • Starting the aggregation as soon as the first fragment starts returning results
    • Caching metadata definitions
    • Gathering tables and columns statistics
  • Data stored in popular Apache Hadoop file formats: Impala uses the Hive metastore database. Databases and tables are shared between both components. The list of supported file formats include Parquet, Avro, simple Text and SequenceFile amongst others. Choosing the right file format and the compression codec can have enormous impact on performance. Impala also supports, since CDH 5.8 / Impala 2.6, Amazon S3 filesystem for both writing and reading operations.

One of the performance improvements is related to "Streaming intermediate results": Impala works in memory as much as possible, writing on disk only if the data size is too big to fit in memory; as we'll see later this is called optimistic and pipelined query execution. This has immediate benefits compared to standard MapReduce jobs, which for reliability reasons always writes intermediate results to disk.
As per this Cloudera blog, the usage of Impala in combination with Parquet data format is able to achieve the performance benefits explained in the Dremel paper.

Impala Query Process

Impala runs a daemon, called impalad on each Datanode (a node storing data in the Hadoop cluster). The query can be submitted to any daemon in the cluster which will act as coordinator node for the query. Impala daemons are always connected to the statestore, which is a process keeping a central inventory of all available daemons and related health and pushes back the information to all daemons. A third component called catalog service checks for metadata changes driven by Impala SQL in order to invalidate related cache entries. Metadata are cached in Impala for performance reasons: accessing metadata from the cache is much faster than checking against the Hive metastore. The catalog service process is in charge of keeping Impala's metadata cache in sync with the Hive metastore.

Once the query is received, the coordinator verifies if the query is valid against the Hive metastore, then information about data location is retrieved from the Namenode (the node in charge of storing the list of blocks and related location in the datanodes), it fragments the query and distribute the fragments to other impalad daemons to execute the query. All the daemons read the needed data blocks, process the query, and stream partial result to the coordinator (avoiding the write to disk), which collects all the results and delivers it back to the requester. The result is returned as soon as it's available: certain SQL operations like aggregations or order by require all the input to be available before Impala can return the end result, while others, like a select of pre-existing columns without a order by can be returned with only partial results.

 Impala vs Drill

Apache Drill

Defining Apache Drill as SQL-on-Hadoop is limiting: also inspired by Google's Dremel is a distributed datasource agnostic query engine. The datasource agnostic part is very relevant: Drill is not closely coupled with Hadoop, in fact it can query a variety of sources like MongoDB, Azure Blob Storage, or Google Cloud Storage amongst others.

One of the most important features is that data can be queried schema-free: there is no need of defining the data structure or schema upfront - users can simply point the query to a file directory, MongoDB collection or Amazon S3 bucket and Drill will take care of the rest. For more details, check our overview of the tool. One of Apache Drill's objectives is cutting down the data modeling and transformation effort providing a zero-day analysis as explained in this MapR video.
 Impala vs Drill

Drill is designed for high performance on large datasets, with the following core components:

  • Distributed engine: Drill processes, called Drillbits, can be installed in many nodes and are the execution engine of the query. Nodes can be added/reduced manually to adjust the performances. Queries can be sent to any Drillbit in the cluster that will act as Foreman for the query.
  • Columnar execution: Drill is optimized for columnar storage (e.g. Parquet) and execution using the hierarchical and columnar in-memory data model.
  • Vectorization: Drill take advantage of the modern CPU's design - operating on record batches rather than iterating on single values.
  • Runtime compilation: Compiled code is faster than interpreted code and is generated ad-hoc for each query.
  • Optimistic and pipelined query execution: Drill assumes that none of the processes will fail and thus does all the pipeline operation in memory rather than writing to disk - writing on disk only when memory isn't sufficient.
Drill Query Process

Like Impala's impalad, Drill's main component is the Drillbit: a process running on each active Drill node that is capable of coordinating, planning, executing and distributing queries. Installing Drillbit on all of Hadoop's data nodes is not compulsory, however if done gives Drill the ability to achieve the data locality: execute the queries where the data resides without the need of moving it via network.

When a query is submitted against Drill, a client/application is sending a SQL statement to a Drillbit in the cluster (any Drillbit can be chosen), which will act as Foreman (coordinator in Impala terminology) that will parse the SQL and convert it into a logical plan composed by operators. The next step is the cost-based optimizer which, based on optimizations like rule/cost based, data locality and storage engine options, rearranges operations to generate the optimal physical plan. The Foreman then divides the physical plan in phases, called fragments, which are organised in a tree and executed in parallel against the data sources. The results are then sent back to the client/application. The following image taken from drill.apache.org explains the full process:

 Impala vs Drill

Similarities and Differences

As we saw above, Drill and Impala have a similar structure - both take advantage of always on daemons (faster compared to the start of a MapReduce job) and assume an optimistic query execution passing results in cache. The code compilation and the distributed engine are also common to both, which are optimized for columnar storage types like Parquet.

There are, however, several differences. Impala works only on top of the Hive metastore while Drill supports a larger variety of data sources and can link them together on the fly in the same query. For example, implicit schema-defined files like JSON and XML, which are not supported natively by Impala, can be read immediately by Drill.
Drill usually doesn't require a metadata definition done upfront, while for Impala, a view or external table has to be declared before querying. Following this point there is no concept of a central and persistent metastore, and there is no metadata repository to manage just for Drill. In OBIEE's world, both Impala and Drill are supported data sources. The same applies to Data Visualization Desktop.
 Impala vs Drill

The aim of this article isn't a performance-wise comparison since those depends on a huge amount of factors including data types, file format, configurations, and query types. A comparison dated back in 2015 can be found here. Please be aware that there are newer versions of the tools since this comparison, which bring a lot of changes and improvements for both projects in terms of performance.

Conclusion

Impala and Drill share a similar structure - both inspired by Google's Dremel - relying on always active daemons deployed on cluster nodes to provide the best query performances on top of Big Data data structures. So which one to choose and when?
As described, the capability of Apache Drill to query a raw data-source without requiring an upfront metadata definition makes the tool perfect for insights discovery on top of raw data. The capacity of joining data coming from one or more storage plugins in a unique query makes the mash-up of disparate data sources easy and immediate. Data science and prototyping before the design of a reporting schema are perfect use cases of Drill. However, as part of the discovery phase, a metadata definition layer is usually added on top of the data sources. This makes Impala a good candidate for reporting queries.
Summarizing, if all the data points are already modeled in the Hive metastore, then Impala is your perfect choice. If instead, you need a mashup with external sources, or need work directly with raw data formats (e.g. JSON), then Drill's auto-exploration and openness capabilities are what you're looking for.
Even though both tools are fully compatible with Oracle BIEE and Data Visualization (DV), due to Drill's data exploration nature, it could be considered more in line with DV use cases, while Impala is more suitable for standard reporting like OBIEE. The decision on tooling highly depends on the specific use case - source data types, file formats and configurations have deep impact on the agility of the business analytics process and query performance.

If you want to know more about Apache Drill, Impala and the use cases we have experienced, don't hesitate to contact us!

Categories: BI & Warehousing

OneCommand Step 1 error

Amardeep Sidhu - Mon, 2017-04-10 11:50

Hit this silly issue while doing an Exadata deployment for a customer. Step 1 was giving the following error:

ERROR: 192.168.99.102 configured on dm01celadm01.example.com as dm01dbadm02 does not match expected value dm01dbadm02.example.com

I wasn’t able to make sense of it for quite some time until a colleague pointed out that the reverse lookup entries should be done for FQDN only. As it is clear in the above message reverse lookup of the IP 192.168.99.102 returns dm01dbadm02 instead of dm01dbadm02.example.com. Fixing this in DNS resolved the issue.

Actually the customer had done reverse lookup entries for both the hostname and FQDN. As the DNS can return the results in any order, so the error message was bit random. Whenever the the hostname was returned first, Step 1 gave an error. But when the FQDN was the first thing returned, there was no error in Step 1 for that IP.

Categories: BI & Warehousing

Data Lake and Data Warehouse

Dylan's BI Notes - Fri, 2017-04-07 11:23
This is an old topic but I learned more and come up more perspectives over time. Raw Data vs Clean Data Metadata What kind of services are required? Data as a Service Analytics as a Service Raw Data and Clean Data I think that assuming that you can use raw data directly in a dangerous thing. […]
Categories: BI & Warehousing

OBIEE Component Status Notifications

Rittman Mead Consulting - Wed, 2017-04-05 09:00

At Rittman Mead, we often hear requests for features or solutions generally not provided by Oracle. These requests range from commentary functionality to custom javascript visualizations. There are many ways to implement these functionalities, but organizations often lack the in-house resources to engineer an acceptable solution.

Rittman Mead has the capability to engineer any solution desired, and in many cases, has already developed a solution. Many of our accelerators currently offered, such as Chit Chat or User Engagement, grew out of numerous requests for these features.

One of the more common requests we hear at Rittman Mead is for BI Administrators to receive notifications for the status of their OBIEE components. They want to be notified of the status of the system components throughout the day in a convenient manner, so any failures are recognized quickly.

This particular feature can easily be implemented with Rittman Mead's Performance Analytics service. However, Rittman Mead would like to publicly provide this capability independent of our accelerator. We have developed a small Python script to provide this functionality, and we would like to give this script to the community.

The provided script is available free of charge, and available under the MIT license. It has been tested on both OBIEE 11G and 12C environments, as well as on Windows and Linux operating systems. The rest of this blog will detail, at a high level, how the script works, and how to configure it correctly.

The script is available through our public Github repository here.

Script Output

First, let's clarify how we will gather the status of the components in the first place. Thankfully, OBIEE includes some scripts to display this information on both Linux and Windows. In 12C, the script is status.sh/status.cmd, and in 11G the primary command is opmnctl status.

When I execute this script on an OBIEE 12C OEL environment, I receive the following response:


The output includes some extra information we don't require, but we can ignore it for now. With some programming knowledge, we can trim what we don't need, organize it into a nice table, and then send the output to nearly anywhere desired. For portability and stability, I will use Python to organize the message contents and I will also use email as the channel of communication.

Sending the Output Through Email

If we are only concerned with notifying administrators of the current status, one of the better avenues to send this data is through email. An email destination will allow users to be able to receive the status of the components almost instantaneously, and be able to take the appropriate action as soon as possible.

Additionally, Python's standard set of modules includes functions to assist in sending SMTP messages, making the script even more portable and maintainable. The simplest method to generate the email is just by sending the complete output as the body of the message. An example of this output is below:

While this works, it's not exactly attractive. With some Python and HTML/CSS skills, we can style the email to look much nicer:


Now we have something nice we can send BI Administrators to show the status of the components.

Configuration and Use

To effectively utilize this script, you will have to change some of the configuration parameters, located at the top of the script. The parameters I am using are shown below (with sensitive information hidden, of course):

The sender and username fields should both be the user you are logging in as on the SMTP server to send the email. If you want the email address shown on a message to be different than the user configured on the SMTP server, then these can be set separately. The password field should be the password for the user being configured on the SMTP server.

The recipient field should be the address of the user who will be receiving the emails. For simple management, this should be a single user, who should then be configured to forward all incoming status emails to the responsible parties. This will allow easier maintenance, since modifying the list of users can take place in the email configuration, rather than the script configuration. In this example, I am just sending the emails to my Rittman Mead email address.

The SMTP settings should also be updated to reflect the SMTP server being utilized. If you are using Gmail as the SMTP server, then the configuration shown should work without modifications.

Finally, the python script requires the absolute path to the status command to execute to produce the output (the opmnctl or status commands). Environment variables may not be taken into consideration by this script, so it's best to not use a variable in this path.

NOTE: If the \ character is in the path, then you MUST use \\ instead. This is especially true on Windows environments. If this change is required but omitted, the script will not function properly.

Additionally, if you don't care about the HTML output (or if it doesn't render nicely in your email client), then it can be disabled by setting the value of render_html to False. If, for some reason, the nice HTML fails to render, then the email will just send the plain text output, instead of failing to deliver an email at all.

Once configured, try executing the script:
python email_component_status.py

If everything worked correctly, then you should have received an email with the status of the components. If you do not receive an email, then you should check both the configuration settings, and the internet connection of the machine (firewalls included). The script will also generate output that should assist you in troubleshooting the cause of the problem.

Additional Notifications or Destinations

The solution provided, while useful, is not perfect. What if you want to send this information to a destination other than an email address, such as a ticketing system like Atlassian JIRA? Or what if you want notifications based on other situations, such as slow running reports, or high CPU usage?

There may be many situations in which you would want one, or several, employees to receive different notifications based on events or circumstances that occur in your OBIEE environment. The script in this blog post only provides one of these notifications, but implementing many more will quickly become burdensome.

As part of Rittman Mead's Performance Analytics offering, we include custom software and code to fulfill this requirement. In addition to providing dashboards to explore the performance of your OBIEE systems, Performance Analytics can be configured to distribute alerts, based on any quantifiable condition, to any number of external systems.

The full Performance Analytics suite can not only alert users of down system components, but of any number of conditions that may occur in your BI environment.

If you have questions about this script, Performance Analytics, or anything else Contact Us here.

To find out more about Performance Analytics, contact us, visit the product page here, or read some of the fantastic blogs from Robin Moffatt.

Categories: BI & Warehousing

Oracle Analytics Cloud: Product Overview

Rittman Mead Consulting - Tue, 2017-04-04 10:00
 Product Overview

We at Rittman Mead are always helping our customer solving their problems, many times we heard them

  • being unsure about the sizing of their server
  • being worried about the upfront cost of the licensing
  • having recurring nightmares about patching
  • willing to try the cloud but couldn't find the right option to replace their on-premises system

This is their lucky day: Oracle officially launched Oracle Analytics Cloud (OAC), a new PaaS (Platform as a Service) providing a complete and elastic Business Intelligence platform in the cloud, customizable and managed by you but all on the Oracle Cloud!

 Product Overview

If you haven't been on a remote island you may have noticed that in recent years Oracle's main focus has been around the Cloud. Several products have been launched covering a vast spectrum of functionalities: Data Management, Application Development, Business Analytics and Security are only some of the areas covered by the Software/Platform/Infrastructure as a Service offering.

 Product Overview

In the Business Analytics area, we at Rittman Mead started thinking long time ago on how to host Oracle's BI on-premises (OBIEE) in the Cloud and worked closely with Oracle since the beta phase of their first PaaS product: BI Cloud Service (BICS). Effectively we put our hands on all the cloud products in the BA family like Big Data Discovery (both on premises and cloud), Data Visualization Cloud Service, Big Data Preparation Service.

Business Intelligence Cloud Products

Until few weeks ago Oracle's main Business Analytics cloud products were BI Cloud Service (BICS) and Data Visualization Cloud Service (DVCS). As mentioned in our blog both tools aimed initially at departmental use-cases: the simplicity of the data model interface and the lack of admin configuration options stopped them from being a compelling story for hosting a full enterprise Business Intelligence solution.

 Product Overview

New features like BICS Data Sync, Remote Data Connector and RPD lift and shift addressed almost all the limitations but the lack of detailed admin/maintenance capabilities represent a stopper for moving complex environments in the cloud. Still BICS and DVCS are perfect for their aim: business users analysing sets of data without needing to wait the IT to provision a server or to care about upfront licensing costs.

Oracle Analytics Cloud

Oracle Analytics Cloud extends the watermark in every direction by providing a product that is:

  • Complete functionality: most of the tools, procedures and options provided on-premises are now available in OAC.
  • Combining all the offering of BICS, DV, BIEE and Essbase: OAC includes the features of Oracle's top BI products.
  • Licensing Tailored: the many options available (discussed in a later post) can be chosen depending on analytical needs, timeframe of service, required performances
  • Easily Scalable: do you want to expand your BI solution to the double of the users without loosing performances? Just buy some more horsepower!
  • Fully Accessible: SSH connection available to the server makes it easy to change settings as needed, REST API and Clients are provided for all lifecycle operations
  • Customizable: settings, images, networking, VPN all settings are available
  • Scriptable: settings like scaling, instance creation and deletion, start and stop can be easily scripted via the REST-APIs
  • Fully Customer Managed: Oracle provides the automation to backup and patch but the customer decides when to run them.
What's The Difference?

So what's the difference between Oracle Analytics Cloud and the "old" DVCS and BICS? How is OACS going to change Oracle's BI offer in the cloud?

The great deal of using OACS is control: BICS/DVC limiting factors around admin options and development are solved providing a tool capable of hosting a full enterprise BI solution. Even if the platform is managed by Oracle SSH access is provided meaning that instance configurations can be changed. No more upfront server sizing decisions, now the size of the instance is decided during creation time and can be changed later in the process if the demand changes.

The REST-APIs will enable the scripting of the full lifecycle of the instance, providing a way to automate the BI enterprise workflow even in complex environments where concurrent development is needed. Patching and Backups are not a problem anymore with the automated processes provided.

Direct RPD online editing is available with the Admin tool. The old BICS Data Modeler is still there for simple models, but Admin Tool can be used in case of complex RPDs.

 Product Overview

The front-end is like the BICS and OBIEE 12c one, some new visualization have been added to Visual Analyzer in line with the new additions to Data Visualization Desktop: Parallel Coordinates, Chord, Network, Sankey diagrams are now available.

 Product Overview

A new console is now available in Visual Analyzer allowing settings like Mail or Deliveries that before were only accessible via Weblogic Console, Enterprise Manager or config files.

 Product Overview

Finally Essbase is now available in the cloud too with a new web interface!
Summarizing, if you wanted to go Cloud, but were worried about missing options, now Oracle Analytics Cloud provides all you need to host a full Enterprise BI solution.

In the next few days I'll be analysing various aspects of Oracle Analytics Cloud Suite, so keep in touch!

If you need assistance in checking if Oracle Analytics Cloud suits your needs or in planning your migration to the cloud don't hesitate to contact us

Categories: BI & Warehousing

Metadata Modeling in the Database with Analytic Views

Rittman Mead Consulting - Mon, 2017-04-03 09:00
Metadata Modeling in the Database with Analytic Views

12.2, the latest Oracle database release provides a whole set of new features enhancing various aspects of the product including JSON support, Auto-List Partitioning and APEX news among others. One of the biggest news in the Data Warehousing / Analytics area was the introduction of the Analytic Views, that as per Oracle's definition are

Metadata objects that enable the user to quickly and easily create complex hierarchical and dimensional queries on data in database tables and views

tl;dr

If you are on rush, here is an abstract of what you'll find in this looooong blog post:

Metadata modeling can now be done directly in the database using Analytic Views, providing to end users a way of querying database objects without needing a knowledge of joining conditions, aggregation functions or order by clauses.
This post will guide you through the creation of an analytic view that replicates a part of a OBIEE's Sampleapp business model. The latest part of the post is dedicated to understanding the usage of analytic views and the benefits for end users especially in cases when self-service BI tools are used.

Metadata Modeling in the Database with Analytic Views

If you are still around and interested in the topic please take a drink and sit comfortably, it will be a good journey.

Metadata Modeling

What are then the Analytics Views in detail? How are they going to improve end user's ability in querying data?

To answer above question I would take a step back. Many readers of this blog are familiar with OBIEE and its core: the Repository. The repository contains the metadata model from the physical sources till the presentation areas and includes the definition of:

  • Joins between tables
  • Hierarchies for dimensions
  • Aggregation rules
  • Security settings
  • Data Filters
  • Data Sources

This allows end users to just pick columns from a Subject Area and display them in the appropriate way without needing to worry about writing SQL or knowing how the data is stored. Moreover definitions are held centrally providing the famous unique source of truth across the entire enterprise.

Metadata Modeling in the Database with Analytic Views

The wave of self-service BI tools like Tableau or Oracle's Data Visualization Desktop provided products capable of querying almost any kind of data sources in a visual and intuitive way directly in the end user hands. An easy and direct access to data is a good thing for end user but, as stated above, requires knowledge of the data model, joins and aggregation methods.
The self-service tools can slightly simplify the process by providing some hints based on column names, types or values but the cruel reality is that the end-user has to build the necessary knowledge of the data source before providing correct results. This is why we've seen several times self-service BI tools being "attached" to OBIEE: get corporate official data from the unique source of truth and mash them up with information coming from external sources like personal Excel files or output of Big Data processes.

Analytics Views

Analytic Views (AV) take OBIEE's metadata modeling concept and move it at database level providing a way of organizing data in a dimensional model so it can be queried with simpler SQL statements.
The Analytical Views are standard views with the following extra options:

  • Enable the definition of facts, dimensions and hierarchies that are included in system-generated columns
  • Automatically aggregate the data based on pre-defined calculations
  • Include presentation metadata

Analytics views are created with a CREATE ANALYTIC VIEW statement, some privileges need to be granted to the creating user, you can find the full list in Oracle's documentation.

Every analytical view is composed by the following metadata objects:

  • Attribute dimensions: organising table/view columns into attributes and levels.
  • Hierarchies: defining hierarchical relationships on top of an attribute dimension object.
  • Analytic view objects: defining fact data referencing both fact tables and hierarchies.

With all the above high level concepts in mind it's now time to try how Analytical Views could be used in a reporting environment.

Database Provisioning

For the purpose blog post I used Oracle's 12.2.0.1 database Docker image, provided by Gerald Venzl, the quickest way of spinning up a local instance. You just need to:

  • Install Docker
  • Download database installer from Oracle's website
  • Place the installer in the proper location mentioned in the documentation
  • Build Oracle Database 12.1.0.2 Enterprise Edition Docker image by executing
./buildDockerImage.sh -v 12.1.0.2 -e
  • Running the image by executing
docker run --name db12c -p 1521:1521 -p 5500:5500 -e ORACLE_SID=orcl -e ORACLE_PDB=pdborcl -e ORACLE_CHARACTERSET=AL32UTF8 oracle/database:12.2.0.1-ee  

The detailed parameters definition can be found in the GitHub repository. You can then connect via sqlplus to your local instance by executing the standard

sqlplus sys/pwd@//localhost:1521/pdborcl as sysdba  

The password is generated automatically during the first run of the image and can be found in the logs, look for the following string

ORACLE AUTO GENERATED PASSWORD FOR SYS, SYSTEM AND PDBAMIN: XXXXxxxxXXX  

Once the database is created it's time to set the goal: I'll try to recreate a piece of the Oracle's Sampleapp RPD model in the database using Analytic Views.

Model description

In this blog post I'll look in the 01 - Sample App business model and specifically I'll try to replicate the logic behind Time, Product and the F0 Sales Base Measures using Analytic Views.

Metadata Modeling in the Database with Analytic Views

Dim Product

The Sampleapp's D1 - Products (Level Based Hierarchy) is based on two logical table sources: SAMP_PRODUCTS_D providing product name, description, LOB and Brand and the SAMP_PROD_IMG_D containing product images. For the purpose of this test we'll keep our focus on SAMP_PRODUCTS_D only.
The physical mapping of Logical columns is shown in the image below.

Metadata Modeling in the Database with Analytic Views

Attribute Dimension

The first piece we're going to build is the attribute dimension, where we'll be defining attributes and levels. The mappings in above image can "easily" be translated into an attributes with the following SQL.

CREATE OR REPLACE ATTRIBUTE DIMENSION D1_DIM_PRODUCT  
USING SAMP_PRODUCTS_D  
ATTRIBUTES  
 (PROD_KEY as P0_Product_Number
    CLASSIFICATION caption VALUE 'P0 Product Number',
  PROD_DSC as P1_Product
    CLASSIFICATION caption VALUE 'P1 Product',
  TYPE as P2_Product_Type
    CLASSIFICATION caption VALUE 'P2 Product Type',
  TYPE_KEY as P2k_Product_Type
    CLASSIFICATION caption VALUE 'P2k Product Type',
  LOB as P3_LOB
    CLASSIFICATION caption VALUE 'P3 LOB',
  LOB_KEY as P3k_LOB
    CLASSIFICATION caption VALUE 'P3k LOB',
  BRAND as P4_Brand
    CLASSIFICATION caption VALUE 'P4 Brand',
  BRAND_KEY as P4k_Brand
    CLASSIFICATION caption VALUE 'P4k Brand',
  ATTRIBUTE_1 as P5_Attribute_1
    CLASSIFICATION caption VALUE 'P5 Attribute 1',
  ATTRIBUTE_2 as P6_Attribute_2
    CLASSIFICATION caption VALUE 'P6 Attribute 2',
  SEQUENCE as P7_Product_Sequence
    CLASSIFICATION caption VALUE 'P7 Product Sequence',
  TOTAL_VALUE as P99_Total_Value
    CLASSIFICATION caption VALUE 'P99 Total Value')

Few pieces to note:

  • CREATE OR REPLACE ATTRIBUTE DIMENSION: we are currently defining a dimension, the attributes and levels.
  • USING SAMP_PRODUCTS_D: defines the datasource, in our case the table SAMP_PRODUCTS_D. Only one datasource is allowed per dimension.
  • PROD_KEY as P0_Product_Number: using the standard notification as we can easily recaption columns names
  • CLASSIFICATION CAPTION ... several options can be added for each attribute like caption or description

The dimension definition is not complete with only attribute declaration, we also need to define the levels. Those can be taken from OBIEE's hierarchy

Metadata Modeling in the Database with Analytic Views

For each level we can define:

  • The level name, caption and description
  • The Key
  • the Member Name and Caption
  • the Order by Clause

Translating above OBIEE's hierarchy levels into Oracle SQL

LEVEL BRAND  
  CLASSIFICATION caption VALUE 'BRAND'
  CLASSIFICATION description VALUE 'Brand'
  KEY P4k_Brand
  MEMBER NAME P4_Brand
  MEMBER CAPTION P4_Brand
  ORDER BY P4_Brand
LEVEL Product_LOB  
  CLASSIFICATION caption VALUE 'LOB'
  CLASSIFICATION description VALUE 'Lob'
  KEY P3k_LOB
  MEMBER NAME P3_LOB
  MEMBER CAPTION P3_LOB
  ORDER BY P3_LOB
  DETERMINES(P4k_Brand)
LEVEL Product_Type  
  CLASSIFICATION caption VALUE 'Type'
  CLASSIFICATION description VALUE 'Type'
  KEY P2k_Product_Type
  MEMBER NAME P2_Product_Type
  MEMBER CAPTION P2_Product_Type
  ORDER BY P2_Product_Type
  DETERMINES(P3k_LOB,P4k_Brand)
LEVEL Product_Details  
  CLASSIFICATION caption VALUE 'Detail'
  CLASSIFICATION description VALUE 'Detail'
  KEY P0_Product_Number
  MEMBER NAME P1_Product
  MEMBER CAPTION P1_Product
  ORDER BY P1_Product
  DETERMINES(P2k_Product_Type,P3k_LOB,P4k_Brand)
ALL MEMBER NAME 'ALL PRODUCTS';  

There is an additional DETERMINES line in above sql for each level apart from Brand, this is how we can specify the relationship between level keys. If we take the Product_LOB example, the DETERMINES(P4k_Brand) defines that any LOB in our table automatically determines a Brand (in OBIEE terms that LOB is a child of Brand).

Hierarchy

Next step is defining a hierarchy on top of the attribute dimension D1_PRODUCTS defined above. We can create it just by specifying:

  • the attribute dimension to use
  • the list of levels and the relation between them

which in our case becomes

CREATE OR REPLACE HIERARCHY PRODUCT_HIER  
  CLASSIFICATION caption VALUE 'Products Hierarchy'
USING D1_DIM_PRODUCT  
  (Product_Details CHILD OF
   Product_Type CHILD OF
   Product_LOB CHILD OF
   BRAND);

When looking into the hierarchy Product_hier we can see that it's creating an OLAP-style dimension with a row for each member at each level of the hierarchy and extra fields like DEPT, IS_LEAF and HIER_ORDER

Metadata Modeling in the Database with Analytic Views

The columns contained in Product_hier are:

  • One for each Attribute defined in attribute dimension D1_PRODUCTS like P0_PRODUCT_NUMBER or P2K_PRODUCT_TYPE
  • The member name, caption and description and unique name
  • The level name in the hierarchy and related depth
  • The relative order of the member in the hierarchy
  • A field IS_LEAF flagging hierarchy endpoints
  • References to the parent level
Member Unique Names

A particularity to notice is that the MEMBER_UNIQUE_NAME of Cell Phones is [PRODUCT_TYPE].&[101] which is the concatenation of the LEVEL and the P2K_PRODUCT_TYPE value.
One could expect the member unique name being represented as the concatenation of all the preceding hierarchy members, Brand and LOB, and the member key itself in a string like [PRODUCT_TYPE].&[10001]&[1001]&[101].

This is the default behaviour, however in our case is not happening since we set the DETERMINES(P3k_LOB,P4k_Brand) in the attribute dimension definition. We Specified that Brand ([10001]) and LOB ([1001]) can automatically be inferred by the Product Type so there is no need to store those values in the member key. We can find the same setting in OBIEE's Product Type logical level

Metadata Modeling in the Database with Analytic Views

Dim Date

The basic D0 Dim Date can be built starting from the table SAMP_TIME_DAY_D following the same process as above. Like in OBIEE, some additional settings are required when creating a time dimension:

  • DIMENSION TYPE TIME: the time dimension type need to be specified
  • LEVEL TYPE <LEVEL_NAME>: each level in the time hierarchy needs to belong to a precise level type chosen from:
    • YEARS
    • HALF_YEARS
    • QUARTERS
    • MONTHS
    • WEEKS
    • DAYS
    • HOURS
    • MINUTES
    • SECONDS
Attribute Dimension

Metadata Modeling in the Database with Analytic Views

Taking into consideration the additional settings, the Dim Date column mappings in above image can be translated in the following attribute dimension SQL definition.

CREATE OR REPLACE ATTRIBUTE DIMENSION D0_DIM_DATE  
DIMENSION TYPE TIME  
USING SAMP_TIME_DAY_D  
ATTRIBUTES  
 (CALENDAR_DATE AS TOO_CALENDAR_DATE,
  PER_NAME_MONTH AS T02_PER_NAME_MONTH,
  PER_NAME_QTR AS T03_PER_NAME_QTR,
  PER_NAME_YEAR AS T04_PER_NAME_YEAR,
  DAY_KEY AS T06_ROW_WID,
  BEG_OF_MTH_WID AS T22_BEG_OF_MTH_WID,
  BEG_OF_QTR_WID AS T23_BEG_OF_QTR_WID
  )
    LEVEL CAL_DAY
      LEVEL TYPE DAYS
      KEY TOO_CALENDAR_DATE
      ORDER BY TOO_CALENDAR_DATE
      DETERMINES(T22_BEG_OF_MTH_WID, T23_BEG_OF_QTR_WID,T04_PER_NAME_YEAR)
    LEVEL CAL_MONTH
      LEVEL TYPE MONTHS
      KEY T22_BEG_OF_MTH_WID
      MEMBER NAME T02_PER_NAME_MONTH
      ORDER BY T22_BEG_OF_MTH_WID
      DETERMINES(T23_BEG_OF_QTR_WID,T04_PER_NAME_YEAR)
    LEVEL CAL_QUARTER
      LEVEL TYPE QUARTERS
      KEY T23_BEG_OF_QTR_WID
      MEMBER NAME T03_PER_NAME_QTR
      ORDER BY T23_BEG_OF_QTR_WID
      DETERMINES(T04_PER_NAME_YEAR)
    LEVEL CAL_YEAR
      LEVEL TYPE YEARS
      KEY T04_PER_NAME_YEAR
      MEMBER NAME T04_PER_NAME_YEAR
      ORDER BY T04_PER_NAME_YEAR
    ALL MEMBER NAME 'ALL TIMES';

You may have noticed a different mapping of keys, member names and order by attributes. Let's take the CAL_MONTH as example. It's defined by two columns

  • BEG_OF_MTH_WID: used for joins and ordering
  • PER_NAME_MONTH: used as "display label"

PER_NAME_MONTH in the YYYY / MM format could be also used for ordering, but most of the times end user requests months in the MM / YYYY format. Being able to set a ordering column different from the member name allows us to properly manage the hierarchy.

Metadata Modeling in the Database with Analytic Views

Hierarchy

Time hierarchy follows the same rules as the product one, no additional settings are required.

CREATE OR REPLACE HIERARCHY TIME_HIER  
USING D0_DIM_DATE  
  (CAL_DAY CHILD OF
   CAL_MONTH CHILD OF
   CAL_QUARTER CHILD OF
   CAL_YEAR);
Fact Sales

The last step in the journey is the definition of the analytic view of the fact table that as per Oracle's documentation

An analytic view specifies the source of its fact data and defines measures that describe calculations or other analytic operations to perform on the data. An analytic view also specifies the attribute dimensions and hierarchies that define the rows of the analytic view.

The analytic view definition contains the following specifications:

  • The data source: the table or view that will be used for the calculation
  • The columns: which columns from the source objects to use in the calculations
  • The attribute dimensions and hierarchies: defining both the list of attributes and the levels of the analysis
  • The measures: a set of aggregations based on the predefined columns from the data source.

Within analytical views definition a materialized view can be defined in order to store aggregated values. This is a similar to OBIEE's Logical Table Source setting for aggregates.

Analytic View Definition

For the purpose of the post I'll use SAMP_REVENUE_F which is one of the sources of F0 Sales Base Measures in Sampleapp. The following image shows the logical column mapping.

Metadata Modeling in the Database with Analytic Views

The above mappings can be translated in the following SQL

CREATE OR REPLACE ANALYTIC VIEW F0_SALES_BASE_MEASURES  
USING SAMP_REVENUE_F  
DIMENSION BY  
  (D0_DIM_DATE
    KEY BILL_DAY_DT REFERENCES TOO_CALENDAR_DATE
    HIERARCHIES (
      TIME_HIER DEFAULT),
   D1_DIM_PRODUCT
    KEY PROD_KEY REFERENCES P0_Product_Number
    HIERARCHIES (
      PRODUCT_HIER DEFAULT)
   )
MEASURES  
 (F1_REVENUE FACT REVENUE AGGREGATE BY SUM,
  F10_VARIABLE_COST FACT COST_VARIABLE AGGREGATE BY SUM,
  F11_FIXED_COST FACT COST_FIXED AGGREGATE BY SUM,
  F2_BILLED_QTY FACT UNITS,
  F3_DISCOUNT_AMOUNT FACT DISCNT_VALUE AGGREGATE BY SUM,
  F4_AVG_REVENUE FACT REVENUE AGGREGATE BY AVG,
  F21_REVENUE_AGO AS (LAG(F1_REVENUE) OVER (HIERARCHY TIME_HIER OFFSET 1))
  )
DEFAULT MEASURE F1_REVENUE;  

Some important parts need to be highlighted:

  • USING SAMP_REVENUE_F: defines the analytic view source, in our case the table SAMP_REVENUE_F
  • DIMENSION BY: this section provides the list of dimensions and related hierarchies to take into account
  • KEY BILL_DAY_DT REFERENCES TOO_CALENDAR_DATE: defines the join between the fact table and attribute dimension
  • HIERARCHIES (TIME_HIER DEFAULT): multiple hierarchies can be defined on top of an attribute dimension and used in an analytical view, however like in OBIEE only one will be used by default
  • F1_REVENUE FACT REVENUE AGGREGATE BY SUM: defines the measure with alias, source column and aggregation method
  • F2_BILLED_QTY FACT UNITS: if aggregation method is not defined it replies on default SUM
  • F21_REVENUE_AGO: new metrics can be calculated based on previously defined columns replicating OBIEE functions like time-series. The formula (LAG(F1_REVENUE) OVER (HIERARCHY TIME_HIER OFFSET 1)) calculates the equivalent of the OBIEE's AGO function for each level of the hierarchy.
  • DEFAULT MEASURE F1_REVENUE: defines the default measure of the analytic view
Using Analytic Views

After the analytic view definition, it's time to analyse what benefits end users have when using them. We are going to take a simple example: a query to return the Revenue and Billed Qty per Month and Brand.

Using only the original tables we would have the following SQL

SELECT D.CAL_MONTH,  
  D.BEG_OF_MTH_WID,
  P.BRAND,
  SUM(F.REVENUE) AS F01_REVENUE,
  SUM(F.UNITS)   AS F02_BILLED_QTY
FROM SAMP_REVENUE_F F  
JOIN SAMP_PRODUCTS_D P  
ON (F.PROD_KEY = P.PROD_KEY)  
JOIN SAMP_TIME_DAY_D D  
ON (F.BILL_DAY_DT = D.CALENDAR_DATE)  
GROUP BY D.CAL_MONTH,  
  D.BEG_OF_MTH_WID,
  P.BRAND
ORDER BY D.BEG_OF_MTH_WID,  
  P.BRAND;

The above SQL requires the knowledge of:

  • Aggregation methods
  • Joins
  • Group by
  • Ordering

Even if this is an oversimplification of the analytic view usage you can already spot that some knowledge of the base data structure and SQL language is needed.

Using the analytic views defined above, the query can be written as

SELECT TIME_HIER.MEMBER_NAME AS TIME_SLICE,  
  PRODUCT_HIER.MEMBER_NAME   AS PRODUCT_SLICE,
  F1_REVENUE,
  F2_BILLED_QTY
FROM F0_SALES_BASE_MEASURES  
WHERE TIME_HIER.LEVEL_NAME  IN ('CAL_MONTH')  
AND PRODUCT_HIER.LEVEL_NAME IN ('BRAND')  
ORDER BY TIME_HIER.HIER_ORDER,  
  PRODUCT_HIER.HIER_ORDER;

As you can see, there is a simplification of the SQL statement: no more aggregation, joining conditions and group by predicates are needed. All the end-user has to know is the analytical view name, and the related hierarchies that can be used.

The additional benefit is that if we want to change the level of granularity of the above query we just need to change the WHERE condition. E.g. to have the rollup per Year and LOB we just have to substitute

WHERE TIME_HIER.LEVEL_NAME  IN ('CAL_MONTH')  
AND PRODUCT_HIER.LEVEL_NAME IN ('BRAND')  

with

WHERE TIME_HIER.LEVEL_NAME  IN ('CAL_YEAR')  
AND PRODUCT_HIER.LEVEL_NAME IN ('LOB')  

without touching granularity, group by and order by statements.

Using Analytic Views in DVD

At the beginning of my blog post I wrote that Analytic Views could be useful when used in conjunction with self-service BI tools. Let's have a look at how the end user journey is simplified in the case of Oracle's Data Visualization Desktop.

Without AV the end-user had two options to source the data:

  • Write the complex SQL statement with joining condition, group and order by clause in the SQL editor to retrieve data at the correct level with the related dimension
  • Import the fact table and dimensions as separate datasources and join them together in DVD's project.

Both options require a SQL and joining conditions knowledge in order to being able to present correct data. Using Analytic Views the process is simplified. We just need to create a new source pointing to the database where the analytic views are sitting.
Next step is retrieve the necessary columns from the analytic view. Unfortunately analytic views are not visible from DVD object explorer (only standard table and views are shown)

Metadata Modeling in the Database with Analytic Views

We can however specify with a simple SQL statement all the informations we need like Time and Member Slice, the related levels and the order in hierarchy.

SELECT TIME_HIER.MEMBER_NAME AS TIME_SLICE,  
  PRODUCT_HIER.MEMBER_NAME   AS PRODUCT_SLICE,
  TIME_HIER.LEVEL_NAME AS TIME_LEVEL,
  PRODUCT_HIER.LEVEL_NAME AS PRODUCT_LEVEL,
  TIME_HIER.HIER_ORDER AS TIME_HIER_ORDER,
  PRODUCT_HIER.HIER_ORDER AS PRODUCT_HIER_ORDER,
  F1_REVENUE,
  F2_BILLED_QTY
FROM F0_SALES_BASE_MEASURES  
ORDER BY TIME_HIER.HIER_ORDER,  
  PRODUCT_HIER.HIER_ORDER;

You may have noted that I'm not specifying any WHERE clause for level filtering: as end user I want to be able to retrieve all the necessary levels by just changing a filter in my DVD project. After including the above SQL in the datasource definition and amending the measure/attribute definition I can start playing with the analytic view data.

Metadata Modeling in the Database with Analytic Views

I can simply include the dimension's MEMBER_NAME in the graphs together with the measures and add the LEVEL_NAME in the filters. In this way I can change the graph granularity by simply selecting the appropriate LEVEL in the filter selector for all the dimensions available.

Metadata Modeling in the Database with Analytic Views

One particular to notice however is that all the data coming from various columns like date, month and year are "condensed" into a single VARCHAR column. In case of different datatypes (like date in the time dimension) this will prevent a correct usage of some DVD's capabilities like time series or trending functions. However if a particular type of graph is needed for a specific level, either an ad-hoc query or a casting operation can be used.

Conclusion

In this blog post we analysed the Analytic Views, a new component in Oracle Database 12.2 and how those can be used to "move" the metadata modeling at DB level to provide an easier query syntax to end-users.

Usually metadata modeling is done in reporting tools like OBIEE that offers additional set of features on top of the one included in analytic views. However centralized reporting tools like OBIEE are not present everywhere and, with the wave of self-service BI tools, analytic views represent a perfect method of enabling users not familiar with SQL to simply query their enterprise data.

If you are interested in understanding more about analytic views or metadata modeling, don't hesitate to contact us!
If you want to improve the SQL skills of your company workforce, check out our recently launched SQL for beginners training!

Categories: BI & Warehousing

Pages

Subscribe to Oracle FAQ aggregator - BI &amp; Warehousing