BI & Warehousing

ETL Offload with Spark and Amazon EMR - Part 2 - Code development with Notebooks and Docker

Rittman Mead Consulting - Fri, 2016-12-16 03:00

In the previous article I gave the background to a project we did for a client, exploring the benefits of Spark-based ETL processing running on Amazon's Elastic Map Reduce (EMR) Hadoop platform. The proof of concept we ran was on a very simple requirement, taking inbound files from a third party, joining to them to some reference data, and then making the result available for analysis. The primary focus was proving the end-to-end concept, with future iterations focussing on performance and design optimisations.

Here we'll see how I went about building up the ETL process.

Processing Overview

The processing needed to iterate over a set of files in S3, and for each one:

  • Loads the file from S3
  • Determines region from filename, and adds as column to data
  • Deduplicates it
  • Writes duplicates to separate file
  • Loads sites reference data
  • Extracts domain from URL string
  • Joins facts with sites on domain
  • Writes resulting file to S3

Once the data is processed and landed back to S3, we can run analytics on it. See subsequent articles for discussion of Redshift vs in-place querying with tools such as Presto.

Ticking All The Cool-Kid Boxes - Spark AND Notebooks AND Docker!

Whilst others in Rittman Mead have done lots of work with Spark, I myself was new to it, and needed a sandpit in which I could flail around without causing any real trouble. Thus I set up a nice self-contained development environment on my local machine, using Docker to provision and host it, and Jupyter Notebooks as the interface.

Notebooks

In a world in which it seems that there are a dozen cool new tools released every day, Interactive Notebooks are for me one of the most significant of recent times for any developer. They originate in the world of data science, where taking the 'science' bit at its word, data processing and exploration is written in a self-documenting manner. It makes it possible to follow how and why code was written, what the output at each stage was -- and to run it yourself too. By breaking code into chunks it makes it much easier to develop as well, since you can rerun and perfect each piece before moving on.

Notebooks are portable, meaning that you can work with them in one system, and then upload them for others to learn from and even import to run on their own systems. I've shared a simplified version of the notebook that I developed for this project on gist here, and you can see an embedded version of it at the end of this article.

The two most common are Apache Zeppelin, and Jupyter Notebooks (previously known as iPython Notebooks). Jupyter is the one I've used previously, and stuck with again here. To read more about notebooks and see them in action, see my previous blog posts here and here.

Docker

Plenty's been written about Docker. In a nutshell, it is a way to provision and host a set of self-contained software. It takes the idea of a virtual machine (such as VMWare, or VirtualBox), but without having to install an OS, and then the software, and then configure it all yourself. You simply take a "Dockerfile" that someone has prepared, and run it. You can create copies, or throwaway and start again, from a single command. I ran Docker on my Mac through Kitematic, and natively on my home server.

There are prebuilt Docker configuration files for lots of software (including Oracle and OBIEE!), and I found one that includes Spark, PySpark, and Jupyter - perfect!

To launch it, you simply enter:

docker run -d -p 18888:8888 jupyter/all-spark-notebook

This downloads all the necessary Docker files etc - you don't need anything local first, except Docker.

I ran it with an additional flag, -v, configuring it to use a folder on my local machine to store the work that I created. By default all files reside within the Docker image itself - and get deleted when you delete the Docker instance.

docker run -d -p 18888:8888 -v /Users/rmoff/all-spark-notebook:/home/jovyan/work jupyter/all-spark-notebook

You can also run the container with an additional flag, GRANT_SUDO, so that the guest user can run sudo commands within it. To do this include -e GRANT_SUDO=yes --user root:

docker run -d -p 18888:8888 -e GRANT_SUDO=yes --user root -v /Users/rmoff/all-spark-notebook:/home/jovyan/work jupyter/all-spark-notebook

With the docker container running, you can access Jupyter notebooks on the port exposed in the command used to launch it (18888)

Getting Started with Jupyter

From Jupyter's main page you can see the files within the main folder (see above for how you can map this to a local folder on the machine hosting Docker). Using the New menu in the top-right you can create:

  • Folders and edit Text files
  • A terminal
  • A notebook, running under one of several different 'Kernels' (host interpreters and environments)

The ability to run a terminal from Jupyter is very handy - particularly on Docker. Docker by its nature isn't really designed for interaction within the container itself. It's the point of Docker in a way, that it provisions and configures all the software for you. You can use Docker to run a bash shell directly, but it's more fiddly than just using the Jupyer Terminal.

I used a Python 2 notebook for my work; with this Docker image you also have the option of Python 3, Scala, and R.

Developing the Spark job

With my development environment up and running, I set to writing the Spark job. Because I'm already familiar with Python I took advantage of PySpark. Below I describe the steps in the processing and how I achieved them.

Environment Preparation

Define AWS parameters:

access_key='XXXXXXXXXXXXXXXXXX  
secret='YYYYYYYYYYYYYYYYY'  
bucket_name='foobar-bucket'  

Set up the PySpark environment, including necessary JAR files for accessing S3 from Spark:

import os  
os.environ['AWS_ACCESS_KEY_ID'] = access_key  
os.environ['AWS_SECRET_ACCESS_KEY'] = secret  
os.environ['PYSPARK_PYTHON'] = '/usr/bin/python2'  
os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages org.apache.hadoop:hadoop-aws:2.7.1,com.amazonaws:aws-java-sdk-pom:1.10.34,com.databricks:spark-csv_2.11:1.3.0 pyspark-shell'  

Create a spark context:

import pyspark  
sc = pyspark.SparkContext('local[*]')  
sqlContext = pyspark.SQLContext(sc)  

Import Python libraries

from pyspark.sql.functions import udf  
from pyspark.sql.functions import lit  
import boto  
from urlparse import urlsplit  

Note that to install python libraries not present on the Docker image (such as boto, which is used for accessing AWS functionality from within Python) you can run from a Jupyter Terminal:

/opt/conda/envs/python2/bin/pip install boto

On other platforms the path to pip will vary, but the install command is the same

Loading Data from S3

The source data comes from an S3 "bucket", on which there can be multiple "keys". Buckets and keys roughly translate to "disk drive" and "file".

We use the boto library to interact with S3 to find a list of files ('keys') matching the pattern of source files that we want to process.

Connect to the bucket
conn_s3 = boto.connect_s3()  
bucket = conn_s3.get_bucket(bucket_name)  
Iterate over the bucket contents

This bit would drive iterative processing over multiple input files; for now it just picks the last file on the list (acme_file getting set on each iteration and so remaining set after the loop)

contents=bucket.list(prefix='source_files/')  
for f in contents:  
    print f.name
    print f.size
    acme_file = f.name
print "\n\n--\nFile to process: %s" % acme_file  
Read the CSV from S3 into Spark dataframe

The Docker image I was using was running Spark 1.6, so I was using the Databricks CSV reader; in Spark 2 this is now available natively. The CSV file is loaded into a Spark data frame. Note that Spark is reading the CSV file directly from a S3 path.

full_uri = "s3n://{}/{}".format(bucket_name, acme_file)  
print full_uri  
s3n://foobar-bucket/source_files/acme_GB_20160803_100000.csv
acme_df = sqlContext.read.load(full_uri,  
                                  format='com.databricks.spark.csv',
                                  header='true',
                                  inferSchema='true')

acme_df.printSchema()  
root
 |-- product: string (nullable = true)
 |-- product_desc: string (nullable = true)
 |-- product_type: string (nullable = true)
 |-- supplier: string (nullable = true)
 |-- date_launched: timestamp (nullable = true)
 |-- position: string (nullable = true)
 |-- url: string (nullable = true)
 |-- status: string (nullable = true)
 |-- reject_reason: string (nullable = true)

The above shows the schema of the dataframe; Spark has infered this automagically from the column headers (for the column names), and then the data types within (note that it has correctly detected a timestamp in the date_launched column)

Add country column to data frame

The filename of the source data includes a country field as part of it. Here we use this regular expression to extract it:

filename=os.path.split(acme_file)[1]  
import re  
m=re.search('acme_([^_]+)_.+$', filename)  
if m is None:  
    country='NA'
else:  
    country=m.group(1)

print "Country determined from filename '%s' as : %s" % (filename,country)  
Country determined from filename 'acme_GB_20160803_100000.csv' as : GB

With the country stored in a variable, we add it as a column to the data frame:

Note that the withColumn function requires a Column value, which we create here using the PySpark lit function that was imported earlier on.

acme_df=acme_df.withColumn('country',lit(country))

acme_df.printSchema()  
root
 |-- product: string (nullable = true)
 |-- product_desc: string (nullable = true)
 |-- product_type: string (nullable = true)
 |-- supplier: string (nullable = true)
 |-- date_launched: timestamp (nullable = true)
 |-- position: string (nullable = true)
 |-- url: string (nullable = true)
 |-- status: string (nullable = true)
 |-- reject_reason: string (nullable = true)
 |-- country: string (nullable = false)

Note the new column added to the end of the schema.

Deduplication

Now that we've imported the file, we need to deduplicate it to remove entries with the same value for the url field. Here I'm created a second dataframe based on a deduplication of the first, using the PySpark native function dropDuplicates:

acme_deduped_df = acme_df.dropDuplicates(['url'])  

For informational purposes we can see how many records are in the two dataframes, and determine how many duplicates there were:

orig_count = acme_df.count()  
deduped_count = acme_deduped_df.count()  
print "Original count: %d\nDeduplicated count: %d\n\n" % (orig_count,deduped_count)  
print "Number of removed duplicate records: %d" % (orig_count - deduped_count)  
Original count: 97974
Deduplicated count: 96706


Number of removed duplicate records: 1268
Deriving Domain from URL

One of the sets of reference data is information about the site on which the product was viewed. To bring these sets of attributes into the main dataset we join on the domain itself. To perform this join we need to derive the domain from the URL. We can do this using the python urlsplit library, as seen in this example:

sample_url = 'https://www.rittmanmead.com/blog/2016/08/using-apache-drill-with-obiee-12c/'

print sample_url  
print urlsplit(sample_url).netloc  
https://www.rittmanmead.com/blog/2016/08/using-apache-drill-with-obiee-12c/
www.rittmanmead.com

We saw above that to add a column to the dataframe the withColumn function can be used. However, to add a column that's based on another (rather than a literal, which is what the country column added above was) we need to use the udf function. This generates the necessary Column field based on the urlsplit output for the associated url value.

First we define our own function which simply applies urlsplit to the value passed to it

def getDomain(value):  
    return urlsplit(value).netloc

and then a UDF based on it:

udfgetDomain = udf(getDomain)  

Finally, apply this to a third version of the dataframe:

acme_deduped_df_with_netloc = acme_deduped_df.withColumn("netloc", udfgetDomain(acme_deduped_df.url))  
Joining to Second Dataset

Having preparing the primary dataset, we'll now join it to the reference data. The source of this is currently an Oracle database. For simplicity we're working with a CSV dump of the data, but PySpark supports the option to connect to sources with JDBC so we could query it directly if required.

First we import the sites reference data CSV:

sites_file = "s3n://{}/{}".format('foobar-bucket', 'sites.csv')  
sites_df = sqlContext.read.load(sites_file,  
                                  format='com.databricks.spark.csv',
                                  header='true',
                                  inferSchema='true')

Then some light data cleansing with the filter function to remove blank SITE entries, and blank SITE_RETAIL_TYPE entries:

sites_pruned_df = sites_df.filter("NOT (SITE ='' OR SITE_RETAIL_TYPE = '')")  

Now we can do the join itself. This joins the original dataset (acme_deduped_df_with_netloc) with the sites reference data (sites_pruned_df), using a left outer join.

merged_df = acme_deduped_df_with_netloc.join(sites_pruned_df,acme_deduped_df_with_netloc.netloc == sites_pruned_df.SITE, 'left_outer')  

Using the filter function and show we can inspect the dataset for matches, and misses:

First 10 matched:

merged_df.filter(merged_df.ID.isNotNull()).select('date_launched','url','netloc','ID','SITE','SITE_RETAIL_TYPE').show(10)  

First 10 unmatched:

merged_df.filter(merged_df.ID.isNull()).select('date_launched','url','netloc','ID','SITE','SITE_RETAIL_TYPE').show(10)  
Write Back to S3

The finished dataset is written back to S3. As before, we're using the databricks CSV writer here but in Spark 2 would be doing it natively:

acme_enriched_filename='acme_enriched/%s' % filename.replace('.csv','')  
full_uri = "s3n://{}/{}".format(bucket_name, acme_enriched_filename)  
print 'Writing enriched acme data to %s' % full_uri  
merged_df.write.save(path=full_uri,  
                     format='com.databricks.spark.csv',
                     header='false',
                     nullValue='null',
                     mode='overwrite')
Summary

With the above code written, I could process input files in a couple of minutes per 30MB file. Bear in mind two important constraints to this performance:

  1. I was working with data residing up in the Amazon Cloud, with the associated network delay in transferring to and from it

  2. The processing was taking place on a single node Spark deployment (on my laptop, under virtualisation), rather than the multiple-node configuration typically seen.

The next steps, as we'll see in the next article, were to port this code up to Amazon Elastic Map Reduce (EMR). Stay tuned!

Footnote: Notebooks FTW!

(FTW)

Whilst I've taken the code and written it out above more in the form of a blog post, I could have actually just posted the Notebook itself, and it wouldn't have needed much more explanation. Here it is, along with some bonus bits on using S3 from python:

Categories: BI & Warehousing

ETL Offload with Spark and Amazon EMR - Part 1

Rittman Mead Consulting - Thu, 2016-12-15 03:00

We recently undertook a two-week Proof of Concept exercise for a client, evaluating whether their existing ETL processing could be done faster and more cheaply using Spark. They were also interested in whether something like Redshift would provide a suitable data warehouse platform for them. In this series of blog articles I will look at how we did this, and what we found.

Background

The client has an existing analytics architecture based primarily around Oracle database, Oracle Data Integrator (ODI), Oracle GoldenGate, and Oracle Business Intelligence Enterprise Edition (OBIEE), all running on Amazon EC2. The larger architecture in the organisation is all AWS based too.

Existing ETL processing for the system in question is done using ODI, loading data daily into a partitioned Oracle table, with OBIEE providing the reporting interface.

There were two aspects to the investigation that we did:

  • Primarily, what would an alternative platform for the ETL look like? With lots of coverage recently of the concept of "ETL offloading" and "Apache-based ETL", the client was keen to understand how they might take advantage of this

    Within this, key considerations were:

    • Cost
    • Scalability
    • Maintenance
    • Fit with existing and future architectures
  • The second aspect was to investigate whether the performance of the existing reporting could be improved. Despite having data for multiple years in Oracle, queries were too slow to provide information other than within a period of a few days.

Oracle licenses were a sensitive point for the client, who were keen to reduce - or at least, avoid increased - costs. ODI for Big Data requires additional licence, and so was not in scope for the initial investigation.

Data and Processing

The client uses their data to report on the level of requests for different products, including questions such as:

  • How many requests were there per day?
  • How many requests per product type in a given period?
  • For a given product, how many requests were there, from which country?

Data volumes were approximately 50MB, arriving in batch files every hour. Reporting requirements were previous day and before only. Being able to see data intra-day would be a bonus but was not a requirement.

High Level Approach

Since the client already uses Amazon Web Services (AWS) for all its infrastructure, it made sense to remain in the AWS realm for the first round of investigation. We broke the overall requirement down into pieces, so as to understand (a) the most appropriate tool at each point and (b) the toolset with best overall fit. A very useful reference for an Amazon-based big data design is the presentation Big Data Architectural Patterns and Best Practices on AWS. Even if you're not running on AWS, the presentation has some useful pointers for things like where to be storing your data based on volumes, frequency of access, etc.

Data Ingest

The starting point for the data was Amazon's storage service - S3, in which the data files in CSV format are landed by an external process every hour.

Processing (Compute)

Currently the processing is done by loading the external data into a partitioned Oracle table, and resolving dimension joins and de-duplication at query time.

Taking away any assumptions, other than a focus on 'new' technologies (and a bias towards AWS where appropriate), we considered:

  • Switch out Oracle for Redshift, and resolve the joins and de-duplication there
    • Loading the data to Redshift would be easy, but would be switching one RDBMS-based solution for another. Part of the aim of the exercise was to review a broader solution landscape than this.
  • Use Hadoop-based processing, running on Elastic Map Reduce (EMR):

    • Hive QL to process the data on S3 (or HDFS on EMR)
      • Not investigated, because provides none of the error handling etc that Spark would, and Spark has SparkSQL for any work that needs doing in SQL.
    • Pig
      • Still used, but 'old' technology, somewhat esoteric language, and superseded by Spark
    • Spark
      • Support for several languages including commonly-used ones such as Python
      • Gaining increasing levels of adoption in the industry
      • Opens up rich eco-system of processing possibilities with related projects such as Machine Learning, and Graph.

We opted to use Spark to process the files, joining them to the reference data, and carrying out de-duplication. For a great background and discussion on Spark and its current place in data architectures, have a listen to this podcast.

Storage

The output from Spark was written back to S3.

Analytics

With the processed data in S3, we evaluated two options here:

  • Load it to Redshift for query
  • Query in-place with a SQL-on-Hadoop engine such as Presto or Impala
    • With the data at rest on S3, Amazon's Athena is also of interest here, but was released after we carried out this particular investigation.

The presumption was that OBIEE would continue to provide the front-end to the analytics. Oracle's Data Visualization Desktop tool was also of interest.

In the next post we'll see the development environment that we used for prototyping. Stay tuned!

Categories: BI & Warehousing

Source Control and Automated Code Deployment Options for OBIEE

Rittman Mead Consulting - Wed, 2016-12-14 03:00

It's Monday morning. I've arrived at a customer site to help them - ironically enough - with automating their OBIEE code management. But, on arrival, I'm told that the OBIEE team can't meet with me because someone did a release on the previous Friday, had now gone on holiday - and the wrong code was released but they didn't know which version. All hands-on-deck, panic-stations!

This actually happened to me, and in recent months too. In this kind of situation hindsight gives us 20:20 vision, and of course there shouldn't be a single point of failure, of course code should be under version control, of course it should be automated to reduce the risk of problems during deployments. But in practice, these things often don't get done - and it's understandable why. In the very early days of a project, it will be a manual process because that's what is necessary as people get used to the tools and technology. As time goes by, project deadlines come up, and tasks like this are seen as "zero sum" - sure we can automate it, but we can also continue doing it manually and things will still get done, code will still get released. After a while, it's just accepted as how things are done. In effect, it is technical debt - and this is your reminder that debt has to be paid, sooner or later :)

I'll not pretend that managing OBIEE code in source control, and automating code deployments, is straightforward. But, it is necessary, so in this post I'll walk through why you should be doing it, and then importantly how.

Why Source Control?

Do we really need source control for OBIEE? After all, what's wrong with the tried-and-tested method of sticking it all in a folder like this?

sdlc01.png

What's wrong with this? What's right with this? Oh lack of source control, let me count the number of ways that I doth hate thee:

  1. No audit trail of who changed something
  2. No audit of what was changed, and when
  3. No enforceable naming standards for versions
  4. No secure way of identifying deployment candidates
  5. No distributed method for sharing code (don't tell me that a network share counts!)
  6. No way of reliably identifying the latest version of code

These range from the immediately practical through to the slightly more abstract but necessary in a mature deployment.

Of immediate impact is the simply ability to identify the latest version of code on which to make new changes. Download the copy from the live server? Really? No. If you're tracking your versions accurately and reliably then you simply pull the latest version of code from there, in the knowledge that it is the version that is live. No monkeying around trying to figure out if it really is (just because it's called "PROD-091216.rpd" how do you know that's actually what got released to Production? And was that on 12th December or 9th September? Who knows!).

Longer term, having a secure and auditable code line simply makes it easier and less risky to manage. It also gives you the ability to work with it in a much more flexible manner, such as genuine concurrent development by multiple developers against the RPD. You can read more about this in my presentation here.

Which Source Control?

I don't care. Not really. So long as you are using source control, I am happy.

For preference, I always advocate using git. It is a modern platform, with strong support from many desktop clients (SourceTree is my favourite, along with the commandline too, natch). Git is decentralised, meaning that you can commit and branch code locally on your own machine without having to be connected to a server. It supports a powerful fork and pull process too, which is part of the reason it has almost universal usage within the open source world. The most well known of git platforms is github, which in effect provides git as a Platform-as-a-service (PaaS), in a similar fashion to Bitbucket too. You can also run git on its own locally, or more pragmatically, with gitlab.

But if you're using Subversion (SVN), Perforce, or whatever - that's fine. The key thing is that you understand how to use it, and that it is supported within your organisation. For simple source control, pretty much all the common platforms work just fine. If you get onto more advanced use, such as feature-branches and concurrent development, you may find it worth ensuring that your chosen platform supports the workflow that you adopt. Even then, whilst I'd chose git for preference, at Rittman Mead we've helped clients develop very powerful concurrent development processes with Subversion providing the underlying source control.

What Goes into Source Control? Part 1

So you've drunk the Source Control koolaid, and accepted that really there is no excuse not to use it. So what do you put into it? The RPD? The OBIEE 12c BAR file? What if you're still on OBIEE 11g? The answer here depends partially on how you are planning to manage code deployment in your environment. For a fully automated solution, you may opt to store code in a more granular fashion than if you are simply migrating full BAR files each time. So, read on to understand about code deployment, and then we'll revisit this question again after that.

How Do You Deploy Code Changes in OBIEE?

The core code artefacts are the same between OBIEE 11g and OBIEE 12c, so I'll cover both in this article, pointing out as we go any differences.

The biggest difference with OBIEE 12c is the concept of the "Service Instance", in which the pieces for the "analytical application" are clearly defined and made portable. These components are:

  • Metadata model (RPD)
  • Presentation Catalog ("WebCat"), holding all analysis and dashboard definitions
  • Security - Application Roles and Policy grants, as well as OBIEE front-end privilege grants

Part of this is laying the foundations for what has been termed "Pluggable BI", in which 'applications' can be deployed with customisations layered on top of them. In the current (December 2016) version of OBIEE 12c we have just the Single Service Instance (ssi). Service Instances can be exported and imported to BI Archive files, known as BAR files.

The documentation for OBIEE environment migrations (known as "T2P" - Test to Production) in 12c is here. Hopefully I won't be thought too rude for saying that there is scope for expanding on it, clarifying a few points - and perhaps making more of the somewhat innocuous remark partway down the page:

PROD Service Instance metadata will be replaced with TEST metadata.

Hands up who reads the manual fully before using a product? Hands up who is going to get a shock when they destroy their Production presentation catalog after importing a service instance?...

Let's take walk through the three main code artefacts, and how to manage each one, starting with the RPD.

The RPD

The complication of deployments of the RPD is that the RPD differs between environments because of different connection pool details, and occassionally repository variable values too.

If you are not changing connection pool passwords between environments, or if you are changing anything else in your RPD (e.g. making actual model changes) between environments, then you probably shouldn't be. It's a security risk to not have different passwords, and it's bad software development practice to make code changes other than in your development environment. Perhaps you've valid reasons for doing it... perhaps not. But bear in mind that many test processes and validations are based on the premise that code will not change after being moved out of dev.

With OBIEE 12c, there are two options for managing deployment of the RPD:

  1. BAR file deploy and then connection pool update
  2. Offline RPD patch with connection pool updates, and then deploy
    • This approach is valid for OBIEE 11g too
RPD Deployment in OBIEE 12c - Option 1

This is based on the service instance / BAR concept. It is therefore only valid for OBIEE 12c.

  1. One-off setup : Using listconnectionpool to create a JSON connection pool configuration file per target environment. Store each of these files in source control.
  2. Once code is ready for promotion from Development, run exportServiceInstance to create a BAR file. Commit this BAR file to source control

    /app/oracle/biee/oracle_common/common/bin/wlst.sh <<EOF
    exportServiceInstance('/app/oracle/biee/user_projects/domains/bi/','ssi','/home/oracle','/home/oracle')
    EOF
    

  3. To deploy the updated code to the target environment:

    1. Checkout the BAR from source control
    2. Deploy it with importServiceInstance, ensuring that the importRPD flag is set.

      /app/oracle/biee/oracle_common/common/bin/wlst.sh <<EOF
      importServiceInstance('/app/oracle/biee/user_projects/domains/bi','ssi','/home/oracle/ssi.bar',true,false,false)
      EOF
      
    3. Run updateConnectionPool using the configuration file from source control for the target environment to set the connection pool credentials

      /app/oracle/biee/user_projects/domains/bi/bitools/bin/datamodel.sh updateconnectionpool -C ~/prod_cp.json -U weblogic -P Admin123 -SI ssi
      

      Note that your OBIEE system will not be able to connect to source databases to retrieve data until you update the connection pools.

    4. The BI Server should pick up the new RPD after a few minutes. You can force this by restarting the BI Server, or using "Reload Metadata" from OBIEE front end.

Whilst you can also create the BAR file with includeCredentials, you wouldn't use this for migration of code between environments - because you don't have the same connection pool database passwords in each environment. If you do have the same passwords then change it now - this is a big security risk.

The above BAR approach works fine, but be aware that if the deployed RPD is activated on the BI Server before you have updated the connection pools (step 3 above) then the BI Server will not be able to connect to the data sources and your end users will see an error. This approach is also based on storing the BAR file as whole in source control, when for preference we'd store the RPD as a standalone binary if we want to be able to do concurrent development with it.

RPD Deployment in OBIEE 12c - Option 2 (also valid for OBIEE 11g)

This approach takes the RPD on its own, and takes advantage of OBIEE's patching capabilities to prepare RPDs for the target environment prior to deployment.

  1. One-off setup: create a XUDML patch file for each target environment.

    Do this by:

    1. Take your development RPD (e.g. "DEV.rpd"), and clone it (e.g. "PROD.rpd")
    2. Open the cloned RPD (e.g. "PROD.rpd") offline in the Administration Tool. Update it only for the target environment - nothing else. This should be all connection pool passwords, and could also include connection pool DSNs and/or users, depending on how your data sources are configured. Save the RPD.
    3. Using comparerpd, create a XUDML patch file for your target environment:

      /app/oracle/biee/user_projects/domains/bi/bitools/bin/comparerpd.sh \
      -P Admin123 \
      -W Admin123 \
      -G ~/DEV.rpd \
      -C ~/PROD.rpd \
      -D ~/prod_cp.xudml
      
    4. Repeat the above process for each target environment

  2. Once code is ready for promotion from Development:

    1. Extract the RPD

      • In OBIEE 12c use downloadrpd to obtain the RPD file

        /app/oracle/biee/user_projects/domains/bi/bitools/bin/datamodel.sh \
        downloadrpd \
        -O /home/oracle/obiee.rpd \
        -W Admin123 \
        -U weblogic \
        -P Admin123 \
        -SI ssi
        
      • In OBIEE 11g copy the file from the server filesystem

    2. Commit the RPD to source control

  3. To deploy the updated code to the target environment:

    1. Checkout the RPD from source control
    2. Prepare it for the target environment by applying the patch created above

      1. Check out the XUDML patch file for the appropriate environment from source control
      2. Apply the patch file using biserverxmlexec:

        /app/oracle/biee/user_projects/domains/bi/bitools/bin/biserverxmlexec.sh \
        -P Admin123 \
        -S Admin123 \
        -I prod_cp.xudml \
        -B obiee.rpd \
        -O /tmp/prod.rpd
        
    3. Deploy the patched RPD file

      • In OBIEE 12c use uploadrpd

        /app/oracle/biee/user_projects/domains/bi/bitools/bin/datamodel.sh \
        uploadrpd \
        -I /tmp/prod.rpd \
        -W Admin123 \
        -U weblogic \
        -P Admin123 \
        -SI ssi \
        -D
        

        The RPD is available straightaway. No BI Server restart is needed.

      • In OBIEE 11g use WLST's uploadRepository to programatically do this, or manually from EM.

        After deploying the RPD in OBIEE 11g, you need to restart the BI Server.

This approach is the best (only) option for OBIEE 11g. For OBIEE 12c I also prefer it as it is 'lighter' than a full BAR, more solid in terms of connection pools (since they're set prior to deployment, not after), and it enables greater flexibility in terms of RPD changes during migration since any RPD change can be encompassed in the patch file.

Note that the OBIEE 12c product manual states that uploadrpd/downloadrpd are for:

"...repository diagnostic and development purposes such as testing, only ... all other repository development and maintenance situations, you should use BAR to utilize BAR's repository upgrade and patching capabilities and benefits.".

Maybe in the future the BAR capabilites will extend beyond what they currently do - but as of now, I've yet to see a definitive reason to use them and not uploadrpd/downloadrpd.

The Presentation Catalog ("WebCat")

The Presentation Catalog stores the definition of all analyses and dashboards in OBIEE, along with supporting objects including Filters, Conditions, and Agents. It differs significantly from the RPD when it comes to environment migrations. The RPD can be seen in more traditional software development lifecycle terms, sine it is built and developed in Development, and when deployed in subsequent environment overwrites in entirety what is currently there. However, the Presentation Catalog is not so simple.

Commonly, content in the Presentation Catalog is created by developers as part of 'pre-canned' reporting and dashboard packs, to be released along with the RPD to end-users. Where things get difficult is that the Presentation Catalog is also written to in Production. This can include:

  • User-developed content saved in one (or both) of:
    • My Folders
    • Shared, e.g. special subfolders per department for sharing common reports outside of "gold standard" ones
  • User's profile data, including timezone and language settings, saved dashboard customisations, preferred delivery devices, and more
  • System configuration data, such as default formatting for specific columns, bookmarks, etc

In your environment you maybe don't permit some of these (for example, disabling access to My Folders is not uncommon). But almost certainly, you'll want your users to be able to persist their environment settings between sessions.

The impact of this is that the Presentation Catalog becomes complex to manage. We can't just overwrite the whole catalog when we come to deployment in Production, because if we do so all of the above listed content will get deleted. And that won't make us popular with users, at all.

So how do we bring any kind of mature software development practice to the Presentation Catalog, assuming that we have report development being done in non-Production environments?

We have two possible approaches:

  1. Deploy the full catalog into Production each time, but backup first existing content that we don't want to lose, and restore it after the deploy
    • Fiddly, but means that we don't have to worry about which bits of the catalog go in source control - all of it does. This has consequences for if we want to do branch-based development with source control, in that we can't. This is because the catalog will exist as a single binary (whether BAR or 7ZIP), so there'll be no merging with the source control tool possible.
    • Risky, if we forget to backup the user content first, or something goes wrong in the process
    • A 'heavier' operation involving the whole catalog and therefore almost certainly requiring the catalog to be in maintenance-mode (read only).
  2. Deploy the whole catalog once, and then do all subsequent deploys as deltas (i.e. only what has changed in the source environment)
    • Less risky, since not overwriting whole target environment catalog
    • More flexible, and more granular so easier to track in source control (and optionally do branch-based development).
    • Requires more complex automated deployment process.

Both methods can be used with OBIEE 11g and 12c.

Presentation Catalog Migration in OBIEE - Option 1

In this option, the entire Catalog is deployed, but content that we want to retain backed up first, and then re-instated after the full catalog deploy.

First we take the entire catalog from the source environment and store it in source control. With OBIEE 12c this is done using the exportServiceInstance WLST command (see the example with the RPD above) to create a BAR file. With OBIEE 11g, you would create an archive of the catalog at its root using 7-zip/tar/gzip (but not winzip).

When ready to deploy to the target environment, we first backup the folders that we want to preserve. Which folders might we want to preserve?

  1. /users - this holds both objects that users have created and saved in My Folders, as well as user profile information (including timezone preferences, delivery profiles, dashboard customisations, and more)
  2. /system - this hold system internal settings, which include things such as authorisations for the OBIEE front end (/system/privs), as well as column formatting defaults (/system/metadata), global variables (/system/globalvariables), and bookmarks (/system/bookmarks).
    • See note below regarding the /system/privs folder
  3. /shared/<…>/<…> - if users are permitted to create content directly in the Shared area of the catalog you will want to preserve this. A valid use of this is for teams to share content developed internally, instead of (or prior to) it being released to the wider user community through a more formal process (the latter being often called 'gold standard' reports).

Regardless of whether we are using OBIEE 11g or 12c we create a backup of the folders identified by using the Archive functionality of OBIEE. This is NOT just creating a .zip file of the file system folders - which is completely unsupported and a bad idea for catalog management, except at the very root level. Instead, the Archive functionality creates a .catalog file which can be stored in source control, and unarchived back into OBIEE to restore content.

You can create OBIEE catalog archives in one of four ways, which are also valid for importing the content back into OBIEE too:

  1. Manually, through OBIEE front-end
  2. Manually, through Catalog Manager GUI
  3. Automatically, through Catalog Manager CLI (runcat.sh)

    • Archive:

      runcat.sh \
      -cmd archive  \
      -online http://demo.us.oracle.com:7780/analytics/saw.dll \
      -credentials /tmp/creds.txt \
      -folder "/shared/HR" \
      -outputFile /home/oracle/hr.catalog
      
    • Unarchive:

      runcat.sh \
      -cmd unarchive \
      -inputFile hr.catalog \
      -folder /shared \
      -online http://demo.us.oracle.com:7780/analytics/saw.dll  \
      -credentials /tmp/creds.txt \
      -overwrite all
      
  4. Automatically, using the WebCatalogService API (copyItem2 / pasteItem2).

Having taken a copy of the necessary folders, we then deploy the entire catalog (with the changes from the development in) taken from source control. Deployment is done in OBIEE 12c using importServiceInstance. In OBIEE 11g it's done by taking the server offline, and replacing the catalog with the filesystem archive to 7zip of the entire catalog.

Finally, we then restore the folders previously saved, using the Unarchive function to import the .catalog files:

Presentation Catalog Migration in OBIEE - Option 2

In this option we take a more granular approach to catalog migration. The entire catalog from development is only deployed once, and after that only .catalog files from development are put into source control and then deployed to the target environment.

As before, the entire catalog is initially taken from the development environment, and stored in source control. With OBIEE 12c this is done using the exportServiceInstance WLST command (see the example with the RPD above) to create a BAR file. With OBIEE 11g, you would create an archive of the catalog at its root using 7zip.

Note that this is only done once, as the initial 'baseline'.

The first time an environment is commissioned, the baseline is used to populate the catalog, using the same process as in option 1 above (in 12c, importServiceInstance/ in 11g unzip of full catalog filesystem copy).

After this, any work that is done in the catalog in the development environment is migrated through by using OBIEE's archive function against just the necessary /shared subfolder to a .catalog file, storing this in source control

This is then imported to target environment with unarchive capability. See above in option 1 for details of using archive/unarchive - just remember that this is archiving with OBIEE, not using 7zip!

You will need to determine at what level you take this folder: -

  • If you archive the whole of /shared each time you'll never be able to do branch-based development with the catalog in which you want to merge branches (because the .catalog file is binary).
  • If you instead work at, say, department level (/shared/HR, /shared/sales, etc) then the highest grain for concurrent catalog development would be the department. The lower down the tree you go the greater the scope for independent concurrent development, but the greater the complexity to manage. This is because you want to be automating the unarchival of these .catalog files to the target environment, so having to deal with multiple levels of folder hierarchy gets hard work.

It's a trade off between the number of developers, breadth of development scope, and how simple you want to make the release process.

The benefit of this approach is that content created in Production remains completely untouched. Users can continue to create their content, save their profile settings, and so on.

Presentation Catalog Migration - OBIEE Privilege Grants

Permissions set in the OBIEE front end are stored in the Presentation Catalog's /system/privs folder.

Therefore, how this folder is treated during migration dictates where you must apply your security grants (or conversely, where you set your security grants dictates how you should treat the folder in migrations). For me the "correct" approach would be to define the full set of privileges in the development environment and the migrate these through along with pre-built objects in /shared through to Production. If you have a less formal approach to environments, or for whatever reason permissions are granted directly in Production, you will need to ensure that the /system/privs folder isn't overwritten during catalog deployments.

When you create a BAR file in OBIEE 12c, it does include /system/privs (and /system/metadata). Therefore, if you are happy for these to be overwritten from the source environment, you would not need to backup/restore these folders. If you set includeCatalogRuntimeInfo in the OBIEE 12c export to BAR, it will also include the complete /system folder as well as /users.

Agents

Regardless of how you move Catalog content between environments, if you have Agents you need to look after them too. When you move Agents between environment, they are not automatically registered with the BI Scheduler in the target environment. You either have to do this manually, or with the web service API : WebCatalogService.readObjects to get the XML for the agent, and then submit it to iBotService.writeIBot which will register it with the BI Scheduler.

Security
  • In terms of the Policy store (Application Roles and Policy grants), these are managed by the Security element of the BAR and migration through the environments is simple. You can deploy the policy store alone in OBIEE 12c using the importJazn flag of importServiceInstance. In OBIEE 11g it's not so simple - you have to use the migrateSecurityStore WLST command.
  • Data/Object security defined in the RPD gets migrated automatically through the RPD, by definition
  • See above for a discussion of OBIEE front-end privilege grants.
What Goes into Source Control? Part 2

So, suddenly this question looks a bit less simple than when orginally posed at the beginning of this article. In essence, you need to store:

  1. RPD
    1. BAR + JSON configuration for each environment's connection pools -- 12c only, simpler, but less flexible and won't support concurrent development easily
    2. RPD (.rpd) + XUDML patch file for each environment's connection pools -- works in 11g too, supports concurrent development
  2. Presentation Catalog
    1. Entire catalog (BAR in 12c / 7zip in 11g) -- simpler, but impossible to manage branch-based concurrent development
    2. Catalog baseline (BAR in 12c / 7zip in 11g) plus delta .catalog files -- More complex, but more flexible, and support concurrent development
  3. Security
    1. BAR file (OBIEE 12c)
    2. system-jazn-data.xml (OBIEE 11g)
  4. Any other files that are changed for your deployment.

    It's important that when you provision a new environment you can set it up the same as the others. It is also invaluable to have previous versions of these files so as to be able to rollback changes if needed, and to track what settings have changed over time.

    This could include:

    • Configuration files (nqsconfig.ini, instanceconfig.xml, tnsnames.ora, etc)
    • Custom skins & styles
    • writeback templates
    • etc
Summary

I never said it was simple ;-)

OBIEE is an extremely powerful product, and just as you have to take care to build your data models correctly, you also need to take care to understand why and how to manage your code correctly. What I've tried to do here is pull together the different options available, and lay them out with their respectively pros and cons. Let me know in the comments below what you think and how you manage OBIEE code at your site.

One of the key messages that it's important to get across is this: there are varying degrees of complexity with which you can embrace source control. All are valid, and in fact an incremental adoption of them rather than big-bang can sometimes be a better idea:

  • At one end of the scale, you simply use source control to hold copies of all your code, and continue to deploy manually
  • Getting a bit smarter, automating code deployments from source control. Code development is still done serially though.
  • At the other end of the scale, you use source control with branch-based feature-driven concurrent development. Completed features are merged automatically with RPD conflicts managed by the OBIEE tooling from the command line. Testing and deployment are both automated.

If you'd like assistance with your OBIEE development and deployment practices, including fully automated source-control driven concurrent development management, please get in touch with us here at Rittman Mead. We would be delighted to use our extensive experience in this field to produce a flexible and customised process for your particular environment and requirements.

You can find the companion slide deck to this article, with further discussion on concurrent development, here.

Categories: BI & Warehousing

The Visual Plugin Pack for OBIEE

Rittman Mead Consulting - Tue, 2016-12-13 04:00

Last week we announced the Rittman Mead Open Source project, and released into open source:

  • the excellent Insights project, a javascript API/framework for building a new frontend for OBIEE, written by Minesh Patel
  • Enhanced usage tracking for OBIEE, to track click-by-click how your users interact with the application

Today it is the turn of the Visual Plugin Pack.....

What is the Visual Plugin Pack for OBIEE ?

Visual Plugin Pack (VPP) is a means by which users of OBIEE Answers can use custom JavaScript visualisations without having to write any javascript!

It is a framework that enables developers to build Javascript visualisation plugins, that report builders can then utilise and configure through native OBIEE user interface.

I want to point this out from the very start, that despite its name, the Visual Plugin Pack is not a pack of all-singing, all-dancing, super-duper visualisations for OBIEE.

Instead, VPP should be thought of as a framework that allows you to quickly develop and integrate all-singing, all-dancing, super-duper visualisations that will work natively within OBIEE.

Subtle difference, but an important one.

So what does it do ?

Essentially, VPP allows you to accelerate the development and deployment of custom, configurable and reusable OBIEE JavaScript visualisations.

Back in 2013 I wrote this post describing how to embed a D3 Visualisation within OBIEE. The same method will still work today, but it's a cumbersome process and requires heavy use of the narrative form, which let's be honest, is a painful experience when it comes to JavaScript development.

Some drawbacks with this method:

  • Code editing in the Narrative view is not productive.
  • Reusing visualisations in other analyses requires the copying and pasting of code.
  • Basic Visualisation configuration changes, for example, width, height, colours, font etc requires code changes.
  • Remapping Column bindings requires code changes.
  • JavaScript library dependencies and load order can be tricky to manage.

The Visual Plugin Pack attempts to address these issues by abstracting away the complexities of the Narrative form and allowing developers to focus on visualisation code, not OBIEE integration code.
If you choose to use VPP for your visualisations then you will never have to touch the narrative form, all visualisation development can take place outside of OBIEE in your favourite code editor and deployed to Weblogic when you are done.

VPP also allows you to define design-time controls that affect column bindings and visualisation behaviour. The example visualisation below has been written to accept 5 column bindings and 1 configuration component, which controls the visualisation size. You can create as many column bindings and configuration components as you need

scatterplot matrix

How do I get started ?

You can download or fork the repository from here.

Installation and developer guides can be found on the wiki:-

There are several visualisations that come bundled with VPP, some more polished than others, but they should serve as good examples that can be enhanced further.

Summary

If you've got some in-house JavaScript skills and are looking to develop and integrate custom visualisations into OBIEE, then VPP can help alleviate a lot of the frustrations associated with the traditional method. Once you're up and running you'll be able to develop faster, integrate quickly and share your visualisations with all OBIEE report writers.

If you'd like to discuss how Rittman Mead can help with deployment or assist with custom visualisation development feel free to contact us.

Categories: BI & Warehousing

Enhanced Usage Tracking for OBIEE - Now Available as Open Source!

Rittman Mead Consulting - Mon, 2016-12-12 04:00
Introduction

OBIEE provides Usage Tracking as part of the core product functionality. It writes directly to a database table every Logical Query that hits the BI Server, including details of who ran it, when, and information about how it executed including for how long, how many rows, and so on. This in itself is a veritable goldmine of information about your OBIEE system. All OBIEE deployments should have Usage Tracking enabled, for supporting performance analysis, capacity planning, catalog rationalisation, and more.

What Usage Tracking doesn't track is interactions between the end user and the Presentation Services component. Presentation Services sits between the end user and the BI Server from where the actual queries get executed. This means that until a user executes an analysis, there's no record of their actions in Usage Tracking. There is this audit data available, but you have to manually enable and collect it, which can be tricky. This is where Enhanced Usage Tracking comes in. It enables the collection and parsing of every click a user makes in OBIEE. For an overview of the potential of this data, see the article here and here.

Today we're pleased to announce the release into open-source of Enhanced Usage Tracking! You can find the github repository here: https://github.com/RittmanMead/obi-enhanced-usage-tracking.

Highlights of the data that Enhanced Usage Tracking provides includes:

  • Which web browsers do people use? Who is accessing OBIEE with a mobile device?

  • Who deleted a catalog object? Who moved it?

  • What dashboards get exported to Excel most frequently, and by whom?

The above visualisations are from both Kibana, and OBIEE. The data from Enhanced Usage Tracking can be loaded into Elasticsearch, and is also available from Oracle tables too, hence you can put OBIEE itself on top of it, or DV:

eut108.png

How to use Enhanced Usage Tracking

See the github repository for full detail on how to install and run the code.

TODO

What's left TODO? Here are a few ideas if you'd like to help build on this tool. I've linked each title to the relevant github issue.

TODO 01

The sawlog is a rich source of lots of data, but the Logstash script has to know how to parse it. It's all down to the grok statement which identifies fields to extract and defined their deliniators. Use grokdebug.herokuapp.com to help master your syntax. From there, the data can be emitted to CSV and loaded into Oracle.

Here's an example of something yet to build - when items are moved and deleted in the Catalog, it is all logged. What, who, and when. The Logstash grok currently scrapes this, but the data isn't included in the CSV output, nor loaded into Oracle.

eut105.png

Don't forget to submit a pull request for any changes to the code that would benefit others in the community!

You'll also find loading the data directly into Elasticsearch easier than redefining the Oracle table DDL and load script each time, since in Elasticsearch the 'schema' can evolve based simply on the data that Logstash sends to it.

TODO 02

Version 5 of the Elastic stack was released in late 2016, and it would be good to test this code with it and update the README section above to indicate if it works - or submit the required changes needed for it to do so.

TODO 03

There's lots of possibilities for this data. Auditing who did what, when, is useful (e.g. who deleted a report?). Taking it a step further, are there patterns in user behaviour? Certain patterns of clicks that could be identified to highlight users who are struggling to find the data that they want? For example, opening lots of presentation folders in the Answers editor before adding columns to the analysis? Can we extend that to identify users who are struggling to use the tool and are going to "churn" (stop using it) and thus contact them before they do so to help resolve any issues they have?

TODO 04

At the moment the scripts are manual to invoke and run. It would be neat to package this up into a service (or set of services) that could run automagically at server boot.

Until then, using GNU screen is a handy hack for setting scripts running and being able to disconnect from the server without terminating them. It's like using nohup ... &, except you can reconnect to the session itself as and when you want to.

TODO 05

Click events have defined 'Request' types, and these I have roughly grouped together into 'Request Groups' to help describe what the user was doing (e.g. Logon / Edit Report / Run Report). Not all requests have been assigned to request groups. It would be useful to identify all request types, and refine further the groupings.

TODO 06

At the moment only clicks in Presentation Services are captured and analysed. I bet the same can be done for Data Visualization/Visual Analyzer too ...

Problems?

Please raise any issues on the github issue tracker. This is open source, so bear in mind that it's no-one's "job" to maintain the code - it's open to the community to use, benefit from, and maintain.

If you'd like specific help with an implementation, Rittman Mead would be delighted to assist - please do get in touch to discuss our rates.

Categories: BI & Warehousing

Insights - An Open-Source Visualisation Platform for OBIEE

Rittman Mead Consulting - Fri, 2016-12-09 04:00

On and off over the last year, I have spent some time developing a customisable framework for building visualisations and dashboards, using OBIEE as the back-end. The result is Insights, a JavaScript web application that offers a modern alternative to OBIEE Answers. As of today, we have officially open sourced the project, so you are free to download, install, hack and contribute as you please.

The primary motive for building this application was to meet some very bespoke reporting requirements for a client, which I mention in my previous blog describing the prototype. During this piece of work I wrote an object orientated interface for the OBIEE web services. The icing on the cake was tying it into Tom Underhill's Visual Plugin Pack.

You can see more information about Insights in a presentation that I did at the recent UKOUG conference here: Bridging the Gap: Enhancing OBIEE with a Custom Visualisation Platform

Since then a lot of the work has been put in to make it developer friendly, visually appealing and hopefully easier to use. I'll be the first to admit that it's far from perfect, but it should be a decent starting point.

Getting Started

In order to use Insights you will need OBIEE 11.1.1.9 or above. Additionally, the application has only been tested using IE11 or Chrome browsers and so compatibility with other browsers cannot be guaranteed.

First, download the application or fork the Git repository.

There is an installation guide in the project at docs/installation.html. Follow this guide to deploy the application on your OBIEE server.

Demo

This is a quick step-by-step demonstration creating a basic dashboard, showing off some of the features in the application (apologies if the GIFs take a while to load).

First you log in, using your usual OBIEE credentials. The homepage shows some pre-configured dashboards here, but we're going to click the pencil to create a new one.

Logging in

Next I've dragged in some columns from my subject area, Sample App and run the query, displaying the default table plugin.

Add Columns

In this step, I've gone to the configuration tab, and changed the colour of my table.

Configuration

Now I change the plugin type using the drop down menu at the top. Notice that my previous table visualisation gets stored on the right. By clicking the Store button manually, it also adds my new pie chart. Then we can flick between them easily.

Store Pie Chart

Filters can be added by clicking the icon next to the column on the subject area panel.

Filters

Adding in a sunburst chart, and playing with some of the colours here.

Sunburst

Now we have our visualisations, we can begin constructing our dashboard. You can freely move around and resize the visualisations as you choose. I recommend hiding the panels for this, as the full screen is much closer to what users will see when viewing the dashboard.

Dashboard

The next GIF shows the interaction framework, which can be used to implement UI features where the user interacts with one visualisation and another visualisation on the page reacts to it. In its most basic form, each plugin type can be filtered - where OBIEE runs the query again. Although more complex reactions that are specific to a certain chart type can also be configured, as seen below with the sunburst chart.

Interactivity

Dashboard prompts can be added by clicking the filter icon next to one of the RPD columns. Any visualisations using this subject area will respond to the prompt. The prompt box can be freely placed on the canvas like any other object.

Dashboard Prompt

Finally, we can save the object to the web catalogue. This saves as a hidden analysis object in the OBIEE web catalogue and contains all of the information to recreate the dashboard when loading. All OBIEE security features are preserved, so users will only be able to access folders and reports they have permissions for.

Save to Web Catalogue

Finished dashboards can be viewed in the application once they have been saved. The dashboard viewer will show all dashboard objects in that folder as different pages, available from the left pane. Images can be exported to PNG and PDF as well as data from the visualisations exporting to Excel and CSV.

Viewing Dashboards

So How Do I Learn More?

The slides that I did at UKOUG describing Insights give a comprehensive overview of the design behind the tool. You can find them here.

Summary

In a nutshell, those are the main features of the application. Feel free to try it out and have a read through the documentation (available through the application itself or offline as HTML files in the docs directory).

As an open source application there is no official support, however if you experience any bugs or have any requests for enhancements, please post them on the issue tracker.

We hope you enjoy using the app and if you would like to enlist our expertise to help you deploy and develop using this platform, feel free to contact us to discuss it further.

.post-content img { max-width: 100%; }
Categories: BI & Warehousing

The Rittman Mead Open Source Project

Rittman Mead Consulting - Thu, 2016-12-08 08:00

We have a strong innovation spirit at Rittman Mead, with all staff encouraged to use technology to its best advantage in order to do things with the software that haven't been done before. Some of these projects may may be 'scratching the itch' of a repeated manual task that should be automated. Others use technology to extend the capabilities of the tools or write new ones to fill gaps that have been identified.

At Rittman Mead we pride ourselves in our sharing of knowledge with the BI/DI community, both 'offline' at conferences and online through our blog. Today we are excited to extend this further, with the release over the next few days and weeks into open-source of some key code projects: -

  • insights - a javascript API/framework for building a new frontend for OBIEE, building on the OBIEE web service interface, as described here
  • vpp - "Visual Plugin Pack" - innovative visualisation capabilities to use natively within OBIEE
  • obi-enhanced-usage-tracking - the ability to track and audit user behaviour per-click, as described here

They will be available shortly on the Rittman Mead GitHub repository. The license for these is the MIT licence.

These projects are in addition to existing code that we have shared with the community over the years, including the obi-metrics-agent tool and the popular OBIEE 11g Linux service script.

We're very excited about opening up these projects to the community, and would be delighted to see forks and pull-requests as people build and expand on them. It should go without saying, but these are contributed 'as is'; any bugs and problems you find we will happily receive a pull request for :-)

If you would like help implementing and extending these for your own project, we would be delighted to offer services in doing so - just get in touch to find out more.

Over the next few weeks keep an eye on the blog for more information about each project, and future ones.

Categories: BI & Warehousing

An Oracle DVD story of... DVDs

Rittman Mead Consulting - Thu, 2016-12-08 07:00

Have you ever wondered what the trend in movie releases has been for the past few decades? Comparing the number of Sci-Fi releases vs. Romantic Comedy releases? Me too, which is why I've taken my first look Oracle Data Visualization Desktop (DVD) to spot trends between these movie genres - Sci-Fi and the Romantic Comedy.

For this post, I found an interesting dataset from IMDB.com on Kaggle.com, listing a smattering of movies since the early 1900s from which to sort and analyze. For this example, I will contrast the number of releases between the two movie genres, looking for any possible relationship as to number of releases for both.
If you haven't installed the application yet, take a quick look at Matthew Walding's post for a good introduction. Oracle's DVD installer is fairly quick and simple, and you'll be creating visualizations in no time.

So, once the DVD application is running, we can create our first project:

Or, alternatively...


 Next, we'll need a data source:

And, we'll import the CSV-formatted file I downloaded from Kaggle.com earlier:


 Select the "movie_metadata.csv" file to import:

And, change the Name to "IMDB Movies A" for clarification:

After the file has been imported, we see a problem:

Clicking "More Detail", the following screen displays the detail we can use to troubleshoot the query error:

For troubleshooting, I used the highlighted value and found the problem is with the "budget" column, which requires a datatype change from "Integer" to "Double":

The next task is to create a method of identifying a specific movie genre, however, as you can see, all genre labels for each movie are stored in a pipe-delimited value of the genres column:

So, for this demonstration, I've chosen to add a calculated column for each genre I want to analyze, locating the desired string within the pipe-delimited value under the assumption that the same value, "Sci-Fi" for instance, is recorded with the same characters in every occurrence of each pipe-delimited value.

For the first column (data element), I chose the functions LOCATE and SIGN to provide a simple logical indicator (0 and 1) that can be aggregated (summed) easily.

Click the "Validate" button to verify syntax:

The LOCATE function returns a positive integer where the expression "Sci-Fi" is located in a given string, the genres data element in this case. The SIGN function subsequently returns either a 0, 1, or -1, depending on the sign of the resulting integer from the LOCATE function. 1 (one) indicates yes, this movie release includes a Sci-Fi label for genre. 0 (zero) indicates a missing Sci-Fi label for genre, for example.

Here is the new column, appended to our existing dataset:


Next, I will create another column to identify the Romantic Comedy genre - genre_RomCom_Ind, as follows:

With the two new data columns, our dataset is expanded accordingly:


Now, it's let's create the visualizations:


Let's create a bar graph for each of our new Indicator columns, starting with the Sci-Fi genre:


We'll create a filter to include data only for title years between 1977 and 2015:


Afterward, our initial graph appears as follows, with a default aggregate summing all genre_SciFi_Ind values (0 or 1) for each title year:


Now, let's add a similar bar graph for all Romantic Comedy (genre_RomCom_Ind) releases. Notice, the same filter for title year will be applied to this new graph:


Next, I'll change the labels for each graph, providing proper context for the visualization:


We can also change the aggregate method used for the graph, when necessary:


For an added touch, let's add a trend line to the bar graph for even easier viewing:


And now, our graph appears as follows:


Applying similar modifications to our Sci-Fi Releases graph and displaying both graphs together on Canvas 1, we have the following:


Next, we'll add this Canvas to an Insight, select Narrate, and add our own description of any interesting comparisons we can identify:


In viewing the two graphs side-by-side, we notice one interesting outlier that, in year 2010, the number of Romantic Comedy releases outnumbered Sci-Fi releases by 21 movies, and on this Insight, I can enter a description (narration) beneath the graphs, highlighting this departure from the plotted trend line, as shown below. Another interesting, and unexpected, trend we see is the decreasing number of Romantic Comedy releases after 2008. But, do these two graphs display an obvious relationship, or correlation, between the two genres, either positive or negative? If we look at each trend line between 1990 and 2008, each movie genre shows an increasing number of releases, generally speaking, and leaves a somewhat inconclusive determination as to correlation, although, the periods after 2008 seem to indicate an inverse relationship.


At this point, the project can be saved (with a new title), exported, and/or printed:


In this post, I've demonstrated a basic example as an introduction to Oracle Data Visualization Desktop, with IMDB movie data, to visually quantify the number of movie releases in a given timeline, 1977 to 2015, creating custom calculations and dynamic visualizations for our particular measures.

One note I would add is that I did not attempt to eliminate any overlapping indicators, which were minimal, in the newly-added columns, genre_Sci-Fi_Ind and genre_RomCom_Ind. Although, it is possible that a movie can be labeled as all three - Sci-Fi, Romance, and Comedy, it did not distort the overall trend. And, these graphs now create a question - Can we know what influences the release of Sci-Fi movies and their increasing popularity? Are Romantic Comedy movies truly decreasing in popularity or is the movie studios choice to decrease the number of RomCom releases because of the surge in Sci-Fi releases? We all understand limitations of all studios investment capital, but must the RomCom genre suffer because of the Sci-Fi genre? If so, why? I realize this is a simplified view of the trend, but does lend itself to more scrutiny among other genres as well.

Categories: BI & Warehousing

Introducing On Demand Training from Rittman Mead

Rittman Mead Consulting - Mon, 2016-12-05 08:00

Rittman Mead is happy to announce that its much anticipated On Demand Training (ODT) service is live, giving people the opportunity to receive expertly written & delivered self-paced training courses that can be accessed online anywhere, at anytime.

We have been delivering technical & end-user courses based on Oracle Analytics products all over the world for the past decade.

While our classroom sessions continue to offer an unrivalled experience for our trainees, we understand that in the modern era, flexibility is important.

ODT has been built based on the feedback of our clients and network so that you can:

  • Experience our training regardless of the distance to travel

  • Keep your member’s of staff on site at crucial periods in your company’s calendar

  • Give participants the ability to reinforce the lessons they’ve learnt afterwards

Learn

Use Rittman Meads LMS as your virtual classroom to access all course materials, lesson demos and slides

Practice

Get hands on with your very own cloud based training environment

Engage

Submit questions to Rittman Meads Principal Trainer network on subjects that might be specific to your everyday use of the product

Each course provides 30 days access to the above, ensuring you have enough time to learn at your pace and re-enforce each lesson.

We’re feeling particularly seasonal down here in Brighton, so to celebrate the launch of our platform we’re offering a 40% discount on our first live course OBIEE 12c Front End Development & Data Visualization between now and January 31st.

Simply use the discount code RMODT12C on checkout to take advantage of this exclusive offer.

For more details and to start your On Demand learning experience with Rittman Mead please check out:

  • Our webpage where you can find out more information about ODT and register an account for our LMS
  • The Rittman Mead LMS where you can view our course catalog and purchase courses

You can also contact training@rittmanmead.com if you have any questions about the service.

Happy Learning!!!

Categories: BI & Warehousing

Work-Life Balance at Rittman Mead

Rittman Mead Consulting - Wed, 2016-11-30 20:00
Work-Life Balance at Rittman Mead



Rittman Mead has always had a long standing commitment to giving back, not only to the technology industry, but to local and global communities as well.  Recently, Rittman Mead employees have been encouraged to take up to 40 paid hours to participate in community service opportunities.  This year, I chose to use my 40 hours to serve at an orphanage in the Dong Nai province of Vietnam.  The Bien Hoa Center for Supporting and Vocational Training is an orphanage that currently serves 53 children, ranging in age from infants to 16 year olds.

Work-Life Balance at Rittman Mead

Additionally, the Bien Hoa Center was home to my recently adopted son for over 7 years.   So this was a place near and dear to my heart. The orphanage is run by a very attentive staff, who do a great job caring for the kids, despite having very limited resources.   Many of the children are learning english and other useful skills that will serve them well once they leave the orphanage, either through aging out of the program or through adoption.  While we were there, my family and I were able to play with the older children and comfort the babies.  It was a pleasure to see them display such beautiful, wide smiles despite their difficult situations.   Work-Life Balance at Rittman Mead We were also able to deliver a gift donated by many of my generous Rittman Mead colleagues, which included over 60lbs of art supplies, candy and toys.  Despite our consultants being separated across many states, once I posted an opportunity to contribute, gifts just started arriving at our Atlanta office, where I hang my laptop bag.

Work-Life Balance at Rittman Mead

It is truly a pleasure working with such compassionate people and having a management team that values more than just profits.  Caring about the causes that are important to employees is a big part of the Rittman Mead culture. This attitude, coupled with numerous family friendly work events, makes employees feel like more than simply a cog in the wheels of a profit machine. At Rittman Mead, employees are supported in their pursuit of a healthy work-life balance and that is one of the big reasons I am proud to work here.


Categories: BI & Warehousing

Another Thanksgiving Email

Tim Tow - Sat, 2016-11-26 16:06
Here is another email we got Wednesday from a Dodeca Excel Add-In for Essbase customer.  This customer has a lot of VBA macros running some automation with Essbase and asked for some assistance.  We told it was as easy as replacing they Essbase function declarations file with our Dodeca Add-In  function declarations file, and then setting the variables that contain the location for the Dodeca-Essbase server.  In other words, replace this file:

With this file:





Easy, right? Here is the email:

I hadn't had a chance to test this massive file out with the latest version of the add-in yet, but was more than pleasantly surprised when I replaced the add-in code for the Dodeca wrapper and the only thing I had to do was change the connection to our new server and the file was live! It's truly drop and go! Thanks so much!!!!



Categories: BI & Warehousing

Use REST API to get the data

Dylan's BI Notes - Sat, 2016-11-26 10:37
We now see more and more data are available in REST API.  Here are some of the data sources I worked earlier: ServiceNow Table API, REST API for Oracle Service Cloud, Here is a check list to enable a data source for BI: 1. How does the service perform the authentication? In the case of […]
Categories: BI & Warehousing

Rittman Mead at UKOUG Tech 16

Rittman Mead Consulting - Thu, 2016-11-24 11:52
Rittman Mead at UKOUG Tech 16

This year as always Rittman Mead is coming to UKOUG Tech 16 with a strong presence and a great line up of sessions covering OBIEE, ODI, Kafka, advanced visualisation and more. And yes, there will be Cloud!

Rittman Mead at UKOUG Tech 16

Here is the details of the Rittman Mead sessions :

There is no better way to finish a conference than with two success stories from our recent engagements! If the OBIEE 12c upgrade depicted in Francesco's session is something you are also looking to achieve, we would be pleased to tell you more about it and to see how we can help you.

And of course, we are also happy to answer any questions if you see us in sessions or around the conference. You can find some of us during the Oracle Big Data meetup (Monday evening) or the ODTUG Data and Analytics Switzerland meetup (Tuesday evening).

So see you in two weeks in Birmingham !

Categories: BI & Warehousing

Happy Thanksgiving (Featuring an Email That I Am Thankful For)

Tim Tow - Wed, 2016-11-23 11:12

I get tons of email every day and often have hundreds of emails that have to track on a daily basis which explains, in part, my absence from doing recent blog entries.   Most of the emails I track are sales and support relating to Dodeca.  After all, with Dodeca, the buck stops here, right?  I also get some spam in the mix.  Sometimes, there is an email that really makes my day.  Here is one of those types I received this morning:

From: (masked)
Sent: Wednesday, November 23, 2016 8:48 AM
To: Applied OLAP Support
Subject: Happy Thanksgiving!

Although we are a smaller Essbase shop at 105 users, our users are strong advocates for Essbase and use it extensively. So, it is great timing to say how thankful I am to you and your teams for developing the Dodeca Essbase Add-In. At the end of the day, it’s the user experience that drives the support for Essbase and your product will excite our user base! [And, it requires little or no support from my team!]

Happy Thanksgiving to all!


Wow!  This email really made my day!  To our new customer that sent this email (and who graciously granted me permission to post her words), Happy Thanksgiving to you as well.  We hope this is the first of many Thanksgivings that we work together!


Categories: BI & Warehousing

How to Use Versioning in ODI 12c

Rittman Mead Consulting - Wed, 2016-11-23 09:00
How to Use Versioning in ODI 12c

How many times have you been working on a project and something goes wrong, or the power shuts off, or you go on vacation and someone has messed with your code or somehow your work is lost? Well, now you have an alternative to safe proof your project work.

Versioning in ODI is allowed at various hierarchy levels within the instances and is stored in the master repository. What this means is that if you have multiple work repositories connected to the same master, you can see all the versions when connected to either work repository.

A version is a backup copy of an object that is saved as a checkpoint in ODI. ODI allows you to version Projects, Folders, Packages, Scenarios, Load Plans, Mappings, Procedures, Knowledge Modules, Models, Model Folders and Solutions. You will need to decide which objects to create and manage versions for, but this tutorial will review the process using Packages.

Fast Review: A package is made up of steps organized into a diagram that is executed. The steps include mappings, variables, procedures, ODI tools (such as OdiXMLConcat, OdiZip, etc), models, sub-models and datastores.

  1. Connect and move to the Designer Navigator and expand the Projects folder.

    How to Use Versioning in ODI 12c

  2. Select and expand Packages. In our example I will open up Target Data Load.

    How to Use Versioning in ODI 12c

  3. As you see below a well trained ODI developer has set up the load plans to run in a specific execution and if any fail, to send an email.

    How to Use Versioning in ODI 12c

    In addition to a fail notification, a new manager also wants to know the package executed successfully. We will create an original version of the package and then add a new email notification. Once we have finished our changes, we will create a new version.
  4. Right-click on the Target Data Load package and select Version > Create Version

    How to Use Versioning in ODI 12c

  5. Type 'Target Data Load v1' as the name and 'Existing Target Data Load package (original)' in the description. You can name the original (or next version, depending on where you are in your versions) whatever you would like. Best practice is to keep the name simple and a version. Make sure to put more descriptive details about the purpose of the version in the description box below the version name.

    How to Use Versioning in ODI 12c

    Now that we have our original version safely created, we can make our changes to the package.

  6. If it is not expanded, expand the Internet accordion of the ODI toolbox and locate OdiSendMail. Highlight OdiSendMail and place it on the canvas to the right

    How to Use Versioning in ODI 12c

  7. Using the toolbar connect your last mapping to OdiSendMail 2 using the green ok arrow. Remember to click on the mapping and drag your cursor to OdiSendMail 2.

    How to Use Versioning in ODI 12c

  8. Make sure to click on the white cursor from menu (above the Toolbox, to the left of the green arrows) and click Odi Send Mail 2 so the properties window populates with the mail server information. Press Save. Contact your company email or network admin to get the required smtp information.

    Pro-tip: For our demonstrations and tutorials we use mailtrap.io. It is a working dummy smtp testing server.

  9. Once completed your screen should look similar to the image below.

    How to Use Versioning in ODI 12c

  10. Before we version the changes, we should test them. Execute the new package by right clicking on the newly modified package, in our tutorial it will be Target Data Load. Click OK and accept the defaults in the Run window.

    WARNING: Make sure you are in the correct environment for testing and development. DO NOT run any package that will change or affect any real data.

    How to Use Versioning in ODI 12c

  11. Now check your execution under the Operator tab to make sure it was successfully executed.

    How to Use Versioning in ODI 12c

  12. Now we will create a new version of the package that will be the latest version. We will repeat earlier steps, so return to the Designer Navigator > Projects window and click on the Target Data Load package. Right-click Target Data Load package and go to Version > Create Version. Note when the window appears you see the original version in the bottom box.

    Leave the default name and change it to v2. Then for a description put 'Added successful execution for Target Data Load mappings OdiSendMail' and click ok.

    How to Use Versioning in ODI 12c

    You have now created 2 versions of your package: the original version and the modified version with the success email. Versioning is a key feature in ODI that really allows multiple developers to interact and work together to achieve the best results for data integration projects.

Special Note:

In this post, we reviewed how to use internal versioning in ODI. Rittman Mead always recommend to use an external configuration management systems (ex: GitHub) in ODI releases earlier than 12.2.1.2, rather than the internal versioning. In the next ODI patchset, there will be integration with Git for better work flow control. Subversion integration is currently available as of ODI 12.1.3.

If you are interested in seeing how to use Git in conjunction with older versions of ODI (prior to 12.2.1.2.6) stay tuned and check out an upcoming video here using Rittman Mead Principal Consultant Pete Tamsin's method for 'Using Git in an ODI Procedure'.

Huge thanks to Pete Tamsin and Michael Rainey for their help editing this post. No man, or woman, is an island!

Find me on:

How to Use Versioning in ODI 12c
How to Use Versioning in ODI 12c



Related Videos:





Categories: BI & Warehousing

Catalog Validation: Why, What, When, Where and How?

Rittman Mead Consulting - Tue, 2016-11-22 02:04

One of the features everybody "loved" about OBIEE 11g were the Global Unique Identifiers (GUIDs), used to recognize users and groups based on an identifier that could be different from the username. The original aim of GUIDs was being able to distinguish different users sharing the same username coming from multiple Authentication Providers.

The GUIDs management could be tricky especially if they are not in sync between different environments, and could cause a wide range of errors like the inability to login or to see parts of the catalog.

[2016-10-20T09:19:04.000+02:00] [OBIPS] [ERROR:1] [] [saw.security.validate.indexes] [ecid: 0058cGJgGOkBh4sawh3j6G0001QC00000B,0] [tid: 2002437888] XXXX's guid 0A8AC9E0811D11E4AF4FE155B36CBFFD in catalog doesn't match guid 49BB3BB0629311E5BFFE71BB91F31C2B in backend! Aborting! Please UpdateGuids!

After checking the Presentation Services logs (sawlog.log), the solution for most of those errors was simply regenerating GUIDs. The GUIDs regeneration method however isn't something easily doable in a production system since it requires some downtime (a reboot of both the Oracle BI Server and Presentation Services is required).

Why Would you Run Catalog Validation?

You may ask yourself:

Why is he talking about GUIDs when they have been removed in OBIEE 12c?

And you would be perfectly correct. GUIDs misalignment is not a problem anymore in OBIEE 12c but was historically only one of the issues causing catalog corruption and that would require afterwards a catalog validation.

Even without GUIDs catalog corruption is still something that could happen in OBIEE 12c: objects (e.g. analysis, dashboards, agents) owned by deleted users, broken links, corrupted files in the server are only some of the issues that could be present in any OBIEE implementation no matter which version it's installed.
Most of the time corrupted catalogs generate errors which are difficult to diagnose and the manual fixing is not always possible and never easy to do.

The Catalog Validation process, available since OBIEE 11g, is very powerful since provides a detailed analysis - and an automated fix if configured - of all the catalog corruptions.

What is Catalog Validation?

As per Oracle's documentation, the Catalog Validation (CV) procedure does the following checks:

  • Ensures that each object in the catalog is larger than zero bytes: any object with zero bytes size is probably due to corruption and should be removed.
  • Ensures that each item in the catalog has a valid corresponding .atr file: the .atr file contains the properties (permissions, ownership, creation date, modification date etc.) of any object in the catalog. An object without related .atr file is not visible in OBIEE's front-end.
  • Ensures that each link in the catalog is valid: links to deleted or renamed dashboards and analysis will cause an error when clicked.
  • Ensures that the files in the account cache are valid: this step checks that all the accounts are valid and the cache entries (storing user related information) are up to date.
  • Ensures that all XML objects in the catalog pass schema validation: every object (dashboard, analysis, prompt etc.) in the catalog is stored as XML file. This step checks that the XML is valid.
  • Attempts to repair object names that were damaged by ftp programs: moving catalog objects using ftp programs could corrupt the object name.
When Should You Run Catalog Validation?

I've seen Catalog Validation being used only when problems were raised, however it is a good practice to validate the catalog every time a major change is made that impacts it or on a schedule in environments where end users can directly create content.

The following is a list of cases when running a Catalog Validation could be useful:

  • Before an upgrade: running CV before an upgrade and ensuring the consistency helps avoiding problems related to possible corruptions
  • After an upgrade: running CV after an upgrade to ensure that content and security migration worked
  • After a major change: when a major change happens in the catalog CV ensures to missing links or ownership problems are present
  • After a deployment: executing CV after a production deployment to check the content migration and verify the security.
  • On a schedule: execute CV on instances where end-users can create content and to verify accounts.

Please note that a catalog can have corruption even if no front-end enhancements have been made, the following are just some examples:

  • Developer account deletions: all objects owned by that account will be flagged as corrupted
  • Security changes: changing/deleting security roles impact catalog privileges
  • File System corruption: data can be badly written in file system
  • Content deletions: deleting content makes referring objects corrupted

Sometimes the OBIEE environment continues working as expected even if some of the above corruptions are present. Nevertheless on a long period those may be cause of errors especially if upgrades or changes in the security are planned.

Where Do You Run Catalog Validation?

Catalog Validation can be run in every OBIEE instance available, however the following use cases could be particularly interesting:

  • Validating development catalog: once consistency of development catalog is ensured it can then be migrated forward to production
  • Validating production (or smoke test) catalog: validating production catalog to ensure that code promotions happened consistently, that user homes are valid and that no objects (user created or promoted) are broken.

A particularity to note down is that if running CV with a production catalog in a different environment (e.g. development) with a different security store, then many accounts and their related content could be flagged as not-existent and deleted. As a general rule CV should be run on environments sharing the same security as where the catalog is sourced from, allowing a genuine check of the security settings.

Performing a Catalog Validation in production environment is not always possible due to the processes restarts required, a smoke test environment sharing the same security settings would be the perfect target for the test. When running Catalog Validation on a live catalog or when taking a catalog backup ensure that "Maintenance Mode" is activated: setting this flag ON (that can be found under Administration page in OBIEE's front-end) ensures that no changes can be performed in the catalog during the check or upgrade.

Maintenance Mode

How Do You Run Catalog Validation?

In order to run Catalog Validation you need to:

  • Stop Presentation Service[s] (obips): Stopping the component can be achieved either in Enterprise Manager or via command line. Command line syntax has changed between OBIEE 11g and 12c, you can find the two statements in below code
# 11g Syntax
$INSTANCE_HOME/bin/opmnctl stopproc ias-component=obips1
# 12c Syntax
$INSTANCE_HOME/bitools/bin/stop.sh -i OBIPS
  • Create a backup of the catalog: when performing a catalog backup 7-Zip should be the chosen tool. WinZip has know problems with catalog files (see Oracle's doc, chapter "17.10 Archiving and Unarchiving Using Catalog Manager").
  • Create a backup of instanceconfig.xml file (under $INSTANCE_HOME/config/fmwconfig/biconfig/OBIPS)
  • Change instanceconfig.xml file in order to include the validation tags explained in the following section
  • Start Presentation Service[s]: like the stop operation, this can be performed either via EM or command line. Below the code for 11g and 12c
# 11g Syntax
$INSTANCE_HOME/bin/opmnctl startproc ias-component=obips1
# 12c Syntax
$INSTANCE_HOME/bitools/bin/start.sh -i OBIPS
  • Repeat the steps above until the catalog is fully validated: As explained in section below, several different assessment and automated fixes can be performed. The sawlog.log files will contain entries when corrupted object are present in the catalog. A catalog is fully validated when no corrupted objects are found during CV.
  • Stop Presentation Service[s]
  • Restore original instanceconfig.xml file
  • Start Presentation Service[s]
Catalog Validation configuration

The following tags must be inserted under <ServerInstance><Catalog> tag.

<?xml version="1.0" encoding="UTF-8" standalone="no"?>  
    <!-- Oracle Business Intelligence Presentation Services Configuration File -->
    <WebConfig xmlns="oracle.bi.presentation.services/config/v1.1">
    <ServerInstance>
        [...]
        <Catalog>
              <Validate>OnStartupAndExit</Validate>
              <ValidateAccounts>Clean</ValidateAccounts>
              <ValidateHomes>Report</ValidateHomes>
              <ValidateItems>Clean</ValidateItems>
              <ValidateLinks>Clean</ValidateLinks>
        </Catalog>
        [...]
    </ServerInstance>
</WebConfig>  

The tags do the following. See below for an explanation of the values that can be specified:

  • Validate: Main configuration tag. Possible values are
    • None: No Catalog Validation is going to happen, however all the privileges and each object ACLs are cleaned for non-existing accounts
    • OnStartupAndExit: Presentation Service is started, performs the validation based on the following tags and stops. This process can be reiterated multiple times with different options for each element.
  • ValidateAccounts: Verifies the consistency of users, roles and groups.
  • ValidateHomes: Verifies all user's homes, is executed only if ValidateAccounts is set to Report or Clean
  • ValidateItems: Verifies if catalog items are consistent - size greater than zero and valid xml.
  • ValidateLinks: Verifies the consistency of all links in the catalog (e.g. all analysis contained in a dashboard).

The accepted values for all settings except Validate are: are the following:

  • None: no validation will be performed
  • Report: a log is written for every inconsistent item in sawlog.log file under $INSTANCE_HOME/servers/obips1/logs
  • Clean: does the same step as Report plus removing from the catalog the inconsistent object.

As you understand the "Clean" option isn't suggested for all tags, you don't want a dashboard to be deleted only because the owner doesn't exist anymore, but it is the desired choice when you need to remove all the old or corrupted user homes. The "Report" option on the other side provides a way of logging all the corrupted items and fixing them manually.

Catalog Validation is an extremely useful tool, allowing an automated check (and fix) of all the corrupted items in the catalog. Using Catalog Validation together with Baseline Validation Tool provides a way of ensuring the correctness of migrations and developments:

  • Running Catalog Validation before the migration to ensure all objects to promote are consistent
  • Running Catalog Validation after the migration to ensure the consistency of all promoted objects and security
  • Running Baseline Validation Tool between source and target environment to ensure the expected outputs are matching.

Summarizing Catalog Validation and Baseline Validation Tool can be considered complementary: the first checks the catalog objects and security consistency, the second analyses and compares the expected results. Running both alongside any code promotion process should be considered a good practice.

Categories: BI & Warehousing

Common OBIEE Implementation Mistakes That Impact Performance

Rittman Mead Consulting - Thu, 2016-11-17 04:19

I've recently performed several assessments to help our clients diagnose and fix their OBIEE performance problems. They often share similar design or implementation issues, and I thought it would be useful to present them here as a handy reference:

Mistake 1: No Software Monitoring

The key to happy users is to know there’s a problem before they do, and even better, fix it before they realise. Availability of comprehensive performance metrics is vital to achieve this goal, and Usage Tracking data represent the natural starting point. When enabled in OBIEE, Usage Tracking collects statistics for each query and is helpful to understand usage patterns across dashboards and users, and to analyse performance trends and diagnose specific issue. Whilst Usage Tracking is generally always enabled and data it generates reviewed regularly, other software metrics are not gathered at all or are not available to the BI team, giving only a partial view of the picture.

In addition to gathering and monitoring Usage Tracking data, we would strongly recommend:

  • Proper OS monitoring to capture full breadth of OS metrics on both application and database servers including CPU %, Load Average, Memory, Disk I/O, Disk Busy %, Network I/O. Store historic data for trend analysis with appropriate aggregations (so as to not blunt peaks), and near-realtime granular data for interactive performance diagnostics.

  • OBIEE Dynamic Monitoring Service (DMS) metrics collection. These metrics provide information regarding Oracle Fusion Middleware components' performance, state, and on-going behaviour.

    OBIEE Performance Analytics Dashboards

All of these can be done and visualised with Rittman Mead OBIEE Performance Analytics Dashboards. Have a look at these videos to see exactly how the dashboards can help, and get in touch with us to request a demo!

Mistake 2: Randomly Changing Configuration Settings

It’s a common belief, or hope depending on available hardware, among online gamers that by changing configuration settings you can fix any software performance issue. Unfortunately OBIEE is not a game and this is not true! Whilst OBIEE does not provide any Boost Performance switch, the defaults are generally a good starting point, and unnecessary fiddling without good reason should be avoided:

  • You can introduce additional problems by mistake.

  • It makes it hard to track what's changed from a default config.

  • If it fixes the problem, could be coincidence - no actual understanding of the underlying problem without proof of it.

We at Rittman Mead are firm believers in a practical and logical approach to performance analytics and optimisation. Eschewing the drunk man anti-method of ‘tuning’ configuration settings at random, we advocate making a clear diagnosis and baseline of performance problems before changing anything. Once a clear understanding of the situation is established, steps are taken in a controlled manner to implement and validate one change at a time.

Mistake 3: Data Transformation at Query Time

You can usually find a lot of data transformation being done in the RPD itself. This is one of the most useful functionality of OBIEE, since data re-modelling in-flight at query time is usually a lot quicker to implement rather than changing the underlying data model and / or ETL processes.

Wrangling data this way in a small number of instances is common, but widespread use is indicative of a data model and / or ETL design that needs modifying to take into account the report requirements, it's difficult to maintain, and it results in less efficient SQL, as well as unnecessary complex SQL.

  • Conditional transformation - Business is always changing and it may require to group data in a way that was not expected when data model has been designed. For example, you can create a logical column using CASE WHEN statement to define a new country grouping which is not present in your data model.

    Conditional transformation example

  • Data cleansing - Sometimes you may need to integrate social networks data into your corporate reports, but you have limited or no control over these contents. For example, you can use the TRIM function to deal with blanks that are expected in a column containing users' reviews for your products.

    Data cleansing example

  • Data conversion - The classic use case for data conversion is when you have to manipulate dates to provide the required format. For example, you can use CAST and Calendar/Date functions to create the format YYYY-MM, required to define the Month level's key of your Date hierarchy, from a date column.

    Data conversion example

  • Opaque views - An opaque view is a physical layer table that consists of a SELECT statement. The typical use case for opaque views is when you have to turn a transactional data model into a star schema. However opaque views mask the true data structure from OBIEE, preventing it from best generating optimal data access queries, and they should be used only when there are no other available solutions.

We recommend to do this kind of transformations once, at ETL time, rather than every time a particular table or column is used in a query.

Mistake 4: Underuse of Double Column Feature

Double Column is probably one of the most undervalued and unused features of OBIEE. It provides a mechanism for associating two logical columns: one column provides the display and description values such as the description of an item; the second column provides a descriptor ID or code column.

Use should be made of the Double Column feature to enable OBIEE to generate more efficient database physical SQL. The principle behind this is that instead of queries running using string-based predicates which may not be indexed in the database or could be less efficient, the user sees the string version of an attribute whilst OBIEE uses the corresponding numerical ID column in the generated query when it queries the database, thus generally creating more optimal execution plans.

As an example, to do this we start with assigning the P0 Product Number column as the descriptor ID column in the Business Model and Mapping layer for P1 Product column:

Double column feature setup example

Let's now create a simple report in Answers containing T05 Per Name Year and 1- Revenue columns, and apply a filter on P1 Product as well. If you are selecting is equal to / is in, is not equal to / is not in or is between operators, then filtering by code values checkbox will be available:

Filter by code values example

Let's check it, and then see, in terms of SQL, how the query filter is generated:

select sum(T216.Revenue) as c1,  
     T795.PER_NAME_YEAR as c2
from  
     BISAMPLE.SAMP_PRODUCTS_D T451 /* D10 Product (Dynamic Table) */ ,
     BISAMPLE.SAMP_TIME_QTR_D T795 /* D03 Time Quarter Grain */ ,
     BISAMPLE.SAMP_REVENUE_FA2 T216 /* F21 Rev. (Aggregate 2) */ 
where  ( T216.Prod_Key = T451.Prod_Key  
         and T216.Prod_Key = 7
         and T216.Bill_Qtr_Key = T795.QTR_KEY
         and T451.Prod_Key = 7 ) 
group by T795.PER_NAME_YEAR  

Double Column feature can also be used to define language independent filters as my former colleague Venkat already noted in another post.

Mistake 5: Suboptimal Report Design

Certain suboptimal report designs that I've seen recently include overcrowded analyses, abuse of table / pivot prompts, and OBIEE used as a data extraction tool.

Overcrowded analyses / Overuse of the Excluded area in Views

An overcrowded analysis is an analysis with an awful lot of columns included in Criteria tab and several different views with many excluded columns. These views can then be displayed all together or spread out different dashboard pages.

This pattern is quite common with clients that migrated from Hyperion Interactive Reporting and the main motivation for it - there is only one analysis to maintain - may sound good. However, since Oracle BI Server retrieves results for all columns specified in Criteria tab, we recommend to create several analyses with a single view rather than adopt this pattern. In this way the BI Server generates more efficient queries and avoids on the fly data aggregation.

As an example, you can use Oracle Sample Application v607 to create and run an analysis with a Line-Bar view as follows. Note the huge number of columns in the Excluded area.

Overcrowded analysis example

This will results in a simple chart with 3 bars and a line connecting 3 points:

Line-Bar view example

Let's open obis1-query.log log file and have a look at the physical SQL generated by the BI Server:

select sum(T418.Units) as c1,  
     sum(T418.Revenue) as c2,
     T762.Name as c3,
     T666.Office_Dsc as c4,
     T666.Company as c5,
     T418.Order_Status as c6,
     T451.Prod_Dsc as c7,
     T451.Brand as c8,
     T437.Employee_Key as c9,
     T42.Calendar_Date as c10,
     T42.Per_Name_Year as c11,
     T666.Office_Key as c12,
     T762.Cust_Key as c13,
     T451.Prod_Key as c14,
     T666.Company_Key as c15
from  
     BISAMPLE.SAMP_PRODUCTS_D T451 /* D10 Product (Dynamic Table) */ ,
     BISAMPLE.SAMP_TIME_DAY_D T42 /* D01 Time Day Grain */ ,
     BISAMPLE.SAMP_CUSTOMERS_D T762 /* D60 Customers */ ,
     BISAMPLE.SAMP_OFFICES_D T666 /* D30 Offices */ ,
     BISAMPLE.SAMP_EMPL_D_VH T437 /* D50 Sales Rep (Parent Child Hierarchy) */ ,
     BISAMPLE.SAMP_EMPL_PARENT_CHILD_MAP T490 /* D51 Closure Table Sales Rep Parent Child */ ,
     BISAMPLE.SAMP_REVENUE_F T418 /* F10 Billed Rev */ 
where  ( T42.Calendar_Date = T418.Bill_Day_Dt and T418.Cust_Key = T762.Cust_Key and T418.Prod_Key = T451.Prod_Key and T418.Empl_Key = T490.Member_Key and T418.Office_Key = T666.Office_Key and T437.Employee_Key = T490.Ancestor_Key )  
group by T42.Calendar_Date, T42.Per_Name_Year, T418.Order_Status, T437.Employee_Key, T451.Prod_Key, T451.Prod_Dsc, T451.Brand, T666.Company_Key, T666.Office_Dsc, T666.Company, T666.Office_Key, T762.Name, T762.Cust_Key)  

The physical SQL code doesn't look as simple as the resulting view! But the worst is yet to come: it will take about 7 seconds to complete and 256k rows (1.5 GB) will be retrieved from the database!

Let's remove all the excluded columns from the analysis and see what happens to the physical SQL:

select sum(T216.Units) as c1,  
     sum(T216.Revenue) as c2,
     T795.PER_NAME_YEAR as c3
from  
     BISAMPLE.SAMP_TIME_QTR_D T795 /* D03 Time Quarter Grain */ ,
     BISAMPLE.SAMP_REVENUE_FA2 T216 /* F21 Rev. (Aggregate 2) */ 
where  ( T216.Bill_Qtr_Key = T795.QTR_KEY )  
group by T795.PER_NAME_YEAR  

Definitely far better than it looked before. Moreover it will take about 0.2 seconds - 97% less - to complete and only 3 rows (3 KB) - 99% less - will be retrieved from the database.

Abuse of Table / Pivot prompts

Table / Pivot prompts provides an interactive result set that enables users to select the data that they want to view. However Table prompts doesn’t apply any WHERE condition to the query issued by the BI Server: it will retrieve the full result set first, and then select the data to display on the fly. For this reason we recommend to use Dashboard prompts rather than Table / Pivot prompts.

As an example, you can create and run an analysis with a Pivot Table view as follow:

Pivot prompts example

This will results in a pivot table with 8 rows and 25 columns:

Pivot Table view example

Again, let's open obis1-query.log log file and have a look at the physical SQL generated by the BI Server:

select sum(T418.Units) as c1,  
     sum(T418.Revenue) as c2,
     T666.Company as c3,
     T451.Prod_Dsc as c4,
     T451.Brand as c5,
     T653.Per_Name_Month as c6,
     T653.Per_Name_Year as c7,
     T666.Company_Key as c8,
     T451.Prod_Key as c9
from  
     BISAMPLE.SAMP_PRODUCTS_D T451 /* D10 Product (Dynamic Table) */ ,
     BISAMPLE.SAMP_TIME_MTH_D T653 /* D02 Time Month Grain */ ,
     BISAMPLE.SAMP_OFFICES_D T666 /* D30 Offices */ ,
     BISAMPLE.SAMP_REVENUE_F T418 /* F10 Billed Rev */ 
where  ( T418.Prod_Key = T451.Prod_Key and T418.Bill_Mth_Key = T653.Mth_Key and T418.Office_Key = T666.Office_Key )  
group by T451.Prod_Key, T451.Prod_Dsc, T451.Brand, T653.Per_Name_Year, T653.Per_Name_Month, T666.Company_Key, T666.Company  

As expected no WHERE condition based on selected values in pivot prompts was applied to the query. It will take about 0.3 seconds to complete and 2k rows (10 MB) will be retrieved from the database.

Now let's replace pivot prompts with dashboard prompts and see what happens to the physical SQL:

select sum(T418.Units) as c1,  
     sum(T418.Revenue) as c2,
     T451.Prod_Dsc as c3,
     T653.Per_Name_Month as c4,
     T451.Prod_Key as c5
from  
     BISAMPLE.SAMP_PRODUCTS_D T451 /* D10 Product (Dynamic Table) */ ,
     BISAMPLE.SAMP_TIME_MTH_D T653 /* D02 Time Month Grain */ ,
     BISAMPLE.SAMP_OFFICES_D T666 /* D30 Offices */ ,
     BISAMPLE.SAMP_REVENUE_F T418 /* F10 Billed Rev */ 
where  ( T418.Prod_Key = T451.Prod_Key and T418.Bill_Mth_Key = T653.Mth_Key and T418.Office_Key = T666.Office_Key and T451.Brand = 'BizTech' and T653.Per_Name_Year = '2013' and T666.Company = 'Genmind Corp' )  
group by T451.Prod_Key, T451.Prod_Dsc, T653.Per_Name_Month  

It looks better than before, huh? The optimised code will take about 0.1 seconds - 67% less - to complete and 96 rows (0.2 MB) - 95% less - will be retrieved from the database.

OBIEE as a data extraction tool

A large number of clients are using OBIEE mostly as a feed for Excel-marts. Using OBIEE simply as a data extraction tool for lots of data into Excel or other destinations can cause both performance and functional problems. OBIEE is designed to be primarily used as a reporting and analytics tool, with users interacting with the data within OBIEE and, as my colleague Francesco Tisiot already noted in another post, almost any transformation doable in Excel can be achieved in OBIEE, probably faster and with zero impact on local workstations.

This behaviour could highlight a lack of end user training and / or communication between them and BI developers. End users may don't know OBIEE at all, but are used to working with Excel: at the end of the day people just want their job done as painlessly as possible. We recommend to talk to your end users about what they're doing with the data in their destination system, and whether the same can be achieved within OBIEE. If users really do just need the data in Excel, then you should look at how best to do it (Oracle Doc ID 1558070.1).

Conclusion

I've shown here some of the pitfalls that are commonly made with OBIEE. They can impact the performance for end-users, as well as making life more difficult for those building and maintaining the reports.

If you'd like Rittman Mead to take a look at your OBIEE implementation and make sure you're getting the best out of it, please do get in touch!

Categories: BI & Warehousing

Oracle Data Integrator 12c: Getting Started - Developer's Quickstart

Rittman Mead Consulting - Mon, 2016-11-14 13:18

I’ve decided that it’s time for a refresher on Oracle Data Integrator 12c. This week in the “Oracle Data Integrator 12c: Getting Started” series: getting a quick start on mapping development. Several objects must be created before a single bit of ETL can even be created, and for those who are new to the product, as many readers of this series will be, that can be frustrating. The objects that must be in place are as follows:

  • Data Server
  • This object is the connection to your data source. Created under one of the many technologies available in ODI, this is where the JDBC url, username, password, and other properties are all created and stored.
  • Physical Schema
  • Underneath the Data Server you’ll find the Physical Schema. This object, when connecting to a relational database, represents the database schema where the tables reside that you wish to access in ODI.
  • Logical Schema
  • Here’s where it can sometimes get a bit tricky for folks new to Oracle Data Integrator. One of the great features in ODI is how it abstracts the physical connection and schema from the logical objects. The Logical Schema is mapped to the Physical Schema by an object called a Context. This allows development of mappings and other objects to occur against the Logical schema, shielding the physical side from the developers. Now when promoting code to the next environment, nothing must changed in the developed objects for the connection.
  • Model
  • Once you have the Topology setup (Data Server, Physical Schema, Logical Schema), you can then create your Model. This is going to be where the logical Datastores are grouped for a given schema. There are many other functions of the Model object, such as journalizing (CDC) setup, but we’ll save those features for another day.
  • Datastore
  • The Datastore is a logical representation of a table, file, XML element, or other physical object. Stored in the form of a table, the Datastore has columns and constraints. This is the object that will be used as a source or target in your ODI Mappings.

Now you can create your mapping. Whew!

Over the years, Oracle has worked to make the process of getting started a lot easier. Back in ODI 11g, the Oracle Data Integrator QuickStart was a 10 step checklist, where each step leads to another section in the documentation. A nice gesture by Oracle but by no means “quick”. There was also a great tool, the ODI Accelerator Launchpad, built in Groovy by David Allan of the Oracle DI team. Now we were getting closer to something “quick”. But this was simply a script that you had to run, not an integrated part of the ODI Studio platform. Finally, with the release of ODI 12.1.3, the Quickstart was introduced. The New Model and Topology Objects wizard allows you to create everything you need in order to reverse engineer tables into ODI Datastore objects and begin creating your first mappings.

ODI 12c New Model and Topology Objects wizard

Going through the wizard is much simpler than manually setting up the Topology objects and Model for folks just getting started with Oracle Data Integrator. The blog post from Oracle linked above can walk you through the process and I’ve added a demonstration video below that does the same. As a bonus in my demo, I’ve added a tip to help you get your initial load mappings created in an instant. Have a look:

There you have it, a quick and easy way to get started with Oracle Data Integrator 12c and create your first source to target Mapping. If you have further questions and would like a more detailed answer, you can always join one of the Rittman Mead ODI bootcamps to learn more from one of our data integration experts. Up next in the Getting Started series, we’ll look at adding enhancing the ODI metadata by adding constraints and other options.

Categories: BI & Warehousing

Oracle Data Integrator 12c: Getting Started - Installation and Configuration

Rittman Mead Consulting - Tue, 2016-11-08 15:34

I’ve decided that it’s time for a refresher on Oracle Data Integrator 12c. This week, I’m continuing the “Oracle Data Integrator 12c: Getting Started” series with the next topic: Installation and Configuration. I don’t plan to dig into all of the nitty gritty details on each specific scenario you might run into. Rittman Mead have an excellent training program for ODI 12c where you can ask the specifics. There is also this 100+ page document from Oracle that can provide further details: Oracle® Fusion Middleware - Installing and Configuring Oracle Data Integrator 12c (12.2.1) E55859-03. In this post, we’ll hit on the basics and hopefully guide you in the right direction.

First, let’s take a look at the installation process. Installing ODI 12c is quite simple compared to past versions of product. This is mainly due to the configuration of agents being moved outside of the initial installation and into a completely different process. The big choice we have now is which implementation to choose: standard or enterprise.

Standard:
Choose this installation if you plan to use a Standalone Agent to execute your mappings and data integration processes. This will install ODI Studio and the Standalone Agent, along with all of the necessary drivers and libraries to make everything work. Choose this option when installing ODI Studio on your client machine.

Enterprise:
This option will install the JEE Agent, Standalone Colocated Agent, as well as ODI Studio. The installation footprint is slightly larger than the Standard install, coming in at 3390 MB vs 3017 MB for standard. That’s a part of why I choose Standard when just installing Studio. Along with the JEE Agent and ODI Studio, you’ll also have the ODI Console and ODI plugin for Fusion Middleware Control (Enterprise Manager). One thing to note, you’ll need to first install the Oracle Fusion Middleware Infrastructure (download) package as it is a prerequisite to the ODI 12c Enterprise version.

To get started, download the bits and bytes from Oracle OTN and unzip the file. Next, using the appropriate Java version, run the ODI installer.

[oracle@bigdatalite bin]$ ./java -version
java version "1.8.0_111"
Java(TM) SE Runtime Environment (build 1.8.0_111-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.111-b14, mixed mode)
[oracle@bigdatalite bin]$ cd ../..
[oracle@bigdatalite odi122120]$ ./jdk1.8.0_111/bin/java -jar fmw_12.2.1.2.0_odi.jar 
Launcher log file is /tmp/OraInstall2016-10-21_12-20-12PM/launcher2016-10-21_12-20-12PM.log.
Extracting the installer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . .

From there, simply go through the install wizard. I won’t go into too much because there’s not much to it. First, decide whether you will install the Standard or Enterprise version. Next, determine where you want to install your ODI home. For a JEE or Colocated Agent installation, ODI must be installed into the Oracle Home containing Fusion Middleware Infrastructure. For Standalone agent, a brand new Oracle Home must be used. Now that we’ve installed the software, it’s time to setup the repository database schema.

Master and Work Repository Setup

The Repository Configuration Utility (RCU) is delivered as a part of the ODI install. It is used to create Fusion Middleware repositories, such as the ODI Master/Work Repositories, and any additional database schemas necessary.

Start the RCU by browsing to ODI_HOME/oracle_common/bin and running ./rcu. The steps are pretty straightforward, so I’ll leave this as an exercise for everyone to try at home. Once we have our repositories, we can begin configuring Agents and Studio.

ODI Studio Configuration

Setting up ODI Studio is fairly straightforward, so I’ll keep it brief. Once installed, browse to ODI_HOME/odi/studio and fire up ./odi.sh (or odi.exe in Windows). Then, create a new connection to your ODI Master and Work repository. Enter the ODI username/password to connect with, master repository schema username/password, JDBC connection information, and Work repository. Test the connection to ensure it’s all working and you’re set.

Standalone Agent Configuration

Browse to your /oracle_common/common/bin and run the WebLogic Server Configuration wizard.

./config.sh

Create ODI Agent

First, create a new domain for the agent. This is similar to a WebLogic domain if you’ve gone through the setup process for any application that relies on WLS. It’s recommended to create the domain outside of the ODI Oracle Home. That way you can avoid issues when upgrading/reinstalling and can follow the same domain directory structure if installing multiple agents, since each must be installed in its own domain.

Next, choose the Standalone Agent Template, installed as part of the ODI download. Then, select the JDK to be used by the agent.

Configure ODI Agent component

The repository is already setup, so now if we connect to the service table (STB) schema, we can look up the ODI repository connection information without having to manually enter anything.

Get RCU configuration

The system component will be the ODI Agent name. Then, set the host server listen address, port, and Supervisor username/password.

ODI Agent configuration

Finally, setup the Node Manager for the ODI Agent. The Node Manager doesn’t have to be used when starting the agent, but it is a requirement to set up during configuration. The Username/Password should be kept in a safe place, as it will be necessary for starting components within the Node Manager.

Finally, click Create and you’re off and running.

Starting the Standalone Agent

Before you can start the agent, be sure to configure it in the master repository via ODI studio. Check out the video below for a short tutorial.

Now, you can startup the agent with the node manager or without it. If we want to manage the agent via Fusion Middleware Control, use the Node Manager. But, if you plan to simply use the command line to start / stop the agent, you can do so using the classic startup method.

Start Agent with Node Manager

From within the DOMAIN_HOME/bin directory, run the following. You’ll want to use nohup to start the process in the background on Linux. For Windows, create a service to run nodemanager.cmd.

[oracle@bigdatalite bin]$ pwd
/u01/odi_agent/user_projects/domains/base_domain/bin
[oracle@bigdatalite bin]$ nohup ./startNodeManager.sh > nm.out&
[oracle@bigdatalite bin]$ nohup: ignoring input and redirecting stderr to stdout

Next, start the agent component from within the same directory. You’ll be prompted for the Node Manager password setup during configuration.

[oracle@bigdatalite bin]$ ./startComponent.sh OracleDIAgent1
Starting system Component OracleDIAgent1 ...

Initializing WebLogic Scripting Tool (WLST) ...
...
Reading domain from /u01/odi_agent/user_projects/domains/base_domain

Please enter Node Manager password:
Connecting to Node Manager ...
Start Agent without Node Manager

For times when you don’t need the Node Manager, you can still start the ODI Agent using the classic method. Just remember to always use all-caps when referring to the agent parameters, such as NAME.

./agent.sh -NAME=OracleDIAgent1

Either approach for starting the agent will get you up and running.

Colocated Agent Configuration

The Colocated Agent is essentially a Standalone agent that is managed and maintained via WebLogic Server. Rather than repeat the steps, I thought I would point you to a nice Oracle By Example on Creating a Colocated Agent. As mentioned earlier, be sure to create your domain outside of the ODI home.

Ping colocated agent

For reference, when testing an agent, simply place the hostname:port/agent-web-application-context in a browser. The Ping response means the agent is up and running.

JEE Agent Configuration

Finally, we’ll discuss the JEE Agent configuration. As mentioned in my previous post, the main reasons for using the JEE Agent is for high availability or connection pooling. You may also want to use the ODI Console or need to manage and monitor ODI agents from FMW Control. Whatever the reason, here’s how we configure the agent.

The configuration process is very similar to that of the Colocated Agent, but with a several extra steps involved. I’ll highlight those steps here. To get started, just as with the other agents, browse to the ODI_HOME/oracle_common/common/ directory and run ./config.sh.

Work through the usual first few installer screens. Then, we select the template to configure. Choose ODI Agent and any additional items you’d like to add to the domain, such as ODI Console or Enterprise Manager plugin.

JEE Agent template

Next, add the credential mappings for both the ODI user, allowing JEE agent access to the Master / Work Repository, and the WebLogic domain user.

JEE Agent credential mapping

Select the options you plan to configure for the domain. This will vary based on what currently exists in WebLogic Server, but in this example we’ll be setting up a new AdminServer, Node Manager, and the Managed Server to host the ODI JEE Agent.

JEE Agent domain configuration

Next, configure the AdminServer, Node Manager, and Managed Server. For each, remember to use the machine IP address and not the default “Local All Machines”. For the node manager, enter a username and password as we did in previous configurations.

JEE Agent server configuration

The final screens in the configuration wizard allow you to create a cluster and assign the cluster to the server. You can assign a dynamic cluster as needed for scalability, though this falls outside the context of this article. Assign the cluster to a server, create the machine (remember to assign the IP address), and finally assign the servers (AdminServer and Managed Server) to the machine.

JEE Agent topology

Click create and you're done! Now start weblogic, node manager, then login to console to start ODI managed server and agent.

There you are, ODI Agents installed and configured. As always, there are many external factors that may change how your configuration and setup works in your environment, so please think of this as a simple guide and not the exact approach for each unique situation. If you do have further questions and would like a more detailed answer, you can always join one of the Rittman Mead ODI bootcamps to learn more from one of our experts on the product. Up next in the Getting Started series, we’ll look at creating your first ODI mapping.

Categories: BI & Warehousing

Oracle Data Visualization Desktop: Star Schemas and Multiple Fact Sources

Rittman Mead Consulting - Tue, 2016-11-08 04:00

Once added to a project, columns I specified with my custom SQL query now shows up as one source.

Now that I have a custom data source with only the data I want to explore, I can easily create an analysis without having to sift through multiple data sources.

* A note about Data Visualization Desktop and caching: When using the above method of writing SQL to create a virtual table, it is loaded into the cache. You should only use this method for very compact models. If the virtual table contains too much data, you can still add it as a data source but it may be too big to cache, causing your columns not to load when creating analyses.

Although having one fact source is common in relational models, using multiple fact sources is sometimes unavoidable in dimensional modeling.

In my sample data, I have another schema called GCBC_SURVEYS which contains two fact tables containing satisfaction scores for both customers and staff and one dimension table containing the organization that conducted the surveys.

For this example, I’m going to try to add each table as a data source manually first and attempt to join the two fact tables to my dimension table. When using this method, pay close attention to change any key and ID columns from Measure to Attribute so they aren’t aggregated. Data Visualization Desktop sees a numeric datatype and assumes it’s a measure.

Once I've added in all of the GCBCSURVEYS tables as data sources, I’m going to load them all into a project and create my joins using source diagram. When I joined each fact table to the dimension table on the SURVORG_ID, notice how DVD automatically created a join between my two fact tables.

This is not desirable because, due to the presence of a circular join, we run the risk of double counting. When I try to break the join between the two fact tables, DVD asks which data source I want to break conformance from.

When I select one of the fact sources, it will not only break the join between the two fact sources but also the join between the fact and the dimension table.

As of this writing, I have not found a way to only break joins between fact tables if they are using the same key to connect to the dimension table.

The only workaround to this I’ve found is to write SQL statement to pull in the columns and create the joins into one virtual table. This way I could specify the joins without DVD creating one between the fact sources.

Once I created my virtual table, I could use it to create a report and use both fact sources.

Although it can take some time to set up all the data sources you want to use for your model, Data Visualization Desktop packs some powerful features when it comes to dimensional modeling. The ability to use more than one fact source when needed adds another area of flexibility to DVD. Whether you are pulling in each table and then creating your model or writing a SQL to create one virtual table, DVD has the flexibility to be able to accommodate a variety of different scenarios.

Categories: BI & Warehousing

Pages

Subscribe to Oracle FAQ aggregator - BI &amp; Warehousing