BI & Warehousing

Unify - bringing together the best of both worlds

Rittman Mead Consulting - Thu, 2017-07-06 09:00

Since I started teaching OBIEE in 2011, I had the pleasure of meeting many fascinating people who work with Business Intelligence.

In talking to my students, I would generally notice three different situations:

  1. Folks were heavy users of OBIEE, and just ready to take their skills to the next level.

  2. They were happily transitioning to OBIEE from a legacy reporting tool, that didn’t have the power that they needed.

  3. There were not-so-good times, like when people were being forced to transition to OBIEE. They felt that they were moving away from their comfort zone and diving into a world of complicated mappings that would first require them to become rocket scientists. They were resistant to change.

It was this more challenging crowd, that mostly sparked my interest for other analytics tools. I received questions like: “Why are we switching to another system? What are the benefits?”

alt

I wanted to have a good answer to these questions. Over the years, different projects have allowed me the opportunity to work with diverse reporting tools. My students’ questions were always in mind: Why? And what are the benefits? So, I always took the time to compare/contrast the differences between OBIEE and these other tools.

I noticed that many of them did a fantastic job at answering the questions needed, and so did OBIEE. It didn’t take me long to have the answer that I needed: the main difference in OBIEE is the RPD!

alt

The RPD is where so much Business Intelligence happens. There, developers spend mind boggling times connecting the data, deriving complex metrics and hierarchies, joining hundreds of tables, and making everything a beautiful drag and drop dream for report writers.

Yes, many other tools will allow us to do magic with metadata, but most of them require this magic to be redefined every time we need a new report, or the report has a different criteria. Yes, the RPD requires a lot of work upfront, but that work is good for years to come. We never lose any of our previous work, we just enhance our model. Overtime, the RPD becomes a giant pool of knowledge for a company and is impressively saved as a file.

alt

For tapping into the RPD metadata, traditionally we have used BI Publisher and OBIEE. They are both very powerful and generally complement each other well. Other tools have become very popular in the past few years. Tableau is an example that quickly won the appreciation of the BI community and has kept consistent leadership in Gartner’s BI Magic quadrant since 2013. With a very slick interface and super fast reporting capability, Tableau introduced less complex methods to create amazing dashboards - and fast! So, what is there not to like? There is really so much TO like!

Going back to the comparing and contrasting, the main thing that Tableau doesn’t offer is… the RPD. It lacks a repository with the ability to save the join definitions, calculations and the overall intelligence that can be used for all future reports.

At Rittman Mead, we’ve been using these tools and appreciate their substantial capabilities, but we really missed the RPD as a data source. We wanted to come up with a solution that would allow our clients to take advantage of the many hours they had likely already put into metadata modeling by creating a seamless transition from OBIEE’s metadata layer to Tableau.

alt

This past week, I was asked to test our new product, called Unify. Wow. Once again, I am so proud of my fellow coworkers. Unify has a simple interface and uses a Tableau web connector to create a direct line to your OBIEE repository for use in Tableau reports, stories and dashboards.

alt

In Unify, we select the subject areas from our RPD presentation layer and choose our tables and columns as needed. Below is a screenshot of Unify using the OBIEE 12c Sample App environment. If you are not familiar with OBIEE 12c, Oracle provides the Sample App - a standalone virtual image with everything that you need to test the product. You can download the SampleApp here: http://www.oracle.com/technetwork/middleware/bi-foundation/obiee-samples-167534.html

alt

We are immediately able to leverage all joins, calculated columns, hierarchies, RPD variables, session variables and that’s not all… our RPD security too! Yes, even row level security is respected when we press the “Unify” button and data is brought back into Tableau. So now, there is no reason to lose years of metadata work because one team prefers to visualize with Tableau instead of OBIEE.

Unify allows us to import only those data needed for the report, as we can utilize ‘in-tool’ filtering, keeping our query sets small, and our performance high.

In sum, Unify unites it all - have your cake and eat it too. No matter which tool you love the most, add them together and you will certainly love them both more.

alt

Categories: BI & Warehousing

Oracle Data Visualization Desktop v3

Rittman Mead Consulting - Wed, 2017-07-05 07:57
Oracle Data Visualization Desktop v3

The ODTUG Kscope17 conference last week in San Antonio was a great event with plenty of very interesting sessions and networking opportunities. Rittman Mead participated during the thursday deep dive BI session and delivered three sessions including a special "fishing" one.


. pic.twitter.com/jC04r4RNvx

— Andrew Fomin (@fomin_andrew) 28 giugno 2017

In the meantime Oracle released Data Visualization Desktop 12.2.3.0.0 which was presented in detail during Philippe Lions session and includes a set of new features and enhancements to already existing functionalities. Starting from new datasources, through new visualization options, in this post I'll go in detail on each of them.

Data Sources

The following new datasources have been introduced:

The latter two (still in beta) are very relevant since they enable querying any product directly exposing JDBC or ODBC connectors (like Presto) without needing to wait for the official support in the DVD list of sources.

Still in DVD v3 there is no support for JSON or XML files. In my older blog post I wrote how JSON (and XML) can be queried in DVD using Apache Drill, however this solution has Drill installation and knowledge as a prerequisite which is not always achievable in end users environment where self-service BI is happening. I believe future versions of DVD will address this problem by providing full support to both data sources.

Connection to OBIEE

One of the most requested new features is the new interface to connect to OBIEE: until DVD v2 only pre-built OBIEE analysis could be used as sources, with DVD v3 OBIEE Subject Areas are exposed making them accessible. The set of columns and filters can't be retrieved on the fly during the project creation but must be defined upfront during datasource definition. This feature avoids move back and forth from OBIEE to DVD to create an analysis in as datasource, and then use it in DVD.

Oracle Data Visualization Desktop v3

Another enhancement in the datasource definition is the possibility to change the column delimiter in txt sources, useful if the datasource has an unusual delimiters.

Oracle Data Visualization Desktop v3

Data Preparation

On the data-preparation side we have two main enhancements: the convert-to-date and the time grain level.
The convert-to-date feature enhances ability for columns to date conversion including the usage of custom parsing strings. Still this feature has some limits like not being able to parse dates like 04-January-2017 where the month name is complete. For this date format a two step approach, reducing the month-name and then converting, is still required.

Oracle Data Visualization Desktop v3

The second enhancement in the data preparation side is the time grain level and format, those options simplify the extraction of attributes (e.g. Month, Week, Year) from date fields which can now be done visually instead of writing logical SQL.

Oracle Data Visualization Desktop v3

The Dataflow component in DVD v3 has an improved UI with new column merge and aggregation functionalities which makes the flow creation easier. Its output can now be saved as Oracle database or Hive table eliminating the need of storing all the data locally.

Oracle Data Visualization Desktop v3

It's worth mentioning that Dataflow is oriented to self-service data management: any parsing or transformation happens on the machine where DVD is installed and its configuration options are limited. If more robust transformations are needed then proper ETL softwares should be used.

New Visualization Options

There are several enhancement on the visualization side, with the first one being the trendlines confidence levels which can be shown, with fixed intervals (90%, 95% or 99%)
Oracle Data Visualization Desktop v3

Top N and bottom N filtering has been added for each measure columns expanding the traditional "range" one.

Two new visualizations have also been included: waterfall and boxplot are now default visualizations. Boxplots were available as plugin in previous versions, however the five number summary had to be pre-calculated; in DVD v3 the summary is automatically calculated based on the definition of category (x-axis) and item (value within the category).

Oracle Data Visualization Desktop v3

Other new options in the data visualization area include: the usage of logarithmic scale for graphs, the type of interpolation line to use (straight, curved, stepped ...), and the possibility to duplicate and reorder canvases (useful when creating a BI story).

Oracle Data Visualization Desktop v3

Console

The latest set of enhancements regard the console: this is a new menu allowing end users to perform task like the upload of a plugin that before were done manually on the file system.

The new Oracle Analytics Store lists add-ins divided into categories:

  • PlugIn: New visualizations or enhancement to existing ones (e.g. auto-refresh, providing a similar behaviour to OBIEE's slider)
  • Samples: Sample projects showing detailed DVD capabilities
  • Advanced Analytics: custom R scripts providing non-default functionalities
  • Map Layers: JSON shape files that can be used to render custom maps data.

The process to include a new plugin into DVD v3 is really simple: after downloading it from the store, I just need open DVD's console and upload it. After a restart of the tool, the new plugin is available.

Oracle Data Visualization Desktop v3

The same applies for Map Layers, while custom R scripts still need to be stored under the advanced_analytics\script_repository subfolder under the main DVD installation folder.

As we saw in this blog post, the new Data Visualization Desktop release includes several enhancement bringing more agility in the data discovery with enhancements both in the connections to new sources (JDBC and ODBC) and standard reporting with OBIEE subject areas now accessible. The new visualizations, the Analytics Store and the plugin management console make the end user workflow extremely easy also when non-default features need to be incorporated. If you are interested in Data Visualization Desktop and want to understand how it can be proficiently used against any data source don't hesitate to contact us!

Categories: BI & Warehousing

Common Questions and Misconceptions in The Data Science Field

Rittman Mead Consulting - Tue, 2017-07-04 09:06

There are many types of scenarios in which data science could help your business. For example, customer retention, process automation, improving operational efficiency or user experience.

It is not however always initially clear which questions to concentrate on, or how to achieve your aims.

This post presents information about the type of questions you could address using your data and common forms of bias that may be encountered.

Types of Question
  • Descriptive: Describe the main features of the data, no implied meaning is inferred. This will almost always be the first kind of analysis performed on the data.

  • Exploratory: Exploring the data to find previously unknown relationships. Some of the found relationships may define future projects.

  • Inferential: Looking at trends in a small sample of a data set and extrapolating to the entire population. In this type of scenario you would end up with an estimation of the value and an associated error. Inference depends heavily on both the population and the sampling technique.

  • Predictive: Look at current and historical trends to make predictions about future events. Even if x predicts y, x does not cause y. Accurate predictions are hard to achieve and depend heavily on having the correct predictors in the data set. Arguably more data often leads to better results however, large data sets are not always required.

  • Causal: To get the real relationship between variables you need to use randomised control trials and measure average effects. i.e. if you change x by this much how does y change. Even though this can be carried out on observed data huge assumptions are required and large errors would be introduced into the results.

Biases in data collection or cleaning

It is very easy to introduce biases into your data or methods if you are not careful.
Here are some of the most frequent:

  • Selection/sampling bias: If the population selected does not represent the actual population, the results are skewed. This commonly occurs when data is selected subjectively rather than objectively or when non-random data has been selected.

  • Confirmation bias: Occurs when there is an intentional or unintentional desire to prove a hypothesis, assumption, or opinion.

  • Outliers: Extreme data values that are significantly out of the normal range of values can completely bias the results of an analysis. If the outliers are not removed in these cases the results of the analysis can be misleading. These outliers are often interesting cases and ought to be investigated separately.

  • Simpson's Paradox: A trend that is indicated in the data can reverse when the data is split into comprising groups.

  • Overfitting: Involves an overly complex model which overestimates the effect/relevance of the examples in the training data and/or starts fitting to the noise in the training data.

  • Underfitting: Occurs when the underlying trend in the data is not found. Could occur if you try to fit a linear model to non linear data or if there is not enough data available to train the model.

  • Confounding Variables: Two variables may be assumed related when in fact they are both related to an omitted confounding variable. This is why correlation does not imply causation.

  • Non-Normality: If a distribution is assumed to be normal when it is not the results may be biased and misleading.

  • Data Dredging: This process involves testing huge numbers of hypotheses about a single data set until the desired outcome is found.
Citations:

Comics from Dilbert Comics By Scott Adams.
Spurious Correlations from http://tylervigen.com/spurious-correlations.

Insights Lab

To learn more about the Rittman Mead Insights Lab please read my previous blog post about our methodology.

Or contact us at info@rittmanmead.com

Categories: BI & Warehousing

OAC: Essbase – Incremental Loads / Automation

Rittman Mead Consulting - Mon, 2017-07-03 08:56

I recently detailed data load possibilities with the tools provided with Essbase under OAC here. Whilst all very usable, my thoughts turned to systems that I have worked on and how the loads currently work, which led to how you might perform incremental and / or automated loads for OAC Essbase.

A few background points:

  • The OAC front end and EssCS command line tools contain a ‘clear’ option for data, but both are full data clears – there does not seem to be a partial or specifiable ‘clear’ available.
  • The OAC front end and EssCS command line tools contain a ‘file upload’ function for (amongst other things) data, rules, and MAXL (msh) script files. Whilst the front-end operation has the ability to overwrite existing files, the EssCS Upload facility (which would be used when trying to script a load) seemingly does not – if an attempt is made to upload a file that already exists, an error is shown.
  • The OAC ‘Job’ facility enables a data load to be conducted with a rules file; the EssCS Dataload function (which would be used when trying to script a load) seemingly does not.
  • MAXL still exists in OAC, so it is possible to operate at Essbase ‘command level’

Whilst the tools that are in place all work well and are fine for migration or other manual / adhoc activity, I am not sure what the intended practice might be around some ‘real world’ use cases: a couple of things that spring to mind are

  • Incremental loads
  • Scheduled loads
  • Large ASO loads (using buffers)
Incremental Loads

It is arguably possible to perform an incremental load in that

  • A rules file can be crafted in on-prem and uploaded to OAC (along with a partial datafile)
  • Loads appear to be conducted in overwrite mode, meaning changed and new records will be handled ok

It is possible that (eg) a ‘current month’ data file could be loaded and reloaded to form an incremental load of sorts. The problem here will come if data is deleted for a particular member combination in the source from one day to the next – with no partial clear (eg, of current month data) seemingly possible, there is no way of clearing redundant values (at least for an ASO cube…for a BSO load, the ‘Clear combinations’ functionality of the load rules file can be used…although that has not yet been tested on this version).

So in the case of an ASO cube, the only option using available tools would be to ensure that ‘contra’ records are added to the incremental load file. This is not ideal, as it is another process to follow in data preparation, and would also add unnecessary zeros to the cube. For these reasons, I would generally look to effect a partial clear of the ‘slice’ being loaded before proceeding with an incremental the load.

The only way I can see of achieving this under OAC would be to take advantage of the fact that MAXL is available and effect the clear using alter database clear data.

This means that the steps required might be

  • Upload prepared incremental data file (either manually via OAC or via EssCS UploadFiles after having first deleted the existing file)
  • Upload on-prem prepared rules file (either manually via OAC or via EssCS UploadFiles after having first deleted the existing file)
  • Access the OAC server (eg via Putty), start MAXL, and run a command to clear the required slice / merge slices (if necessary)
  • In OAC, create / run a job for the specified data file / rules file

I may have missed something, but I see no obvious way of being able to automate this process with the on-board facilities.

Automating the load process

Along with the points listed above, some other facts to be aware of:

  • It is possible to manually transfer files to OAC using FTP
  • It is possible to amend the cron scheduler for the oracle user in OAC

Even bearing in mind the above, I should caveat this section by saying getting ‘under the hood’ in this way is possibly not supported or recommended, and should only be undertaken at your own risk.

Having said that…

By taking advantage of the availability of FTP and cron, it should be possible to script a solution that can run unattended, for full and incremental loads. Furthermore, data clears (full or partial) can be included in the same process, as could parallel buffer loading for ASO or any other MAXL-controllable process (within the confines of this version of Essbase).

The OAC environment

A quick look around discloses that the /u01/latency directory is roughly the equivalent of the ../user_projects/epmsystem1/EssbaseServer/essbaseserver1 (or equivalent) directory in an on-prem release in that it contains the /app ‘parent’ directory which in turn contains a subdirectory structure for all application and cube artefacts. Examining this directory for ASOSamp.Basic shows that the uploaded dataload.* files are here, along with all other files listed by the Files screen of OAC:

Note that remote connection is via the opc user, but this can be changed to oracle once connected (by using sudo su – oracle).

As oracle, these files can be manually deleted…doing so means they will no longer be found by the EssCS Listfiles command or the Files screen within OAC (once refreshed). If deleted manually, new versions of the files can be re-uploaded via either of the methods detailed above (whilst an overwrite option exists in the OAC Files facility, there seems to be no such option with the EssCS Upload feature…trying to upload a file that already exists results in an error.

All files are owned by the oracle user, with no access rights at all for the opc user that effects a remote connection via FTP.

Automation: Objectives

The objective of this exercise was to come up with a method that, unattended, would:

  • Upload received files (data, rules) to OAC from a local source
  • Put them in the correct OAC directory in a usable format
  • Invoke a process that runs a pre-load process (eg a clear), a load, and (if necessary a post load process)
  • Clear up after itself
Automation: The Process

The first job is to handle the upload of files to OAC. This could be achieved via a psftp script that uploads the entire contents of a nominated local directory:

The EssCSUpload,bat script above (which can, of course be added to a local scheduler so that it runs unattended at appointed times) passes a pre-scripted file to psftp to connect and transfer the files. Note that the opc user is used for the connection, and the files are posted to a custom-created directory, CUSTOM_receive (under the existing /u01/latency). The transferred files are also given a global ‘rw’ attribute to assist with later processing

Now the files are in the OAC environment, control is taken up there.

A shell script (DealWithUploads) is added to the oracle home directory:

This copies all the files in the nominated receiving directory to the actual required location – in this case, the main ASOSamp/Basic directory. Note the use of ‘-p’ with the copy command to ensure that attributes (ie, the global ‘rw’) are retained. Once copied, the files are deleted from the receiving directory so that they are not processed again.

Once the files are copied into place, startMAXL is used to invoke a pre-prepared msh script:

as can be seen, this clears the cube and re-imports from the uploaded file using the uploaded rules file. The clear here is a full reset, but a partial clear (in the case of ASO) can be used here instead if required

As with the ‘local’ half of the method, the DealWithUploads.sh script file can be added to the scheduler on OAC: the existing cron entries are already held in the file /u01/app/oracle/tools/home/oracle/crontab.txt; it is a simple exercise to schedule a call to this new custom script.

A routine such as this would need a good degree of refinement and hardening – the file lists for the transfers should be self-building, passwords need to be encrypted, the MAXL script should only be called if required, the posting locations for files should be content/context sensitive, etc – but in terms of feasibility testing the requirements listed above, it was successful.

This approach places additional directories and files in an environment / structure that could be maintained at any time: it is therefore imperative that some form of code control / release mechanism is employed so that it can be replaced in the event of any unexpected / uncontrollable maintenance taking place on the OAC environment that could invalidate or remove it.

Even once hardened, I think there is a considerable weak spot in this approach in that the rules file seemingly has to be crafted in an on-prem environment and uploaded: as I detailed here, even freshly-uploaded, working rules files error when an attempt is made to verify them. For now, I’ll keep looking for an alternative.

Summary

Whilst a lot of the high-level functionality is in place around data loads, often with multiple methods, I think there are a couple of detailed functionality areas that may currently require workarounds – to my mind, the addition of the ability to select & run an msh format ‘preload’ script when running a dataload Job (eg for clears) would be useful, whilst a fully functional rules file editor strikes me as important. The fact that an FTP connection is available at all is a bonus, but because this is as a non-oracle user, it is not possible to put a file in the correct place directly - the EssCS Upload faciity does this of course, but the seeming absence of an overwrite option or an additional Delete option for EssCS) somewhat limits its usefulness at this point. But can you implement a non attended, scheduled load or incremental load routine ? Sure you can.

Categories: BI & Warehousing

OAC: Essbase – Loading Data

Rittman Mead Consulting - Fri, 2017-06-30 09:33

After my initial quick pass through Essbase under OAC here, this post looks at the data loading options available in more detail. I used the provided sample database ASOSamp.Basic, which first had to be created, as a working example.

Creating ASOSamp

Under the time-honoured on-prem install of Essbase, the sample applications were available as an install option – supplied data has to be loaded separately, but the applications / cubes themselves are installed as part of the process if the option is selected. This is not quite the same under OAC – some are provided in an easily installable format, but they are not immediately available out-of-the-box.

One of the main methods of cube creation in Essbase under OAC is via the Import of a specifically formatted Excel spreadsheet, and it is via the provision of downloadable pre-built ‘template’ spreadsheets that the sample applications are installed in this version.

After accessing the homepage of Essbase on OAC, download the provided cube creation template – this can be found under the ‘Templates’ button on the home page:

Note that in the case of the sample the ASOSamp.Basic database, the data is not in the main template file – it is held in a separate file. This is different to other examples, such as Sample.Basic, where the data provided is held in a dedicated tab in the main spreadsheet. Download both Aggregate Storage Sample and Aggregate Storage Sample Data:

Return to the home page, and click Import. Choose the spreadsheet downloaded as Aggregate Storage Sample (ASO_Sample.xlsx) and click Deploy and Close.

This will effect all of the detail in the spreadsheet – create the application, create the cube, add dimensions / attribute dimensions and members to the outline, etc:

Loading ASOSamp.Basic

Because the data file is separate from the spreadsheet, the next step is to uploaded this to OAC so that it is available for loading: back on the home page, select the newly-created ASOSamp.Basic (note: not ASOSamp.Sample as with on-prem), and click Files:

In the right-hand window, select the downloaded data file ASOSampleData.txt and click the Upload button:

This will upload the file:

Once the file upload is complete, return to the home page. With the newly-created ASOSamp.Basic still selected, click Jobs:

Choose Data Load as the Job Type, and highlight the required Data File:

Click Execute.

A new line will be added to the Job Monitor:

The current status of the job is shown – in this case, ‘in progress’ – and the screen can be refreshed.

Once complete, the Status field will show the completion state of the job, whilst the Job Details icon on the right-hand side provides more detail – in this case, confirming that 311,795 records were successfully loaded, and 0 rejected:

The success of the load is confirmed by a quick look in Smartview:

Note that a rules file was not selected as part of the job – this makes sense when we look at the data file…

...which is familiar-looking: just what we would expect from a EAS export (MAXL: export database), which can of course just be loaded in a similar no-rules-file way in on prem.

Incidentally, this is different to the on-prem approach to ASOSamp.Sample where a ‘flat’, tab-delimited data file is provided for the sample data, along with a rules file that is required for the load:

...although the end-results are the same:

This ‘standard’ load works in overwrite mode – any new values in the file will be added, but any that exist already will be overwritten: running the load again and refreshing the Smartview report results in the same numbers confirms this.

This can be verified further by running with a changed data file: taking a particular node of data for the Units measure…

One of the constituent data values can be changed in a copy of the data file – in this example, one record (it doesn’t matter which for this purpose) can be increased – in this case, ‘1’ has been increased to ‘103’:

The amended file needs to be saved and uploaded to OAC as outlined above, and the load process repeated, this time using the amended file. After a successful load, the aggregated value on the test Smartview report has increased by the same 102:

Loading flat files

So, how might we load the same sort of flat, tab delimited file of the like supplied as the on-prem ASOSamp.Sample data file ?

As above, files can be uploaded to OAC, so putting the dataload.txt data file from the on-prem release into OAC is straightforward. However, as you’d expect, attempting to run this as a load job without a rules file results in an error.

However, it is possible to run an OAC load with a rules file created in an on-prem version: firstly, upload the rules file (in this case, dataload.rul) in the same way as the data file. When setting up the load job, select the data file as normal, but under Scripts select the rules file required:

The job runs successfully, with the ‘Details’ overlay confirming the successful record count.

As with rules files generated by the Import facility, uploaded rules files can also be edited in text mode:

It would seem logical that changing the dataLoadOptions value at line 215 to a value other than OVERWRITE (eg ADD) might be a quick behavioural change for the load that would be easy to effect. However, making this change resulted in verification errors. Noting that the errors related to invalid dimension names, an attempt was made to verify the actual, unchanged rules file as uploaded…which also resulted in the same verification errors. So somewhat curiously, the uploaded on-prem rules file can be successfully used to load a corresponding data file, but (effectively) can’t be edited or amended.

Loading from Spreadsheet Template

The template spreadsheets used to build applications can also contain one or more data tabs. Unlike the OAC Jobs method or EssCS Dataload, the spreadsheet method gives you the option of a rules file AND the ability to Add (rather than overwrite) data:

Within OAC, this is actioned via the ‘Import’ function on the home page:

Note that we are retaining all data, and have the Load Data box checked. Checks confirm the values in the file are added to those already in the cube.

The data can also be uploaded via the Cube Designer in Excel under Cube Designer / Load Data:

Note that unlike running this method under OAC, the rules file (which was created by the initial import as the Data tab existed in the spreadsheet at that point) has to be selected manually.

Once complete, an offer is made to view the Job Status Viewer (which can also be accessed from Cube Designer / View Jobs):

With further detail for each job also being available:

Use facilities to upload files

Given the ability to upload and run both data and rules files, the next logical step would be to script this for automated running. OAC contains a downloadable utility, the Command Line Tool (aka CLI , EssCS) which is a number of interface tools that can be run locally against an OAC instance of Essbase:

Login / Logout
Calc
Dataload
Dimbuild
Clear
Version
Listfiles
Download
Upload
LcmExport
LcmImport

Running locally, a successful EssCS login effectively starts a session that then remains open for other EssCS commands until the session is closed with a logout command.

The login syntax suggests the inclusion of the port number in the URL, but I had no success with this…although it worked without the port reference:

As above, the connection is made and is verified by the successful running of another command (eg version), but the logout command produced an error. Despite this, the logout appeared successful – no other EssCS commands worked until a login was re-issued.

With EssCS installed and working, the Listfiles and Upload facilities become available. The function of these tools is pretty obvious from the name. Listfiles should be issued with at least arguments for the application and cube name:

The file type (csc, rul, txt, msh, xls, xlsx, xlsm, xml, zip, csv) can be included as an additional argument…

…although the file types is a fixed list – for example, you don’t seem to be able to use a wild card to pick up all spreadsheet files.

Whilst there is an Upload (and Download) facility, there does not seem to be the means to delete a remote file…which is a bit of an inconvenience, because using Upload to upload a file that already exists results in an error, and there is no overwrite option. The dataload.txt and dataload.rul files previously uploaded via the OAC front end were therefore manually deleted via OAC, and verified using Listfiles.

The files were then uploaded back to OAC using the Upload option of EssCS:

As you would expect, the files will then appear both in a Listfiles command and via OAC:

Note that the file list in OAC does not refresh with a browser page refresh or any ‘sort’ operation: use Refresh under Actions as above.

With the files now re-uploaded, the data can be loaded. EssCS also contains a DataLoad command, but unfortunately there appears to be no means to specify a rules file – meaning it would seem to be confined to overwrite, ‘export data’ style imports only:

A good point here is that the a DataLoad EssCS command makes an entry to the Jobs table, so success / record counts can be confirmed:

Summary

The post details three methods of loading data to Essbase under OAC:

  • Via the formatted template spreadsheet (on import or from Cube Designer)
  • Via the Command Line Interface
  • Via the Jobs facility of OAC

There are some minor differences between them, which may affect which you may wish to use for any particular scenario.

Arguably, given the availability of MAXL, there is a further custom method available as the actual data load can be effected that way too. This will be explored further in the next post that will start to consider how these tools might be used for real scenarios.

Categories: BI & Warehousing

Exploring the Rittman Mead Insights Lab

Rittman Mead Consulting - Tue, 2017-06-27 08:58
What is our Insights Lab?

The Insights Lab offers on-demand access to an experienced data science team, using a mature methodology to deliver one-off analyses and production-ready predictive models.

Our Data Science team includes physicists, mathematicians, industry veterans and data engineers ready to help you take analytics to the next level while providing expert guidance in the process.

Why use it?

Data is cheaper to collect and easier to store than ever before. But collecting the data is not synonymous with getting value from it. Businesses need to do more with the same budget and are starting to look into machine learning to achieve this.

These processes can take off some of the workload, freeing up people's time to work on more demanding tasks. However, many businesses don't know how to get started down this route, or even if they have the data necessary for a predictive model.

R

Our Data science team primarily work using the R programming language. R is an open source language which is supported by a large community.

The functionality of R is extended by many community written packages which implement a wide variety of statistical and graphical techniques, including linear and nonlinear modeling, statistical tests, time-series analysis, classification, clustering as well as packages for data access, cleaning, tidying, analysing and building reports.

All of these packages can be found on the Comprehensive R Archive Network (CRAN), making it easy to get access to new techniques or functionalities without needing to develop them yourself (all the community written packages work together).

R is not only free and extendable, it works well with other technologies and makes it an ideal choice for businesses who want to start looking into advanced analytics. Python is an obvious alternative, and several of our data scientists prefer it. We're happy to use whatever our client's teams are most familiar with.

Experienced programmers will find R syntax easy enough to pick up and will soon be able to implement some form of machine learning. However, for a detailed introduction to R and a closer look at implementing some of the concepts mentioned below we do offer a training course in R.

Our Methodology

Define Define a Question

Analytics, for all intents and purposes, is a scientific discipline and as such requires a hypothesis to test. That means having a specific question to answer using the data.

Starting this process without a question can lead to biases in the produced result. This is called data dredging - testing huge numbers of hypotheses about a single data set until the desired outcome is found. Many other forms of bias can be introduced accidentally; the most commonly occurring will be outlined in a future blog post.

Once a question is defined, it is also important to understand which aspects of the question you are most interested in. Associated, is the level of uncertainty or error that can be tolerated if the result is to be applied in a business context.

Questions can be grouped into a number of types. Some examples will be outlined in a future blog post.

Define a dataset

The data you expect to be relevant to your question needs to be collated. Maybe supplementary data is needed, or can be added from different databases or web scraping.

This data set then needs to be cleaned and tidied. This involves merging and reshaping the data as well as possibly summarising some variables. For example, removing spaces and non-printing characters from text and converting data types.

The data may be in a raw format, there may be errors in the data collection, or corrupt or missing values that need to be managed. These records can either be removed completely or replaced with reasonable default values, determined by which makes the most sense in this specific situation. If records are removed you need to ensure that no selection biases are being introduced.

All the data should be relevant to the question at hand, anything that isn't can be removed. There may also be external drivers for altering the data, such as privacy issues that require data to be anonymised.

Natural language processing could be implemented for text fields. This takes bodies of text in human readable format such as emails, documents and web page content and processes it into a form that is easier to analyse.

Any changes to the dataset need to be recorded and justified.

Model Exploratory Analysis

Exploratory data analysis involves summarising the data, investigating the structure, detecting outliers / anomalies as well as identifying patterns and trends. It can be considered as an early part of the model production process or as a preparatory step immediately prior. Exploratory analysis is driven by the data scientist, enabling them to fully understand the data set and make educated decisions; for example the best statistical methods to employ when developing a model.

The relationships between different variables can be understood and correlations found. As the data is explored, different hypotheses could be found that may define future projects.

Visualisations are a fundamental aspect of exploring the relationships in large datasets, allowing the identification of structure in the underlying dataset.

This is also a good time to look at the distribution of your dataset with respect to what you want to predict. This often provides an indication of the types of models or sampling techniques that will work well and lead to accurate predictions.

Variables with very few instances (or those with small variance) may not be beneficial, and in some cases could even be detrimental, increasing computation time and noise. Worse still, if these instances represent an outlier, significant (and unwarranted) value may be placed on these leading to bias and skewed results.

Statistical Modelling/Prediction

The data set is split into two sub groups, "Training" and "Test". The training set is used only in developing or "training" a model, ensuring that the data it is tested on (the test set) is unseen. This means the model is tested in a more realistic context and will help to determine whether the model has overfitted to the training set. i.e. is fitting random noise in addition to any meaningful features.

Taking what was learned from the exploratory analysis phase, an initial model can be developed based on an appropriate application of statistical methods and modeling tools. There are many different types of model that can be applied to the data, the best tends to depend on the complexity of your data and the any relationships that were found in the exploratory analysis phase. During training, the models are evaluated in accordance with an appropriate metric, the improvement of which is the "goal" of the development process. The predictions produced from the trained models when run on the test set will determine the accuracy of the model (i.e. how closely its predictions align with the unseen real data).

A particular type of modelling method, "machine learning" can streamline and improve upon this somewhat laborious process by defining models in such a way that they are able to self optimise, "learning" from past iterations to develop a superior version. Broadly, there are two types, supervised and un-supervised. A supervised machine learning model is given some direction from the data scientist as to the types of methods that it should use and what it is expecting. Unsupervised machine learning on the other hand, as the name suggests, involves giving the model less information to start with and letting it decide for its self what to value, and how to approach the problem. This can help to remove bias and reduce the number of assumptions made but will be more computationally intensive, as the model has a broader scope to investigate. Usually supervised machine learning is employed in a case where the problem and data set are reasonably well understood, and unsupervised machine learning where this is not the case.

Complex predictive modelling algorithms perform feature importance and selection internally while constructing models. These models can also report on the variable importance determined during the model preparation process.

Peer Review

This is an important part of any scientific process, and effectively utilities our broad expertise in modelling at Rittman Mead. This enables us to be sure no biases were introduced that could lead to a misleading prediction and that the accuracy of the models is what could be expected if the model was run on new unseen data. Additional expert views can also lead to alternative potential avenues of investigation being identified as part of an expanded or subsequent study.

Deploy Report

For a scientific investigation to be credible the results must be reproducible. The reports we produce are written in R markdown and contain all the code required to reproduce the results presented. This also means it can be re-run with new data as long as it is of the same format. A clear and concise description of the investigation from start to finish will be provided to ensure that justification and context is given for all decisions and actions.

Delivery

If the result is of the required accuracy we will deploy a model API enabling customers to start utilising it immediately.
There is always a risk however that the data does not contain the required variables to create predictions with sufficient confidence for use. In these cases, and after the exploratory analysis phase there may be other questions that would be beneficial to investigate. This is also a useful result, enabling us to suggest additional data to collect that may allow a more accurate result should the process be repeated later.

Support

Following delivery we are able to provide a number of support services to ensure that maximum value is extracted from the model on an on-going basis. These include:
- Monitoring performance and accuracy against the observed, actual values over a period of time. Should there be discrepancies between these values arise, these can be used to identify the need for alterations to the model.
- Exploring specific exceptions to the model. There may be cases in which the model consistently performs poorly. Instances like these may not have existed in the training set and the model could be re-trained accordingly. If they were in the training set these could be weighted differently to ensure a better accuracy, or could be represented by a separate model.
- Updates to the model to reflect discrepancies identified through monitoring, changes of circumstance, or the availability of new data.
- Many problems are time dependent and so model performance is expected to degrade, requiring retraining on more up to date data.

Summary

In conclusion our Insights lab has a clearly defined and proven process for data science projects that can be adapted to fit a range of problems.

Contact us to learn how Insights Lab can help your organization get the most from its data, and schedule your consultation today.
Contact us at info@rittmanmead.com

Categories: BI & Warehousing

Using Tableau to Show Variance and Uncertainty

Rittman Mead Consulting - Mon, 2017-06-26 09:00

Recently, I watched an amazing keynote presentation from Amanda Cox at OpenVis. Toward the beginning of the presentation, Amanda explained that people tend to feel and interpret things differently. She went on to say that, “There’s this gap between what you say or what you think you’re saying, and what people hear.”

While I found her entire presentation extremely interesting, that statement in particular really made me think. When I view a visualization or report, am I truly understanding what the results are telling me? Personally, when I’m presented a chart or graph I tend to take what I’m seeing as absolute fact, but often there’s a bit of nuance there. When we have a fair amount of variance or uncertainty in our data, what are some effective ways to communicate that to our intended audience?

In this blog I'll demonstrate some examples of how to show uncertainty and variance in Tableau. All of the following visualizations are made using Tableau Public so while I won’t go into all the nitty-gritty detail here, follow this link to download the workbook and reverse engineer the visualizations yourself if you'd like.

First things first, I need some data to explore. If you've ever taken our training you might recall the Gourmet Coffee & Bakery Company (GCBC) data that we use for our courses. Since I’m more interested in demonstrating what we can do with the visualizations and less interested in the actual data itself, this sample dataset will be more than suitable for my needs. I'll begin by pulling the relevant data into Tableau using Unify.

If you haven't already heard about Unify, it allows Tableau to seamlessly connect to OBIEE so that you can take advantage of the subject areas created there. Now that I have some data, let’s look at our average order history by month. To keep things simple, I’ve filtered so that we’re only viewing data for Times Square.

Average Orders for 2015-2016

On this simple visualization we can already draw some insights. We can see that the data is cyclical with a peak early in the year around February and another in August. We can also visually see the minimum number of orders in a month appears to be about 360 orders while the maximum is just under 400 orders.

When someone asks to see “average orders by month”, this is generally what people expect to see and depending upon the intended audience a chart like this might be completely acceptable. However, when we display aggregated data we no longer have any visibility into the variance of the underlying data.

Daily Orders

If we display the orders at the day level instead of month we can still see the cyclical nature of the data but we also can see additional detail and you’ll notice there’s quite a bit more “noise” to the data. We had a particularly poor day in mid-May of 2014 with under 350 orders. We’ve also had a considerable number of good days during the summer months when we cleared 415 orders.

Moving Average

Depending upon your audience and the dataset, some of these charts might include too much information and be too busy. If the viewer can’t make sense of what you’re putting in front of them there’s no way they’ll be able to discern any meaningful insights from the underlying dataset. Visualizations must be easy to read. One way to provide information about the volatility of the data but with less detail would be to use confidence bands, similar to how one might view stock data. In this example I’ve calculated and displayed a moving average, as well as upper and lower confidence bands using the 3rd standard deviation. Confidence bands show how much uncertainty there is in your data. When the bands are close you can be more confident in your results and expectations.

Orders by Month Orders by Day

An additional option is the use of a scatterplot. The awesome thing about a scatterplots is that not only does it allow you to see the variance of your data, but if you play with the size of your shapes and tweak the transparency just right, you also get a sense of density of your dataset because you can visualize where those points lie in relation to each other.

Boxplot

The final example I have for you is to show the distribution of your data using a boxplot. If you’re not familiar with boxplots, the line in the middle of the box is the median. The bottom and top of the box, known as the bottom and top hinge, give you the 25th and 75th percentiles respectively and the whiskers outside out the box show the minimum and maximum values excluding any outliers. Outliers are shown as dots.

I want to take a brief moment to touch on a fairly controversial subject of whether or not to include a zero value in your axes. When you have a non-zero baseline it distorts your data and differences are exaggerated. This can be misleading and might lead your audience into drawing inaccurate conclusions.

For example, a quick Google search revealed this image on Accuweather showing the count of tornados in the U.S. for 2013-2016. At first glance it appears as though there were almost 3 times more tornados in 2015 than in 2013 and 2014, but that would be incorrect.

On the flipside, there are cases where slight fluctuations in the data are extremely important but are too small to be noticed when the axis extends to zero. Philip Bump did an excellent job demonstrating this in his "Why this National Review global temperature graph is so misleading" article in the The Washington Post.

Philip begins his article with this chart tweeted by the National Review which appears to prove that global temperatures haven’t changed in the last 100 years. As he goes on to explain, this chart is misleading because of the scale used. The y-axis stretches from -10 to 110 degrees making it impossible to see a 2 degree increase over the last 50 years or so.

The general rule of thumb is that you should always start from zero. In fact, when you create a visualization in Tableau, it includes a zero by default. Usually, I agree with this rule and the vast majority of the time I do include a zero, but I don’t believe there can be a hard and fast rule as there will always be an exception. Bar charts are used to communicate absolute values so the size of that bar needs to be proportional to the overall value. I agree that bar charts should extend to zero because if it doesn’t we distort what the data is telling us. With line charts and scatterplots we tend to look at the positioning of the data points relative to each other. Since we’re not as interested in the value of the data, I don’t feel the decision to include a zero or not is as cut and dry.

The issue boils down to what it is you’re trying to communicate with your chart. In this particular case, I’m trying to highlight the uncertainty so the chart needs to draw attention to the range of that uncertainty. For this reason, I have not extended the axes in the above examples to zero. You are free to disagree with me on this, but as long as you’re not intentionally misleading your audience I feel that in instances such as these this rule can be relaxed.

These are only a few examples of the many ways to show uncertainty and variance within your data. Displaying the volatility of the data and giving viewers a level of confidence in the results is immensely powerful. Remember that while we can come up with the most amazing visualizations, if the results are misleading or misinterpreted and users draw inaccurate conclusions, what’s the point?

Categories: BI & Warehousing

OBIEE 12c Catalog Validation: Command Line

Rittman Mead Consulting - Fri, 2017-06-23 08:49
 Command Line

I wrote a blog post a while ago describing the catalog validation: an automated process performing a consistency check of the catalog and reporting or deleting the inconsistent artifacts.
In the post I stated that catalog validation should be implemented regularly as part of the cleanup routines and provides precious additional information during the pre and post upgrade phases.

However some time later I noted Oracle's support Doc ID 2199938.1 stating that the startup procedure I detailed in the previous blog post is not supported in any OBI release since 12.2.1.1.0. You can imagine my reaction...

 Command Line

The question then became: How do we run the catalog validation since the known procedure is unsupported? The answer is in catalog manager and the related command line call runcat.sh which, in the server installations (like the SampleApp v607p), can be found under $DOMAIN_HOME/bitools/bin.

How Does it Work?

As for most of command line tools, when you don't have a clue on how it works, the best approach is to run with the -help option which provides the list of parameters to pass.

Catalog Manager understands commands in the following areas:

Development To Production  
createFolder        Creates folder in the catalog  
delete            Deletes the given path from the catalog  
maintenanceMode        Puts the catalog into or out of Maintenance Mode (aka ReadOnly)  
...

Multi-Tenancy  
provisionTenant        Provisions tenants into a web catalog  
...

Patch Management  
tag            Tags all XML documents in a catalog with a unique id and common version string  
diff            Compares two catalogs  
inject            Injects a single item to a diff file  
...

Subject Area Management  
clearQueryCache        Clears the query cache  

Unfortunately none of the options in the list seems to be relevant for catalog validation, but with a close look at the recently updated Doc ID 2199938.1 I could find the parameter to pass: validate.
The full command then looks like

./runcat.sh -cmd validate

In my previous blog I mentioned different types of validation. What type of validation is the default command going to implement? How can I change the behaviour? Again the -help option provides the list of instructions.

# Command : -cmd validate -help 

validate        Validates the catalog

Description  
Validates the catalog

For more information, please see the Oracle Business Intelligence Suite  
Enterprise Edition's Presentation Services Administration Guide.

Syntax  
runcat.cmd/runcat.sh -cmd validate  
    [ -items (None | Report | Clean) [ -links (None | Report | Clean) ] [-folder <path{:path}>] [-folderFromFile <path of inclusion list file>] ] 
    [ -accounts (None | Report | Clean) [ -homes (None | Report | Clean) ] ] 
    -offline <path of catalog> 

Basic Arguments  
None

Optional Arguments  
-items (None | Report | Clean)        Default is 'Report' 
-links (None | Report | Clean)        Default is 'Clean'. Also, '-items' cannot be 'None'. 
-accounts (None | Report | Clean)        Default is 'Clean' 
-homes (None | Report | Clean)        Default is 'Report'. Also, '-accounts' cannot be 'None'. 
-folder <path{:path}>            Which folders in the catalog to validate
-folderFromFile <path of inclusion list file>            File containing folders in the catalog to validate

Common Arguments  
-offline <path of catalog>

-folderFromFile <folder from file>        ----- Sample Folder From File ------
                        /shared/groups/misc
                        /shared/groups/_filters
                        ------------------------------------

Example  
runcat.cmd/runcat.sh -cmd validate -offline c:\oraclebi\data\web\catalog\paint  

Few bits to notice:

  • -offline: the catalog validation needs to happen offline. Either with services down or on a copy of the live catalog. Running catalog validation on a online catalog is dangerous especially with "Clean" options since could delete content in use.
  • -folder: the catalog validation can be run only for a subset of the catalog
  • None | Report | Clean: each validation can be skipped (None), logged (Report) or solved via removal of the inconsistent object (Clean)
  • Also, '-accounts' cannot be 'None'.: some validations are a prerequisite for others to happen
  • Default is 'Clean': some validations have a "Clean" as default value, meaning that will solve the issue by removing the inconsistent object, this may be inappropriate in some cases.

As written before, the initial catalog validation should be done with all options set on Report since this will give a log file of all inconsistencies without deleting pieces of the catalog that could still be valuable. In order to do so the command to execute is:

./runcat.sh -cmd validate -items Report -links Report -accounts Report -homes Report -offline <path_to_catalog> > cat_validation.log

runcat.sh output is displayed direcly in the console, I'm redirecting it to a file called cat_validation.log for further analysis.

If, after the initial run with all options to Report you want the catalog validation utility to "fix" the inconsistent objects, just change the desired options to Clean. Please make sure to take a backup of the catalog before since the automatic fix is done by removing the related objects. Moreover ensure that catalog validation is working on a offline catalog. The command itself can work on top on a online catalog but is never a good idea checking a catalog that could potentially be changed while the tool is running.

The output

Let's see few examples of how Catalog Validation spots inconsistent objects. For the purpose of this test I'll work with Oracle's Sampleapp.

Abandoned and inaccessible homes

Running the validation against the Sampleapp catalog provides some "interesting" results: some homes are declared "abandoned": this could be due to the related user not existing anymore in weblogic console, but that's not the case

E10    saw.security.validate.homes Abandoned home /users/weblogic  

Looking deeper in the logs we can see that the same user folders are flagged as

User facing object '/users/weblogic' has no user permissions and is inaccessible  

Logging in with the user weblogic doesn't allow me to check the "My Folders" in the catalog. When switching to "Admin View" and trying to open "My Folder" I get the following error

 Command Line

As written in the logs looks like the user folder has permission problems. How can we solve this? One option is to use again the runcat.sh command with the forgetAccounts option to remove the inconsistent homes. However this solution deletes all the content related to the user that was stored under the "My Folders".

In order to keep the content we need to overwrite the folder's permission with an administrator account. Unfortunately, when right-clicking on the folder, the "Permission" option is not available.

 Command Line

As a workaround I found that clicking on Properties and then on Set Ownership of this item and all subitems allows you to grant full access to the administrator which is then able to reset the proper user the relevant access privilege.

 Command Line

Once the workaround is implemented the users is able to check his "My Folder" content, however the the errors are still present in catalog validation. The solution is storing the relevant artifacts in another part of the catalog, run runcat.sh with forgetAccounts option and then reimport the objects if needed.

Inconsistent Objects

The main two reasons generating inconsistent objects are:

  • Invalid XML: The object (analysis or dashboard) XML code is not valid. This can be caused by errors during the write to disk or problems during migrations.
  • Broken Links: analysis contained in a dashboard or linked from other analysis have been renamed or deleted.

Let's see how catalog validation shows the errors.

Invalid XML

To test this case I created a simple analysis with two columns and then went to the Advanced tab and deliberately removed an > to make the XML invalid.

 Command Line

When trying to applying the change I got the following error which denied me the possibility to save.

 Command Line

Since I really wanted to ruin my analysis I went directly to the file system under $BI_HOME/bidata/service_instances/ssi/metadata/content/catalog/root/shared/$REQUEST_PATH and changed the XML directly there.

After than I run the catalog validation with only the flag items equal to Report and the rest set to None since I'm looking only at invalid XMLs.
The result as expected is:

Message: Unterminated start tag, 'saw:column', Entity publicId: /app/oracle/biee/user_projects/domains/bi/bidata/service_instances/ssi/metadata/content/catalog/root/shared/rm+demo/notworkinanalysis, Entity systemId: , Line number: 9, Column number: 13  

Which tells me that my analysis notworkinganalysis is invalid with an unterminated start tag, exactly the error I was expecting. Now I have two choices: either fixing the analysis XML manually or rerunning the catalog validation with option Clean which will delete the analysis since it's invalid. As said before there is no automated fix.

I wanted to do a further example on this, instead of removing the >, i removed a quotation mark " to make the analysis invalid

 Command Line

After clicking to Apply OBIEE already tells me that there is something wrong in the analysis. But since it allows me to save and since I feel masochist I saved the analysis.

 Command Line

But... when running the catalog validation as before I end up seeing 0 errors related to my notworkinganalysis.

 Command Line

The answer to Jackie Chan question is that I got 0 errors since in this second case the XML is still valid. Removing a " doesn't make the XML syntax invalid! In order to find and solve that error we would need to use Oracle's Baseline Validation Tool.

Broken Links

To test the broken links case I created the following scenario:

  • Analysis SourceAnalysis which has navigation action to TargetAnalysis

 Command Line

  • Dashboard TestDashboard which contains the TargetAnalysis object.

In order to break things I then deleted the TargetAnalysis.

 Command Line

Running catalog validation with the option links to Report. As expected I get a line

N1    saw.catalog.impl.scour.validateDeadLink Referenced path /shared/RM Demo/TargetAnalysis in file /shared/RM Demo/_portal/TestDashboard/page 1 is inaccessible.  

But I don't get anything on the SourceRequest object, for which navigation is failing.

 Command Line

But if instead of an action link I use TargetAnalysis to filter the results of SourceAnalysis

 Command Line

And then delete TargetAnalysis, I get the expected error:

N1    saw.catalog.impl.scour.validateDeadLink Referenced path /shared/RM Demo/TargetAnalysis in file /shared/RM Demo/SourceAnalysis is inaccessible

Summarizing the broken link validation reports if missing objects are included in the main definition of other objects (as filters or as parts of dashboards) but doesn't seem to report if the missing object is only linked via an action.

Conclusion

My experiments show that catalog validation finds some errors like invalid homes, XML files and broken links which otherwise users would hit at the run-time and that won't make them happy. But there are still some errors which it doesn't log like analysis with wrong column syntax, luckily for most of the cases other tools like the Baseline Validation can spot them easily so use all you have, use as frequently as possible and if you want more details about how it works and how it can be included in the automatic checks for code promotions don't hesitate to contact us!

Categories: BI & Warehousing

Getting Smarter in Renting with Tableau 10

Rittman Mead Consulting - Thu, 2017-06-22 03:26
Preface

Not a long time ago a friend of mine spent a significant amount of time trying to find a flat to rent. And according to what he said it wasn't an easy task. It took him a decent time and efforts to find something that is big enough (but not too big) not too far from a workplace, had required features and affordable at the same time. And as a specialist in data analysis, I prefer to think about this task as a data discovery one (yes, when you have a hammer everything looks like a nail). And I decided to see if a data analysis tool can help me understand the rental market better. I'm sure you've already read the name of this post so I can't pretend I'm keeping intrigue. This tool is Tableau 10.3.

The Data

The friend I was talking before was looking for a flat in Moscow, but I think that this market is completely unknown to the most of the readers. And also I'd have to spend a half of time translating everything into English so for this exercise I took Brighton and Hove data from http://rightmove.co.uk and got a nice JSON Lines file. JSON Lines files are basically the same JSON as we all know but every file has multiple JSONs delimited by a newline.

{json line #1}
{json line #2}
...
{json line #n}

That could be a real problem but luckily Tableau introduced JSON support in Tableau 10.1 and that means I don't have to transform my data to a set of flat tables. Thanks to Tableau developers we may simply open JSON Lines files without any transformations.

Typical property description looks like this:

10

It has a few major blocks:

  • Property name - 2 bedroom apartment to rent
  • Monthly price - £1,250
  • Description tab:
    • Letting information - this part is more or less standard and has only a small number of possible values. This part has Property name: Property value structure ('Date available':'Now').
    • Key features - this part is an unformalized set of features. Every property may have its own unique features. And it is not a key-value list like Letting information, but a simple list of features.
    • Full description - simply a block of unstructured text.
  • Nearest stations - shows three nearest train stations (there could be underground stations too if they had it in Brighton).
  • School checker - this shows 10 closest primary and 10 secondary schools. For this, I found a kind of API which brought me a detailed description of every school.

And finally, the JSON for one property has the following look. In reality, it is one line but just to make it more easy to read I formatted it to a human readable format. And also I deleted most of the schools' info as it is not as important as it is huge.


Property JSON

{  
   "furnish":"Unfurnished",
   "key_features":[  
      "LARGE BRIGHT SPACIOUS LOUNGE WITH PATIO DOORS",
      "FULLY FITTED KITCHEN",
      "TWO DOUBLE BEDROOMS WITH WARDROBES",
      "A FURTHER SINGLE BEDROOM/OFFICE/STUDY",
      "A GOOD SIZED SHOWER ROOM ",
      "SINGLE GARAGE AND ON STREET PARKING",
      "EASY ACCESS TO THE CITY CENTRE OF CHICHESTER AND COMMUTER ROUTES. ",
      "TO ARRANGE A VIEWING PLEASE CONTACT US ON 01243 839149"
   ],
   "property_price_week":"£254 pw",
   "nearest_stations":[  
      {  
         "station_name":"Fishbourne",
         "station_dist":"(0.4 mi)"
      },
      {  
         "station_name":"Chichester",
         "station_dist":"(1.2 mi)"
      },
      {  
         "station_name":"Bosham",
         "station_dist":"(1.7 mi)"
      }
   ],
   "letting_type":"Long term",
   "secondary_schools":{  
      "schools":[  
         {  
            "distance":"0.6 miles",
            "ukCountryCode":"ENG",
            "name":"Bishop Luffa School, Chichester",
           ...
         }]
    }
   "url":"http://www.rightmove.co.uk/property-to-rent/property-66941567.html",
   "date_available":"Now",
   "date_reduced":"",
   "agent":"On The Move, South",
   "full_description":"<p itemprop=\"description\">We are delighted to bring to market, this fabulous semi detached bungalow ... </p>",
   "primary_schools":{  
      "schools":[  
         {  
            "distance":"0.3 miles",
            "ukCountryCode":"ENG",
            "name":"Fishbourne CofE Primary School",
         }]
    }
   },
   "property_address":[ "Mill Close, Chichester, West Sussex, PO19"],
   "property_name":"3 bedroom bungalow to rent",
   "date_added":"08 June 2017 (18 hours ago)",
   "property_price_month":"£1,100 pcm",
   "let_agreed":null,
   "unknownown_values":"",
   "deposit":"£1384"
}

The full version is here: 6391 lines, I warned you. My dataset is relatively small and has 1114 of such records 117 MB in total.

Just a few things I'd like to highlight. Letting information has only a small number of fixed unique options. I managed to parse them to fields like furnish, letting_type, etc. Key Features list became just an array. We have thousands of various features here and I can't put them to separate fields. Nearest stations list became an array of name and value pairs. My first version of the scrapper put them to a key-value list. Like this:

"nearest_stations":[  
      "Fishbourne": "(0.4 mi)",
      "Chichester": "(1.2 mi)",
      "Bosham": "(1.7 mi)"
      ]

but this didn't work as intended. I got around one hundred of measures with names Fishbourne, Chichester, Bosham, etc. Not what I need. But that could work well if I had only a small number of important POIs (airports for example) and wanted to know distances to this points. So I changed it to this and it worked well:

"nearest_stations":[  
      {  
         "station_name":"Fishbourne",
         "station_dist":"(0.4 mi)"
      },
      {  
         "station_name":"Chichester",
         "station_dist":"(1.2 mi)"
      },
      {  
         "station_name":"Bosham",
         "station_dist":"(1.7 mi)"
      }
   ]
Connect to the Data

When I started this study my knowledge of the UK property rent market was close to this:

20

And it's possible or even likely that some of my conclusions may be obvious for anyone who is deep in the topic. In this blog, I show how a complete newbie (me) can use Tableau and become less ignorant.

So my very first task was to understand what kind of objects are available for rent, what are their prices and so on. That is the typical task for any new subject area.

As I said before Tableau 10 can work with JSON files natively but the question was if it could work with such a complex JSON as I had. I started a new project and opened my JSON file.

30

I expected that I will have to somehow simplify it. But in reality after a few seconds of waiting Tableau displayed a full structure of my JSON and all I had to do was selecting branches I need.

40

After a few more seconds I got a normal Tableau data source.

50

And this is how it looked like in analysis mode

55

First Look at the Data

OK, let's get started. The first question is obvious: "What types of property are available for rent?". Well, it seems that name ('2 bedroom apartment to rent') is what I need. I created a table report for this field.

60

Well, it gives me the first impression of what objects are offered and what my next step should be. First of all the names are ending with "to rent". This just makes strings longer without adding any value. The word "bedroom" also doesn't look important. Ideally, I'd like to parse these strings into fields one of which is # of bedrooms and the second one is Property type. The most obvious action is to try Split function.

80

Well, it partially worked. This function is smart enough and removed 'to rent' part. But except for this, it gave me nothing. On other datasets (other cities) it gave me much better results but it still wasn't able to read my mind and did what I wanted:

85

But I spent 15 seconds for this and lost nothing and if it worked I'd saved a lot of time. Anyway, I'm too old to believe in magic and this almost didn't hurt my feelings.

Some people, when confronted with a problem, think “I know, I'll use regular expressions.” Now they have two problems.

Yes, this string literally asks some regular expressions wizardry.

90

I can easily use REGEXP_EXTRACT_NTH and get what I want. Group 1 is the number of bedrooms and Group 3 is the property type. Groups 2 and 4 are just constant words.

100

Explanation for my regular expressionI can describe most of the names in the following way: "digit bedroom property type to rent" and the rest are "property type to rent. So digit and bedroom are optional and property type to rent are mandatory. The expression is easy and obvious: ([0-9]*)( bedroom )*(.*)( to rent)

Regular expressions are one of my favourite hammers and helped me a lot for this analysis. And after all manipulations, I got a much better view of the data (I skipped some obvious steps like create a crosstab or a count distinct measure to save space for anything more interesting).

110

And while this result looks pretty simple it gives me the first insight I can't get simply browsing the site. The most offered are 1 and 2 bedroom properties especially flats and apartments. And if a family needs a bigger something with 4 or 5 bedrooms, well I wish them good luck, not many offers to chose from. Also if we talk about living property only we should filter out things like GARAGE, PARKING or LAND.

120 130

I think both charts work pretty well. The first one presents a nice view of how flats and apartments outnumber all other types and the second one gives a much better understanding of how many of 2 bedroom properties offered compared to all others.

And while I'm not a big fan of fancy visualisations but if you need something less formal and more eye-catching try Bubbles chart. It's not something I'd recommend for an analysis but may work well for a presentation. Every bubble represents particular property type, colour shows a number of bedrooms and size shows the number of properties.

140

Going Deeper

The next obvious question is the price. How much do different properties cost? Is any particular one more expensive than average or less? What influences the price?

As a baseline, I'd like to know what is the average property price. And I obviously don't want just one figure for the city-wide price. It's meaningless. Let's start with a bar chart and see what is the range of prices.

145

Well, we have a lot of options. Flat share costs less than £700 or we may choose a barn for more than £3600. Again a very simple result but I can't get it directly from the site.

The next obvious question is how the number of bedrooms affects the price. Does the price skyrockets with every additional bedroom or maybe more bedrooms mean smaller rooms and price increases not too fast?

150

Well, this chart gives me the answer but it looks bad. Mostly because a lot of properties types don't have enough variance in room number. Studio flats have only one bedroom by definition and the only converted barn has 7 bedrooms. I'd like to remove types which don't have at least 3 options and see how the price changes. For this, I created a new computed field using fixed keyword. It counts the number of bedroom options by property type.

160

And then I use it in the filter 'Bedroom # variance' at least 3. Now I have a much more clean view. And I can see that typically more bedrooms mean significantly higher price with a few exceptions. But in fact, these are not actual exceptions just a problem of a small dataset. I can say that increase in # bedrooms certainly means a significant increase in price. And one more insight. Going above 7 bedrooms may actually double the price.

170

Averages are good but they hide important information of how prices are distributed. For example, six properties priced £1K and one £200 give average £885. And looking at average only may make you think that with £900 you may choose one of 7 options. It's very easy to build a chart to check this. Just create a new calculation called Bins and use in a chart.

180 190

With £100 bins I got the following chart. It shows how many properties have price falling to a particular price range. For example, the £1000 bin shows # of properties with prices £1000-£1100.

200

The distribution looks more or less as expected but the most interesting here is that £1000-£1100 interval seems to be very unpopular. Why? Let's add # of bedrooms to this chart.

210

£1000 is too expensive for 1 bedroom and studios but too cheap for two. Simple. What else can we do here before moving further? Converting this chart to a running total gives a cool view.

220

What can this chart tell us? For example, if we look at the orange line (2 bedrooms) we will find that with £1200 we may choose among 277 of 624 properties. With £1400 budget we have 486 of 624. Further £200 increase in budget won't significantly increase the number of possibilities and if the change from £1200 to £1400 almost doubled the number of possibilities, the next £200 give only 63 new options. I don't have a ready to use insight here, but I got a way to estimate a budget for a particular type of property. With budget £X I will be able to choose one of N properties.

Why It Costs What It Costs

OK, now I know a lot of statistics about prices. And my next question is about factors affecting the price. I'd like to understand does a particular property worth what it cost or not. Of course, I won't be able to determine exact price but even hints may be useful.

The first hypothesis I want to check is if a train station near raises the price or it isn't any important. I made a chart very similar to the previous one and it seems that Pareto principle works perfectly here. 80% or properties are closer than 20% of the maximum distance to a station.

230

But this chart doesn't say anything about the price it just gives me the understanding of how dense train stations are placed. I'd say that most of the properties have a station in 10-15 minutes of walking reach and therefore this should not significantly affect the price. My next chart is a scatter plot for price and distance. Every point is a property and its coordinates on the plot determined by its price and distance to the nearest station. Colour shows # of bedrooms.

240

I'd say that this chart shows no clear correlation between price and distance. And a more classical line chart shows that.

250

The maximum price slightly decreases with distance, minimum price on the contrary increases. Average price more or less constant. I think the hypothesis is busted. There is no clear correlation between the distance a tenant have to walk to a station and the price he has to pay. If you want to rent something and the landlord says that the price is high because of a train station near, tell him that there are stations all around and he should find something more interesting.

What about furnishings? Does it cheaper to get an unfurnished property or a landlord will be happy to meet someone who shares his taste?

260

Unfurnished property is definitely cheaper. And it's interesting that in some cases partly furnished even cheaper than completely unfurnished. But at least for furnished/unfurnished, we can see a clear correlation. When you see a furnished one for the price of unfurnished this may be a good pennyworth.

Another thing I'd like to check. Can we expect I lower price for a property not available immediately? Or is, on the contrary, the best price is offered for already unoccupied properties?

As always start with a general picture. What is the average time of availability by property types?

270

For most popular types it is about one month and if you have a house you typically publish it two or three months in advance. And what is about the price? One more chart that I like in Tableau. In the nutshell, it is a normal line chart showing an average price by days before property availability. But the thickness of lines shows the number of properties at the same time. So I can see not only the price but reliance too. A thick line means it was formed by many properties and a thin line may be formed by few properties and move up or down significantly then something changes. It would be very interesting to get a historical data and see how much time properties stay free or how long it takes before the price is reduced, but unfortunately, I don't have this data.

280

And looking at this chart I'd say that there is no statistically significant dependency for price and availability date. Renting a property available in the distant future won't save you money* (*=statistically).

And the last thing I'd like to investigate is the Key features. What do landlords put as the key features of their properties? How do they affect the price?

The list of popular Key features surprised me.

290

'Unfurnished' looks good to me, it is a really significant part of the deal. But 'Brighton'? For properties in Brighton? '1 Bedroom'. How many bedrooms can '1 bedroom flat to rent' have? Oh, there is a key feature saying '1 bedroom' now I know. But jokes aside. I had to make a lot of cleaning on this data before I could use it. There are six ways to write 'Modern kitchen'. Make everything upper case, then remove quotes, strip spaces and tabs, remove noisy features like 'stylish 1 bedroom apartment' and so on. After this, I got a slightly better list with approximately 3500 features instead of 4500. Note how all variants of writing 'GAS CENTRAL HEATING' now combined into one most popular feature. But there are still too many features. I'm sure that there should be not more than a hundred of them. Even at this screenshot you may see 'Unfurnished' and 'Unfurnished property' features.

300

When I need a visualisation for this amount of points, bar charts or tables won't play well. My weapon of choice is Scatter plot. Every point is a particular feature, axes are minimum and average prices of it, size is determined by the number of properties declaring to have this feature and the colour is the maximum price. So if a feature is located high on the plot it means that in average it will be expensive to have it. If this feature at the same time located close to the left side even cheap properties may have it. For example, if you want a swimming pool be ready to pay at least £3000 and £7000 in average. And the minimum price for tumble dryer is £3250 but average £3965. The cheapest property with a dryer is more expensive than with a pool, but in average pools are more expensive. That is how this chart works.

310

The problems of this chart are obvious. It is littered with unique features. Only one property has 4 acres (the point in top right corner). And actually not so many swimming pools are available for rent in Brighton. I filtered it by "# of properties > 25" and here is how prices for the most popular features are distributed.

320

Central location will cost you at least £100 and £1195 in average and for Great location be ready to pay at least £445 and £1013 in average. Great location seems to be less valuable than the central one.

And now I can see how a particular feature impacts prices. For example 'GAS HEATING'. I made a set with all variants of heating I could find ('GAS CENTRAL HEATING', 'GAS HEAT' and so on). Now I can analyse how this feature impacts properties. And here is how it impacts the price of flats. Blue circles are properties with gas heating and orange are without.

330

Very interesting in my opinion. The minimum price of properties with gas heating (blue circles) is higher than without. That is expected. But average price for properties without gas heating is higher.

And here are kitchen appliances. For 1 bedroom flats, they increase both minimum and average prices significantly. But for bigger flats minimum price with appliances is higher and average price is lower. Possible this option is important for relatively cheap properties, but its weight is not that big for the bigger ones.

340

Summary

350

Categories: BI & Warehousing

Rittman Mead at Kscope 2017

Rittman Mead Consulting - Wed, 2017-06-21 07:45
Rittman Mead at Kscope 2017

Rittman Mead will be well represented in San Antonio, Texas next week for Kscope 17 with some of our best from both sides of the Atlantic! Our very own Francesco Tisiot and Jordan Meyer will present various topics as well as participate in the conference events. Also, the newly named ODTUG BI Community Lead, Rittman Mead's Becky Wagner, will be on hand and leading a lot of activities throughout. See details below and we hope to see you in Texas.

Jordan

Oracle Big Data Spatial and Graph enables the analysis of data sets beyond that of standard relational analytics commonly used. Through graph technology relationships can be identified that may not otherwise have been. This has practical uses including in product recommendations, social network analysis, and fraud detection.

In this presentation we will see a practical demonstration of Oracle Big Data Spatial and Graph to load and analyze the "Panama Papers" data set. Graph algorithms will be utilized to identify key actors and organizations within the data, and patterns of relationships shown. This practical example of using the tool will give attendees a clear idea of the functionality of the tool and how it could be used within their own organization.

When: Jun 27, 2017, Tuesday Session 7 , 11:15 am - 12:15 pm
Room: Magnolia

Francesco

OBIEE 12c is the latest generation of Oracle's Enterprise analytics and reporting tool, bringing with it many powerful new features. Many users are still on earlier releases of OBIEE 11g or even 10g, and are looking to understand how they can move to OBIEE 12c to benefit from its new capabilities.

Liberty Global is a global telecommunications company, with a long history with OBIEE going back to 10g. They wanted to move to OBIEE 12c in order to use the new Advanced Analytics options, and used Rittman Mead to support them with the full scope of the upgrade.

In this presentation, we will see what a highly successful OBIEE 12c migration looks like. We will cover clear details of all the steps required, and discuss some of the problems encountered. Regression testing is a crucial step in any upgrade and we will show how we did this efficiently and accurately with the provided Baseline Validation Tool. This presentation will assist all attendees who are considering, or in the process of, an OBIEE 12c upgrade.

When: Jun 26, 2017, Monday Session 5 , 4:45 pm - 5:45 pm
Room: Wisteria/Sunflower

And

As a DBA or sysadmin responsible for OBIEE how do you really dig into the guts of OBIEE, look at intra-component communication between the system components and examine the apparently un-examinable? What do you do when you need to trace activity beyond what is in the log files? How do you work with log files in order to give precise but low-level information? What information can be gleaned, by hook or by crook, from OBIEE?

OBIEE provides a set of systems management and diagnostic tools, but these only take you so far. Join me in this presentation to dive deeper with OBIEE. We will take a look at a bag of tricks including undocumented configuration options, flame graphs, system call tracing, discovering undocumented REST APIs, and more! This is not just a geek-out - this is real-life examples of where client OBIEE projects have required that next level of diagnostic techniques and tools. Don your beanie hat and beard as we go deep!

When: Jun 28, 2017, Wednesday Session 12 , 9:45 am - 10:45 am
Room: Wisteria/Sunflower

Becky

Becky Wagner is the new ODTUG BI Community Lead. You will find her at:

Monday Community Lunch | 12:45 – 2:00 PM | Grand Oaks K-S

Monday evening BI Community Night | 8:00 - 10:00 PM | Grand Oaks H http://kscope17.com/events/community-nigh-events

She will be doing the 5K Fun Run http://kscope17.com/events/kscope17-5k on Tuesday morning

Women in Technology Lunch | 12:15– 1:45 PM | Cibolo Canyon 6 on Wednesday https://form.jotformpro.com/71134693041955

Navigating the Oracle Business Analytics Frontier Panel
9:00 AM - 11:00 AM, Cibolo Canyon 8/9/10
http://kscope17.com/content/thursday-deep-dive-sessions

Categories: BI & Warehousing

Unify: Could it be any easier?

Rittman Mead Consulting - Mon, 2017-06-19 09:00

Rittman Mead’s Unify is the easiest and most efficient method to pull your OBIEE reporting data directly into your local Tableau environment. No longer will you have to worry about database connection credentials, Excel exports, or any other roundabout way to get your data where you need it to be.

Unify leverages OBIEE’s existing metadata layer to provide quick access to your curated data through a standard Tableau Web Data Connector. After a short installation and configuration process, you can be building Tableau workbooks from your OBIEE data in minutes.

This blog post will demonstrate how intuitive and easy it is to use the Unify application. We will only cover using Unify and it’s features, as once the data gets into Tableau it can be used the same as any other Tableau Data Source. The environment shown already has Unify installed and configured, so we can jump right in and start using the tool immediately.

To start pulling data from OBIEE using Unify, we need to create a new Web Data Connector Data Source in Tableau. This data source will prompt us for a URL to access Unify. In this instance, Unify is installed as a desktop application, so the URL is http://localhost:8080/unify.

Once we put in the URL, we’re shown an authentication screen. This screen will allow us to authenticate against OBIEE using the same credentials. In this case, I will authenticate as the weblogic user.

Once authenticated, we are welcomed by a window where we can construct an OBIEE query visually. On the left hand side of the application, I can select the Subject Area I wish to query, and users are shown a list of tables and columns in the selected Subject Area. There are additional options along the top of the window, and I can see all saved queries on the right hand side of the window.

The center of the window is where we can see the current query, as well as a preview of the query results. Since I have not started building a query yet, this area is blank.

Unify allows us to either build a new query from scratch, or select an existing OBIEE report. First, let’s build our own query. The lefthand side of the screen displays the Subject Areas and Columns which I have access to in OBIEE. With a Subject Area selected, I can drag columns, or double click them, to add them to the current query. In the screenshot above, I have added three columns to my current query, “P1 Product”, “P2 Product Type”, and “1 - Revenue”.

If we wanted to, we could also create new columns by defining a Column Name and Column Formula. We even have the ability to modify existing column formulas for our query. We can do this by clicking the gear icon for a specific column, or by double-clicking the grey bar at the top of the query window.

It’s also possible to add filters to our data set. By clicking the Filter icon at the top of the window, we can view the current filters for the query. We can then add filters the same way we would add columns, by double clicking or dragging the specific column. In the example shown, I have a query on the column “D2 Department” where the column value equals “Local Plants Dept.”.

Filters can be configured using any of the familiar methods, such as checking if a value exists in a list of values, numerical comparisons, or even using repository or session variables.

Now that we have our columns selected and our filters defined, we can execute this query and see a preview of the result set. By clicking the “Table” icon in the top header of the window, we can preview the result.

Once we are comfortable with the results of the query, we can export the results to Tableau. It is important to understand that the preview data is trimmed down to 500 rows by default, so don’t worry if you think something is missing! This value, and the export row limit, can be configured, but for now we can export the results using the green “Unify” button at the top right hand corner of the window.

When this button is clicked, the Unify window will close and the query will execute. You will then be taken to a new Tableau Workbook with the results of the query as a Data Source. We can now use this query as a data source in Tableau, just as we would with any other data source.

But what if we have existing reports we want to use? Do we have to rebuild the report from scratch in the web data connector? Of course not! With Unify, you can select existing reports and pull them directly into Tableau.

Instead of adding columns from the lefthand pane, we can instead select the “Open” icon, which will let us select an existing report. We can then export this report to Tableau, just as before.

Now let’s try to do something a little more complicated. OBIEE doesn’t have the capability to execute queries across Subject Areas without common tables in the business model, however Tableau can perform joins between two data sources (so long as we select the correct join conditions). We can use Unify to pull two queries from OBIEE from different Subject Areas, and perform a data mashup with the two Subject Areas in Tableau.

Here I’ve created a query with “Product Number” and “Revenue”, both from the Subject Area “A - Sample Sales”. I’ve saved this query as “Sales”. I can then click the “New” icon in the header to create a new query.

This second query is using the “C - Sample Costs” Subject Area, and is saved as “Costs”. This query contains the columns “Product Number”, “Variable Costs”, and “Fixed Costs”.

When I click the Unify button, both of these queries will be pulled into Tableau as two separate data sources. Since both of the queries contain the “Product Number” column, I can join these data sources on the “Product Number” column. In fact, Tableau is smart enough to do this for us:

We now have two data sets, each from a different OBIEE subject area, joined and available for visualization in Tableau. Wow, that was easy!

What about refreshing the data? Good question! The exported data sources are published as data extracts, so all you need to do to refresh the data is select the data source and hit the refresh button. If you are not authenticated with OBIEE, or your session has expired, you will simply be prompted to re-authenticate.

Using Tableau to consume OBIEE data has never been easier. Rittman Mead’s Unify allows users to connect to OBIEE as a data source within a Tableau environment in an intuitive and efficient method. If only everything was this easy!

Interested in getting OBIEE data into Tableau? Contact us to see how we can help, or head over to https://unify.ritt.md to get a free Unify trial version.

Categories: BI & Warehousing

Unify - An Insight Into the Product

Rittman Mead Consulting - Thu, 2017-06-15 06:00
Unify - An Insight Into the Product

Monday, 12 Jun saw the official release of Unify, Rittman Mead's very own connector between Tableau and OBIEE. It provides a simple but powerful integration between the two applications that allows you to execute queries through OBIEE and manipulate and render the datasets using Tableau.

Unify - An Insight Into the Product

Why We Made It

One of the first questions of course would be why we would want to do this in the first place. The excellent thing about OBI is that it acts as an abstraction layer on top of a database, allowing analysts to write efficient and secure reports without going into the detail of writing queries. As with any abstraction, it is a trade of simplicity for capability. Products like Tableau and Data Visualiser seek to reverse this trade, putting the power back in the hands of the report builder. However, without quoting Spiderman, care should be taken when doing this.

The result can be that users write inefficient queries, or worse still, incorrect ones. We know there will be some out there that use self service tools as purely a visualisation engine, simply dropping pre-made datasets into it. If you are looking to produce sustainable, scalable and accessible reporting systems, you need to tackle the problem both at the data acquisition stage as well as the communication stage at the end.

If you are already meeting both requirements, perhaps by using OBI with Data Visualiser (formerly Visual Analyser) or by other means then that's perfectly good. However, We know from experience that there are many of you out there that have already invested heavily into both OBI and Tableau as separate solutions. Rather than have them linger in a state of conflict, we'd rather we nurse them into a state of symbiosis.

The idea behind Unify is that it bridges this gap, allowing you to use your OBIEE system as an efficient data acquisition platform and Tableau as an intuitive playground for users who want to do a a bit more with their data. Unify works by using the Tableau Web Data Connector as a data source and then our customised software to act as an interface for creating OBIEE queries and them exporting them into Tableau.

How It Works

Unify uses Tableau's latest Web Data Connector data source to allow us to dynamically query OBIEE and extract data into Tableau. Once a dataset is extracted into Tableau, it can be used with Tableau as normal, taking advantages of all of the powerful features of Tableau. This native integration means you can add in OBIEE data sources just as you would add in any others - Excel files, SQL results etc. Then you can join the data sources using Tableau itself, even if the data sources don't join up together in the background.

First you open up Tableau and add a Web Data Connector source:

Unify - An Insight Into the Product

Then give the link to the Unify application, e.g. http://localhost:8080/unify. This will open up Unify and prompt you to login with your OBIEE credentials. This is important as Unify operates through the OBIEE server layer in order to maintain all security permissions that you've already defined.

Unify - An Insight Into the Product

Now that the application is open, you can make OBIEE queries using the interface provided. This is a bit like Answers and allows you to query from any of your available subject areas and presentation columns. The interface also allows you to use filtering, column formulae and OBIEE variables much in the same way as Answers does.

Alternatively, you can open up an existing report that you've made in OBIEE and then edit it at your leisure. Unify will display a preview of the dataset so you can tweak it until you are happy that is what you want to bring into Tableau.

Unify - An Insight Into the Product

Once you're happy with your dataset, click the Unify button in the top right and it will export the data into Tableau. From this point, it behaves exactly as Tableau does with any other data set. This means you can join your OBIEE dataset to external sources, or bring in queries from multiple subject areas from OBIEE and join them in Tableau. Then of course, take advantage of Tableau's powerful and interactive visualisation engine.

Unify - An Insight Into the Product

Unify Server

Unify comes in desktop and server flavours. The main difference between the two is that the server version allows you to upload Tableau workbooks with OBIEE data to Tableau Server and refresh them. With the desktop version, you will only be able to upload static workbooks that you've created, however with the server version of Unify, you can tell Tableau Server to refresh data from OBIEE in accordance with a schedule. This lets you produce production quality dashboards for your users, sourcing data from OBIEE as a well as any other source you choose.

Unify Your Data

In a nutshell, Unify allows you to combine the best aspects of two very powerful BI tools and will prevent the need for building all of your reporting artefacts from scratch if you already have a good, working system.

I hope you've found this brief introduction to Unify informative and if you have OBIEE and would like to try it with Tableau, I encourage you to register for a free desktop trial. If you have any questions, please don't hesitate to get in touch.

Categories: BI & Warehousing

Unify: See Your Data From Every Perspective

Rittman Mead Consulting - Mon, 2017-06-12 09:09
 See Your Data From Every Perspective

 See Your Data From Every Perspective

Ad hoc access to accurate and secured data has always been the goal of business intelligence platforms. Yet, most fall short of balancing the needs of business users with the concerns of IT.

Rittman Mead has worked with hundreds of organizations representing all points on the spectrum between agility and governance. Today we're excited to announce our new product, Unify, which allows Tableau users to directly connect to OBIEE, providing the best of both worlds.

Governed Data Discovery

Business users get Tableau's intuitive data discovery features and the agility they need to easily blend their departmental data without waiting on IT to incorporate it into a warehouse. IT gets peace of mind, knowing their mission-critical data is protected by OBIEE's semantic layer and row-level security.

Unify Essentials

Unify runs as a desktop app, making it easy for departmental Tableau users to connect to a central OBIEE server. Unify also has a server option that runs alongside OBIEE, for organizations with a large Tableau user base or those using Tableau Server.

Desktop installation and configuration is simple. Once installed, users can query OBIEE from within Tableau with just a few clicks. Have a look at these short videos demonstrating setup and use of Unify.

Available Today

Download your free 7-day trial of Unify Desktop here.

No Tableau Desktop license? No problem. Unify is compatible with Tableau Public.

Categories: BI & Warehousing

OAC: Essbase and DVCS

Rittman Mead Consulting - Wed, 2017-06-07 09:00

Finally managed to get around to having a proper look at Essbase within Oracle Analytics Cloud Service (OAC) after a busy couple of months. This post focusses mainly on initial impressions on the ‘out of the box’ the Essbase side of this - which we will explore in more detail in future posts, as well as more detail on the use of Essbase with DVCS.

Using Essbase with DVCS

One of the features we are keen to explore more in this context is the integration of Essbase and the Data Visualisation Cloud Service (DVCS). One point that we found that we do not think is being expressed clearly anywhere else we have seen is how to configure this: In setting up our OAC instance, we were having difficulty coming up with a combination of configuration selections that enables Essbase and DV to work at the same time.

Oracle documentation (such as the price list) suggest that both should be available within Standard Edition OAC:

But Doc ID 2265410.1 on MoS suggests, by needing to add a security rule to the Essbase OAC, that two OAC instances are required. We could not find any reference to this requirement in Oracle documentation or blogs on the subject, but it transpires after checking with Oracle that this is indeed the case – Essbase and DV need to be on separate OAC instances.

Essbase

Looking purely at Essbase, my initial reaction is very positive…whilst the interface is different (I am sure tears will be shed for EAS & Studio in the foreseeable future…although given the way some stalwarts are still clinging on the last surviving copies of the Excel Add In, maybe not too imminently), once the surface of the new interface is scratched more...ahem…’seasoned’ developers will take comfort from being able to do a lot of the same things as they currently can. I am also confident it will fulfil one of the stated objectives in making it easier for non-experts to be able quickly and easily deploy cubes for analysis purposes.

Whilst the manual application and cube maintenance tools through the OAC front-end seem resilient and work effectively, I think some aspects will be difficult to use as the primary maintenance method in a production system - the ‘breadcrumb’ method afforded to dimension maintenance in particular will start to get fiddly to use with a dimension of any sort of volume. The application and cube Import (from a formatted Excel spreadsheet) facility is great - to my mind, a bit like a supercharged and easier-to-use Outline Load Utility in Hyperion Planning - and the ability to refresh the spreadsheet from a deployed cube is a good feature that shouldn’t have been taken for granted. I know Excel is regarded as the Devil’s work in some BI quarters…I personally don’t feel that way until it is being used as a database (or as some form of primary data storage)…but in this context, it is quick & easy to use, on most people’s desktops straightway, and is intuitive.

Still in the Excel corner, on the Smartview side, the addition of the Cube Designer extension (requiring Smartview 11.1.2.5.700) to be able to consider & change the more generic aspects (not members) of the ‘cube maintenance’ spreadsheets is a nice touch that makes this more straightforward and removes the need to pay strict attention to the spreadsheet layout. The ‘treeview’ style hierarchy viewer also helps make sense of the parent-child members that need to be detailed on the individual dimension tabs.

One issue that has flitted across my mind at this early stage is that of rules files. Whilst the Import facility creates these for you (as with creating a cube from Essbase Studio) which is welcome, and rules files created in an on-prem system can be uploaded (again, welcome), the on-board rules file editor is text based:

I’m not too sure how many people have created or edited rules files like this before (although I’d hazard a guess), but whilst the presence of any means to create, amend, or even tweak a file is good, it remains to be seen how usable this approach is. The alternative is to resubmit from the maintenance spreadsheet thus getting it created / amended for you or to maintain in on-prem system…but seeing as this platform is an alternative to (rather than an augmentation of) on prem for a lot of people, I’m not sure how practical this is.

Whilst the existing tools look really promising, I can’t help but think there will be occasions going forwards where it might be advantageous to be able to create a rules file to run an uploaded file outside of them: time will tell.

The Command Line Tool (downloadable from OAC-Essbase / Utilities) is a little limited at the moment, but goes some way towards filling the potential gap left by the absence of client-side EssMsh and can only grow with further releases: from the Oracle OAC documentation...

In conclusion, first impressions are very favourable. There are changes (eg Security), new features (eg Sandboxing), and I am sure there will be gaps for those considering moving from existing on-prem applications - for example, as I have seen someone else reference, there does not seem to be any reference to partitions in the front end or the import spreadsheet layout - so whilst there is a lot with which we will quite quickly feel familiar, there are also going to be new areas and new practices for us to get into step with: as above, we will look to explore some of these in future posts.

Categories: BI & Warehousing

Overview of the new Cloudera Data Science Workbench

Rittman Mead Consulting - Fri, 2017-06-02 09:07

Recently Cloudera released a new product called Cloudera Data Science Workbench(CDSW)

Being a Cloudera Partner, we at Rittman Mead are always excited when something new comes along.

The CDSW is positioned as a collaborative platform for data scientists/engineers and analysts, enabling larger teams to work in a self-service manner through a web browser. This browser application is effectively an IDE for R, Python and Scala - all your favorite toys!

The CDSW is deployed onto edge nodes of your CDH cluster, providing easy access to your HDFS data and the Spark2 and Impala engines. This means that team members can immediately start working on their projects, accessing full datasets and share analysis and results. A CDSW Project can include reusable code and snippets, libraries etc helping your teams to collaborate. Oh, and these projects can be linked with Github repos to help keep version history.

The workbench is used to fire up user session with R, Python or Scala inside a dedicated Docker engines. These engines can be customised, or extended, like any other Docker images to include all your favourite R packages and Python libraries. Using HDFS, Hive, Spark2 or Impala the workload can then be distributed over to the CDH cluster, by use of your preferred methods, without having to configure anything. This engine (virtual machine, really) runs for as long as the analysis. Any logs or output files need to be saved in the project folder, which is mounted inside the engine and saved on the CDSW master node. The master node is a gateway node to the CDH cluster and can scale out to many worker nodes to distribute the Docker engines

(C) Cloudera.com

And under the hood we also have Kubernetes to schedule user workload across the worker nodes and provide CPU and memory isolation

So far I find the IDE to be a bit too simple and lacking features compared to e.g. RStudio Server. But the ease of use and the fact that everything is automatically configured makes the CDSW an absolute must for any Cloudera customer with data science teams. Also, I'm convinced that future releases will add loads of cool functionality

I spent about two days building a new cluster on AWS and install the Cloudera Data Science Workbench, just an indication of how easy it is to get up and running. Btw, it also runs in the cloud (Iaas) ;)

Want to know more or see a live demo? Contact us at info@rittmanmead.com

Categories: BI & Warehousing

First Steps with Oracle Analytics Cloud

Rittman Mead Consulting - Thu, 2017-06-01 07:43
Preface

Not long ago Oracle added a new offer to their Cloud - an OBIEE in a Cloud with full access. Francesco Tisiot made an overview of it and now it's time to go a bit deeper and see how you can poke it with a sharp stick by yourself. In this blog, I'll show how to get your own OAC instance as fast and easy as possible.

Before you start

The very first step is to register a cloud account. Oracle gives a trial which allows testing of all features. I won't show it here as it is more or less a standard registration process. I just want highlight a few things:

  • You will need to verify your phone number by receiving an SMS. It seems that this mechanism may be a bit overloaded and I had to make more than one attempts. I press the Request code button but nothing happens. I wait and press it again, and again. And eventually, I got the code. I can't say for sure and possible it was just my bad luck but if you face the same problem just keep pushing (but not too much, requesting a code every second won't help you).
  • Even for trial you'll be asked for a credit card details. I haven't found a good diagnostics on how much was already spent and the documentation is not really helpful here.
Architecture

OAC instances are not self-containing and require some additional services. The absolute minimum configuration is the following:

  • Oracle Cloud Storage (OCS) - is used for backups, log files, etc.
  • Oracle Cloud Database Instance (DBC) - is used for RCU schemas.
  • Oracle Analytics Cloud Instance (OAC) - is our ultimate target.

From the Cloud services point of view, architecture is the following. This picture doesn't show virtual disks mounted to instances. These disks consume Cloud Storage quota but they aren't created separately as services.

Architecture

We need at least one Oracle Database Cloud instance to store RCU schemas. This database may or may not have a separate Cloud Storage area for backups. Every OAC instance requires Cloud storage area for logs. Multiple OAC instances may share one Cloud storage area but I can't find any advantage of this approach over a separate area for every instance.

Create Resources

We create these resource in the order they are listed earlier. Start with Storage, then DB and the last one is OAC. Actually, we don't have to create Cloud Storage containers separately as they may be created automatically. But I show it here to make things more clear without too much "it works by itself" magic.

Create Cloud Storage

The easiest part of all is the Oracle Cloud Storage container. We don't need to specify its size or lots of parameters. All parameters are just a name, storage class (Standard/Archive) and encryption.

20-create_ocs.gif

I spent some time here trying to figure out how to reference this storage later. There is a hint saying that "Use the format: <storage service>-<identity domain>/<container>. For example: mystorage1-myid999/mybackupcontainer." And if identity domain and container are pretty obvious, storage service puzzled me for some time. The answer is "storage service=Storage". You can see this in the top of the page.

30-OCS_naming.png

It seems that Storage is a fixed keyword, rurittmanm is the domain name created during the registration process and demo is the actual container name. So in this sample when I need to reference my demo OCS I should write Storage-rurittmanm/demo.

Create Cloud DB

Now when we are somewhat experienced in Oracle Cloud we may move to a more complicated task and create a Cloud DB Instance. It is harder than Cloud Storage container but not too much. If you ever created an on-premise database service using DBCA, cloud DB should be a piece of cake to you.

At the first step, we set the name of the instance and select the most general options. These options are:

  • Service Level. Specifies how this instance will be managed. Options are:

    • Oracle Database Cloud Service: Oracle Database software pre-installed on Oracle Cloud Virtual Machine. Database instances are created for you using configuration options provided in this wizard. Additional cloud tooling is available for backup, recovery and patching.
    • Oracle Database Cloud Service - Virtual Image: Oracle Database software pre-installed on an Oracle Cloud Virtual Machine. Database instances are created by you manually or using DBCA. No additional cloud tooling is available.
  • Metering Frequency - defines how this instance will be paid: by months or by hours.

  • Software Release - if the Service Level is Oracle Database Cloud Service, we may choose 11.2, 12.1 and 12.2, for Virtual Image only 11.2 and 12.1 are available. Note that even cloud does no magic and with DB 12.2 you may expect the same problems as on-premise.

  • Software Edition - Values are:

    • Standard Edition
    • Enterprise Edition
    • Enterprise Edition - High Performance
    • Enterprise Edition - Extreme Performance
  • Database Type - defines High Availability and Disaster Recovery options:

    • Single Instance
    • Database Clustering with RAC
    • Single Instance with Data Guard Standby
    • Database Clustering with RAC and Data Gard Standby

Database Clustering with RAC and Database Clustering with RAC and Data Gard Standby types are available only for Enterprise Edition - Extreme Performance edition.

40-create_obdc-1.gif

The second step is also quite intuitive. It has a lot of options but they should be pretty simple and well-known for anyone working with Oracle Database.

60-create-odbc-dc.png

The first block of parameters is about basic database configuration. Parameters like DB name (sid) or Administration Password are obvious.

Usable DataFile Storage (GB) is less obvious. Actually, in the beginning, it puzzled me completely. In this sample, I ask for 25 Gb of space. But this doesn't mean that my instance will take 25 Gb of my disk quota. In fact, this particular instance took 150 Gb of disk space. Here we specify only a guaranteed user disk space, but an instance needs some space for OS, and DB software, and temp, and swap, and so on.

65-db-disk.png

A trial account is limited with 500 Gb quota and that means that we can create only 3 Oracle DB Cloud instances at max. Every instance will use around 125 Gb of let's say "technical" disk space we can't reduce. From the practical point of view, it means that it may be preferable to have one "big" instance (in terms of the disk space) rather than multiple "small".

  • Compute shape specifies how powerful our VM should be. Options are the following:
    • OC3 - 1.0 OCPU, 7.5 GB RAM
    • OC4 - 2.0 OCPU, 15.0 GB RAM
    • OC5 - 4.0 OCPU, 30.0 GB RAM
    • OC6 - 8.0 OCPU, 60.0 GB RAM
    • OC7 - 16.0 OCPU, 120.0 GB RAM
    • OC1m - 1.0 OCPU, 15.0 GB RAM
    • OC2m - 2.0 OCPU, 30.0 GB RAM
    • OC3m - 4.0 OCPU, 60.0 GB RAM
    • OC4m - 8.0 OCPU, 120.0 GB RAM
    • OC5m - 16.0 OCPU, 240.0 GB RAM

We may increase or decrease this value later.

  • SSH Public Key - Oracle gives us an ability to connect directly to the instance and authentication is made by user+private key pair. Here we specify a public key which will be added to the instance. Obviously, we should have a private key for this public one. Possible options are either we provide a key we generated by ourselves or let Oracle create keys for us. The most non-obvious thing here is what is the username for the SSH. You can't change it and it isn't shown anywhere in the interface (at least I haven't found it). But you can find it in the documentation and it is opc.

The second block of parameters is about backup and restore. The meaning of these options is obvious, but exact values aren't (at least in the beginning).

70-create-odbc-brc.png

  • Cloud Storage Container - that's the Cloud Storage container I described earlier. Value for this field will be something like Storage-rurittmanm/demo. In fact, I may do not create this Container in advance. It's possible to specify any inexistent container here (but still in the form of Storage-<domain>/<name>) and tick Create Cloud Storage Container check-box. This will create a new container for us.

  • Username and Password are credentials of a user who can access this container.

The last block is Advanced settings and I believe it's quite simple and obvious. Most of the time we don't need to change anything in this block.

80-create-odbc-ac.png

When we fill all parameters and press the Next button we get a Summary screen and the actual process starts. It takes about 25-30 minutes to finish.

When I just started my experiments I was constantly getting a message saying that no sites available and my request may not be completed.

It is possible that it was again the same "luck" as with the phone number verification but the problem solved by itself a few hours later.

Create OAC Instance

At last, we have all we need for our very first OAC instance. The process of an OAC instance setup is almost the same as for an Oracle DB Cloud Instance. We start the process, define some parameters and wait for the result.

At the first step, we give a name to our instance, provide an SSH public key, and select an edition of our instance. We have two options here Enterprise Edition or Standard Edition and later we will select more additional options. Standard edition will allow us to specify either Data Visualisation or Essbase instances and Enterprise Edition adds to this list a classical Business Intelligence feature. The rest of the parameters here are exactly the same as for Database Instance.

90-oacs-1st-step.png

At the second step, we have four blocks of parameters.

100-oacs-2nd-step.png

  • Service Administrator - the most obvious one. Here we specify an administrator user. This user will be a system administrator.

  • Database - select a database for RCU schemas. That's why we needed a database.

  • Options - specify which options our instance will have.

    • Self-Service Data Visualisation, Preparation and Smart Discovery - this option means Oracle Data Visualisation and it is available for both Standard and Enterprise Editions.
    • Enterprise Data Models - this option gives us classical BI and available only for Enterprise Edition. Also, this option may be combined with the first one giving us both classical BI and modern Data discovery on one instance.
    • Collaborative Data Collection, Scenarios and What-if Analysis - this one stands for Essbase and available for Standard and Enterprise Editions. It can't be combined with other options.
  • Size is the same thing that is called Compute Shape for the Database. Options are exactly the same.
  • Usable Storage Size on Disk GB also has the same meaning as for the DB. The minimum size we may specify here is 25 Gb what gives us total 170 Gb of used disk space.

Here is a picture showing all possible combinations of services:

110-oacs-editions.png

And here virtual disks configuration. data disk is the one we specify.
130-oacs-storage.png

The last block - Cloud Storage Configuration was the hardest one. Especially the first field - Cloud Storage Base URL. The documentation says "Use the format: https://example.storage.oraclecloud.com/v1" and nothing more. When you know the answer it may be easy, but when I saw it for the first time it was hard. Should I place here any unique URL just like an identifier? Should it end with v1? And what is the value for the second instance? V2? Maybe I should place here the URL of my current datacenter (https://dbcs.emea.oraclecloud.com). The answer is https://<domain>.storage.oraclecloud.com/v1 in my case it is https://rurittmanm.storage.oraclecloud.com/v1. It stays the same for all instances.

All other parameters are the same as they were for DBCS instance. We either specify an existing Cloud Storage container or create it here.

120-oacs-cloud-storage.png

The rest of the process is obvious. We get a Summary and then wait. It takes about 40 minutes to create a new instance.

Note: diagnostics here is a bit poor and when it says that the instance start process is completed it may not be true. Sometimes it makes sense to wait some time before starting to panic.

Now we may access our instance as a usual. The only difference is that the port is 80 not 9502 (or 443 for SSL). For Data Visualisation the link is http(s)://<ip address>/va, for BIEE - http(s)://<ip address>/analytics and for Essbase http(s)://<ip address>/essbase. Enterprise Manager and Weblogic Server Console are availabale at port 7001 which is blocked by default.

What is bad that https uses a self-signed certificate. Depending on browser settings it may give an error or even prevent access to https.

Options here either use HTTP rather than HTTPS or add this certificate to your local computer. But these aren't the options for a production server. Luckily Oracle provides a way to use own SSL certificates.

Typical Management Tasks SSH to Instances

During the setup process, we provide Oracle with a public key which is used to get an SSH access to instances. Cloud does nothing special to this. In the case of Windows, we may use Putty. Just add the private key to Pageant and connect to the instance using user opc.

140-pageant.png

150-putty.gi

Opening Ports

By default only the absolute minimum of the ports is open and we can't connect to the OAC instance using BI Admin tool or to the DB with SQLDeveloper. In order to do this, we should create an access rule which allows access to this particular ports.

In order to get to the Access Rules interface, we must use instance menu and select the Access Rules option.

150-access-menu.png

This will open the Access Rules list. What I don't like about it is that it opens the full list of all rules but we can create only a rule for this particular instance.

160-access-rules-list.png

New rule creation form is simple and should cause no issues. But be careful here and not open too much for a wild Internet.

170-new-rule.png

Add More Users

The user who registered a Cloud Account becomes its administrator and can invite more users and manage privileges.

180-access-users.png

Here we can add and modify users.

190-users.png

When we add a user we specify a name, email and login. Also here we set roles for the user. The user will get an email with these details, and link to register.

Obviously, the user won't be asked about a credit card. He just starts working and that's all.

Summary

My first steps with Oracle Analytics Cloud were not very easy, but I think it was worth it. Now I can create a new OBIEE instance just in a few minutes and one hour later it will be up and running. And I think that's pretty fast compared to a normal process of creating a new server in a typical organisation. We don't need to think about OS installation, or licenses, or whatever else. Just try it.

Categories: BI & Warehousing

ORA-12154 in Data Guard environment

Amardeep Sidhu - Wed, 2017-05-31 10:54

Hit this silly issue in one of the data guard environments today. Primary is a 2 node RAC running 11.2.0.4 and standby is also a 2 node RAC. Archive logs from node2 aren’t shipping and the error being reported is

ORA-12154: TNS:could not resolve the connect identifier specified

We tried usual things like going to $TNS_ADMIN, checking the entry in tnsnames.ora and then also trying to connect using sqlplus sys@target as sysdba. Everything seemed to be good but logs were not shipping and the same problem was being reported repeatedly. As everything on node1 was working fine so it looked even more weird.

From the error it is clear that the issue is with tnsnames entry. Finally found the issue after some 30 mins. It was an Oracle EBS environment so the TNS_ADMIN was set to the standard $ORACLE_HOME/network/admin/*hostname* path (on both the nodes). On node1 there was no tnsnames.ora file in $ORACLE_HOME/network/admin so it was connecting to the standby using the Apps tnsnames.ora which was having the correct entry for standby. On node2 there was a file called tnsnames.ora in $ORACLE_HOME/network/admin but it was not having any entry for standby. It was trying to connect using that file (the default tns path) and failing with ORA-12154. Once we removed that file, it started using the Apps tnsnames.ora and logs started shipping.

Categories: BI & Warehousing

ACE Alumni

Tim Tow - Tue, 2017-05-23 23:08
Today, I asked Oracle to move me from Oracle ACE Director status to Oracle ACE Alumni status.  There are a number of reasons why I decided to change status.  When I started answering questions on internet forums years ago, I did it to share what I had learned in order to help others.  The same goes for this blog which I originally started so that I could give better and more complete answers to questions on the forums.

After the Hyperion acquisition by Oracle, I was contacted by Oracle who asked if I would be interested in becoming an "Oracle ACE".  It was an honor.  But over time, things have changed.  As more people found out about the ACE program, more people wanted to become an ACE.  If you have ever monitored the OTN Essbase and Smart View forums, they have become cluttered with copy and paste posts from people obviously trying to increase their points.  As the ACE program grew, it also become harder for the OTN team to manage and now require a formal activity reporting - a time report if you will - to track contributions to the community.  As I am already extremely pressed for time, I decided that tracking my contributions to the community - in exchange for a free pass to Open World, just didn't make sense.

All of that being said, just because I have moved to Oracle ACE Alumni status doesn't mean that I will stop contributing to the community.  My company will continue to provide free downloads and support for the Next Generation (Essbase) Outline Extractor and the Outline Viewer along with free downloads of Drillbridge Community Edition.  And maybe, just maybe, I will finally have time to write some new blog posts (maybe even some posts on some new Dodeca features inspired by our work with Oracle Analytics Cloud / Essbase Cloud!)

Categories: BI & Warehousing

Users of Analytics Applications

Dylan's BI Notes - Sun, 2017-05-21 15:08
Business User who are consuming the data and the report.  They see the information pushed to them.  They can see alerts in their phone.  They see emails.  They add the page to a bookmark in their browser and periodically look at them.   They are executives, managers, busy users who have other duties.   They don’t […]
Categories: BI & Warehousing

Delivery to Oracle Document Cloud Services (ODCS) Like A Boss

Tim Dexter - Wed, 2017-05-17 11:53
p { margin-bottom: 0.1in; direction: ltr; color: rgb(0, 0, 10); line-height: 120%; text-align: left; }p.western { font-family: "Liberation Serif",serif; font-size: 12pt; }p.cjk { font-family: "WenQuanYi Micro Hei"; font-size: 12pt; }p.ctl { font-family: "Lohit Devanagari"; font-size: 12pt; }

We have moved to a new blogging platform. This was a post from Pradeep that missed the cut over ...

In release 12.2.1.1, BI Publisher added a new feature - Delivery to Oracle Document Cloud Services (ODCS). Around the same time, BI Publisher was also certified against JCS 12.2.1.x and therefore, today if you have hosted your BI Publisher instance on JCS then we recommend Oracle Document Cloud Services as the delivery channel. Several reasons for this:

  1. Easy to configure and manage ODCS in BI Publisher on Oracle Public Cloud. No port or firewall issues.

  2. ODCS offers a scalable, robust and secure document storage solution on cloud.

  3. ODCS offers document versioning and document metadata support similar to any content management server

  4. Supports all business document file formats relevant for BI Publisher

When to use ODCS?

ODCS can be used for all different scenarios where a document need to be securely stored in a server that can be retained for any duration. The scenarios may include:

  • Bursting documents to multiple customers at the same time.

    • Invoices to customers

    • HR Payroll reports to its employees

    • Financial Statements

  • Storing large or extremely large reports for offline printing

    • End of the Month/Year Statements for Financial Institutions

    • Consolidated department reports

    • Batch reports for Operational data

  • Regulatory Data Archival

    • Generating PDF/A-1b or PDF/A-2 format documents

How to Configure ODCS in BI Publisher?

Configuration of ODCS in BI Publisher requires the  URI, username and password. Here the username is expected to have access to the folder where the files are to be delivered.


 

How to Schedule and Deliver to ODCS?

Delivery to ODCS can be managed through both - a Normal Scheduled Job and a Bursting Job.

A Normal Scheduled Job allows the end user to select a folder from a list of values as shown below

\

In case of Bursting Job, the ODCS delivery information is to be provided in the bursting query as shown below:

Accessing Document in ODCS

Once the documents are delivered to ODCS, they can be accessed by user based on his access to the folder, very similar to FTP or WebDAV access.

That's all for now. Stay tuned for more updates !

 

Categories: BI & Warehousing

Pages

Subscribe to Oracle FAQ aggregator - BI &amp; Warehousing